Select First Record only

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Select First Record only

Post by srinagesh »

Hi !

My Source File is as follows

Code: Select all

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
A          12-march-06                    1200
B          15-march-06                    1500
A          17-march-06                    2200
C          18-march-06                    1600
C          19-march-06                    1700
I need to take first records only (based on Site)

ie; My output should be

Code: Select all

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
A          12-march-06                    1200
B          15-march-06                    1500
C          18-march-06                    1600
If I use a hash file, with key as Site.. I am getting the Last record (as shown below)

Code: Select all

Site    ReplenishmentDate       ReplenishmentQty
--------------------------------------------------------
B          15-march-06                    1500
A          17-march-06                    2200
C          19-march-06                    1700
Please suggest me a way of acheiving this.

PS: Storing the data into a temporary database table with PrimaryKey as site and logic as "truncate and insert" is not an option. The requirement is such that we can use database at this stage.

Regards
Nagesh
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
There are several way of acheiving this. Pls do search the forum.
You are grouping based on the Site and getting the max of rest of the columns.
One of the way would be to add another column to flag out the change in group.
Set 2 stage variable in transformer like
Change = If Previous_col = Input_link.Site Then 0 Else 1
Previous_col = Input_link.Site
Assign Change to a column say Flag.
Pass the sorted data to the transformer and select only the records for which Flag =1.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Change your date to ISO format YYYY-MM-DD, pass thru a Sort stage and sort descending on site, date, then write to hash file. Last row into hash file with just site as key will be the oldest date for that site.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

If you are reading from database then do the order by in database SQL and then use a stage variable to hold the last record and compare with the current record. If it is same then do not send to output.
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

There is a FAQ How do I remove duplicate rows from a data source? This will give you a few options.
chowdary
Participant
Posts: 38
Joined: Thu Jun 23, 2005 11:25 am

Re: Select First Record only

Post by chowdary »

Nagesh

First Sort the data on that particular column using sort stage and then in the transformer you can also use Transform function RowProcCompareWithPreviousValue to ignore duplicate values.
The routine will check the supplied value with the previous value.
If its same, then 1 is returned else 0 is returned. you can use this routing in the Stage variable.

I think this helps

Chowdary
[/b]
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

Hi !

I think my example has misled you a little.
I want to be able to select the First Rows only for a given Site.. Irrespective of the Date.

ie., If the file is

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
A       2006-02-16        1500
A       2006-02-18        1200
Then the output should be

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
For this reason I cant use any sort or any other feature.

Please suggest me an alternative
Poornagirija
Participant
Posts: 65
Joined: Fri Nov 19, 2004 12:00 am

Post by Poornagirija »

hi srinagesh, :D
Use aggregator stage along with Xfm.
Make Site(field) as group
for the rest of columns make derivation as FIRST.

Hope that will give your answer :?
With Regards,
Poornagirija.

"Don't limit your challenges - challenge your limits"
loveojha2
Participant
Posts: 362
Joined: Thu May 26, 2005 12:59 am

Post by loveojha2 »

You can achieve it through Stage Variables and Constraints

Create Two Stage Variables

Code: Select all

Prev_Key: Derivation Curr_Key (Initiallize with a value say '-1')
Curr_Key: Derivation Src.Site

Constraint for the Output Link

Code: Select all

Prev_Key<>Curr_Key
Hope this helps.
Success consists of getting up just one more time than you fall.
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

Thanx Guys,

It works.


-Nagesh
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

Belatedly throwing in an approach using UNIX ...

Code: Select all

sort -s -k1,1 yourfile |
  awk '
    BEGIN \
    {
      prev_key = "";
    }
    ( $1 != prev_key ) \
    {
      print $0;
    }
    {
      prev_key = $1;
    }
    ' > firstrowforkey
Caveats:
1) requires your sort command to have the "-s" option
2) requires my off-the-cuff awk script to be correct.

David
(Previously known as D)

Be alturistic and donate your spare CPU cycles to research. http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

And if your flavour of unix does not have "-s" for stable sorting, you may be able to use "sort -k1,1 -u" instead. According to the man pages I have seen, this will suppress duplicates based solely on the key field. Unfortunately, the man page for sort gives no indication which of the lines will be output (e.g. always the first, always the last, always random).

Unscientific testing on a far too small a sample set yielded a behaviour of the first instance being output - your actual mileage may vary! Does anyone know definitively?

David.
(Previously known as D)

Be alturistic and donate your spare CPU cycles to research. http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

srinagesh wrote:Hi !

I think my example has misled you a little.
I want to be able to select the First Rows only for a given Site.. Irrespective of the Date.

ie., If the file is

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
A       2006-02-16        1500
A       2006-02-18        1200
Then the output should be

Code: Select all

Site    Date                  Qty
------------------------------------
A       2006-02-17        1300
For this reason I cant use any sort or any other feature.

Please suggest me an alternative
Use Aggregator. Group by Site. For Date and Qty, use the Derivation 'First'.

gateleys
Post Reply