Lookup from delimited sequential file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Lookup from delimited sequential file

Post by narasimha »

What are the different appraoches that can be implemented to read a sequential file of the format,

Code: Select all

123,1020,1578
2344,5423,4353,2311
412
3213,4231
look up the values in a hashed file and replace with corresponding keys
(I already have a hashed file for this)
  • 123 =>1
    1020 => 24
    1578 => 32
    2344 => 52
    5423 => 82
    4353 => 75
    2311 => 48
    412 =>12
    3213 => 60
    4231 => 71
and insert into another table in the format shown below

Code: Select all

Group 	Detail 	Key
1			1			1
1			24			2
1			32			3
2			52			4
2			82			5
2			75			6
2			48			7
3			12			8
4			60			9
4			71			10
Some points-
Each number corresponds to a key in the detail column.
The numbers in the Group column unique set of numbers from the sequential file .
Dont worry about the key generation part of it.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »


What are the different appraoches that can be implemented to read a sequential file of the format,

Code:

123,1020,1578
2344,5423,4353,2311
412
3213,4231
Hi Narasimha,

I have been working on your issue. And considering the example given by you, I have simulated a job design to read a variable_width file.

Since your example contains a maximum of 4 columns, this sequential file has four columns (c1, c2, c3, c4) in its metadata. By declaring the 'Incomplete Column' option as 'Replace' for all the 4 columns, it was possible to read the variable length file.

For further ease, You can download this pdf for the sequential file settings.

Download

Let me know if it worked out for you,

Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Naveen Thanks for the effort.

Before we get too far, the number of coulmns is not restricted to a maximum of 4, it could be more.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

Yes! I understand that. You should have the

Code: Select all


no. of columns in sequential file stage = the no. of columns in the longest record

I hope that you will be knowing the maximum no. of columns that a record can have.

Let me know, :)

Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Unfortunately it is not known as of now :(
This group of numbers is formed dynamically.
I have to assume some number - 20. Does that help
But I would prefer something dynamic
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

It depends on your file. If you think that a record in the source file can never have more than 20 columns :wink: . Then, you should be good to go. :arrow:

Do let me know if you were able to finish your process succesfully.

Thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

I was going through the Plug-in Documentation of the Row Splitter Stage and found that it has a capability of generating the desired splitting up of multi-value columns into seperate rows. I could not try this.. (No access to DataStage). but the way to go would be

Code: Select all

Src---------->Transformer------->RowSplitter------->
In the Transformer, replace all commas to "char(10)" which is a newline character.

In the row Splitter general Page, Enable
Multiple Lines. This determines whether Row Splitter deconcatenates the input string into separate output rows, or whether it outputs each input string as a separate output row. Select Multiple Lines to have the rows deconcatenated. By default it is not selected.

And set Line termination to Unix.

Finally this is an untested theory

Let me know if it worked..
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

An alternative approach may be to preprocess the file using UNIX commands, deriving a commonly formatted record per line in the file. The target would be to have each row in the file with two columns; one is the "group by" and the other is the value. It will then not matter how many columns you have in the file.

Assuming that the field separator is a comma, an off-the-cuff solution would be something like:

Code: Select all

awk -F, '
    {
    gsub (/,/ , ( "\n" NR "," ) );
    print NR "," $0;
    }
  ' yourfile > newfile
This, when fed your sample data ...

Code: Select all

cat |
  awk -F, '
  {
    gsub (/,/ , ( "\n" NR "," ) );
    print NR "," $0;
  }' << EOD
123,1020,1578
2344,5423,4353,2311
412
3213,4231
EOD
... yields ...

Code: Select all

1,123
1,1020
1,1578
2,2344
2,5423
2,4353
2,2311
3,412
4,3213
4,4231
... which I'm presuming is straight forward enough for you to deal with.

If you want to understand the awk script, the starting point is the UNIX command "man awk" (unless ray requests I expand further! :D ).

Hope this helps.

David
Last edited by djm on Thu Mar 23, 2006 7:06 pm, edited 1 time in total.
(Previously known as D)

Be alturistic and donate your spare CPU cycles to research. http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Search the forum to find a fast method for performing the pivot on the comma-delimited field using an intermediate sequential file. Follow this with a Transformer stage to do the lookup. Voila!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

David, your script works like magic :)
But i need to lookup with the tables to get the keys and then populate the group id's.
Vaidya, your untested theory works, You get nice single coulmn of data from a row of delimited values.
But i guess my need is something like described below
Ray your approach looks the best, but need to do a reverse vertical pivot functionality. i.e look into single rows and form multiple rows.
I guess i need to build a routine to implement this logic, unless otherwise you have a better approach.

Detail Description of Job
Based on this table
Dim_Metro_Grp
(I get my Metro_Grp_keys from here

Code: Select all

Metro_Grp_Dsply	Metro_Grp_key	Metro_Grp_Num_Metro
-------------------------------------------------------------------------------------------------
1030,126,432			1				3  
327,529					2				2
135						3				1
158,132,126			4				3
126,158,132,130		5				4

I have to populate this table
Metro_Grp_Dtl
(combination of the Metro_Grp_keys and Metro_Key

Code: Select all

Metro_Grp_Dtl_Key	Metro_Grp_key	Metro_Key
-------------------------------------------------------------------------------------------------
1					1				1
2					1				2
3					1				3
4					2				4
5					2				5
6					3				7
7					4				6
By looking into the Metro Keys from

Dim_Metro

Code: Select all

Metro_id	Metro_Nm	Metro_Key
-------------------------------------------------------------------------------------------------
1030			abc			1
126			asd			2
432			dfg			3
327			hjd			4
592			moi			5
158			oih			6
135			rfg			7

The end result populate the Metro_Grp_Dtl Table
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
Post Reply