Lookup from delimited sequential file

narasimha · Post by **narasimha** » Mon Mar 06, 2006 7:14 pm

What are the different appraoches that can be implemented to read a sequential file of the format,

123,1020,1578
2344,5423,4353,2311
412
3213,4231

look up the values in a hashed file and replace with corresponding keys
(I already have a hashed file for this)

123 =>1
1020 => 24
1578 => 32
2344 => 52
5423 => 82
4353 => 75
2311 => 48
412 =>12
3213 => 60
4231 => 71

and insert into another table in the format shown below

Code: Select all

Group 	Detail 	Key
1			1			1
1			24			2
1			32			3
2			52			4
2			82			5
2			75			6
2			48			7
3			12			8
4			60			9
4			71			10

Some points-
Each number corresponds to a key in the detail column.
The numbers in the Group column unique set of numbers from the sequential file .
Dont worry about the key generation part of it.

I_Server_Whale · Post by **I_Server_Whale** » Mon Mar 06, 2006 10:03 pm

What are the different appraoches that can be implemented to read a sequential file of the format,

Code:

123,1020,1578
2344,5423,4353,2311
412
3213,4231

Hi Narasimha,

I have been working on your issue. And considering the example given by you, I have simulated a job design to read a variable_width file.

Since your example contains a maximum of 4 columns, this sequential file has four columns (c1, c2, c3, c4) in its metadata. By declaring the 'Incomplete Column' option as 'Replace' for all the 4 columns, it was possible to read the variable length file.

For further ease, You can download this pdf for the sequential file settings.

Download

Let me know if it worked out for you,

Thanks,
Naveen.

narasimha · Post by **narasimha** » Mon Mar 06, 2006 10:27 pm

Naveen Thanks for the effort.

Before we get too far, the number of coulmns is not restricted to a maximum of 4, it could be more.

I_Server_Whale · Post by **I_Server_Whale** » Mon Mar 06, 2006 10:35 pm

Yes! I understand that. You should have the

Code: Select all


no. of columns in sequential file stage = the no. of columns in the longest record

I hope that you will be knowing the maximum no. of columns that a record can have.

Let me know,

Thanks,
Naveen.

narasimha · Post by **narasimha** » Mon Mar 06, 2006 10:53 pm

Unfortunately it is not known as of now

This group of numbers is formed dynamically.
I have to assume some number - 20. Does that help
But I would prefer something dynamic

I_Server_Whale · Post by **I_Server_Whale** » Mon Mar 06, 2006 10:59 pm

It depends on your file. If you think that a record in the source file can never have more than 20 columns

. Then, you should be good to go.

Do let me know if you were able to finish your process succesfully.

Thanks,
Naveen.

ameyvaidya · Post by **ameyvaidya** » Mon Mar 06, 2006 11:49 pm

I was going through the Plug-in Documentation of the Row Splitter Stage and found that it has a capability of generating the desired splitting up of multi-value columns into seperate rows. I could not try this.. (No access to DataStage). but the way to go would be

Code: Select all

Src---------->Transformer------->RowSplitter------->

In the Transformer, replace all commas to "char(10)" which is a newline character.

In the row Splitter general Page, Enable
Multiple Lines. This determines whether Row Splitter deconcatenates the input string into separate output rows, or whether it outputs each input string as a separate output row. Select Multiple Lines to have the rows deconcatenated. By default it is not selected.

And set Line termination to Unix.

Finally this is an untested theory

Let me know if it worked..

djm · Post by **djm** » Mon Mar 06, 2006 11:49 pm

An alternative approach may be to preprocess the file using UNIX commands, deriving a commonly formatted record per line in the file. The target would be to have each row in the file with two columns; one is the "group by" and the other is the value. It will then not matter how many columns you have in the file.

Assuming that the field separator is a comma, an off-the-cuff solution would be something like:

Code: Select all

awk -F, '
    {
    gsub (/,/ , ( "\n" NR "," ) );
    print NR "," $0;
    }
  ' yourfile > newfile

This, when fed your sample data ...

Code: Select all

cat |
  awk -F, '
  {
    gsub (/,/ , ( "\n" NR "," ) );
    print NR "," $0;
  }' << EOD
123,1020,1578
2344,5423,4353,2311
412
3213,4231
EOD

... yields ...

Code: Select all

... which I'm presuming is straight forward enough for you to deal with.

If you want to understand the awk script, the starting point is the UNIX command "man awk" (unless ray requests I expand further! :D ).

Hope this helps.

David

ray.wurlod · Post by **ray.wurlod** » Tue Mar 07, 2006 7:58 am

Search the forum to find a fast method for performing the pivot on the comma-delimited field using an intermediate sequential file. Follow this with a Transformer stage to do the lookup. Voila!

narasimha · Post by **narasimha** » Tue Mar 07, 2006 7:36 pm

David, your script works like magic

But i need to lookup with the tables to get the keys and then populate the group id's.
Vaidya, your untested theory works, You get nice single coulmn of data from a row of delimited values.
But i guess my need is something like described below
Ray your approach looks the best, but need to do a reverse vertical pivot functionality. i.e look into single rows and form multiple rows.
I guess i need to build a routine to implement this logic, unless otherwise you have a better approach.

Detail Description of Job
Based on this table
Dim_Metro_Grp
(I get my Metro_Grp_keys from here

Code: Select all

Metro_Grp_Dsply	Metro_Grp_key	Metro_Grp_Num_Metro
-------------------------------------------------------------------------------------------------
1030,126,432			1				3  
327,529					2				2
135						3				1
158,132,126			4				3
126,158,132,130		5				4

I have to populate this table
Metro_Grp_Dtl
(combination of the Metro_Grp_keys and Metro_Key

Code: Select all

Metro_Grp_Dtl_Key	Metro_Grp_key	Metro_Key
-------------------------------------------------------------------------------------------------
1					1				1
2					1				2
3					1				3
4					2				4
5					2				5
6					3				7
7					4				6

By looking into the Metro Keys from

Dim_Metro

Code: Select all

Metro_id	Metro_Nm	Metro_Key
-------------------------------------------------------------------------------------------------
1030			abc			1
126			asd			2
432			dfg			3
327			hjd			4
592			moi			5
158			oih			6
135			rfg			7

The end result populate the Metro_Grp_Dtl Table