Creating number of Files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Cherukuri
Participant
Posts: 46
Joined: Wed Jul 25, 2007 2:43 am
Location: India
Contact:

Creating number of Files

Post by Cherukuri »

Hi,

I have a requirement to create a number of .txt target files based on information below:

The source table is Ship_Que_T and its from oracle database.
the table has 10 fields and 1 field as shipment_id (not a key field)has the data below:

the values in the shipment_id : 10001,10001,10001,20002,20002,7111,7111,7111,7111,7111,7111,999,999

now individual .txt files has to be created for ever unique shipment id.
example : for 10001 --- ship_1.txt file
for 20002 --- ship_2.txt file
for 7111 --- ship_3. txt file
for 999 --- ship_43. txt file
etc.......

Important thing is for ever day the table will be updated with as many as shipment_id values and that many .txt files has to be created .

I think i presented the question that is understandable...
Could any one please give a solution for this requirement...

Thanks and Regards,
Cheru
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Start by posting in the correct forum... which I've just moved you to.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There are a number of posts here on the subject of dynamically creating output files. However, they all involve using the actual file data in the filename rather than what looks like an incremented number.

Is there any reason you can't do that? Use the "shipment id" in the filename?

for 10001 --- ship_10001.txt file
for 20002 --- ship_20002.txt file
for 7111 ---- ship_7111. txt file
for 999 ------ ship_999. txt file

That would be the most straight-forward approach. Otherwise you'll need to generate what is basically a surrogate key and assign to any new ids you encounter, which in turn means you have to track the ids that you have encountered so far.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I guess the answer to the above question could depend on what happens when you see the same shipment id later - do you create the file with the same number you used before or does it always get a new number? The latter would simplify things if all you need to do is separate the files by id and then always assign a new number to the file...
-craig

"You can never have too many knives" -- Logan Nine Fingers
Cherukuri
Participant
Posts: 46
Joined: Wed Jul 25, 2007 2:43 am
Location: India
Contact:

Re:Creating number of Files

Post by Cherukuri »

Thank you for your inputs,

The name of the files should be like ship_<shipment_id> .. for example if values in the shipment_id : 10001,10001,10001,20002,20002,7111,7111,7111,7111,7111,7111,999,999

Then the file names should be ship_10001,ship_20002 etc...

And as per our requirement there is no chance of getting the same shipment id again ...

Regards,
Cheru
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

OK... tell us, is the shipment id actually in the file as well or is it only used to determine the filename?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Cherukuri
Participant
Posts: 46
Joined: Wed Jul 25, 2007 2:43 am
Location: India
Contact:

Re:Creating number of Files

Post by Cherukuri »

thanks for the reply.

Yes, the shipment_id values are in the files

Please notice the shipment_id is not a key field. so it has the same values aswell.
For Example :::

shipment_id,order1

100001,abc
100001,cvb
100001,catlogictscar
100001,catlogiticsMill
201111,x
201111,y
311011,1003x
311011,2003y
311011,5009z
.
.
.
etc.

to generate the seperate out put files as

ship_100001.txt

shipment_id,order1
100001,abc
100001,cvb
100001,catlogictscar
100001,catlogiticsMill
---------------------------------

Ship_201111.txt

shipment_id,order1
201111,x
201111,y
--------------------------

Ship_311011.txt

shipment_id,order1
311011,1003x
311011,2003y
311011,5009z

-----------------------------

Please note the source file will update with more shipment id's so that more files need to generate accordingly .. Please help..Thanks

Thanks and Regards,
Cheru
dinakaran_s
Participant
Posts: 22
Joined: Wed Jul 02, 2008 7:01 am
Location: London

Routine for creating dynamic files

Post by dinakaran_s »

Hi,

I have created a C++ routinue for creating dynamic flat files, Insert each record into the file.

logic:
Chk for the file name; If exists then append the row Else create new file.

This routine works only ehn the transformer is set to sequential process.

attachString -- > set of delimited column value
strPath -- > Directory path and File name

Code: Select all

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int pxCreateFile(char *attachString, char *strPath )
{
                bool isExisted = true;
                ifstream fin(strPath);
                if(!fin)
                {
                                isExisted = false;
                }
                fin.close();

                ofstream fout(strPath, ofstream::app);
                if(!fout)
                {
                                std::cout << "ERRR: " << " - Cannot open input file " << strPath << std::endl;
                                return -1;
                }

                if(!isExisted)
                {
                                fout << attachString;
                }
                else
                {
                                fout << endl << attachString;
                }
                fout.close();
                return 0;
}
Cherukuri
Participant
Posts: 46
Joined: Wed Jul 25, 2007 2:43 am
Location: India
Contact:

Re:Creating number of Files

Post by Cherukuri »

Hi Dinakaran,

Thank you very much for your solution..

Can u please explain step by step how to use this routine in my requirement.

Thanks and Regards,
Cheru
dinakaran_s
Participant
Posts: 22
Joined: Wed Jul 02, 2008 7:01 am
Location: London

Post by dinakaran_s »

Steps:

1. Use the above code and create a file "Filename.c"
2. Compile the above file using value available in APT_COMPILEROPT and APT_COMPILER
something like this below,
g++ -c -O -fPIC -Wno-deprecated -m64 -mtune=generic -mcmodel=small
3. Now u can see a new file with "filename.o"

4. Now create a datastage parallel routinue and use the same parameter as defined in routinue. use char* for datatype
5. Point the file "filename.o" to the parallel routinue.

Once the above steps are completed,

use the same routine in the transformer and pass the parameter as defined.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

A quick and dirty way of doing this is to use a server job with oracle table as source and folder stage as target . create a fielname column in the jobstream and prefix it with "ship_" , and concatenate the values ( 10001 .. etc) to it . Of course some clients dont want to see server jobs , and since you are reading from oracle ( parallel select possible) , will not be able to take advantage of parallel processing using a server job!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Or a Sever Shared Container with the Folder stage in your Parallel jobs. And I sincerely doubt you really need a PX job just to dump Oracle data to files.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dinakaran_s
Participant
Posts: 22
Joined: Wed Jul 02, 2008 7:01 am
Location: London

Post by dinakaran_s »

Hi,

Good thing about using parallel job is Performance!! even with sequential mode.

I have tested a job with the above routinue and it really gives good performance.

Source : Oracle
Volume : 150,000 records
Time Taken : 2 min
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... and a Server job would probably be faster because there isn't all of the startup / overhead that the parallel framework carries with it. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply