Creating number of Files

Cherukuri · Post by **Cherukuri** » Sun Apr 15, 2012 5:37 am

Hi,

I have a requirement to create a number of .txt target files based on information below:

The source table is Ship_Que_T and its from oracle database.
the table has 10 fields and 1 field as shipment_id (not a key field)has the data below:

the values in the shipment_id : 10001,10001,10001,20002,20002,7111,7111,7111,7111,7111,7111,999,999

now individual .txt files has to be created for ever unique shipment id.
example : for 10001 --- ship_1.txt file
for 20002 --- ship_2.txt file
for 7111 --- ship_3. txt file
for 999 --- ship_43. txt file
etc.......

Important thing is for ever day the table will be updated with as many as shipment_id values and that many .txt files has to be created .

I think i presented the question that is understandable...
Could any one please give a solution for this requirement...

Thanks and Regards,

chulett · Post by **chulett** » Sun Apr 15, 2012 7:21 am

Start by posting in the correct forum... which I've just moved you to.

chulett · Post by **chulett** » Sun Apr 15, 2012 8:58 am

There are a number of posts here on the subject of dynamically creating output files. However, they all involve using the actual file data in the filename rather than what looks like an incremented number.

Is there any reason you can't do that? Use the "shipment id" in the filename?

for 10001 --- ship_10001.txt file
for 20002 --- ship_20002.txt file
for 7111 ---- ship_7111. txt file
for 999 ------ ship_999. txt file

That would be the most straight-forward approach. Otherwise you'll need to generate what is basically a surrogate key and assign to any new ids you encounter, which in turn means you have to track the ids that you have encountered so far.

chulett · Post by **chulett** » Sun Apr 15, 2012 9:41 am

I guess the answer to the above question could depend on what happens when you see the same shipment id later - do you create the file with the same number you used before or does it always get a new number? The latter would simplify things if all you need to do is separate the files by id and then always assign a new number to the file...

Cherukuri · Post by **Cherukuri** » Mon Apr 16, 2012 1:27 am

Thank you for your inputs,

The name of the files should be like ship_<shipment_id> .. for example if values in the shipment_id : 10001,10001,10001,20002,20002,7111,7111,7111,7111,7111,7111,999,999

Then the file names should be ship_10001,ship_20002 etc...

And as per our requirement there is no chance of getting the same shipment id again ...

Regards,

chulett · Post by **chulett** » Mon Apr 16, 2012 6:43 am

OK... tell us, is the shipment id actually in the file as well or is it only used to determine the filename?

Cherukuri · Post by **Cherukuri** » Mon Apr 16, 2012 7:01 am

thanks for the reply.

Yes, the shipment_id values are in the files

Please notice the shipment_id is not a key field. so it has the same values aswell.
For Example :::

shipment_id,order1

100001,abc
100001,cvb
100001,catlogictscar
100001,catlogiticsMill
201111,x
201111,y
311011,1003x
311011,2003y
311011,5009z
.
.
.
etc.

to generate the seperate out put files as

ship_100001.txt

shipment_id,order1
100001,abc
100001,cvb
100001,catlogictscar
100001,catlogiticsMill
---------------------------------

Ship_201111.txt

shipment_id,order1
201111,x
201111,y
--------------------------

Ship_311011.txt

shipment_id,order1
311011,1003x
311011,2003y
311011,5009z

-----------------------------

Please note the source file will update with more shipment id's so that more files need to generate accordingly .. Please help..Thanks

Thanks and Regards,

dinakaran_s · Post by **dinakaran_s** » Mon Apr 16, 2012 12:38 pm

Hi,

I have created a C++ routinue for creating dynamic flat files, Insert each record into the file.

logic:
Chk for the file name; If exists then append the row Else create new file.

This routine works only ehn the transformer is set to sequential process.

attachString -- > set of delimited column value
strPath -- > Directory path and File name

Code: Select all

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

int pxCreateFile(char *attachString, char *strPath )
{
                bool isExisted = true;
                ifstream fin(strPath);
                if(!fin)
                {
                                isExisted = false;
                }
                fin.close();

                ofstream fout(strPath, ofstream::app);
                if(!fout)
                {
                                std::cout << "ERRR: " << " - Cannot open input file " << strPath << std::endl;
                                return -1;
                }

                if(!isExisted)
                {
                                fout << attachString;
                }
                else
                {
                                fout << endl << attachString;
                }
                fout.close();
                return 0;
}

Cherukuri · Post by **Cherukuri** » Tue Apr 17, 2012 2:31 am

Hi Dinakaran,

Thank you very much for your solution..

Can u please explain step by step how to use this routine in my requirement.

Thanks and Regards,

dinakaran_s · Post by **dinakaran_s** » Tue Apr 17, 2012 8:23 am

Steps:

1. Use the above code and create a file "Filename.c"
2. Compile the above file using value available in APT_COMPILEROPT and APT_COMPILER
something like this below,
g++ -c -O -fPIC -Wno-deprecated -m64 -mtune=generic -mcmodel=small
3. Now u can see a new file with "filename.o"

4. Now create a datastage parallel routinue and use the same parameter as defined in routinue. use char* for datatype
5. Point the file "filename.o" to the parallel routinue.

Once the above steps are completed,

use the same routine in the transformer and pass the parameter as defined.

rameshrr3 · Post by **rameshrr3** » Tue Apr 17, 2012 9:36 am

A quick and dirty way of doing this is to use a server job with oracle table as source and folder stage as target . create a fielname column in the jobstream and prefix it with "ship_" , and concatenate the values ( 10001 .. etc) to it . Of course some clients dont want to see server jobs , and since you are reading from oracle ( parallel select possible) , will not be able to take advantage of parallel processing using a server job!

chulett · Post by **chulett** » Tue Apr 17, 2012 10:00 am

Or a Sever Shared Container with the Folder stage in your Parallel jobs. And I sincerely doubt you really need a PX job just to dump Oracle data to files.

dinakaran_s · Post by **dinakaran_s** » Tue Apr 17, 2012 11:56 am

Hi,

Good thing about using parallel job is Performance!! even with sequential mode.

I have tested a job with the above routinue and it really gives good performance.

Source : Oracle
Volume : 150,000 records
Time Taken : 2 min

chulett · Post by **chulett** » Tue Apr 17, 2012 12:52 pm

... and a Server job would probably be faster because there isn't all of the startup / overhead that the parallel framework carries with it.

DSXchange

Creating number of Files

Creating number of Files

Re:Creating number of Files

Re:Creating number of Files

Routine for creating dynamic files

Re:Creating number of Files