Datastage Parallel schema file - indicate # of rows to skip

gaelynalmeida · Post by **gaelynalmeida** » Thu Sep 14, 2017 10:37 am

Hi,

We have a job that reads a flat file, and loads it to a target using RCP. The sequential file stage reads a schema file to determine the layout.

Some files may have headers, some may not. I do not seen an option to set this at run time in the sequential file stage. The value setting for "First line is column names" is a drop down that has a True or False value, and I cannot override with a parameter

Question: Can I specify, somehow, the number of rows to skip? Can this be done in the schema file? I have not found this in any documentation so far.

Thank you
G. Almeida

chulett · Post by **chulett** » Thu Sep 14, 2017 11:28 am

While Informatica has a "number of records to skip" for flat files, I don't recall DataStage having anything other than the "First line is column headers" true/false option. Perhaps a question for your official support provider?

Thomas.B · Post by **Thomas.B** » Tue Sep 19, 2017 5:53 am

You could use the "Filter" option of the Sequential File stage to skip the first rows from a file, for example :

Code: Select all

sed -e '1,3d'

will skip the first 3 rows.

gaelynalmeida · Post by **gaelynalmeida** » Tue Sep 19, 2017 4:21 pm

Thank you, Craig .. yes, we will reach out to IBM to see what they say. In the mean time, we are using a filter.

gaelynalmeida · Post by **gaelynalmeida** » Tue Sep 19, 2017 4:23 pm

Thank you for your response, Thomas. Yes, we are using an awk filter at this time as a work-around. sed would work just as well.

The trouble is that awk drops some of our records because of non-printable characters, which we would much rather handle further upstream. This is why I was looking for some native functionality.

Perhaps sed will not drop these records - we'll test this out.

chulett · Post by **chulett** » Thu Sep 21, 2017 7:34 am

Still applicable in an RCP scenario, though?

Thomas.B · Post by **Thomas.B** » Fri Sep 22, 2017 1:46 am

Yes, you just need to disable it in the Sequential File and the Transformer stages, activate it in the Column Generator and the output will represent the schema file.

gaelynalmeida · Post by **gaelynalmeida** » Thu Oct 05, 2017 9:28 am

Thank you for all the excellent answers - we are pretty far gone down our development path, so hard to turn back and add another job to the flow.

For now, I think the filter is our best option.

But the other options are good to know for future reference

ray.wurlod · Post by **ray.wurlod** » Thu Oct 05, 2017 11:13 pm

I would have thought that it is possible, since the data browser has that feature, as does the Sample stage. Why not create a job with a Sample stage that skips some rows and inspect the generated osh?