Page 1 of 1
Datastage Parallel schema file - indicate # of rows to skip
Posted: Thu Sep 14, 2017 10:37 am
by gaelynalmeida
Hi,
We have a job that reads a flat file, and loads it to a target using RCP. The sequential file stage reads a schema file to determine the layout.
Some files may have headers, some may not. I do not seen an option to set this at run time in the sequential file stage. The value setting for "First line is column names" is a drop down that has a True or False value, and I cannot override with a parameter
Question: Can I specify, somehow, the number of rows to skip? Can this be done in the schema file? I have not found this in any documentation so far.
Thank you
G. Almeida
Posted: Thu Sep 14, 2017 11:28 am
by chulett
While Informatica has a "number of records to skip" for flat files, I don't recall DataStage having anything other than the "First line is column headers" true/false option. Perhaps a question for your official support provider?
Posted: Tue Sep 19, 2017 5:53 am
by Thomas.B
You could use the "Filter" option of the Sequential File stage to skip the first rows from a file, for example :
will skip the first 3 rows.
Posted: Tue Sep 19, 2017 4:21 pm
by gaelynalmeida
Thank you, Craig .. yes, we will reach out to IBM to see what they say. In the mean time, we are using a filter.
Posted: Tue Sep 19, 2017 4:23 pm
by gaelynalmeida
Thank you for your response, Thomas. Yes, we are using an awk filter at this time as a work-around. sed would work just as well.
The trouble is that awk drops some of our records because of non-printable characters, which we would much rather handle further upstream. This is why I was looking for some native functionality.
Perhaps sed will not drop these records - we'll test this out.
Posted: Thu Sep 21, 2017 7:34 am
by chulett
Still applicable in an RCP scenario, though?
Posted: Fri Sep 22, 2017 1:46 am
by Thomas.B
Yes, you just need to disable it in the Sequential File and the Transformer stages, activate it in the Column Generator and the output will represent the schema file.
Posted: Thu Oct 05, 2017 9:28 am
by gaelynalmeida
Thank you for all the excellent answers - we are pretty far gone down our development path, so hard to turn back and add another job to the flow.
For now, I think the filter is our best option.
But the other options are good to know for future reference
Posted: Thu Oct 05, 2017 11:13 pm
by ray.wurlod
I would have thought that it is possible, since the data browser has that feature, as does the Sample stage. Why not create a job with a Sample stage that skips some rows and inspect the generated osh?