Ignore columns from a sequential file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

rmcclure
Participant
Posts: 48
Joined: Fri Dec 01, 2006 7:50 am

Ignore columns from a sequential file

Post by rmcclure »

Hi,

I am trying to read records from a sequential file to process and write to a table.

The problem is the sequential file is a global company file that generated by a group to be used by multiple divisions. I only need the first 70 columns of about 90. When I try to run the job I get a warning that:
"Import consumed only 1968bytes of the record's 2073 bytes (no further warnings will be generated from this partition)"
I could add 20 extra columns and make this warning go away but the number of columns in this file could change if another division requests additional columns.

How can I read a sequential file and just take the first 70 columns without getting warnings.

The file is:
Delimiter = comma
Null field value = ''
Quote = double
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Worst case you can use a Server Sequential File stage in a Server Shared Container as it has an "Ignore Truncation" option from what I recall. Unless there is something similar in the Parallel version somewhere?
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Another option could be to use External Source stage with Source Program

Code: Select all

cut -d',' -f1-70 <INFILE>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... or that in the Filter option of the Sequential File stage.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... or the Drop on Import property for columns in the Sequential File stage itself.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

But wouldn't you have to define them so it knows what to drop? Or does it drop any not defined?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

chulett wrote:But wouldn't you have to define them so it knows what to drop? Or does it drop any not defined?
Yes, you do have to define them.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ah... which they don't want to do.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The columns could be generically named for the purposes of this exercise.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rmcclure
Participant
Posts: 48
Joined: Fri Dec 01, 2006 7:50 am

Post by rmcclure »

The cut command did not work because the sequential file has descriptions in quotes and some of those description have commas. Others do not.

I ended up adding the generic columns. If they add more columns in the future I will need to modify my job.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You didn't try the Server stage? It would not require any modifications when / if the file changes.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

Another option ... Use a Unix command to replace the "delimiter commas" with another delimiter that would not occur in data (while ignoring commas within quotes) and subsequently using the replacee delimiter to extract first seventy columns.

See example of using nawk (on Solaris) to replace delimiting commas with pipe(|) and extracting first 70 columns below:

Code: Select all

nawk -F\" 'BEGIN{OFS=FS;} {for(i=1;i<=NF;i=i+2){gsub(/,/,"|",$i);} print $0;}' <infile>| awk -F"|" '{ for(i=1; i<=70; i++) printf("%s|"), $i ;printf("\n") };'
It is possible to merge the nawk and awk commands given above.

I believe that a much simpler/elegant solution can be implemented using Perl.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And I believe that a much simpler/elegant solution can be implemented using the Server version of the Sequential File stage. :wink:

Suppress row truncation warnings. If the sequential file being read contains more columns that you have defined, you will normally receive warnings about overlong rows when the job is run. If you want to suppress these message (for example, you might only be interested in the first three columns and happy to ignore the rest), select this check box.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

I agree. Server job would also be more platform independent and may align better with existing skillset of the shop than awk/Perl ... thus have a lower lifetime cost

Though if OP wants to use Parallel job then Unix/Perl offer a viable solution.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Didn't mean a Server job in this particular case but as mentioned earlier a Server Sequential File stage in a Server Shared Container in their otherwise Parallel job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply