Horizontal Pivot

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
basic
Participant
Posts: 3
Joined: Wed Dec 10, 2003 3:54 am

Horizontal Pivot

Post by basic »

Hi all,

Here's the requirement that i'm facing:

I have an input file with two columns :

For example, the first column contains :
"a"

The second column contains :

"1col1|2col2|3col3".

Do you know how i can do in the "derivation" of my pivot to have in my output file :

"a;1col1"
"a;2col2"
"a;2col3"

(use the "|" in the derivation???)

Thanks in advance,

:oops: cool
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Use the Pivot stage to do this. It shouldnt be that hard. Try to come up with a solution, then go on from there.
PS: Pivot Stage is a very sensitive stage to use and it totally positional. It does what you say to do.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Another option.
User transformer to Split the second column into three.

Code: Select all

In.Column[1,INDEX(In.Column,'|',1)-1] 
for the first link

Code: Select all

In.Column[INDEX(In.Column,'|',2),(INDEX(In.Column,'|',2)-INDEX(In.Column,'|',1)-)]
for the second link

Code: Select all

In.Column[INDEX(In.Column,'|',3),(Len(In.Column)-INDEX(In.Column,'|',3))]  
for the second link.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
basic
Participant
Posts: 3
Joined: Wed Dec 10, 2003 3:54 am

Post by basic »

kumar_s wrote:Another option.
User transformer to Split the second column into three.

Code: Select all

In.Column[1,INDEX(In.Column,'|',1)-1] 
for the first link

Code: Select all

In.Column[INDEX(In.Column,'|',2),(INDEX(In.Column,'|',2)-INDEX(In.Column,'|',1)-)]
for the second link

Code: Select all

In.Column[INDEX(In.Column,'|',3),(Len(In.Column)-INDEX(In.Column,'|',3))]  
for the second link.
thanks for you help kumar_s, but in my job i must not to know the number of columns in my output file, i must to create a output file with a number of columns dynamically. That's why, i think the pivot is the best solution but i don't know what i put in the derivation...???
basic
Participant
Posts: 3
Joined: Wed Dec 10, 2003 3:54 am

Re: Horizontal Pivot

Post by basic »

basic wrote:Hi all,

Here's the requirement that i'm facing:

I have an input file with two columns :

For example, the first column contains :
"a"

The second column contains :

"1col1|2col2|3col3".

Do you know how i can do in the "derivation" of my pivot to have in my output file :

"a;1col1"
"a;2col2"
"a;3col3" //correction


(use the "|" in the derivation???)

Thanks in advance,

:oops: cool
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Re: Horizontal Pivot

Post by Ultramundane »

I don't know how to do this with pivot stage.

However, you could:
Use a sequential file stage with a delimeter that does not occur in the file. Maybe \xF8 Also, set Quote to none. Store this in one column. Example col1 varchar without any length.

Send col1 down a link to an external filter. Run this code in the filter.

Code: Select all

## AUTHOR: Ryan T. Putnam

awk 'BEGIN {FS=";"; OFS=";";}
{
   START=0;
   LEN=index(substr($2, START), "|");
   START=START + LEN + 1;

   if (( LEN == 0 ))
   {
      print $1,$2;
   }
   else
   {
      print $1,substr($2, 1, LEN -1);
      while (LEN > 0)
      {
         LEN=index(substr($2, START), "|");
         if (( LEN > 0 ))
         {
            print $1,substr($2, START, LEN -1);
         }
         else
         {
             print $1,substr($2, START);
         }
         START=START + LEN;
      }
   }
}'
Send the filtered results down a link to the column import. Create an output link to the destination. In the column import stage, set the delimeter to ";" and Quote to none. Also, create the columns in the output that you need based upon the assignment names.
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

basic wrote: That's why, i think the pivot is the best solution but i don't know what i put in the derivation...???
Hi,

In the derivations write the column names seperated by comma.

if u have 4 columns (Key1,Col1,col2,col3) then
in the output create "Key1" and "CombinedColOp" fields.
In derivation of "CombinedColOp" write the columns names seperated by coma i.e. "Col1,col2,col3"

Regards
Senthil
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

A much simpler awk script that that listed above would achieve what you need. Have a look at this thread. There is a reply that I contributed that is very close to what you are after. Tweak the awk script in that and you will have what you need. Suggestion: if you end up with an awk script with more than about 3 lines as a solution for this, you are probably not doing it very efficiently.

David
(Previously known as D)

Be alturistic and donate your spare CPU cycles to research. http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Post by Ultramundane »

I will say that this 3 liner below is much easier to read, but is much slower. It takes 1.5 seconds for a 70000 record test I ran vs. the longer version which takes only 1 second. The mulitple delimeters have my head spinning.

Code: Select all

## Author Ryan T. Putnam
awk 'BEGIN {FS=";"; OFS=";";}
{
  TO=split($2, ARRAY, "|");
  if ( TO == 0 ) { print $1 OFS; }
  for (i=1;i<=TO;i++) { print $1,ARRAY[i]; }
}'
I am not sure I coded this with 100% efficiency (as it runs slower than the previous version), but I did want to comply with the 3 line rule.
Ultramundane
Participant
Posts: 407
Joined: Mon Jun 27, 2005 8:54 am
Location: Walker, Michigan
Contact:

Post by Ultramundane »

DJM's solution he provided earilier is much faster. I find it more difficult to read, but it runs in .5 seconds. I think he was being modest by saying 3 lines because his solution will fit on one line very nicely.

DJM's solution modified by Ryan.

Code: Select all

awk 'BEGIN {FS=";"; OFS=";";} { gsub(/|/, "\n" $1 OFS); print $0; }'
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

Ahh, but my programming layout style would have laid out your "core" code like so:

Code: Select all

{
  gsub(/|/, "\n" $1 OFS);
  print $0;
}
for maintainability reasons.

I don't know whether that makes mine 2 or 4 lines! By the way, nice implementation of a "leave this as an exercise for the reader" :D

David
(Previously known as D)

Be alturistic and donate your spare CPU cycles to research. http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1
Post Reply