Balance Optimization fails

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
audev
Participant
Posts: 11
Joined: Wed Jul 18, 2018 7:25 am

Balance Optimization fails

Post by audev »

Hello,

I have a job datastage that cannot be optimized: in other words the optimized is equal to the original.

In the Optimization logs I find 2 warnings that I'm translating from Italian, that state:

WARNING: It is not possible to verify the sorting keys in the stage Read_table_1. The pattern is abandoned. (ORIGINAL: Impossibile verificare le chiavi di ordinamento nello stage Read_table_1. Il pattern viene abbandonato.)

WARNING: It is not possible to generate combined query instructions for the link link_to_rem_dup, link_to_transformer(ORIGINAL: Impossibile generare istruzioni query combinate per i link link_to_rem_dup, link_to_transformer. Il pattern viene abbandonato.)

Properties of job optimization: isGenOrdBy,PushProcToSrcs,PushProcToTrgts,PushAllToDB

This is how the job design looks like:

Image

- the teradata connector simply read from a table join with another one (no order by clause defined)
- the remove duplicates stage
a) stage tab: uses field_1 to define duplicates with options "case insensitive" and "nulls last" defined. field_1 is set with option "duplicate to retain " = first
b) input tab: sorts data with hash partition. 4 fields are used to sort data
- the teradata connector and the remove duplicate stage have the same primary keys definition

Does anyone know why this cannot be optimized? Do the warnings suggest you anything in specific?
Thank you very much,
audev
Participant
Posts: 11
Joined: Wed Jul 18, 2018 7:25 am

Balance Optimization fails to create query

Post by audev »

Apparently I had not yet tried the most obvious solution, which was to insert a sort stage bewtween the teradata connector and the remove duplicate stage.

This is a step forward. but now have another problem. The query that is built, is wrong. The syntax is wrong.

Considering that
1) this is an attempt to modify an existing job that optimizes correctly
2) the original has several join stages (not shown in the picture of the design)
3) the modified one (the one I'm currently dealing with) has 1 join stage more than the original, 1 lookup stage less

is there a limit of joins that the optimizer can successfulyl handle or, as I suppose, it depends only on the job design?
I'm not talking about advanced option limit, but rather about a possible limit after that the optimizer just cannot handle properly.

This is the full job design

Image

Example of what I mean when I say that the syntax is wrong. There are several repeated table names in the query, that looks like:

[...] from DB_NAME.TABLE_NAME TABLE_NAME

Any suggestion is highly appreciated.
Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That doesn't look like incorrect syntax to me, that's just the table_name being used again as an alias. Does it in fact error when it runs? Not at all familiar with Teradata but for others like Oracle that would be perfectly fine syntax.
-craig

"You can never have too many knives" -- Logan Nine Fingers
audev
Participant
Posts: 11
Joined: Wed Jul 18, 2018 7:25 am

Balance Optimization fails to create query

Post by audev »

Thanks chulett, my mistake. After developing many prototypes, some of which actually failed to create a correct query, I assumed that the repetition was an error, while it is just the alias. I got confused. (plus, recently there was a problem with the optimization that was verified by our vendor, so I fell into the trap of the confirmation bias)

Now, I tried again and it does not give any error.
I would consider this a solution.
Thanks again
mouthou
Participant
Posts: 208
Joined: Sun Jul 04, 2004 11:57 pm

Post by mouthou »

Ofcourse if you want Optimize suggestion from DS, yours is the way. Just so you know, have you thought of doing the sorting and duplicate removals in source Teradata SQL itself if you have the liberty to modify the SQL.

There are ORDER BY NULLS FIRST and RANK() functions which satisfy your criteria.
audev
Participant
Posts: 11
Joined: Wed Jul 18, 2018 7:25 am

Balance Optimization fails to create query

Post by audev »

Hi mouthou, thank for your suggestion. I did not consider this solution because I'm not very familiar with that, but I will defintely will in the future, when time will be on my side.
Thanks
Post Reply