Hash file behaviour

Archive of postings to DataStageUsers@Oliver.com. This forum intended only as a reference and cannot be posted to.

Moderators: chulett, rschirm

Locked
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Hash file behaviour

Post by admin »

Hi Gurus

Ive got a job here that is showing some odd behaviour when creating a hash
file and I wonder if someone can confirm a suspecion for me.

The job in question has 5 stages: 1 Source, 2 lookups, 1 transform and a
Hash file as a target. It takes data from the source, does two lookups and
builds a hash file target.

While running this job, we notice that there are significant periods when
the job seems to be "idle" as the CPU usage falls to 0%. We suspect that
this is happenning when the hash file index is being created.

Would this assumption be correct and/or is there a way that I can find out
why the CPU usage drops off for such a long period?

We have done all the "normal" things like looking at the performance stats
in Designer and the Directors monitor window, BUT these do tell me enough.

Any comments/suggestions?


--------------------------------------------------------------------
mail2web - Check your email from the web at
http://mail2web.com/ .
admin
Posts: 8720
Joined: Sun Jan 12, 2003 11:26 pm

Post by admin »

What is the source? If its a database then your
answer is that the DS job has exhausted any fetched
data and is waiting on the database to supply more
rows for processing. There is no index in a plain old
hash file, unless youve added secondary indexes and
are using very customized methods with the hash file.

If a DS job is using little or no cpu, then you have
external influences on the ability for the job to run
at 100% cpu utilization. If you have a job that is
seq -> xfm -> hash, you should see 100% cpu usage
unless youre i/o bound. If this is the case, then
you most likely are suffering contention for either
disks/controllers/both. This is easiest proved if
Oracle is using the same resources as DS. A query
fired off will impact DS job performance because of
the congestion.

Now, if youre doing remotedb -> xfm -> hash, youre
waiting on the db. If youre doing remotedb ->
xfm+localdblookups -> hash, you cant tell if youre
waiting on source data, if your local db lookups are
slow, or if your local db lookups are congesting the
very resources your job needs.

Youre probably discovering why small, modular jobs
are the easiest to tune. If you have a transformation
job that just uses sequential files and hash files,
youll see your jobs run a cpu at 100%.

Good luck!
-Ken


--- "Tom@KnowledgeNetCo.com"
wrote:
> Hi Gurus
>
> Ive got a job here that is showing some odd
> behaviour when creating a hash
> file and I wonder if someone can confirm a suspecion
> for me.
>
> The job in question has 5 stages: 1 Source, 2
> lookups, 1 transform and a
> Hash file as a target. It takes data from the
> source, does two lookups and
> builds a hash file target.
>
> While running this job, we notice that there are
> significant periods when
> the job seems to be "idle" as the CPU usage falls to
> 0%. We suspect that
> this is happenning when the hash file index is being
> created.
>
> Would this assumption be correct and/or is there a
> way that I can find out
> why the CPU usage drops off for such a long period?
>
> We have done all the "normal" things like looking at
> the performance stats
> in Designer and the Directors monitor window, BUT
> these do tell me enough.
>
> Any comments/suggestions?
>
>
>
--------------------------------------------------------------------
> mail2web - Check your email from the web at
> http://mail2web.com/ .
>
>


__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
Locked