DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
samyamkrishna



Group memberships:
Premium Members

Joined: 04 Jul 2006
Posts: 256
Location: Toronto
Points: 1577

Post Posted: Thu Nov 26, 2015 2:44 pm Reply with quote    Back to top    

DataStage® Release: 8x
Job Type: Parallel
OS: Unix
Additional info: I am new to Qualitystage
Hi All,

One of our ETL batch is running for 8 hours.
This has 6 Match jobs in it.

The match uses Unduplicate match and the matches are for Individual, Org , Address, Phone etc.
Each of them run for more than hour.

But the Frequency file is generated from a row generator with 1 row and all the columns thats required for all the six Match Jobs.

My question is.

If we create actual frequency files using the data instaed of a row generator.
The frequency file will have more details than the present frequency file.

Will this help in improving the performance of the match jobs because it has actual data rather than a dummy frequency file?

Regards,
Samyam
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54535
Location: Sydney, Australia
Points: 295717

Post Posted: Fri Nov 27, 2015 2:47 pm Reply with quote    Back to top    

Define what you mean by "performance" in this context. Certainly generating frequencies will generate more accurate results (for a large enough sample) than an artificially flat frequency distrib ...

_________________
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rate this response:  
Not yet rated
samyamkrishna



Group memberships:
Premium Members

Joined: 04 Jul 2006
Posts: 256
Location: Toronto
Points: 1577

Post Posted: Fri Nov 27, 2015 2:52 pm Reply with quote    Back to top    

Hi Ray,

I bought the premium membership yesterday 26th Nov.
But i am still not able to see the content you posted usder premium.

I got a mail from rick stating that i will get another mail of confirmation.
But how long do you think it will take to get this membership.

Should i also contact editor@ liek in one of the recent posts.

Regards,
Samyam
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54535
Location: Sydney, Australia
Points: 295717

Post Posted: Fri Nov 27, 2015 2:54 pm Reply with quote    Back to top    

Wait till the weekend is over.

It won't do any harm to contact editor@dsxchange.net, but these people actually have a life as well as running DSXchange.

_________________
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rate this response:  
Not yet rated
stuartjvnorton
Participant



Joined: 19 Apr 2007
Posts: 523
Location: Melbourne
Points: 3890

Post Posted: Sun Nov 29, 2015 10:46 pm Reply with quote    Back to top    

Define "performance". Number of matches, "quality" of matches (lower false positives / negatives), execution time?

As Ray said, the quality of score may improve a little by using accurate frequency data.

As for execution time, the number of records and number of match passes would be the first things to look at for understanding how much time is reasonable to expect it to take, and where you might be able to improve the times.
Also note that it takes time to create match frequency files.
Rate this response:  
Not yet rated
rjdickson
Participant



Joined: 16 Jun 2003
Posts: 378
Location: Chicago, USA
Points: 2531

Post Posted: Mon Nov 30, 2015 9:00 am Reply with quote    Back to top    

Take a look at your match specification. If you are using overrides for every column, then generating frequencies will not matter as the overrides can take priority.

Is the original question based on curiosity, or are you having quality issues in your matching?

_________________
Regards,
Robert
Rate this response:  
Not yet rated
samyamkrishna



Group memberships:
Premium Members

Joined: 04 Jul 2006
Posts: 256
Location: Toronto
Points: 1577

Post Posted: Mon Nov 30, 2015 12:20 pm Reply with quote    Back to top    

Stuart,

I am worried about the execution time.
Thanks for giving those hints on what to look at.

Will look at them to get to a conclusion.


rjdickson,

Yes there are overides. but i am not sure if its for all the columns.
will check that too.

The question is not based on curiosity. we are having issues with the run times due to a short execution window on production.

_________________
Cheers,
Samyam
Rate this response:  
Not yet rated
rjdickson
Participant



Joined: 16 Jun 2003
Posts: 378
Location: Chicago, USA
Points: 2531

Post Posted: Mon Nov 30, 2015 12:25 pm Reply with quote    Back to top    

The most frequent cause of bad match performance is (arguably) blocking fields that are too 'loose' (include too many candidate records).

Do you know what pass is causing issues? (Blocking fields are per pass).

The next thing you can look at is the job design. I would assume there is some sort of read from a database for the reference link. Does that read have a 'where' clause, and if so, is the column(s) used in the where clause indexed?

_________________
Regards,
Robert
Rate this response:  
Not yet rated
samyamkrishna



Group memberships:
Premium Members

Joined: 04 Jul 2006
Posts: 256
Location: Toronto
Points: 1577

Post Posted: Fri Dec 04, 2015 3:21 pm Reply with quote    Back to top    

I dont have access to director on Prod.
Try to get the access.

Will post my findings once i get hold of the logs.

_________________
Cheers,
Samyam
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours