ProfileStage

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
ccatania
Premium Member
Premium Member
Posts: 68
Joined: Thu Sep 08, 2005 5:42 am
Location: Raleigh
Contact:

ProfileStage

Post by ccatania »

I have a quick ProfileStage question concerning Cardinality count. Is there a limit on the number of rows displayed for this field?
I did a Column Analysis on a Master file of 532,037 rows; the key field is Item Code, which is unique on the mainframe.
The Analysis Result report shows for Cardinality Count 60,000 rows, the Uniqueness indicator is 100.
The results I expected would be to see a Cardinality count equal to the number of Rows in distribution. I verified my source and there are no duplicated occurrences.

:?: :?: :?:
Charlie
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I don't know the answer to your question, but I agree with your expectation. Did you have a sample size set anywhere that might have limited the reported number?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Basicly ProfileStage uses a sample collected along the way to the later stages (I don't recall exactly which it is right now).
In the "Tools>Options>ProfileStage Options" menu click on the ProfileStage options tab and increase the DistributionAnalysisLimit from it's 60k default to accomodate the max (or higher) number of row you have in order to do the full heavy processing on all rows.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
ccatania
Premium Member
Premium Member
Posts: 68
Joined: Thu Sep 08, 2005 5:42 am
Location: Raleigh
Contact:

Post by ccatania »

I found the setting under the Tool drop down menu as Roy indicated, the default setting was 60000. I decided to keep this default setting to not impact performance. If PS shows a 100% for the Uniqueness Indicator that is alright by me. Now that I know of this setting I feel confident that the result that PS returned is valid.

Thanks again for your assistance :D
Charlie
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

well there are sevaral options of sampling methods available and a person a bit familiar might decide to set a specific one that suites that case best.
Bare in mind that sampling can't garantee 100% so be prepared for an occassional extreeme case not covered.
I would reccomend notifying the ones making the decisions if your not one of them regarding this and let them decide.
some thimes they will want 100% coverage of rows knowing it will slow the entire process down and way heavy duty processig doing so.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Post Reply