I want Hashes in PX!

Do you have features you'd like to see in future releases of DataStage, MetaStage, Parameter Manager, Version Control or one of the other tools represented on this forum? Post your ideas here!

Moderators: chulett, rschirm

Post Reply
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

I want Hashes in PX!

Post by clshore »

Frankly,
I'd like to see hash file functionality made available to PX jobs. There's nothing magical about a hash file, plenty of C++ solutions 'off the shelf'.
I'm not necessarily talking about Universe hashes, although that would be nice, giving interoperability between Server and PX jobs.
I'm talking about the high level interface behavior, ie key columns, data columns, fast access, lookups in the Transformer stages, etc.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I agree 150%. Unfortunately, the PX paradigm is to never land data. However, a common reference should first be put into a PX dataset whereby it can be reused as a reference by many shared jobstreams rather than regenerate it each and every time. The dataset becomes like a hash file in those respects, except that you can't modify the contents of a dataset. You could create a new dataset, but that kind of gums up the works.

Also, the shared nothing idea means that a "sandbox" is not a feasible concept. You have to either 100% partition the data and all referenced datasets alike, or deploy 100% of referenced data to all nodes.

What it boils down to is DataStage Server gives you the option of never landing the data, just like PX. Except, DataStage Server has the elegant sandbox usage of the hash file if you so desired. PX has a very limited sandbox usage in the dataset, so if you're looking for that approach PX, maybe expanding the dataset functionality is an option.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
bigpoppa
Participant
Posts: 190
Joined: Fri Feb 28, 2003 11:39 am

I want Hashes in PX!

Post by bigpoppa »

FYI.. back in the day, the output of a OSH step (a smaller unit of work than a PX job) could be put into a virtual dataset. The virtual dataset was kept in memory and could be used as an input to multiple OSH steps within an OSH 'job'. Unlike a server hash file, it couldn't be used by other jobs.

Maybe ASCL should bring back the concept of the virtual dataset.

-BP
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Ascential springs from roots in data warehousing. Back in the 90's, V-Mark acquired Prism, the Inmon company. That's where Prism Warehouse Executive ETL tool went, it became DS/390. Now the company became Ardent software, and they sold and continue to sell data warehouse methodology. Then, Informix bought Ardent. Informix had just acquired RedBrick, which was Ralph Kimball's company and his high-performance data warehouse database Redbrick as well as their ETL tool (can't remember, was it called Evolution/Revolution/Evolve, something like that).

So there sat Informix, the ultimate merger of Inmon and Kimball technologies, methodologies, and ETL tools. Then, Ascential was spunoff and then so on until today.

So you go over to Intelligent Enterprise and the readers have voted Kimball their favorite dude.
http://www.intelligententerprise.com/03 ... a1_1.shtml
The guy also writes the single most best selling book on data warehousing, the original "The DataWarehouse Toolkit", not the newer Lifecycle yada yada.

So if you actually read Kimball's Lifecycle book, he devotes an entire chapter (16) to the concept of a data staging sandbox. He culminates with a typical process organization for a nightly ETL run on page 651. So, now you compare this with a methodology mindset with PX to never land data. You have to wonder, has Kimball fallen by the wayside? Apparently, this is a bone of contentation of mine between applicability of PX technology in a Kimball framework. I know Ascential is reading these posts, hopefully someone will go read some of these pages and can explain why Kimball is wrong or outdated.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

Since this is a wish list, I want both!
I want 'in memory', and persistent 'landed' storage.
Perl has libraries that provide this, with memory resident arrays and hashes that automatically persist to files without explicit developer direction, but still expose the methods for tuning the cache size, refresh interval, cache method (lru, round-robin, etc.), flushing, etc.
So I want it.
iwantit iwantit iwantit iwantit iwantit iwantit iwantit iwantit iwantit !

Carter
Post Reply