Combining multiple XSD's

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
irvinew
Participant
Posts: 15
Joined: Mon Jun 18, 2018 8:52 am
Location: Regina, SK
Contact:

Combining multiple XSD's

Post by irvinew »

I hope someone can help me here.

I have 4 xsd documents that are interlinked. Their names are:

1)AcademicRecord_v1.9.0
2)AcademicRecordBatch_v2.1
3)CoreMain_v1.14.0
4)HighSchoolTranscript_v1.5.0

Crazy I know....


I have a test xml file that imports the "AcademicRecordBatch" and the "HighSchoolTranscript" files into it; I can post the xsd code of the other xsd files; sufficy to say the "CoreMain" and "AcademicRecord" are mixed into the aforementioned files.

I am trying to use the hierarchical stage; with one parser for the AcademicRecordBatch and another for the HighSchoolTranscript. My problem is that if I don't use the Union step everything hangs; if I do use the union step; I don't have any other XSD document to map the parsed output too.

I can one one parser stage at a time; everything seems to map fine; but no rows are produced by the hierarchical stage.


I'm kinda at a loss here; what is the way to read multiple xsd files to parse 1 xml file? Nothing seems to want to work. I did find some literature on how to do this; but that documentation had 2 parser stages mapping to yet another XSD file; of which I don't have.

What is the preferred way to do what I am trying to do? I can supply the XSD documents later if need be; does anyone have a good template project to go by?

Any help is appreciated.
Will
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Combined xsd's is the norm. So that should be fine....at least one or more of them probably has an import or include statement. Chances are they are all four intertwined.

Were they in a single zip file? You can import the whole zip file into a single library entry....or import each resource. Ultimately, they should result in a single related "Tree" within the Library Manager. Be sure that you have that result.

The Hierarchical Stage has a steep learning curve. Keep at it. Things like "no rows retrieved", etc. are normal when first working with it. Only try to get a simple list, such as the repeating instances of (say) the very first repeating node under the root.

But let's ask first what you need to do with this xml/xsd. Are you just reading xml documents? ...or do you need also to write them? ...or only write them? ...or both?

How much data do you have? Meaning --- how "big" are the xml documents that you are creating, or reading (how big is a typical single document)? How many of them do you have to read in a single run? 3, 10, 10000?

How deep are you going into the document? Which "paths"? This will be important to know regardless of your approach. When you open a document instance itself, which nodes, sub-nodes and sub-sub-sub nodes are you trying to retrieve.....

Have you validated your test documents? Something useful to do, regardless of this activity. There are many xml "validation" tools out there that will formally validate your document against the xsd. That is often a reason why you will get zero rows, although there could be mapping issues also.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
irvinew
Participant
Posts: 15
Joined: Mon Jun 18, 2018 8:52 am
Location: Regina, SK
Contact:

Post by irvinew »

eostic wrote:Combined xsd's is the norm. So that should be fine....at least one or more of them probably has an import or include statement. Chances are they are all four intertwined

The Hierarchical Stage has a steep learning curve. Keep at it. Things like "no rows retrieved", etc. are normal when first working with it. Only try to get a simple list, such as the repeating instances of (say) the very first repeating node under the root.
I have all 4 XSD documents imported fine; the schema manager doesn't cry when I do that.

I did finally get some things to work; when drilling down into what table definitions were created I found that there were alot of odd name for my created data types; I figure it was because of the depth of nested types in the XSD document.

So my rookie mistake was to think that ETL could automatically map that to my test file. I didn't bother looking at what table definitions were created.
On that subject; I just took my sample test file that will be supplied by the department of education and created table definitions from that. I then used the XSD document to validate the XML and map the that test document to definitions created from the same XML documents.

In my examples I found they wanted to use a extra XSD document in the union stage to map to... I don't have that. I have 2 hierarchical stages since there are 2 referenced XSD documents in my test XML; the head XML document called "AcademicRecordBatch" is not really needed; as indicated in the comments; all the data in that is public data describing who is sending the document and who is receiving; it doesn't involve students. So really I could just delete one of them.

What I am doing is going to read an xml file in; that file contains student transcript data and from there I will update our transcript tables accordingly.

I don't know how deep or how many records will be in one xml submission to the next.

Once it is clear I really don't have to use a Union layer in the hierarchical stage then I am off to the races.


Thanks for your help.
Will
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Here are some other guidelines to research/consider/think-about/play-with/test....

1. For any given output link, you can pull in everything from one "path" of your hierarchy. So if you have (say) school, with department nodes under that, and student nodes under that, and classes under that and grades under that, they could all be retrieved on one output link. BUT, if you also have an "equipment" node under school, then school + equipment would be on its "own" output link.

2. ALWAYS start your Assembly development and mapping with the LOWEST level node that you care about (at that final mapping step). ...meaning...in the example above, if I want to get "grades", I find the blue "list icon" for the grades node, and map that immediately to the blue "output link" icon. Get some elements and attributes from that node mapped to some output columns and THEN, once you have tested that and it is working, start adding "parent" and "grandparent", etc. node columns to that mapping.

3. ALWAYS (especially when first using the Stage) do your mapping by using the "more...." option, and then expanding the little tree that you get. It will keep things straight. The "suggested" mappings you see are nice, but can get you into trouble with schemas you aren't familiar with or when first using the Stage.

4. Spend some time with the XML Schema Views feature in the Library Manager. This was added a few years ago but doesn't get used much. It allows you to basically "chop down" the schema. Focus only on the trees and nodes you care about. It DRAMATICALLY impacts performance, especially for HUGE schemas.

5. Caveat for notes number 1 and 2.....You can only have one path per output link, and you need to start your mapping action at the lowest node level you want..... BUT..... be aware that the Stage expects the paths to be "complete". This is kind of like normal JOIN defaults for relational databases and standard SQL. If you do a JOIN, but there are no "matches", you won't get any of the parents. Same here. If you, for example, define the link to get all the "Grades" in my example above, but you have a node in the xml for a student who didn't get any grades, or maybe a department node that has no students, you will NOT retrieve that department on that output link......... and would need a separate output link that is only as deep as "that" parent or grandparent node in order for it to be retrieved. This condition depends largely on the application and the xsd in question. Often, schemas require instances to be present, and it becomes a non-issue...but something to be aware of.

6. Keep the size in mind. If these are not huge documents or don't have thousands of elements, or don't need fancy validation, consider the xmlInput Stage instead. Simpler and for small documents, may be faster. ...and xmlInput Stage does not have the issue with point #5.

...and... make use of this invaluable redbook:

http://www.redbooks.ibm.com/redbooks/pdfs/sg247987.pdf


Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply