Json File 717MB unable to import in Datastage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Vrisha
Premium Member
Premium Member
Posts: 60
Joined: Sat Jul 15, 2017 9:32 pm
Location: Texas,USA

Json File 717MB unable to import in Datastage

Post by Vrisha »

Scenario- I have a Json files with size of 717 mb and above.
When I trying to do the 'Import New Resource' using the below , it throws an error of 'TaskTime'.json UPLOAD FAILED.Internal Error.

Using Schema Library Manager ---> New library option -->> TaskTime-Import New resource---> Select the file 'Tasktime.json' ---Throws an error.

I didn't find any problem in importing the schema for file size less than 717MB. I checked with the application team and they told that the file is in correct format.

Is there any file size limit in Datastage for Json files? Is there any option to increase/ set the file size limit in Datastage.

Please let me know.Thanks.
Suja
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

"Import new resource" is only a metadata import function...... the Job itself, once you have it coded and developed, can handle massive JSON documents. Just cut down the json document that you have so that you have a sample with at least "one complete set of nodes" (with at least one instance of all the properties you desire to parse or create). Do the import --- it will dynamically create the schema that you need, and continue to develop your Assembly.

You don't need to "import" that large JSON document, and it is quite unlikely that the 700+ meg is entirely unique instances of the properties, only one of each. More likely, your 700+ meg document is a full document, complete with many repeating sets of actual data, transactions, etc.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Vrisha
Premium Member
Premium Member
Posts: 60
Joined: Sat Jul 15, 2017 9:32 pm
Location: Texas,USA

Post by Vrisha »

Thanks for your prompt response, Erni. I will get back to you on this.
Suja
Vrisha
Premium Member
Premium Member
Posts: 60
Joined: Sat Jul 15, 2017 9:32 pm
Location: Texas,USA

Post by Vrisha »

Hi Ernie,

While still waiting for my membership upgrade to Premium member, I want to clarify the below

Instead of using the Schema library manager to import the metadata for JSON -TaskBaseline (as it is failing due to file size), I used the assembly editor in Hierarchical data stage to create the output columns by pointing to the same name columns in another Task\root. (Task library has the same name columns except 'TimeByDay' which is required in the TaskBaseline module)

I mapped the TimeByDay(TaskBaseline) to FinishDate column(Task) in output mapping of assembly editor.

But while running the job, all the records got dropped with an warning saying that incoming TimeByDay is '0'. But the Task.Json have the date values in TimeByDay.

So I tried to export the TaskBaseline.json which has similar structure as Task.json and renamed the element name like below and saved it as TaskBaseline.json

From
<xs:element name='FinishDate' minOccurs='0' nillable='true' type='xs:string'/>

To
<xs:element name='TimeByDay' minOccurs='0' nillable='true' type='xs:string'/>


While trying to 'import the new resource' using the new Task.json file, it is throwing the error below

'TaskBaseline.json' UPLOAD FAILED. Unexpected character, location = 0, value = '<'

What could be the reason for this. Please let me know
Suja
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Cut down your 717meg json to one set of properties and imoport it. Your should be fine. The result below is xml and not json, so I cant speculate.on what the error might be after changes.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Vrisha
Premium Member
Premium Member
Posts: 60
Joined: Sat Jul 15, 2017 9:32 pm
Location: Texas,USA

Post by Vrisha »

Thanks for your reply, Erni.

What do you mean by ' cut down the 717mb file to one set of properties'. Sorry I didn't understand. Please let me know.
Suja
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's no reason to import the whole dang thing. Cut the size down, create a new file with just a single set of properties, a single example of the data and then import that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Vrisha
Premium Member
Premium Member
Posts: 60
Joined: Sat Jul 15, 2017 9:32 pm
Location: Texas,USA

Post by Vrisha »

Thank you, Chulett. Got it, I will try and get back with the result
Suja
Vrisha
Premium Member
Premium Member
Posts: 60
Joined: Sat Jul 15, 2017 9:32 pm
Location: Texas,USA

Post by Vrisha »

The problem is resolved.

As mentioned by chulett and Ernie, I took 1 record out of huge json file
(277775 records) with metadata structure like below and saved in as TaskBaseline.json
---------------------------------------------------------------------------------------
[{
"__metadata": {
"id": "https://dartorg.sharepoint.com/sites/pw ... A00\u0027)",
"uri": "https://dartorg.sharepoint.com/sites/pw ... A00\u0027)",
"type": "ReportingData.TaskBaselineTimephasedData"
},
"Project": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... 7)/Project"
}
},
"Task": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... 0027)/Task"
}
},
"TaskBaselines": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... kBaselines"
}
},
"ProjectId": "59f88501-bd76-e711-80cb-eba85625f89a",
"TaskId": "ebf88501-bd76-e711-80cb-eba85625f89a",
"TimeByDay": "\/Date(1502928000000)\/",
"BaselineNumber": 0,
"ProjectName": "161 - Proterra Window Decals",
"TaskBaselineBudgetCost": "0.000000",
"TaskBaselineBudgetWork": "0.000000",
"TaskBaselineCost": "0.000000",
"TaskBaselineFixedCost": "0.000000",
"TaskBaselineModifiedDate": "\/Date(1501595954740)\/",
"TaskBaselineWork": "16.000000",
"TaskName": "161 - Proterra Window Decals"
}]

----------------------------------------------------------------------------------

Then I imported the metadata using the schema library manager by pointing to the new small (TaskBaseline.json) file.
Then in Edit Assembly of Hierarchical datastage -'Json source' pointed to huge Json source file in 'Single file' option and did the mapping.

The job ran fine without any error

Thank you for support, chulett and Ernie.
Suja
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Congrats! Note that, for others reading this in the future, be sure in your cut down version to at least include 2 of any repeating subnodes so that the metadata interpreter k ows that those subnodes repeat and should be defines as a list.

Good work!

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply