Json File 717MB unable to import in Datastage
Moderators: chulett, rschirm, roy
Json File 717MB unable to import in Datastage
Scenario- I have a Json files with size of 717 mb and above.
When I trying to do the 'Import New Resource' using the below , it throws an error of 'TaskTime'.json UPLOAD FAILED.Internal Error.
Using Schema Library Manager ---> New library option -->> TaskTime-Import New resource---> Select the file 'Tasktime.json' ---Throws an error.
I didn't find any problem in importing the schema for file size less than 717MB. I checked with the application team and they told that the file is in correct format.
Is there any file size limit in Datastage for Json files? Is there any option to increase/ set the file size limit in Datastage.
Please let me know.Thanks.
When I trying to do the 'Import New Resource' using the below , it throws an error of 'TaskTime'.json UPLOAD FAILED.Internal Error.
Using Schema Library Manager ---> New library option -->> TaskTime-Import New resource---> Select the file 'Tasktime.json' ---Throws an error.
I didn't find any problem in importing the schema for file size less than 717MB. I checked with the application team and they told that the file is in correct format.
Is there any file size limit in Datastage for Json files? Is there any option to increase/ set the file size limit in Datastage.
Please let me know.Thanks.
Suja
"Import new resource" is only a metadata import function...... the Job itself, once you have it coded and developed, can handle massive JSON documents. Just cut down the json document that you have so that you have a sample with at least "one complete set of nodes" (with at least one instance of all the properties you desire to parse or create). Do the import --- it will dynamically create the schema that you need, and continue to develop your Assembly.
You don't need to "import" that large JSON document, and it is quite unlikely that the 700+ meg is entirely unique instances of the properties, only one of each. More likely, your 700+ meg document is a full document, complete with many repeating sets of actual data, transactions, etc.
Ernie
You don't need to "import" that large JSON document, and it is quite unlikely that the 700+ meg is entirely unique instances of the properties, only one of each. More likely, your 700+ meg document is a full document, complete with many repeating sets of actual data, transactions, etc.
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Hi Ernie,
While still waiting for my membership upgrade to Premium member, I want to clarify the below
Instead of using the Schema library manager to import the metadata for JSON -TaskBaseline (as it is failing due to file size), I used the assembly editor in Hierarchical data stage to create the output columns by pointing to the same name columns in another Task\root. (Task library has the same name columns except 'TimeByDay' which is required in the TaskBaseline module)
I mapped the TimeByDay(TaskBaseline) to FinishDate column(Task) in output mapping of assembly editor.
But while running the job, all the records got dropped with an warning saying that incoming TimeByDay is '0'. But the Task.Json have the date values in TimeByDay.
So I tried to export the TaskBaseline.json which has similar structure as Task.json and renamed the element name like below and saved it as TaskBaseline.json
From
<xs:element name='FinishDate' minOccurs='0' nillable='true' type='xs:string'/>
To
<xs:element name='TimeByDay' minOccurs='0' nillable='true' type='xs:string'/>
While trying to 'import the new resource' using the new Task.json file, it is throwing the error below
'TaskBaseline.json' UPLOAD FAILED. Unexpected character, location = 0, value = '<'
What could be the reason for this. Please let me know
While still waiting for my membership upgrade to Premium member, I want to clarify the below
Instead of using the Schema library manager to import the metadata for JSON -TaskBaseline (as it is failing due to file size), I used the assembly editor in Hierarchical data stage to create the output columns by pointing to the same name columns in another Task\root. (Task library has the same name columns except 'TimeByDay' which is required in the TaskBaseline module)
I mapped the TimeByDay(TaskBaseline) to FinishDate column(Task) in output mapping of assembly editor.
But while running the job, all the records got dropped with an warning saying that incoming TimeByDay is '0'. But the Task.Json have the date values in TimeByDay.
So I tried to export the TaskBaseline.json which has similar structure as Task.json and renamed the element name like below and saved it as TaskBaseline.json
From
<xs:element name='FinishDate' minOccurs='0' nillable='true' type='xs:string'/>
To
<xs:element name='TimeByDay' minOccurs='0' nillable='true' type='xs:string'/>
While trying to 'import the new resource' using the new Task.json file, it is throwing the error below
'TaskBaseline.json' UPLOAD FAILED. Unexpected character, location = 0, value = '<'
What could be the reason for this. Please let me know
Suja
Cut down your 717meg json to one set of properties and imoport it. Your should be fine. The result below is xml and not json, so I cant speculate.on what the error might be after changes.
Ernie
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
The problem is resolved.
As mentioned by chulett and Ernie, I took 1 record out of huge json file
(277775 records) with metadata structure like below and saved in as TaskBaseline.json
---------------------------------------------------------------------------------------
[{
"__metadata": {
"id": "https://dartorg.sharepoint.com/sites/pw ... A00\u0027)",
"uri": "https://dartorg.sharepoint.com/sites/pw ... A00\u0027)",
"type": "ReportingData.TaskBaselineTimephasedData"
},
"Project": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... 7)/Project"
}
},
"Task": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... 0027)/Task"
}
},
"TaskBaselines": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... kBaselines"
}
},
"ProjectId": "59f88501-bd76-e711-80cb-eba85625f89a",
"TaskId": "ebf88501-bd76-e711-80cb-eba85625f89a",
"TimeByDay": "\/Date(1502928000000)\/",
"BaselineNumber": 0,
"ProjectName": "161 - Proterra Window Decals",
"TaskBaselineBudgetCost": "0.000000",
"TaskBaselineBudgetWork": "0.000000",
"TaskBaselineCost": "0.000000",
"TaskBaselineFixedCost": "0.000000",
"TaskBaselineModifiedDate": "\/Date(1501595954740)\/",
"TaskBaselineWork": "16.000000",
"TaskName": "161 - Proterra Window Decals"
}]
----------------------------------------------------------------------------------
Then I imported the metadata using the schema library manager by pointing to the new small (TaskBaseline.json) file.
Then in Edit Assembly of Hierarchical datastage -'Json source' pointed to huge Json source file in 'Single file' option and did the mapping.
The job ran fine without any error
Thank you for support, chulett and Ernie.
As mentioned by chulett and Ernie, I took 1 record out of huge json file
(277775 records) with metadata structure like below and saved in as TaskBaseline.json
---------------------------------------------------------------------------------------
[{
"__metadata": {
"id": "https://dartorg.sharepoint.com/sites/pw ... A00\u0027)",
"uri": "https://dartorg.sharepoint.com/sites/pw ... A00\u0027)",
"type": "ReportingData.TaskBaselineTimephasedData"
},
"Project": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... 7)/Project"
}
},
"Task": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... 0027)/Task"
}
},
"TaskBaselines": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... kBaselines"
}
},
"ProjectId": "59f88501-bd76-e711-80cb-eba85625f89a",
"TaskId": "ebf88501-bd76-e711-80cb-eba85625f89a",
"TimeByDay": "\/Date(1502928000000)\/",
"BaselineNumber": 0,
"ProjectName": "161 - Proterra Window Decals",
"TaskBaselineBudgetCost": "0.000000",
"TaskBaselineBudgetWork": "0.000000",
"TaskBaselineCost": "0.000000",
"TaskBaselineFixedCost": "0.000000",
"TaskBaselineModifiedDate": "\/Date(1501595954740)\/",
"TaskBaselineWork": "16.000000",
"TaskName": "161 - Proterra Window Decals"
}]
----------------------------------------------------------------------------------
Then I imported the metadata using the schema library manager by pointing to the new small (TaskBaseline.json) file.
Then in Edit Assembly of Hierarchical datastage -'Json source' pointed to huge Json source file in 'Single file' option and did the mapping.
The job ran fine without any error
Thank you for support, chulett and Ernie.
Suja
Congrats! Note that, for others reading this in the future, be sure in your cut down version to at least include 2 of any repeating subnodes so that the metadata interpreter k ows that those subnodes repeat and should be defines as a list.
Good work!
Ernie
Good work!
Ernie
Ernie Ostic
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>