Page 1 of 1

Parse JSON using a Schema

Posted: Mon Nov 05, 2018 2:03 pm
by jneasy
Hi,

I have been attempting to parse large JSON files (+200MB) that are provided on a daily basis using the schema provided.

I do know that in the Hierarchical stage the assembly editor can infer the schema by just importing the JSON data file,I have been able to parse a file based on this approach. The problem with this is that I have been given no guarantees that the +200MB file fulfills all fields defined in the schema.

My question is has anyone been able to import a JSON schema and use that to parse JSON data?

I have even tried using a simple Person example found at https://json-schema.org/learn/miscellan ... mples.html

Using the sample data the first parser step produces the following in the Downstream Output Test Data which you can see the firstName, lastName and age items are not being populated with the Name and Age values;

Code: Select all

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}<?xml version="1.0" encoding="UTF-8"?><top>
  <InputLinks/>
  <result>
    <root>
      <__24_id>
        <@originalName>$id</@originalName>
      </__24_id>
      <__24_schema>
        <@originalName>$schema</@originalName>
      </__24_schema>
      <@type>object</@type>
      <properties>
        <firstName>
          <@type>object</@type>
          <@@isPresent>false</@@isPresent>
        </firstName>
        <lastName>
          <@type>object</@type>
          <@@isPresent>false</@@isPresent>
        </lastName>
        <age>
          <@type>object</@type>
          <@@isPresent>false</@@isPresent>
        </age>
        <@type>object</@type>
        <@@isPresent>false</@@isPresent>
      </properties>
    </root>
  </result>
</top>
Cheers,
jneasy.

Posted: Tue Nov 13, 2018 10:46 pm
by chulett
Don't like seeing posts without a single reply so here I am... wondering if you made any progress with this.

Posted: Thu Nov 15, 2018 1:17 pm
by eostic
JSON Schema has seen some success, but is not widespread --- not like xml schema and its formality. The Hierarchical Stage uses a formal JSON document --- best suggestion is to find a "complete" one that has at least one instance of each element, and preferably, two or more instances of any node arrays that are able to carry multiple values. Import that and it will mimic a schema for you....and then you can get "reasonable" validation functionality from this "inferred" schema.

Ernie

Posted: Thu Nov 15, 2018 8:08 pm
by jneasy
@ chulett : No progress so far. My next thought is to generate some test data based on the JSON schema. This is where I run into my next problem, the schema is full of cascading references.

@ eostic : I thought someone would comeback with the trying to find a "complete" JSON file. Ive been working off this premise so far and is mostly working but I think I will need to dummy up a "complete" file to complete all mappings.

Appreciate the help guys!

Im going to mark this topic as work around. Work around being generating a "complete" JSON file.

Posted: Fri Nov 16, 2018 11:32 am
by eostic
That's the best approach due to the fact that there is no formal standard for JSON schemas.

Note --- Be careful when completing your document to fully represent your arrays...meaning...if you have a truly repeating subnode, then put more than one value in that node array. I haven't checked carefully our json to xml schema functionality, but I've done similar things with an open source tool called trang, which is useful for converting "complete" xml documents into xml schema (also needed for this Stage). ...and trang will consider a singly occuring node to be OCCURS=1, which may not be correct for a given situation.

Ernie