openMINDS schema discussions

visakh · December 8, 2020, 10:24am

Discussion thread for data repositories using openMINDS schema #openminds
openMINDS: https://github.com/HumanBrainProject/openMINDS
openMINDS core: https://github.com/HumanBrainProject/openMINDS_core
openMINDS SANDS: https://github.com/HumanBrainProject/openMINDS_SANDS
openMINDS controlledTerms: https://github.com/HumanBrainProject/openMINDS_controlledTerms
openMINDS generator: https://github.com/HumanBrainProject/openMINDS_generator

Fred · December 11, 2020, 7:57am

Hi,

in the context of the recent #openminds model release, I am trying to build a first instance of metadata for one of our datasets following openMINDS model.

Thus, I suppose that metadata instances, encoded in #json-ld according the openMINDS model, should be validated against openMINDS #json-schema schemata generated from the schema-templates found in openMINDS repositories, right?

I noticed that openMINDS_generator repository contains scripts to validate generated schemata, using the jsonschema #python lib. Thus, I was wondering if, similarly, there is any script/helping class to validate metadata instances against the collections of #openMINDS schemata?

Still, it isn’t clear for me what would be the best approach in openMINDS case where a single metadata instance validation implies validating different nodes against different schema. Is it better to have nodes stored in separate documents or to consolidate them all in a complete json document? (both could be valid #json-ld docs I suppose, but maybe one is better than the other for the validation process?)

Btw, I could not find test metadata examples in the repos, where should I look? (Or better if someone could share any real-life example of a dataset metadata instance, although there might not be many out there since the new version of openMINDS has just been released?)

Thanks for your help,

LyubaZ · December 14, 2020, 9:03pm

Dear Fred,

first sorry for the late reply. I had too much on my desk…

Yes you are right, the metadata instances encoded in #json-ld should be validated against the openMINDS #json-schemata generated from the schema-templates of openMINDS (currently we only support this format and validation).

The openMINDS #json-schemata on the first validation step (meaning checking if each #json-ld is valid in its self without checking if the linked json-ld instances are of the correct schema schema type as well) can be purely done with the jsonschema #python lib, since on that level they only need to fulfill all requirements defined in normal #json-schema. The second validation step (meaning checking if the linked json-ld instances are of the correct schema schema type as defined in openMINDS) is or will be also handled by the generator. Not sure if Stefan already implemented this part, I need to check with him or Oli.

For separate json-ld vs one big json-ld doc: the major difference lies in reusing instances in other contexts as it is frequently done in graph databases (e.g., a person can be an author in multiple datasets, but remains the same person). Each instance is indexed and can be independently referenced by other instances. Merging everything into one document complicates this feature. Oli can correct me if I’m wrong. I’ll asked him to join this conversation.

The examples are still missing, sorry for this. I will try to push a small example within this week and hopefully provide a complete show case (of a real dataset) before Xmas or at least this year. The show case might be provided via another channel or as well on the GitHub. I’ll update you on the status of the examples by the end of this week.

I hope this helps already a bit?

Cheers

Fred · December 15, 2020, 2:17am

Dear Lyuba,

Thanks for your answer, it’s helping indeed.

I think it does also complicate validation!

In the end, for my metadata instance I am creating separate #json-ld document nodes corresponding to each distinct #openminds schemata since that, in my understanding, the jsonschema #python lib can only validate a node document against a single schema.
Therefore the “2nd validation step” can be performed by iterating over those separate #json-ld nodes, find the corresponding openMINDS schema by parsing the @type property, and validate using jsonschema lib.
But in this way of separate/independent validations, there is presumably no proper check that all referenced nodes are actually defined, and defined only once with the proper @type. So I guess some extra code needs to be added for that purpose.

Anyhow I am interested to learn more about the instance validation feature (to be?) included in the generator.

Looking forward for the complete Xmas showcase!

Thanks

LyubaZ · December 15, 2020, 8:04am

Dear Fred,

I’m glad I could help a bit. And yes, one json-ld document would also complicate the validation.

For the validation, the generator will (and I think this is not yet implemented) help you to identify / generate json-lds that belong to one metadata collection. Meaning it defines the scope of the validation for step 2.

With that I think the complete validation workflow covers everything (with one minor thing missing):

check if each single #json-ld within the collection is correct according to its schema
check for each single #json-ld within the collection if it links to the json-lds of the correct type
check if all json-lds build an actual connected graph structure to avoid loose instances without any connections (missing, I think)

To my understanding this workflow should cover the full validation of the openMINDS model, but please let me know if you see an issue. Your feedback is highly appreciated

I’ll ask Oli and Stefan today to join the conversation so that you can get some more technical details.
Feel also free to directly raise issues with questions or suggestions on the GitHub.

Best, Lyuba

Fred · December 15, 2020, 9:03am

Thank you @LyubaZ for clarifying the validation workflow:

thanks for pointing that out, I didn’t think of potential unconnected nodes indeed…
Seems pretty clear and complete to me now.

I’ll certainly get back at you later when we’ll be going into more details with the help of the examples you’ll be providing in the following days/weeks.

Cheers,