Creating a Neurodata Repository: What are the required resources

JohnOgbonoko · October 11, 2020, 9:03pm

Hello,
I am new here, and recently opted to venture into the field of neuroinformatics.

I am aware there exists lots of neurodata databases out there for research purposes, but want to see the possibility of building one in Nigeria to pull available neurodata together. I am inquisitive to know what resources are required to build such a neurodata repository or database in terms of software and hardware tools, and other factors to be considered.

Your suggestions will greatly help. Thanks

PeerHerholz · October 13, 2020, 3:12pm

Ahoi hoi @JohnOgbonoko,

thank you very much for your super interesting post and welcome to Neurostars, it’s great to have you here!

Your idea sounds awesome and I think there are a lot of folks here who can help you and provide pointers, etc. . I’ll tag a few people (sorry for that folks) who know a lot about data sharing platforms, the legal aspects, community based development and support, etc. and hope that we can get a great discussion going from that: @effigies, @franklin, @robert, @StephanHeunis, @DorienHuijser, @KirstieJane, @emdupre, @cassgvp, @katjaq. I’m pretty sure I missed a lot of folks (also, I can only tag 10 people), sorry for that.

Cheers, Peer

effigies · October 13, 2020, 4:06pm

Hi @JohnOgbonoko,

I’m not 100% clear on what the scope is here. Is the goal to create an OpenNeuro-like repository for Nigerian researchers to submit data to, or a datasets.datalad.org-like registry to make it easy to access already existing open resources, or something else still?

If you can provide more details of what you would like to build (in an ideal world), then it will be easier to give recommendations or suggest people who can.

Best,
Chris

JohnOgbonoko · October 13, 2020, 11:15pm

Thanks for your response @effigies.
The idea is to build an OpenNeuro-like repository for researchers to submit data, and to provide access for researchers who want to use such data.

JohnOgbonoko · October 13, 2020, 11:17pm

Thanks @ PeerHerholz

robert · October 14, 2020, 7:11am

The source code to OpenNeuro is available on github and can be installed on your own server. But be aware that it is a narrow domain-specific repository that is closely tied to BIDS; all datasets that are uploaded and hosted have to be BIDS compliant. Depending on the data you want to store, you may also want to consider a more generic repository system such as DataVerse, which can also be downloaded from github. Besides the domains and restrictions on data organization being different between the systems, also the models for the “roles” are different; DataVerse has more elaborate roles for stakeholders with different responsibilities for data collection (e.g. for reviewing and granting access), whereas on OpenNeuro you have uploader and downloaders, nothing more. Which one works best depends on the needs of the researchers that will deposit data.

Regarding hardware, I don’t know what is precisely required, but can imagine a single or possibly a few servers (e.g. one for the web interface, m one for management, and one for file storage). Requirements for redundancy/high-availability and performance make the setup more complex. Nowadays I would say that a virtualized setup in the cloud makes most sense to start with, since that way you can prevent large up-front hardware investments. But if you have some computers available, those should just work well to set up a (test) environment.

effigies · October 14, 2020, 6:44pm

Thanks for the clarification, @JohnOgbonoko . I think @robert summed up most of what I would have to say on this.

OpenNeuro is written by SquishyMedia. While it is open source and you’re welcome to modify and deploy it yourselves, it might still be worth talking with them. They can give you pretty detailed information about the startup and ongoing costs, and if you’re interested in contracting for particular features, they would be most familiar with the code base to make the necessary adjustments.

JohnOgbonoko · October 14, 2020, 10:18pm

Thanks @effigies, @robert for the detailed suggestion. I will take a look at DataVerse as it sounds more appealing to me. Since power is a huge challenge here, and considering that access to the repository is required 24/7, a cloud setup will be most suitable for me.

Regards.

effigies · October 14, 2020, 11:40pm

To be clear, OpenNeuro is currently deployed in a cloud setup.