GSoC Project Idea 6.1: Make published datasets accessible on the platform



The aim of this part of the project is to develop code to collect published datasets (raw images, segmentations, skeletons, synapses, graphs, volume meshes etc.) and convert them into formats that are compatible with the platform and the CATMAID data model.

We have a large list of datasets and online data sources distributed all-over the internet in a large variety of different formats and ways to access them. After learning about the structure of the target formats, we would want you to get your hands dirty and go through this list of datasets. We want you to understand the dataset’s content and write code to convert them to be accessible and browsable on the platform and in CATMAID. This will be a nice learning experience and we expect that you’ll become very fast at doing this after a few datasets – juggling with a number of tools and libraries

Skills required: Python, SQL; Careful and Systematic Work

Some of the things you’ll be dealing with: JSON, XML-parsing, REST APIs, SQL Queries, NumPy Arrays, HDF5, Jupyter Notebooks, Dataset descriptions in publications, …

Nice-to-have Skills but not strictly required: Linux, Bash, SysAdmin, WordPress

Mentor: Stephan Gerhard, PhD, Zurich, Switzerland.

