GSoC Project Idea 8.1: Continuous integration for research data


#1

The G-Node Data Infrastructure (GIN) services[1] provide a platform for management and sharing of data in neuroscience. Inspired by GitHub, the platform uses a git/git-annex backend for versioning and sharing of scientific data, offering the power of a web based repository management service combined with a distributed file storage. It addresses the range of research data workflows starting from data analysis on the local workstation to remote collaboration and data publication. GIN also provides indexing services for convenient searching of data and metadata, including information in well-defined formats like the odML[2] metadata format and the NIX[3] format for scientific data.

Considering existing continuous integration services like Travis[4] or CircleCI[5] and build pipelines for the scientific field like SnakeMake[6] this project aims to prototype a continuous integration microservice for research data.

Scope of the project is to set up a GIN microservice for automated organization and processing of data and metadata using established CI technology. The development will be performed based on a use case of electrophysiological data.

Skills: A successful application will have some experience with the Python or the Go programming languages and ideally is familiar with git, continuous integration services and/or SnakeMake

Mentors: Achilleas Koutsou, Michael Sonntag, G-Node

[1] https://gin.g-node.org
[2] https://github.com/G-Node/python-odml
[3] https://github.com/G-Node/nix
[4] https://travis-ci.org/
[5] https://circleci.com/
[6] https://snakemake.readthedocs.io/en/stable/


#2

Hi @malin,

My Name is Rahul Verma a 4th year undergrad, doing my bachelors in computer science. It is really awesome how incf is improving life of millions of people by advancing collabrative brain research. I want to become part of it and make lives of people better by doing the project “continuous integration for research data”. I have contributed to Mozilla (Release Engineering) and GNOME (Nautilus - there official file manager) before and i am fairly proficient in Python, Git and linux. I don’t know much about Travis/Circle but i am very much interested in learning more about those. I am really really excited to work on this project so can you please guide me on what should i do next.

Thanks. :slight_smile:


#3

Hello Rahul, you will want to talk to the mentors for this project, Achilleas Koutsou and Michael Sonntag from G-Node. They should contact you soon.


#4

@malin, Thanks. :slight_smile:
Hey @achilleas, as you told me on irc besides learning about Travis ci and snake make, is there anything else you want to tell.


#5

Hello Rahul. If you want to become more familiar with the project, you can have a look at the GIN services to get an idea of what this will be about.
The first link in the description is the main service. The code for that service is hosted here: https://github.com/G-Node/gogs
It’s a slightly modified version of the GOGS project.

The GOGS project has some support for Drone for CI and we experimented with this a bit in the past, but we’re open to trying out any available CI/CD platform that fits out needs.

The goal of this project is a little different from traditional continuous integration and continuous delivery services. As the project description mentions, the goal is to have automated processing of research data and it should be geared towards (but not limited to) electrophysiological data. While researching available technologies and designing the implementation of the project, this goal should be taken into account.

Feel free to ask any further questions once you start getting familiar with the relevant projects.


#6

@achilleas. Thanks a lot for your advice. :). Surely will ping you again for any further queries.