GSoC Project Idea 16.1: Peer-to-peer file and metadata sharing for OpenWorm data management

python

#1

The OpenWorm project has a data management tool called PyOpenWorm that aids in creating, storing, and sharing information about C. elegans and about the evidence that supports that information. Generally, this information will have data from the original work available in the form of CSV files, videos, plots, etc. Although in many cases this data is made available either alongside a published article’s supplementary materials or in a data repository, PyOpenWorm should also allow to share these data, acting as a primary or secondary distribution mechanism.

Aims: Ideally, the student would design and implement a solution for peer-to-peer file sharing that integrates with the existing PyOpenWorm codebase and allows for access control (so that researchers can limit sensitive information, for instance), identity management (to support access control), and data-integrity checking (to protect against accidental and malicious changes). For redundancy and to reduce infrastructure costs, a peer-to-peer framework is desired. The student should review file sharing protocols (e.g., ed2k, kademila, bittorrent) and determine which (if any) best align with OpenWorm’s goals.

Skills: Comfort with independent study, software development, and testing is expected. Python (2 and 3) experience is required. Familiarity with principles and practice of data management (equivalent to an undergraduate course in that topic) is recommended. Experience with peer-to-peer file sharing protocols is useful, but not required.

Mentor: Mark Watts (mark@openworm.org), Arnab Banerjee (arnab1896@gmail.com).