GSoC 2021 Project Idea 18.1: Securing Neuroimaging BIDS Datasets Stored on Decentralized File Sharing Networks with Ethereum Public Keys, Git Annex and IPFS

Problem. A significant barrier to open science practice is the sharing and accessibility of neuroimaging datasets. The interplanetary file system (IPFS) addresses this barrier with peer-to-peer sharing of data and storage on distributed networks such as Bitorrent, Filecoin, and Cloudflare. However, files stored on the distributed IPFS hash table are public by default, making it inappropriate for sharing protected health information, or confidential data.

Project. Ethereum public keys associated with identity following the decentralized identifier standard (DID) for web3.0 can be used to encrypt datasets such that only those with access privileges can access the data on the IPFS network. This project involves the creation of an open-source toolkit to encrypt datasets with Ethereum public keys prior to storage on IPFS. Contributors should have a working knowledge of Java, Python, Git, or Unix.

Project Lead + Mentor: Shady El Damaty, Ph.D.

This project will build off of the 2020 decentralized science brainhack project. More information can be found on the github repository.

Hi, please feel free to reach out to me with any questions!

-Shady

Hello, Achintya here. I really liked the idea of implementing DID for IPFS storage and would like to know more and work around this idea. Could you provide more details about using Ethereum public keys here, do you intend to put this onchain?

1 Like

Hey @Arnab, can you contact @seldamat regarding this

Hi Achintya,

Thanks for kicking off the conversation! The long-term goals of this project are to explore the ecosystem of existing toolkits for 1) self-sovereign data ownership and sharing, 2) secure & private decentralized file storage, 3) confidential cloud computing to improve upon the performance of current federated learning techniques.

DID is a way for us to safely decentralize user lists and metadata and grant individuals with first claim rights to how/when/if they share their data with external services. The most fleshed-out toolkits for linking DID with existing services are based on Ethereum applications, so signing with public keys is a great first step to explore composability with APIs such as iExec (confidential cloud computing), Ocean (private data token), 3box (DID API), & Ceramic (mutable ipfs records).

An exploration of these extensions would be down the line once we have a better idea of how well IPFS would work for securely storing neuroimaging datasets and how that compares with traditional AWS or gcloud buckets.

For this project, we would build on the existing git-annex plug-in for IPFS, explore interoperability with datalad, and demonstrate storage of encrypted datasets – with Ethereum public key encryption as a bonus for future plug-and-play with other services.

IPFS is still a relatively new technology and there is rapid development in this area. We are working on up-to-date project documentation to quickly onboard our summer scholars. You can continue the dialog on our discord server for a more fast-paced conversation.

Hey @arnab1896 @malin , I’ll be updating my weekly progress reports on my blog here: GSoC 2021 Updates | Kinshuk's Place

Noted, thanks for the report.

I have summarized my GSoC project and outlined directions for future work here: blog post