Problem. A significant barrier to open science practice is the sharing and accessibility of neuroimaging datasets. The interplanetary file system (IPFS) addresses this barrier with peer-to-peer sharing of data and storage on distributed networks such as Bitorrent, Filecoin, and Cloudflare. However, files stored on the distributed IPFS hash table are public by default, making it inappropriate for sharing protected health information, or confidential data.
Project.Ethereum public keys associated with identity following the decentralized identifier standard (DID) for web3.0 can be used to encrypt datasets such that only those with access privileges can access the data on the IPFS network. This project involves the creation of an open-source toolkit to encrypt datasets with Ethereum public keys prior to storage on IPFS. Contributors should have a working knowledge of Java, Python, Git, or Unix.
Hello, Achintya here. I really liked the idea of implementing DID for IPFS storage and would like to know more and work around this idea. Could you provide more details about using Ethereum public keys here, do you intend to put this onchain?
Thanks for kicking off the conversation! The long-term goals of this project are to explore the ecosystem of existing toolkits for 1) self-sovereign data ownership and sharing, 2) secure & private decentralized file storage, 3) confidential cloud computing to improve upon the performance of current federated learning techniques.
DID is a way for us to safely decentralize user lists and metadata and grant individuals with first claim rights to how/when/if they share their data with external services. The most fleshed-out toolkits for linking DID with existing services are based on Ethereum applications, so signing with public keys is a great first step to explore composability with APIs such as iExec (confidential cloud computing), Ocean (private data token), 3box (DID API), & Ceramic (mutable ipfs records).
An exploration of these extensions would be down the line once we have a better idea of how well IPFS would work for securely storing neuroimaging datasets and how that compares with traditional AWS or gcloud buckets.
For this project, we would build on the existing git-annex plug-in for IPFS, explore interoperability with datalad, and demonstrate storage of encrypted datasets – with Ethereum public key encryption as a bonus for future plug-and-play with other services.
IPFS is still a relatively new technology and there is rapid development in this area. We are working on up-to-date project documentation to quickly onboard our summer scholars. You can continue the dialog on our discord server for a more fast-paced conversation.