GSoC Project Idea 1: Accessible High-Performance Computing with CBRAIN

hpc
jupyter
rest
python

#1

Research in neuroinformatics increasingly involves performing resource-intensive processing on large datasets, which exceeds the capabilities of laptop or desktop computers available to neuroscience researchers. While high-performance computing (HPC) tools are available to researchers in many countries, using them requires specialized knowledge and access credentials, and require researchers to invest time learning the required computational interfaces, adapting their tools so that they function properly on each cluster they wish to use, and transmitting their research data to and from compute servers manually. CBRAIN is a web-based platform that seamlessly harmonizes access to arbitrary back-end compute servers and data repositories, and performs much of this tedious work automatically in order to accelerate research. While CBRAIN has been deployed in Canada for over 8 years, up until now it has been accessed mostly through the web interface. The goals of this project are two-fold: (1) to create a library of functions as a prototype software development kit (SDK) to access the CBRAIN application programming interface (API), and (2) to use the SDK to develop a tutorial application for how to access CBRAIN using the newly-created library and run a computational pipeline.

Tool presentation

While the CBRAIN platform is written in Ruby on Rails, the API is implemented using the Representational State Transfer (REST) software architecture, so that it is accessible using any programming interface that can implement hypertext transport protocol (HTTP) requests. It is documented under the OpenAPI specification. We suggest that the library of CBRAIN functions be written in Python and wrapped as a pip-installable package, as Python is increasingly used in neuroinformatics research. Additionally, Jupyter notebooks have become a popular format for teaching new coding skills to scientists, and allow the embedding of text and images to assist in the clarity of explanatory code. We suggest that the scientific pipeline run by the tutorial be Freesurfer, one of the most widely used tools currently in the main CBRAIN server’s library. These suggested technologies and tool are not requirements for a successful application, and proposals to implement the goals of the project by different means will still be considered.

Aims

The project will consist of the following stages:

  1. Familiarization with CBRAIN & scientific tool. The documentation and main CBRAIN instance’s graphical user interface will be used by the successful applicant to manually go through the process of running a scientific pipeline. Data, and the steps necessary to complete the process will be provided by the mentors.
  2. CBRAIN API testing. The applicant will use the interactive OpenAPI documentation of CBRAIN’s API to run through the same steps as above.
  3. Library development. A library of functions will be written to access the CBRAIN endpoints necessary to complete the project.
  4. Tutorial development. A small script will be written that leverages the new library to upload and register data into CBRAIN, select an available computational server with the required computational tool, launch a task to run the tool, query CBRAIN about the status of task, and download results once they are available. This script should be written as concisely as possible, with explanatory diagrams, which can be re-used from the existing CBRAIN documentation or developed by the applicant.

Skills: Interns’ enthusiasm, commitment, work ethic and communications skills are essential for the success of any project. In addition, prior experience with: RESTful web development, API usage, the MVC design pattern, Python, Jupyter, high-performance computing, technical writing, graphic design, or neuroinformatics would be beneficial to the project’s success.

Mentors: Andrew Doyle and Shawn Brown, McGill University, Canada.