GSoC Project Idea 1: Accessible High-Performance Computing with CBRAIN

malin · January 14, 2019, 10:58am

Research in neuroinformatics increasingly involves performing resource-intensive processing on large datasets, which exceeds the capabilities of laptop or desktop computers available to neuroscience researchers. While high-performance computing (HPC) tools are available to researchers in many countries, using them requires specialized knowledge and access credentials, and require researchers to invest time learning the required computational interfaces, adapting their tools so that they function properly on each cluster they wish to use, and transmitting their research data to and from compute servers manually. CBRAIN is a web-based platform that seamlessly harmonizes access to arbitrary back-end compute servers and data repositories, and performs much of this tedious work automatically in order to accelerate research. While CBRAIN has been deployed in Canada for over 8 years, up until now it has been accessed mostly through the web interface. The goals of this project are two-fold: (1) to create a library of functions as a prototype software development kit (SDK) to access the CBRAIN application programming interface (API), and (2) to use the SDK to develop a tutorial application for how to access CBRAIN using the newly-created library and run a computational pipeline.

Tool presentation

While the CBRAIN platform is written in Ruby on Rails, the API is implemented using the Representational State Transfer (REST) software architecture, so that it is accessible using any programming interface that can implement hypertext transport protocol (HTTP) requests. It is documented under the OpenAPI specification. We suggest that the library of CBRAIN functions be written in Python and wrapped as a pip-installable package, as Python is increasingly used in neuroinformatics research. Additionally, Jupyter notebooks have become a popular format for teaching new coding skills to scientists, and allow the embedding of text and images to assist in the clarity of explanatory code. We suggest that the scientific pipeline run by the tutorial be Freesurfer, one of the most widely used tools currently in the main CBRAIN server’s library. These suggested technologies and tool are not requirements for a successful application, and proposals to implement the goals of the project by different means will still be considered.

Aims

The project will consist of the following stages:

Familiarization with CBRAIN & scientific tool. The documentation and main CBRAIN instance’s graphical user interface will be used by the successful applicant to manually go through the process of running a scientific pipeline. Data, and the steps necessary to complete the process will be provided by the mentors.
CBRAIN API testing. The applicant will use the interactive OpenAPI documentation of CBRAIN’s API to run through the same steps as above.
Library development. A library of functions will be written to access the CBRAIN endpoints necessary to complete the project.
Tutorial development. A small script will be written that leverages the new library to upload and register data into CBRAIN, select an available computational server with the required computational tool, launch a task to run the tool, query CBRAIN about the status of task, and download results once they are available. This script should be written as concisely as possible, with explanatory diagrams, which can be re-used from the existing CBRAIN documentation or developed by the applicant.

Skills: Interns’ enthusiasm, commitment, work ethic and communications skills are essential for the success of any project. In addition, prior experience with: RESTful web development, API usage, the MVC design pattern, Python, Jupyter, high-performance computing, technical writing, graphic design, or neuroinformatics would be beneficial to the project’s success.

Mentors: Andrew Doyle and Shawn Brown, McGill University, Canada.

crocodoyle · February 26, 2019, 9:37pm

What ISN’T in the project description is that with a CBRAIN account, you get access to the largest computing budget available in Canada… for free! And if you do this project, Google will even PAY you!

Davide95 · February 27, 2019, 1:03am

Hi. I’m writing here because this year I would like to apply for GSOC and I find this project idea interesting.

My name is Davide, I’m an MSc student of Computer Science and I’ve a BSc in Computer Science. If you want to know more about me:

What can I do before the proposal to understand better what I’ll have to do?
Thanks in advance.

shots47s · February 27, 2019, 12:51pm

Hi Davide, great to meet you.

I would recommend that you sign up for an account on our CBRAIN portal and familiarize yourself with our platform. You can sign up for an account at https://portal.cbrain.mcgill.ca. We have some documentation to get your started on our GitHub Wiki at https://github.com/aces/cbrain/wiki. And if you have any questions, you CBRAIN account will let you sign into our discussion forum at https://forum.cbrain.mcgill.ca/. I will have our developers publish the RestAPI on SwaggerHub and post in a subsequent method. Even though CBRAIN is in Ruby on Rails, you shouldn’t need to learn that as you will be working on clients and testing of the RESTFul API, which can really be in any language.

Thanks, looking forward to working with you.
Shawn

Davide95 · February 27, 2019, 1:43pm

I’ve just requested an account. Thanks!

Davide95 · February 27, 2019, 5:14pm

Update: I’ve read the wiki and played a little bit with the web app. It seems cool!
I’m wondering if there are more tutorials (e.g. something that gives you a list of files, a task and teach you how it works in a real scenario).
Let me know!
Thanks in advance.

crocodoyle · February 27, 2019, 9:07pm

Here’s a copy of my brain: https://www.dropbox.com/s/7x65xi24v00qcj5/andrew_mri_nov_2015.nii.gz?dl=0

See if you can upload it to CBRAIN, register it, and launch a task to convert it to .mnc!

Davide95 · February 27, 2019, 9:51pm

@crocodoyle it works, thank you so much!

@shots47s is it possible to have the RestAPI docs?
Also, how the library should be designed? Does it have to cover the entire APIs or just a subset?

crocodoyle · February 27, 2019, 10:48pm

There is a link to the docs in the project description. The bare minimum API coverage for the project would be the calls required to upload data, run a task, and download the results.

Davide95 · March 2, 2019, 5:45pm

I think that now I’ve enought information/knowledge to write a good proposal.
Thanks!

crocodoyle · March 12, 2019, 2:07pm

Shawn and I are happy to give you feedback on your proposals before they are submitted to Google

Shabirmean · March 14, 2019, 8:06pm

Hi,

Should you be a registered student to take part in GSoC? I just graduated from McGill in December 2018. Will I be eligible?

Thank you

Logan_Martel · March 14, 2019, 9:12pm

Hi Shawn, Andrew, Malin, and team!

My name is Logan. I’m currently an M.S. CS student at Georgia Tech’s online (distance-learning) OMSCS program, and presently living in Montreal. As a recent B.Sc. (Hons) Soft. Eng. alum @McGill, I came across this opportunity through Ann Jack’s message to McGill grads, suggesting interested parties should reach out here.

Looking into the CBRAIN project, this opportunity is immensely appealing to me, builds on my strong affinity for projects in Bioinformatics / applied Data Eng., and could fit very well with my current plans!

As a brief pitch: my experience ranges across academic research (~12 months on McGill Bioinf. & NLP ML projects) & industry software development roles (~20 months: mix of MVC / REST development and, recently, Data Eng. heavy work in Python / Jupyter Notebooks at Shopify).

Linking to my profiles:

To familiarize myself with the CBRAIN platform, in anticipation of submitting a Google Summer of Code proposal, I’ve recently requested an account through the CBRAIN portal (note: applied with my GA Tech email, logan.martel@gatech.edu).

Hope to hear from you soon. I will also have a look at available documentation on the GitHub Wiki in the meantime .

Best,
Logan

malin · March 15, 2019, 7:20am

Hello Shabirmean, this is what it says in the GSoC guidelines: “You must currently be a full or part-time student (or have been accepted and committed to the fall term) at an accredited university as of the date accepted student proposals are announced” (May 6). As I interpret that, you will not be eligible. However, some of the projects may also be looking for new project members, outside of GSoC.

crocodoyle · March 19, 2019, 4:22pm

Hi Logan & others,

Thanks for your interest. Have you had your account approved yet Logan? Not a whole lot of time left to prepare an application. A good proposal would indicate some familiarity with the CBRAIN API and the workflows necessary to do the science

What I have suggested to others is that they first use the web interface to run a tool on some data, try it again using the OpenAPI live docs, and document the steps.

If you can install CBRAIN locally, that could be verrry instructive (and might be difficult), but is not strictly necessary for the completion of the project.

Best,
Andrew

Davide95 · March 26, 2019, 3:11pm

Hi everyone
Sorry to bother you again
I’m writing the project proposal, but the website of the GSOC asks me an INCF proposal tag (screenshot down below).
What should I select?

OT: I’ve just submitted a draft, it would be great to have a feedback!

crocodoyle · March 26, 2019, 8:51pm

I’m not sure, I can’t access that page. What are the options in that dropdown?

Davide95 · March 26, 2019, 9:18pm

The tags are:

genn_project;
tvb_project;
gnode_project;
catmaid_project;
openworm_project;
other_project;
brian_project.

malin · March 27, 2019, 7:51am

(Org admin here)

Hello Davide, you can select the other_project tag. (The relative low limit for number of tags, I think it is 10 doesn’t allow us to give a specific tag to all projects). The tags are not super important, but they help us keep track.

Davide95 · March 27, 2019, 10:43am

Great, thanks.

One last thing: is there a way to have a feedback on the draft proposal before submitting it as final?