GSoC 2025 Project #16 CBRAIN :: A Python Command Line Interface (CLI) for the CBRAIN Distributed Computing Platform (175/350h)

arnab1896 · March 4, 2025, 2:46pm

Mentors: Bryan Caron <bryan.caron@mcgill.ca>, Pierre Rioux, Natacha Beck, Serge Boroday, Darcy Quesnel

Skill level: Intermediate - Advanced

Required skills: Python; experience with version control systems (i.e. git) and team-based development methodologies; good understanding of the Linux operating system and development in a Linux environment

Time commitment: part time or full time (350 hours)

Forum for discussion

About: CBRAIN is a web-enabled distributed computing platform that facilitates collaborative research on large, distributed data by creating an easy-to-use interface for users (or groups of collaborating users) to access high-performance computing (HPC) and Cloud Computing resources. Through a series of web-based services, CBRAIN manages data access, transfer, caching and provenance, as well as data processing and reporting. While predominantly used to support researchers in neuroinformatics, CBRAIN is generic and modular, and can easily be extended with new data models and tools for a broad range of research disciplines. CBRAIN is an open source, flexible Ruby on Rails framework for accessing and processing large amounts of data across a distributed network of High Performance Computing (HPC) and Cloud Computing infrastructures. With over 1800 users from over 35 countries, CBRAIN is a key resource that lowers the technical barriers for scientists to conduct neuroinformatics research. More information about CBRAIN can be found at https://cbrain.ca and GitHub - aces/cbrain: CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures..

Aims: The objective of the project is to create a python-based command line interface (CLI), leveraging the CBRAIN APIs, which will enable more advanced users to perform all the typical operations of CBRAIN for data upload / download, file querying / selection, and processing task creation, execution and monitoring from a CLI that can be run on a remote resource without requiring the user to perform the same actions through the CBRAIN web interface. A CLI approach would provide users the ability to create more complex workflows while still leveraging CBRAIN’s core abilities to manage data movement and large-scale data processing.

Website: https://cbrain.ca and GitHub - aces/cbrain: CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.

Tech keywords: Keywords: Python, imaging, CBRAIN, distributed computing, cloud computing

namita-ach · March 10, 2025, 5:18pm

Hi!
This project looks really exciting! I’m a Computer Science undergrad specializing in Advanced and Applied ML, with experience in computational neuroscience, spiking neural networks (SNNs), and cloud computing. Given my background, I’d love to contribute to this. Is there a particular direction I should focus on to be most effective? Would diving deeper into CBRAIN’s API, distributed computing architectures, or HPC workflows be most beneficial for getting started? Thanks in advance! : )

PRIMUS · March 13, 2025, 4:12pm

hy @bryan.caron i mentioned you for some question can you let me know if there is any changes recent updates or changes to the CBRAIN API that would affect the design and implementation of the CLI! Additionally, are there specific new functionalities or improvements that should be prioritized in the CLI beyond the original project scope!

NISHCHAY_RAJPUT · March 19, 2025, 12:31pm

Hey Mentors,
I am really interested in this project for developing the python command line interface for the CBRAIN DCP. Could you share the preferred communication channel where I can be assisted and can engage in the community.

Additionally I have few question regarding this project.

Shoulde we prefer a minimal viable CLI first or a more feature-rich implementation from the start?
Is there an existing codebase or prototype for this CLI or any other reference you would like to provide us in order to start that I can study before starting from scratch.

@natacha-beck @Serge_Boroday @dlq @bryan.caron

Seems like @bryan.caron have been inactive from a long time

dlq · March 19, 2025, 5:51pm

I’ll poke Bryan to get back to you but I think we’re looking for a Python with something like Click and directly uses the Swagger API spec.

Wait to hear from Bryan about how to proceed, though.

NISHCHAY_RAJPUT · March 19, 2025, 7:28pm

Okay I’ll have a look on both of these until Bryan is back!

NISHCHAY_RAJPUT · March 20, 2025, 9:36am

Had a look at click and also found another one i.e. typer. This also looks fine. Will update you more on which would be fine.

NISHCHAY_RAJPUT · March 21, 2025, 5:18pm

Hey @dlq ,
Had look at both of these and pretty other things as well and feasibility n all. Had wrote something for now whatever done. Could you specify more over what do to next?

PRIMUS · March 25, 2025, 3:58pm

looks like @bryan.caron is still busy

dle · March 25, 2025, 7:38pm

Hello everyone,

We wish to inform you that any questions/inquiries from potential project mentees regarding this project can be sent to cbrain-support.mni@mcgill.ca (preferred email).

If your inquiry has already been posted in this thread, we will respond to it directly here.

We are currently receiving a high number of inquiries, and will be responding as soon as we are able. Thank you for your patience!

Shruti_Parmar · March 26, 2025, 11:56am

Hi everyone! I’m Shruti Parmar, a 3rd-year Computer Science student specializing in Data Science. I have experience working with AI/ML, open-source projects, and handling real-world datasets. Excited to contribute to CBRAIN by developing the Python CLI, making data processing and HPC access more efficient for researchers. Looking forward to learning and collaborating with you all!

NISHCHAY_RAJPUT · March 30, 2025, 9:54am

Hey @dle ,
I have prepared my proposal and wanted to get reviewed and want feedback over it and wanted to how how to proceed further. However I find no response either here or over the mail. Please look into it.
Thanks

dle · April 1, 2025, 5:52pm

To all GSoC candidates interested in this project, please see below for some updated information regarding the project.

the CLI project should be developed in a separate GitHub repo. Do not make pull requests on the main CBRAIN project repo.
the CLI project’s codebase should contain the minimal set of Python files to implement its core functionality. Do not include external libraries imported by your project, or files installed as part of a virtual environment.

Implementation

Single command ‘cbrain’ that runs in a Unix (Linux/Mac) terminal
The command is a python script that depends on CBRAIN API
libraries to perform its functions
The command has a standard argument structure:
- options starts with ‘-’, e.g. -j or --json
- other arguments are subcommands and their parameters
Session is maintained in a credentials.json file; advanced
features would be to be able to maintain several distinct
sessions and switch between them?

Examples

These are ‘vision’ examples, not to be taken as the literal product
that this project expects. The developer is free to make adjustments
and make things more convenient/pretty at any levels.

The lines starting with '> ’ represent the terminal’s prompt
and show the ‘cbrain’ command with sample options and argument.
Lines the follow show sample outputs. If these lines are in
parenthesis, it means a description of the output is shown
instead.

> cbrain
(shows the basic usage statement for the command)

> cbrain login
Enter CBRAIN server URL prefix: http://localhost:3000
Enter CBRAIN username: jdoe
Enter CBRAIN password: *********
Connection successful, API token saved in $HOME/.config/cbrain/credentials.json

> cbrain whoami
Current user: jdoe (John Doe) on server http://localhost:3000

> cbrain -v whoami
DEBUG: Found credentials $HOME/.config/cbrain/credentials.json
DEBUG: User in credentials: jdoe on server http://localhost:3000
DEBUG: Token found: a3****b2
DEBUG: Verifying token...
DEBUG: GET /session
DEBUG: Got JSON reply {"user_id":23,"cbrain_api_token":"0123456789abcdeffedcba9876543210"}
DEBUG: GET /users/23
DEBUG: Got JSON reply {"id":23,"login":"jdoe","full_name":"John Doe","email":...}
Current user: jdoe (John Doe) on server http://localhost:3000

> cbrain list projects
ID Type        Project Name
-- ----------- ----------------
34 UserProject jdoe
55 WorkProject my_research

> cbrain --json list projects
[
  { "id": 34, "type": "UserProject", "name": "jdoe" },
  { "id": 55, "type": "WorkProject", "name": "myresearch" }
]

> cbrain switch project 34
Current project is now "jdoe" ID=34

> cbrain show project
Current project is "jdoe" ID=34

> cbrain switch project ALL
No current project selected, everything is unfiltered.

> cbrain list files
ID   Type        File Name
---- ----------- -----------------------
2616 TextFile    license.txt
9221 CivetOutput sub-1234_feb24-beluga-1

> cbrain --json list files
(same list but in JSON)

> cbrain list tools
> cbrain list tool_configs
> cbrain list tasks
> cbrain list users
(note: would only list one user unless the user is admin)
> cbrain list data_providers
> cbrain list remote_resources
> cbrain list background_activitites

> cbrain list files where group_id=55 data_provider_id=27
(only shows files based on filters, which are just sent back to the server
as query parameters; cbrain client can validate the keys base on know
allowed filters)

> cbrain list tasks where bourreau_id=45

> cbrain upload 27 TextFile chapter1.txt < chapter1.txt
(27 is the ID of a data provider, the other two args are the type and name; the
content of the file is fed in the unix way from standard input)

> cbrain create task Civet < task_structure.json

> cbrain copy file 2616 27
Background activity ID: 453
(Arguments are file ID, destination data provider ID)

> cbrain move file 2617 27
Background activity ID: 454
(Arguments are file ID, destination data provider ID)

> cbrain --json show background_activity 453
{"id"=>453, "user_id"=>34, "type"=>"BackgroundActivity::CopyFile",
 "status"=>"Completed", "items"=>[2616], ... }

> cbrain show file 2616
id: 2616
type: TextFile
name: robot.txt
data_provider: 27
size: 4533
num_files: 1
user_id: 34
group_id: 55

> cbrain show task 12345
> cbrain show tool 13
> cbrain show tool_config 422
> cbrain show data_provider 27
> cbrain show remote_resource 45
(etc etc)

> cbrain logout

The tool should accept 'ls' and 'list', 'mv' and 'move', and 'show' and 'info', as equivalent, for example.

Naman_Sharma · April 2, 2025, 9:01am

This is amazing. I have gone through the codebase and understood the Bourreau and BrainPortal and its functionalities.
I am 2024 pass out and have extensive experience in Backend technologies. I have my proposal ready. If it needs to be reviewed before submitting. Please feel free to contact me.

Kush_Thakkar · April 26, 2025, 5:45am

Hello, I know it is a little bit late but is it still possible to join the project?

I have all the required skills as I have project experience in Python, Git, Linux, and I have a strong interest in computational neuroscience as I participated in a google Docathon analyzing women’s brain health data.

Deekshitha_Mallepula · July 10, 2025, 5:34am

Hello! I’m Deekshitha Mallepula, a Python developer and a GSoC 2025 aspirant.
I’m very interested in contributing to Project 16 (CLI for CBRAIN), and I’d like to get started with understanding the codebase and contributing.
Could you please guide me toward beginner-friendly issues or places I can explore first?

Looking forward to working with you!
Thank you.