GSOC 2026 Project #23 : A Python Command Line Interface (CLI) for the CBRAIN Distributed Computing Platform

Mentors: Bryan Caron <bryan.caron@mcgill.ca>, Pierre Rioux, Natacha Beck, Serge Boroday, Darcy Quesnel

Skill level: Intermediate - Advanced

Required skills: Python; experience with version control systems (i.e. git) and team-based development methodologies; good understanding of the Linux operating system and development in a Linux environment

Time commitment: part time or full time (350 hours)

About: CBRAIN is a web-enabled distributed computing platform that facilitates collaborative research on large, distributed data by creating an easy-to-use interface for users (or groups of collaborating users) to access high-performance computing (HPC) and Cloud Computing resources. Through a series of web-based services, CBRAIN manages data access, transfer, caching and provenance, as well as data processing and reporting. While predominantly used to support researchers in neuroinformatics, CBRAIN is generic and modular, and can easily be extended with new data models and tools for a broad range of research disciplines. CBRAIN is an open source, flexible Ruby on Rails framework for accessing and processing large amounts of data across a distributed network of High Performance Computing (HPC) and Cloud Computing infrastructures. With over 1800 users from over 35 countries, CBRAIN is a key resource that lowers the technical barriers for scientists to conduct neuroinformatics research. More information about CBRAIN can be found at https://cbrain.ca and GitHub - aces/cbrain: CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures. · GitHub.

Aims: The objective of the project is to create a python-based command line interface (CLI), leveraging the CBRAIN APIs, which will enable more advanced users to perform all the typical operations of CBRAIN for data upload / download, file querying / selection, and processing task creation, execution and monitoring from a CLI that can be run on a remote resource without requiring the user to perform the same actions through the CBRAIN web interface. A CLI approach would provide users the ability to create more complex workflows while still leveraging CBRAIN’s core abilities to manage data movement and large-scale data processing.

Website: https://cbrain.ca and GitHub - aces/cbrain: CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures. · GitHub

Tech keywords: Keywords: Python, imaging, CBRAIN, distributed computing, cloud computing

Hi @bryan.caron , @natacha-beck, @prioux and @Serge_Boroday,

My name is Ravi, and I am very excited about the CBRAIN CLI (Project #23). I’ve been actively contributing to the core repository and recently fixed the high-priority ActionMailer regression issue #1588 via PR #1599 which was merged into master

My main goal for the CLI is to make it reliable, scriptable, and resilient for researchers in production workflows. Based on the recent technical discourse in the core repository, I am focusing my proposal on the following architectural pillars:

  • I plan to leverage the newly refactored OO methods (such as is_ssh_key_installed? from ** @prioux in PR #1598**) to catch configuration issues locally before a task is even submitted to the portal.
  • Boutiques-aware validation: Integrating local parameter checks against Boutiques descriptors to reduce failed runs and provide immediate feedback on incorrect inputs.
  • Modular Python structure: Building a clean, maintainable codebase using Click for command orchestration and Pydantic for robust data validation and schema integrity.

I am currently finalizing my full proposal and would really appreciate hearing if there are specific CLI workflows, edge cases, or high-priority features (eg. handling specific large file transfer scenarios) that the team would like me to prioritize in the roadmap.

Looking forward to your feedback!

Best regards, Ravi

We initially voted against using libraries such as Click to make the CLI installation accessible for neuroscientists without extensive Python or Linux experience.

That said, it’s an interesting suggestion, and we will consider it.

Thank you for your contribution.

Sergiy

Hi @bryan.caron @prioux @natacha-beck @Serge_Boroday @dlq and fellow contributors!

My name is Shashwat Pratap Singh, and I am very excited to express my interest in Project #23: A Python CLI for the CBRAIN Platform.

My technical background is heavily grounded in Python systems development, Linux environments, and High-Performance Computing.
Recently, I fine-tuned the Gemma LLM on a supercomputing cluster, which required me to deeply engage with the terminal to manage distributed workflows, analyze SLURM execution logs, and optimize job scheduling. Alongside this, I am developing an OS-level virtualization environment to securely execute uncompiled source code, and recently built an LLM-powered tool to automate YAML configuration generation.

Because my day-to-day work involves driving complex, remote HPC tasks directly from the command line, building a production-ready Python wrapper for CBRAIN’s REST API is a challenge I am highly motivated to tackle.

I am currently spinning up a local CBRAIN instance to map out the API endpoints. Looking forward to collaborating with you all!

Hi @Serge_Boroday, thanks for pointing that out! That makes a lot of sense, accessibility is key and last thing a researcher needs is more installation friction.

I am more than happy to pivot to argparse. Sticking to the Standard Library is a great call for keeping the tool lightweight and ensuring it works '“out-of-the-box” without any messy environment management. I will make “Zero-Dependency Portability” a core pillar of my proposal. Appreciate the steering!

:rocket: Exploring Distributed Computing with a Python CLI for CBRAIN

I’ve recently been diving into the idea of building a Python-based Command Line Interface (CLI) for the CBRAIN Distributed Computing Platform — a tool designed to simplify interaction with large-scale distributed systems used in research and high-performance computing.

:light_bulb: The core idea is to create a developer-friendly CLI that allows users to:

  • Submit and monitor jobs on distributed clusters

  • Manage datasets and workflows efficiently

  • Automate repetitive tasks in computational pipelines

  • Integrate CBRAIN capabilities into scripts and research tools

This aligns strongly with my growing interest in AI, Machine Learning, and scalable systems, where compute-intensive workloads require efficient orchestration across distributed environments.

:bullseye: Why this excites me:
Working on such a system bridges the gap between theoretical ML models and real-world deployment at scale. It’s not just about building models—but enabling them to run efficiently on powerful infrastructure.

:school: My interest in
I am particularly interested in pursuing this direction at, where strong research culture and exposure to systems, AI, and applied engineering can help me deepen my understanding of:

  • Distributed computing systems

  • AI-driven pipelines

  • Scalable software architecture

I’m eager to contribute, learn, and build impactful solutions that combine AI + Systems + Real-world applications.

:speech_balloon: Open to discussions, collaborations, and feedback!

Hi, I’ve started contributing by improving the README documentation (PR submitted).

I’m interested in working on the Python CLI project and currently exploring how CBRAIN APIs can be used for task automation.

Looking forward to feedback and guidance from the mentors.