Required skills: Python; experience with version control systems (i.e. git) and team-based development methodologies; good understanding of the Linux operating system and development in a Linux environment
Time commitment: part time or full time (350 hours)
About: CBRAIN is a web-enabled distributed computing platform that facilitates collaborative research on large, distributed data by creating an easy-to-use interface for users (or groups of collaborating users) to access high-performance computing (HPC) and Cloud Computing resources. Through a series of web-based services, CBRAIN manages data access, transfer, caching and provenance, as well as data processing and reporting. While predominantly used to support researchers in neuroinformatics, CBRAIN is generic and modular, and can easily be extended with new data models and tools for a broad range of research disciplines. CBRAIN is an open source, flexible Ruby on Rails framework for accessing and processing large amounts of data across a distributed network of High Performance Computing (HPC) and Cloud Computing infrastructures. With over 1800 users from over 35 countries, CBRAIN is a key resource that lowers the technical barriers for scientists to conduct neuroinformatics research. More information about CBRAIN can be found at https://cbrain.ca and GitHub - aces/cbrain: CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures..
Aims: The objective of the project is to create a python-based command line interface (CLI), leveraging the CBRAIN APIs, which will enable more advanced users to perform all the typical operations of CBRAIN for data upload / download, file querying / selection, and processing task creation, execution and monitoring from a CLI that can be run on a remote resource without requiring the user to perform the same actions through the CBRAIN web interface. A CLI approach would provide users the ability to create more complex workflows while still leveraging CBRAIN’s core abilities to manage data movement and large-scale data processing.
Hi!
This project looks really exciting! I’m a Computer Science undergrad specializing in Advanced and Applied ML, with experience in computational neuroscience, spiking neural networks (SNNs), and cloud computing. Given my background, I’d love to contribute to this. Is there a particular direction I should focus on to be most effective? Would diving deeper into CBRAIN’s API, distributed computing architectures, or HPC workflows be most beneficial for getting started? Thanks in advance! : )
hy @bryan.caron i mentioned you for some question can you let me know if there is any changes recent updates or changes to the CBRAIN API that would affect the design and implementation of the CLI! Additionally, are there specific new functionalities or improvements that should be prioritized in the CLI beyond the original project scope!
Hey Mentors,
I am really interested in this project for developing the python command line interface for the CBRAIN DCP. Could you share the preferred communication channel where I can be assisted and can engage in the community.
Additionally I have few question regarding this project.
Shoulde we prefer a minimal viable CLI first or a more feature-rich implementation from the start?
Is there an existing codebase or prototype for this CLI or any other reference you would like to provide us in order to start that I can study before starting from scratch.
Hey @dlq ,
Had look at both of these and pretty other things as well and feasibility n all. Had wrote something for now whatever done. Could you specify more over what do to next?
We wish to inform you that any questions/inquiries from potential project mentees regarding this project can be sent to cbrain-support.mni@mcgill.ca (preferred email).
If your inquiry has already been posted in this thread, we will respond to it directly here.
We are currently receiving a high number of inquiries, and will be responding as soon as we are able. Thank you for your patience!
Hi everyone! I’m Shruti Parmar, a 3rd-year Computer Science student specializing in Data Science. I have experience working with AI/ML, open-source projects, and handling real-world datasets. Excited to contribute to CBRAIN by developing the Python CLI, making data processing and HPC access more efficient for researchers. Looking forward to learning and collaborating with you all!
Hey @dle ,
I have prepared my proposal and wanted to get reviewed and want feedback over it and wanted to how how to proceed further. However I find no response either here or over the mail. Please look into it.
Thanks
To all GSoC candidates interested in this project, please see below for some updated information regarding the project.
the CLI project should be developed in a separate GitHub repo. Do not make pull requests on the main CBRAIN project repo.
the CLI project’s codebase should contain the minimal set of Python files to implement its core functionality. Do not include external libraries imported by your project, or files installed as part of a virtual environment.
Implementation
Single command ‘cbrain’ that runs in a Unix (Linux/Mac) terminal
The command is a python script that depends on CBRAIN API
libraries to perform its functions
The command has a standard argument structure:
options starts with ‘-’, e.g. -j or --json
other arguments are subcommands and their parameters
Session is maintained in a credentials.json file; advanced
features would be to be able to maintain several distinct
sessions and switch between them?
Examples
These are ‘vision’ examples, not to be taken as the literal product
that this project expects. The developer is free to make adjustments
and make things more convenient/pretty at any levels.
The lines starting with '> ’ represent the terminal’s prompt
and show the ‘cbrain’ command with sample options and argument.
Lines the follow show sample outputs. If these lines are in
parenthesis, it means a description of the output is shown
instead.
> cbrain
(shows the basic usage statement for the command)
> cbrain login
Enter CBRAIN server URL prefix: http://localhost:3000
Enter CBRAIN username: jdoe
Enter CBRAIN password: *********
Connection successful, API token saved in $HOME/.config/cbrain/credentials.json
> cbrain whoami
Current user: jdoe (John Doe) on server http://localhost:3000
> cbrain -v whoami
DEBUG: Found credentials $HOME/.config/cbrain/credentials.json
DEBUG: User in credentials: jdoe on server http://localhost:3000
DEBUG: Token found: a3****b2
DEBUG: Verifying token...
DEBUG: GET /session
DEBUG: Got JSON reply {"user_id":23,"cbrain_api_token":"0123456789abcdeffedcba9876543210"}
DEBUG: GET /users/23
DEBUG: Got JSON reply {"id":23,"login":"jdoe","full_name":"John Doe","email":...}
Current user: jdoe (John Doe) on server http://localhost:3000
> cbrain list projects
ID Type Project Name
-- ----------- ----------------
34 UserProject jdoe
55 WorkProject my_research
> cbrain --json list projects
[
{ "id": 34, "type": "UserProject", "name": "jdoe" },
{ "id": 55, "type": "WorkProject", "name": "myresearch" }
]
> cbrain switch project 34
Current project is now "jdoe" ID=34
> cbrain show project
Current project is "jdoe" ID=34
> cbrain switch project ALL
No current project selected, everything is unfiltered.
> cbrain list files
ID Type File Name
---- ----------- -----------------------
2616 TextFile license.txt
9221 CivetOutput sub-1234_feb24-beluga-1
> cbrain --json list files
(same list but in JSON)
> cbrain list tools
> cbrain list tool_configs
> cbrain list tasks
> cbrain list users
(note: would only list one user unless the user is admin)
> cbrain list data_providers
> cbrain list remote_resources
> cbrain list background_activitites
> cbrain list files where group_id=55 data_provider_id=27
(only shows files based on filters, which are just sent back to the server
as query parameters; cbrain client can validate the keys base on know
allowed filters)
> cbrain list tasks where bourreau_id=45
> cbrain upload 27 TextFile chapter1.txt < chapter1.txt
(27 is the ID of a data provider, the other two args are the type and name; the
content of the file is fed in the unix way from standard input)
> cbrain create task Civet < task_structure.json
> cbrain copy file 2616 27
Background activity ID: 453
(Arguments are file ID, destination data provider ID)
> cbrain move file 2617 27
Background activity ID: 454
(Arguments are file ID, destination data provider ID)
> cbrain --json show background_activity 453
{"id"=>453, "user_id"=>34, "type"=>"BackgroundActivity::CopyFile",
"status"=>"Completed", "items"=>[2616], ... }
> cbrain show file 2616
id: 2616
type: TextFile
name: robot.txt
data_provider: 27
size: 4533
num_files: 1
user_id: 34
group_id: 55
> cbrain show task 12345
> cbrain show tool 13
> cbrain show tool_config 422
> cbrain show data_provider 27
> cbrain show remote_resource 45
(etc etc)
> cbrain logout
The tool should accept 'ls' and 'list', 'mv' and 'move', and 'show' and 'info', as equivalent, for example.
This is amazing. I have gone through the codebase and understood the Bourreau and BrainPortal and its functionalities.
I am 2024 pass out and have extensive experience in Backend technologies. I have my proposal ready. If it needs to be reviewed before submitting. Please feel free to contact me.