GSoC 2023 Project Idea 16.2 Converting existing scientific workflows to new dataflow engine written in Python: Pydra (350 h)

There are many scientific workflows, written using a variety of languages and frameworks. In neuroimaging, many use Nipype 1 or bash. Pydra is intended to be a generic dataflow engine, and we would like to demonstrate its utility by converting existing workflows from different scientific domains. There is flexibility according to participant interest and experience to select the specific workflows. This project will require the creation of new Pydra Task classes that wrap the necessary tools. Depending on the tools to be wrapped, it may be possible to automatically generate classes from a pre-existing specification, or they may be written manually as needed. Likewise, there may be opportunities to convert specifications of entire workflows into Pydra workflows. Generated workflows will be made accessible through the Niflows framework (or similar) and submitted for reuse to workflow hubs such as workflowhub.eu and dockstore.org.

Skill level: Beginner/intermediate

Required skills:

  • Python 3: novice +
  • Bash: novice +
  • Scientific software (e.g. AFNI, SPM, ITK, SpikeInterface, CalmAn): intermediate +
  • Creating data workflows: beginner +
  • Data file format, e.g. NWB, NIfTI, OME-TIFF, PLINK, HDF5: beginner +

Time commitment: Full-time (350 h)

Lead mentor: Dorota Jarecka, Chris Markiewicz, Hao-Ting Wang, Satra Ghosh, collaboration with groups responsible for specific workflows

Project website: TBD

Backup mentors: TBD

Tech keywords: Workflow, Python, Pydra, Nipype, CWL

1 Like

Hi,

I have good experience (5+ years) with Python and with neuroimaging tools (SPM, FreeSurfer, AFNI, fmriprep, Nipype, etc.). I am interested in Pydra, and I would be happy to contribute to the project.

Please, let me know how to learn more about the project, and if there is any preferred communication channel with the mentors.

Best,
Andrea

@arnab1896 how can I get in touch with the mentors?

First of all, nice to hear from you and thanks for your interest.

Apologies for the delay.
I have notified the mentors and they will be in touch soon with more details such as resources and code bases/issues to look through and you can then ask them more questions based on your ideas

Hi @costantinoai - thanks for reaching out!

If you want to learn a bit more about the project, the best place to start would be the pydra tutorial.

Could you please share your cv/resume or github account.

Thank you!

Sure! I sent you a DM :slight_smile:

Hi @costantinoai , how is the work with tutorial going?

Sorry for the late reply. I went through the tutorials and everything looks pretty clear so far. Can we continue this conversation over email to discuss some additional details about the project?

@costantinoai - I think we should keep the discussion here.

Can you tell me which neroimaging packing you’re interested most and what king of workflow you’re using in your work?

The tools I use more often and I am more familiar with are SPM, marsbar, FreeSurfer, fmriprep, nilearn, nibabel, Heudiconv, fmridenoise, and various MVPA/RSA packages on Python and MATLAB in voxel or surface space. I have a general understanding of nipype (although I didn’t use it in my projects), and I used some retinotopic mapping packages, such as neuropythy.

Generally speaking, I use workflows for task-based fMRI to preprocess and denoise data and run uni and multi-variate analyses. All of (well, most of) my workflows are within the BIDS ecosystem. If that’s relevant to the project, I also compare DNNs and brain data in my workflows.

@costantinoai - perhaps you can try to write a simple workflow using pydra Workflows and Tasks, example of workflows could be found in the tutorial. If your workflow has python function, you can use FunctionTask (example here), if you use some shell command you should be able to create ShellCommandTask (example here).
If you use nipype interfaces you have two options, either to use pydra-nipype1 wrapper or to create a new pydra task for specific command, example for fsl.bet can be found here. The second approach requires more work, so feel free to start from the first one.

Please feel free to ask any questions. Preferable you share a code on github (in any form, might be not working), so I can commend on it. But please, start from something relatively easy, with not too many nodes/tasks.

Thanks, @djarecka.

I will probably start by trying to convert my current fMRI workflow to pydra, and I will set up a GitHub repo to share the progress. I’ll keep you posted as soon as I have any news.

1 Like

@costantinoai - how are you doing? please feel free to share whatever you have

@djarecka - the GitHub repo can be found here. I can give you edit permissions if you think that would make things easier.

So far, I implemented the first two tasks of the workflow, one as a FunctionTask and the other using a ShellCommandTask. The tasks are very basic:

  1. We get the DICOM images from the dcm2bids tutorial
  2. We convert the DICOMS into BIDS structure using dcm2bids

In terms of what I plan to implement, you can find a list of TODO functions in the file ./test/fmri_wf_funcs_test.py. Please, let me know if anything is unclear or if you have any feedback/suggestion.

1 Like

By the way, I recently released pydra-dcm2bids if you don’t want to maintain your own task for it. Feel free to try it out and report any issue you may find.

In addition to the tutorial, you can also check this workflow implementing a simple defacing algorithm using FSL tasks. It will give you a feel of how it looks end-to-end, including CLI parsing and optional branching.

Thank you @costantinoai for creating the repository, I can see that you were able to create multiple pydra tasks for the tools you were using! As @ghisvail mentioned some of the tasks could easily be added to specific packages, e.g. pydra-scm2bids.

Since next Tuesday there is a deadline for the project submission. I would be happy to read the draft if you prepare something this week.
Also, @yibeichen, last year contributor and co-mentor this year agreed to share her proposal, so you can see an example: proposal.pdf - Google Drive
More example you can also find on PSF website(https://blogs.python-gsoc.org/en/).

Thanks, @djarecka, and thanks @yibeichen for sharing last year’s proposal.

I had a look into the proposal and into some Pydra workflows, and it seems that most of my current fMRI pipeline is already implemented in Pydra, so I am not sure what would make sense here to implement. Any suggestions? What workflows would you like to see implemented in Pydra?

Hi @costantinoai - not everything what was planned was completed, so there are definitely much more to do.
Perhaps we can try to set a quick 30-min call this week to think about specific tasks/workflows. Can you please fill the poll: meeting with Andrea Costantino - When2meet

@ghisvail and @yibeichen - can you also fill the poll, perhaps it would be easier to meet for a quick call to discuss it current work on tasks. We don’t have to have everyone on the call, but let’s try.

Thanks everyone for filling the poll. You should have received an invitation for Thursday (10am EST, 4pm CEST), if you haven’t please dm.

In the meantime, @costantinoai please think and let us know about your summer schedule. This project is 350h. The standard period for GSoC is May 29th to August 28th. You could choose to extend this time, but I would prefer to not extend too much (not later than September).