GSoC 2023 Project Idea 16.1 Adding new workers and resource management to new dataflow engine written in Python: Pydra (175 h)

Pydra workflows are intended to be written independent of considerations of the computational resources that will ultimately execute them. Pydra “workers” are classes that describe how to submit nodes in an execution graph to a computational resource, for instance a process pool on a local machine or a high-performance computing cluster. Currently Pydra has support for local multiprocessing, Slurm, Dask, and Oracle/Sun Grid Engine (SGE). We would like to expand the range of systems that Pydra can manage, as well as improve utilization of features for existing workers. One goal of particular interest to us is to track resource (CPU, GPU, RAM) allocation to allow scheduling that makes efficient use of those resources. The specific task can depend on the participant’s interest and experience.

Skill level: Beginner/intermediate

Required skills:

  • Programming, OOP: intermediate +
  • Python 3: novice +
  • Bash/Shell: novice +
  • HPC and schedulers: beginner +

Time commitment: Half-time (175 h)

Lead mentor: Dorota Jarecka, Chris Markiewicz, Satra Ghosh

Project website: TBD

Backup mentors: TBD

Tech keywords: HPC, schedulers, Python, Pydra, Nipype

1 Like

Dear Mentors @djarecka @effigies @satra ,

I am a pre-final year engineering undergraduate student at Indian Institute of Information Technology Kalyani studying Bachelor of Technology in Computer Science and Engineering.

I am interested in participating in GSoC 2023 with INCF and would like to apply for Project 16.1. My skills and interests align with the project requirements and I am confident in my ability to make a valuable contribution.

Please advise on next steps and provide any additional information about the project. I look forward to the opportunity to work with the INCF community.

Thank you,
Aditya Agarwal
Github: adi611
Linkedin: Aditya Agarwal | LinkedIn

First of all, nice to hear from you and thanks for your interest.

Apologies for the delay.
The mentors will be in touch soon with more details such as resources and code bases/issues to look through and you can then ask them more questions based on your ideas

1 Like

Hi @adi611 - thank you for your message. If you want to learn about the project, the best place to start would be the pydra tutorial.

Can you please say more about yourself and why are you interested in this project?

Thank you,

Hi @djarecka,

Thank you for your response. I would be happy to share more about myself and why I’m interested in this project.

I’ve had some great experiences in tech so far. I was part of the winning team at the Smart India Hackathon (often referred as “world’s biggest hackathon”) as well as the ETHIndia 2022 Hackathon (often referred as “world’s biggest ethereum hackathon”) and was a world finalist at the UNESCO India Africa International Hackathon. I’ve also contributed to open-source development for organizations like Microsoft, Facebook, Ethereum, Pandas, and more. These experiences have given me a good understanding of how to work collaboratively with teams, both nationally and internationally.

I’m studying Cognitive Science and Technology in my current semester of college, which focuses on neuroinformatics and I am fascinated by the potential for technology to advance our understanding of the brain. Hence I found INCF as an exciting place to contribute and learn.

Regarding Project 16.1, I feel my skills are much in sync with the requirements mentioned and while I’m not as familiar with HPC, I have learned about schedulers in my Operating System classes, so I have a good grasp of the underlying concepts. I think I can quickly learn what I need to know to be effective on this project.

Finally, I’m impressed by the list of mentors mentioned. Working under experienced Engineers and Researchers at leading universities of the world is a unique opportunity, and I would be thrilled to learn from them and contribute to the project.

Thank you,
Aditya Agarwal

@adi611 - please follow the tutorial and feel free to suggest some improvements. After the tutorial, I can suggest some work on the code, so you have a chance to familiarize yourself with pydra and schedulers.

Okay. I have started working on the tutorial and will keep you updated on my progress.

I have noticed some minor typo errors in the notebooks. Should I create a Github issue to address them?

Like here, “pf” instead of “of”

yes, please use GitHub to report any issues and suggest changes

Thank you for this project. I’m Can, a student at University of Cambridge, and I would be interested to work on this project. What grabs my attention is the capability in this project, namely the potential range of computations without requiring the user of the library to supply a series of heavy bottleneck code. I will have finished the Operating Systems course covering scheduling and multitasking by this summer, and I have experience with pthreads from scratch, namely, I created a multithreaded hash bruteforcer. Having played with C many times (and produced mini-projects) I hope to contribute to this project as if implementing an OS scheduler, but in higher level Python. Looking forward to talking to you, regards, Can.

HI @CGDogan - thanks for your message. Could you please point me to your cv/resume and GitHub account if you have

Thank you, emailed dj…@g mail com

@CGDogan - you can also looked at the pydra tutorial

Thank you, I’ll start it now.

1 Like

I apologize for the delay in my response. I have been occupied by my mid-semester university exams this past week. Now that they are behind me, I am ready to resume work and give it my full attention. Thank you for your understanding.

1 Like

have you tryed through the open ai ports thats the worlds biggest conda forge

Hello, I am new contributor i am for work with global communities for start with project can you please share me some tutorial about this.

@Aman123lug - welcome! The tutorial can be found here

@djarecka - Hello, I have been reviewing the ShellCommandTask module and I have noticed that the Binder environment/notebook has stopped working for all modules since yesterday. This is the response I am getting on clicking the Binder button:

@adi611 - thanks for letting me know, could you please report it on github.

Please also try to run the notebooks locally.