GSoC 2025 Project #24 GestureCap :: Markerless gesture-recognition and motion-capture Ml/AI for music, speech generation; develop neuroscientific, psychological theories (music creativity, music-movement-dance interactions) (350h)

Title: Markerless gesture-recognition and motion-capture Ml/AI-based tool to drive music and speech generation, and develop neuroscientific/psychological theories of music creativity and music-movement-dance interactions

Mentors: Louis Martinez <louis.martinez@telecom-paris.fr>, Yohai-Eliel Berreby <yohaiberreby@gmail.com>, and Suresh Krishna <suresh.krishna@mcgill.ca>

Skill level: Intermediate - Advanced

Required skills: Comfortable with Python and modern AI tools. Experience with image/video processing and using deep-learning based image-processing models. Familiarity with C/C++ programming, low-latency sound generation, image processing, and audiovisual displays, as well as MediaPipe is an advantage, though not necessary. Max/CSound/PureData familiarity preferred.

Time commitment: Full time (350 hours)

Forum for discussion

About: Last year, we developed GestureCap (GSoC 2024 report · GitHub), a tool that uses markerless gesture-recognition and motion-capture (using Google’s MediaPipe) to create a working framework whereby movements can be translated into sound with short-latency, allowing for example, gesture-driven new musical instruments and dancers who control their own music while dancing.

Aims: This year, we aim to build on this initial proof-of-concept to create a usable tool that enables gesture-responsive music and speech generation, as well as characterize the low-latency properties of this system and the sense of agency that enables. The development of GestureCap will facilitate both artistic creation, as well as scientific exploration of multiple areas, including for example - how people engage interactively with vision, sound, and movement and combine their respective latent creative spaces. Such a tool will also have therapeutic/rehabilitative applications in populations of people with limited ability to generate music and in whom agency and creativity in producing music have been shown to produce beneficial effects.

Website: GSoC 2024 report · GitHub

Tech keywords: Sound/music generation, Image processing, Python, MediaPipe, Wekinator, AI, Deep-learning

3 Likes

Thanks for sharing the information about the GestureCap project for GSoC 2025. I am very keen to work on this project and extend the efforts of 2024.

As a machine learning, deep learning, and computer vision practitioner, the concept of gesture-controlled music and audio synthesis resonates with me. I am intrigued by the cross-disciplinary aspect of this project with AI, music, and motion capture. I would be keen to know more about furthering gesture recognition using MediaPipe, especially in areas like low-latency performance optimization, time-synchronous gesture tracking, and more integration with audio synthesis.

It would be a big help if you could inform me of how I can help and whether there are any microtasks or steps I can take in order to start. I look forward to your instructions!

Sincerely,
Harsh Gupta
GitHub: 4444Harsh (Harsh) · GitHub

Respected Sir,

I hope you are doing well. My name is Yash Pathak, and I am a third-year engineering student from India with experience in AI, machine learning, and real-time control systems. I came across the GestureCap project in the GSoC 2025 project list and found it highly aligned with my skills and interests.

I have experience in Python, deep learning, image processing, and real-time signal processing. I have previously worked with MediaPipe, OpenCV, and neural networks for gesture recognition and AI-driven applications. The idea of using markerless gesture recognition for music and speech generation excites me, and I would love to contribute to improving the system’s latency, accuracy, and user interaction.

I have already explored the GestureCap repository/documentation. I would greatly appreciate your guidance on how to get started.

Looking forward to your guidance and the opportunity to contribute to GestureCap!

Best regards,
Yash Pathak

yashpradeeppathak@gmail.com
https://www.linkedin.com/in/vindicta07/

Respected Sir,

I hope you’re doing well. I’ve outlined the full technical roadmap for the Markerless Gesture-Recognition and Motion-Capture AI Tool, detailing the key phases, methodologies, and deliverables. I’ve ensured that the plan is structured to align with GSoC’s timeline while keeping the scope practical and impactful.

:pushpin: Key Highlights of the Roadmap:

Uses MediaPipe/OpenPose/BlazePose for real-time gesture tracking.
Implements AI-based gesture-to-sound mapping using CNNs, RNNs, and Transformers.
Integrates with PureData, MAX/MSP, and CSound for real-time sound synthesis.
Focuses on low-latency inference for real-time interaction.
GUI & customization will be minimal during GSoC, with a focus on core AI functionalities first.

I’ve attached the roadmap for your review. Please let me know if any refinements are needed. Looking forward to your feedback and excited to work on this project!

Best regards,
Yash Pathak
https://www.linkedin.com/in/vindicta07/
yashpradeeppathak@gmail.com

To: @suresh.krishna @yberreby

Dear Louis, Yohai-Eliel, and Suresh,

I hope you’re doing well. I am Chinmaya, a final-year undergraduate student at NITK Surathkal, specializing in AI/ML and deep learning. I came across the GestureCap project and found it highly aligned with my experience and interests. I would love to contribute as part of GSoC 2025.

I have worked extensively with deep learning-based image processing, including segmentation and classification tasks in medical imaging and computer vision. My projects have involved applying PyTorch-based models for gesture and object recognition, and I have experience working with real-time inference, video processing, and multimodal AI applications. While I am not yet familiar with MediaPipe, Max/CSound/PureData, I am eager to learn it.

Before submitting my proposal, I wanted to ask if there are any prerequisite tasks or areas I should explore to better understand the project’s scope. Looking forward to your guidance!

With regards,
Chinmaya

Good evening!

My name is Emilia Dobra, and I am a first year computer science student at the Polytechnic University of Bucharest. I’m reaching out to connect with you regarding the process of applying for Gsoc 2025. I find the subject of neuroscience very fascinating and challenging, which is why I researched INCF’s opportunities, but also from the recommendation of a friend who took part in one of your projects in the previous years. From what I’ve seen, the project #24: GestureCap is the best match for me. I’d like to tell you more about me: in highschool I achieved great results, participating in the Informatics Olympiad, which proves my proficiency in c/c++, but also my problem solving abilities under pressure, and debugging skills. Also, I worked and developed projects in different teams and competitions such as: AstroPi (first contact with ML and python) and FTC (robotics competition - team leader). During my first year at university, I’ve become even more experienced with c/c++ as I have managed to develop an Image editor completely in C. Currently, I’m studying signal and sound processing, using advanced Fourier Analysis concepts. I would like to gain a better understanding of these concepts as I believe they could be of great help in this particular project. Would you be kind to provide me with more information and possibly documentation that could guide me through the process?

Kind regards,

Emilia Dobra

Please see the information here. One of those pages ( Recommendations for GSoC contributors | INCF) also has a link to a template - please use it.

https://www.incf.org/activities/gsoc

@csking101 @vindicta_07 @Emi_Dobra and anyone i may have missed, thank you for your interest. i encourage you to put forward ideas for development after going through the github page with the work done as of now. please use the template. as usual, a good proposal answers the questions, what do you propose to do (and why), how will you do it, why are you the person to do it, and here is why it is feasible that you will be able to do it.

the basic prototype has been built and code written, and this year, we expect to extend this basic prototype by connecting to better sound generation tools (with better mappings created) and by improving the video end. so you can focus on the video or mapping end or do both. a survey of work like aumi may also help generate ideas.

knowing the underlying software stack will definitely give you a leg up, so that is also something that can be worked on. good luck.

Thank you for the guidance, Sir ! I’ll go through the GitHub repository and the provided template to structure my proposal accordingly. I’ll also explore related work like AUMI to generate ideas.

I’m particularly interested in working on [mention your area of interest—video processing, sound mapping, or both] and will ensure I understand the underlying software stack to contribute effectively.

Respected Sir @suresh.krishna @yberreby ,
I have started working on the fps side of the code, updates till now:
Improving a gesture-based music system by:
:white_check_mark: Focusing on hand tracking (instead of full-body tracking) for better precision.
:white_check_mark: Using MediaPipe Hands for accurate real-time gesture detection.
:white_check_mark: Implementing error filtering with an exponential moving average to smooth movement.
:white_check_mark: Optimizing FPS & latency to ensure fast, responsive interaction, was getting around 25-40 fps roughly.
:white_check_mark: Sending hand movement data via OSC to control music in PureData.
:white_check_mark: Additionally use of Cyclone Dependencies in Puredata for better optimization.
Currently I am testing on a Github repository available for Sound Interaction. Currently focused on the Optimization side.

Sir wanted to know what should be the next steps, am I going on the right track, Please let me know.
Also Sir, I have provided a video for the same, in the drive link below, Please have a check:
Drive Link

Weren’t almost all these features already implemented last year ? What is the improvement here ? In any case, getting familiar with the code-base and writing a draft proposal is indeed the right track. If you have made specific improvements, that actually involve changing hte existing code and getting it to work, include that in your proposal with details. Good luck.

1 Like

Sir! After going through the project, I’m really excited about the possibilities and would love to explore some new ideas to enhance GestureCap.

A few directions I’ve been thinking about:

  • Adaptive AI for Gesture Mapping – Letting the system learn and personalize gesture-to-sound mappings over time.
  • Multimodal Inputs – Combining gestures with voice or even EEG signals for more expressive control.
  • AI-Powered Dance Synchronization – Generating real-time dance movements based on gestures.
  • Haptic Feedback – Adding vibration or force feedback for a more immersive experience .
  • Generative AI for Music – Using AI to compose music dynamically based on gestures .

Respected Sir @suresh.krishna @yberreby ,
I propose integrating SuperCollider for low-latency, real-time synthesis and GUI for browser-based accessibility. This combination ensures high-performance sound generation while keeping the tool easy to use and install. Also if there are any possibilities like extending this with another alternative: Tone.js.
Sir, any opinion on this?

integratign supercollider is fine and your prposal could show how you will do it. good luck.

1 Like

Respected Sir @suresh.krishna @yberreby ,
I have submitted a proposal from the given template by INCF.
If there are any areas for improvement, I would appreciate your feedback. Please let me know if any modifications are needed.
Looking forward to your guidance.

1 Like

We are not able to tell you that a proposal is good enough or sufficient etc… we rank the proposals, GSoC and then INCF allot. How much you put in/work on the proposal depends on your time, interest and availability/ability. All the best !

1 Like

Hi everyone!

I’m Mrityunjay, and I’m working on a GSoC 2025 proposal to extend GestureCap — a fascinating AI-based framework that uses markerless motion capture to translate body and hand gestures into music in real time.

What’s the Project About?
This year, the goal is to build a fully usable tool that allows gesture-responsive music and speech generation, powered by MediaPipe, OSC, and modern deep learning techniques. The project also aims to investigate the neuroscience and psychology of music-movement-dance creativity and could even have therapeutic applications.

Key focus areas:

  • Improve gesture mapping (e.g., time-dependent gestures using LSTMs)
  • Lower system latency for real-time performance
  • Enable speech generation from gestures
  • Expand accessibility (e.g., smartphone cameras, better GUI)
  • Study sense of agency in gesture-to-sound interaction

Tech Stack: Python, MediaPipe, OSC, PureData/MaxMSP, deep learning, and optionally C++ for low-latency optimization.

I’d love to collaborate, get feedback, or discuss ideas with others interested in creative AI, gesture-based interaction, or neuroscience + music!

Github:- weeebhu (Mrityunjay Kukreti) · GitHub

Looking forward to connecting!

Subject: Excited to Contribute to GestureCap – GSoC 2025

Respected Sir,

I’m Akshitha, a third-year engineering student, specializing in Artificial Intelligence and Machine Learning. I recently came across the “GestureCap” project under INCF for GSoC 2025, and I found the idea incredibly exciting and aligned with my interests in AI, creativity, and real-time interaction.

While I haven’t worked with MediaPipe specifically yet, I do have hands-on experience with Python, image processing, and machine learning. I’ve built ML-based projects such as “gender classification” and “crop prediction”, and I’m always eager to explore innovative applications of AI—especially those that intersect with human expression, like GestureCap.

This would be my first time contributing to open source, and I’m excited about the opportunity. I’ve read through the GestureCap documentation, and I’d love to get involved and start preparing a proposal. I’d really appreciate any guidance or suggestions you might have on how I can contribute meaningfully to the project.

Looking forward to your response and hopefully collaborating with you!

Best regards,
Akshitha Polagani
GitHub: Akshitha-11 (AKSHITHA POLAGANI) · GitHub