GSoC 2025 Project #24 GestureCap :: Markerless gesture-recognition and motion-capture Ml/AI for music, speech generation; develop neuroscientific, psychological theories (music creativity, music-movement-dance interactions) (350h)

Title: Markerless gesture-recognition and motion-capture Ml/AI-based tool to drive music and speech generation, and develop neuroscientific/psychological theories of music creativity and music-movement-dance interactions

Mentors: Louis Martinez <louis.martinez@telecom-paris.fr>, Yohai-Eliel Berreby <yohaiberreby@gmail.com>, and Suresh Krishna <suresh.krishna@mcgill.ca>

Skill level: Intermediate - Advanced

Required skills: Comfortable with Python and modern AI tools. Experience with image/video processing and using deep-learning based image-processing models. Familiarity with C/C++ programming, low-latency sound generation, image processing, and audiovisual displays, as well as MediaPipe is an advantage, though not necessary. Max/CSound/PureData familiarity preferred.

Time commitment: Full time (350 hours)

Forum for discussion

About: Last year, we developed GestureCap (GSoC 2024 report · GitHub), a tool that uses markerless gesture-recognition and motion-capture (using Google’s MediaPipe) to create a working framework whereby movements can be translated into sound with short-latency, allowing for example, gesture-driven new musical instruments and dancers who control their own music while dancing.

Aims: This year, we aim to build on this initial proof-of-concept to create a usable tool that enables gesture-responsive music and speech generation, as well as characterize the low-latency properties of this system and the sense of agency that enables. The development of GestureCap will facilitate both artistic creation, as well as scientific exploration of multiple areas, including for example - how people engage interactively with vision, sound, and movement and combine their respective latent creative spaces. Such a tool will also have therapeutic/rehabilitative applications in populations of people with limited ability to generate music and in whom agency and creativity in producing music have been shown to produce beneficial effects.

Website: GSoC 2024 report · GitHub

Tech keywords: Sound/music generation, Image processing, Python, MediaPipe, Wekinator, AI, Deep-learning

2 Likes

Thanks for sharing the information about the GestureCap project for GSoC 2025. I am very keen to work on this project and extend the efforts of 2024.

As a machine learning, deep learning, and computer vision practitioner, the concept of gesture-controlled music and audio synthesis resonates with me. I am intrigued by the cross-disciplinary aspect of this project with AI, music, and motion capture. I would be keen to know more about furthering gesture recognition using MediaPipe, especially in areas like low-latency performance optimization, time-synchronous gesture tracking, and more integration with audio synthesis.

It would be a big help if you could inform me of how I can help and whether there are any microtasks or steps I can take in order to start. I look forward to your instructions!

Sincerely,
Harsh Gupta
GitHub: 4444Harsh (Harsh) · GitHub

Respected Sir,

I hope you are doing well. My name is Yash Pathak, and I am a third-year engineering student from India with experience in AI, machine learning, and real-time control systems. I came across the GestureCap project in the GSoC 2025 project list and found it highly aligned with my skills and interests.

I have experience in Python, deep learning, image processing, and real-time signal processing. I have previously worked with MediaPipe, OpenCV, and neural networks for gesture recognition and AI-driven applications. The idea of using markerless gesture recognition for music and speech generation excites me, and I would love to contribute to improving the system’s latency, accuracy, and user interaction.

I have already explored the GestureCap repository/documentation. I would greatly appreciate your guidance on how to get started.

Looking forward to your guidance and the opportunity to contribute to GestureCap!

Best regards,
Yash Pathak

yashpradeeppathak@gmail.com
https://www.linkedin.com/in/vindicta07/

Respected Sir,

I hope you’re doing well. I’ve outlined the full technical roadmap for the Markerless Gesture-Recognition and Motion-Capture AI Tool, detailing the key phases, methodologies, and deliverables. I’ve ensured that the plan is structured to align with GSoC’s timeline while keeping the scope practical and impactful.

:pushpin: Key Highlights of the Roadmap:

Uses MediaPipe/OpenPose/BlazePose for real-time gesture tracking.
Implements AI-based gesture-to-sound mapping using CNNs, RNNs, and Transformers.
Integrates with PureData, MAX/MSP, and CSound for real-time sound synthesis.
Focuses on low-latency inference for real-time interaction.
GUI & customization will be minimal during GSoC, with a focus on core AI functionalities first.

I’ve attached the roadmap for your review. Please let me know if any refinements are needed. Looking forward to your feedback and excited to work on this project!

Best regards,
Yash Pathak
https://www.linkedin.com/in/vindicta07/
yashpradeeppathak@gmail.com

To: @suresh.krishna @yberreby

Dear Louis, Yohai-Eliel, and Suresh,

I hope you’re doing well. I am Chinmaya, a final-year undergraduate student at NITK Surathkal, specializing in AI/ML and deep learning. I came across the GestureCap project and found it highly aligned with my experience and interests. I would love to contribute as part of GSoC 2025.

I have worked extensively with deep learning-based image processing, including segmentation and classification tasks in medical imaging and computer vision. My projects have involved applying PyTorch-based models for gesture and object recognition, and I have experience working with real-time inference, video processing, and multimodal AI applications. While I am not yet familiar with MediaPipe, Max/CSound/PureData, I am eager to learn it.

Before submitting my proposal, I wanted to ask if there are any prerequisite tasks or areas I should explore to better understand the project’s scope. Looking forward to your guidance!

With regards,
Chinmaya

Good evening!

My name is Emilia Dobra, and I am a first year computer science student at the Polytechnic University of Bucharest. I’m reaching out to connect with you regarding the process of applying for Gsoc 2025. I find the subject of neuroscience very fascinating and challenging, which is why I researched INCF’s opportunities, but also from the recommendation of a friend who took part in one of your projects in the previous years. From what I’ve seen, the project #24: GestureCap is the best match for me. I’d like to tell you more about me: in highschool I achieved great results, participating in the Informatics Olympiad, which proves my proficiency in c/c++, but also my problem solving abilities under pressure, and debugging skills. Also, I worked and developed projects in different teams and competitions such as: AstroPi (first contact with ML and python) and FTC (robotics competition - team leader). During my first year at university, I’ve become even more experienced with c/c++ as I have managed to develop an Image editor completely in C. Currently, I’m studying signal and sound processing, using advanced Fourier Analysis concepts. I would like to gain a better understanding of these concepts as I believe they could be of great help in this particular project. Would you be kind to provide me with more information and possibly documentation that could guide me through the process?

Kind regards,

Emilia Dobra