Mentors: Alison Wang <jiaxi.wang@durham.ac.uk>, Deepansh Goel <deepansh.04614815623@cseaiml.mait.ac.in>, Suresh Krishna <suresh.krishna@mcgill.ca>
Skill level: Intermediate – Advanced
Required skills: If interested in music generation, experience/familiarity with a framework like Max/CSound/PureData/SuperCollider required and some experience with Python preferred. If interested in speech generation, fluency in Python required and experience with sign-language transcription / speech generation libraries preferred.
Time commitment: Full time (350 hours)
About: Over the last two years, we have developed GestureCap (GSoC 2024 report · GitH), a tool that uses markerless gesture-recognition and motion-capture (using Google’s MediaPipe) to create a working framework whereby movements can be translated into sound with short-latency, allowing for example, gesture-driven new musical instruments. We have created a pipeline whereby we can get down to 12 ms gesture to sound latency, thus increasing the range of possibilities for markerless gesture-driven musical expression. We have also created elementary mappings to go from gesture to sound.
Aims: This year, we aim to build on this initial proof-of-concept to create a usable tool that enables gesture-responsive music and speech generation. Of particular interest is the creationo of a workflow/framework that enables the creation of new mappings from detected gestures to sound. The development of GestureCap will facilitate both artistic creation, as well as scientific exploration of multiple areas, including for example - how people engage interactively with vision, sound, and movement and combine their respective latent creative spaces. Such a tool will also have therapeutic/rehabilitative applications in populations of people with limited ability to generate music and in whom agency and creativity in producing music have been shown to produce beneficial effects.
Project website: GitHub - m2b3/gesturecap2025 · GitHub
Tech keywords: Sound/music generation, Image processing, Python, MediaPipe, Wekinator, AI, Deep-learning