Mentors: Katarzyna Jurewicz <Katarzyna.jurewicz@mcgill.ca>, Buxin Liao <buxin.liao@mail.mcgill.ca>, Suresh Krishna <suresh.krishna@mcgill.ca>
Skill level: Advanced
Required Skills: Fluency in Python and PyTorch. Familiarity with open-source vision and multimodal AI models. Familiarity with Slurm and working with clusters preferred. Basic web-development skills or interest in learning them will be useful.
Time commitment: Full time (350 hours)
Forum for discussion
About: Salience map research in computer vision has extensively examined where human observers look in images and videos during free viewing. Despite cognitive psychology recognizing the role of behavioral goals for over 50 years, integrating task dependence into quantitative models and large open datasets is a recent development.
Aims: This project aims to create an open portal that consolidates existing machine learning/AI models and eye-tracking datasets related to goal-directed vision (e.g., visual search) while providing tools for model testing and validation. A key focus will be on multimodal AI, particularly language-vision integration. Additionally, this platform will serve as a prototype for similar data+model initiatives on public hardware platforms.
Project website: m2b3/ActiveVision
Tech keywords: Python, PyTorch, Visual search, Saliency, Science portals, Vision AI, Vision-language models.
Dear @suresh.krishna, Buxin Liao and Katarzyna Jurewicz,
I hope you are doing well.
My name is Ishaan Chandak, and I am a final-year Information Technology student at VJTI Mumbai. I am deeply passionate about exploring the intersection of deep learning and biology.
I’m excited about the opportunity to contribute to this project, and I’ve been exploring different approaches and research directions to better understand how multimodal systems can effectively predict and evaluate task-driven visual attention. Also, I’ve been exploring some existing visual search datasets and evaluation metrics commonly used in saliency research. This has given me a broader perspective on how saliency models can be evaluated. I look forward to learning from you and understanding how I can meaningfully support the ongoing work. Along the way, I’ve been documenting the research and my learnings to better structure my understanding and keep track of useful insights.
I would be happy to contribute to the project and complete whatever starting tasks will allow me to become more familiar with it — please do let me know if there are some relevant tasks for me.
I am also attaching my resume here for your reference.
Best regards,
Ishaan Chandak
Hello @suresh.krishna , Katarzyna jurewicz and Buxin Liao,
I hope you’re doing well. My name is Aniruddha Roy, and I’m a machine learning enthusiast eager to learn more about this project. However, I’m unsure how to get started or contribute effectively, and I would greatly appreciate your guidance.
With hands-on experience in Python, PyTorch, and computer vision, I have worked on projects involving CNNs, U-Net architectures, and vision-language models. Additionally, I have explored LangChain, LangGraph, Hugging Face, and OCR engines like Tesseract and EasyOCR. While I haven’t extensively worked with Slurm, I am keen to learn and adapt quickly while contributing in the project.
Any advice or resources to help me better understand and engage with the project would be highly valuable. Looking forward to your insights!
Thank You,
Aniruddha Roy
Hello everybody, thank you for your interest.
We will have more information on the Github page by Monday next week. We will list more specific goals, example codebases that can be brought together, evaluation tasks, etc. We will also indicate the rough direction for the project, to provide some guidance on the range of ideas you can yourself also propose to do.
Your proposal should then indicate in detail what you will do, how you will do it, why you are the right person to do it (given your background, skillset etc) and that you will be able to do it (feasibility).
Good luck, and thanks for the patience !