GSoC 2020 project idea 14: Pre-trained models for Developmental Neuroscience

This project will center around building a pre-trained model for shapes and processes related to Developmental Biology and Neurobiology and extracted from image data. Our organization’s Machine Learning interest group (DevoWormML) has published a blog post [1] on the advantages and need for pre-trained models in this area. In short, biological development is characterized by characteristic shapes, movements, changes in shape, and temporal processes that define important features. Pre-trained models are used in NLP and Deep Learning for the domains of sequence discovery in language processing (GPT-2) and bounding box methods for segmenting complex images (DeepLabv3). Models specialized for biology, however, do not exist. A suitable pre-trained model would greatly reduce the need for input data without sacrificing the ability to generalize to different contexts.

Our main interest is in extracting spatiotemporal features from image data. We will focus on microscopy data such as that found in the DevoZoo or from more specialized sources [2]. For a typical pre-trained model, the network is pre-trained with non-random weights that approximate the generalized versions of the features we would like to discover. However, we are also interested in a semantic component, particularly the ability to incorporate elements such as meaning assigned to static knowledge (semantics) and multiple meanings for a single feature (polysemy). This will enable relational modeling and the mapping of segmented image data to lineage trees and taxonomies. This will enable relational modeling and the mapping of segmented image data to lineage trees and taxonomies. Our model, tentatively called DevLearningv1, should be applicable to a wide range of neural network and deep learning techniques.

As a student, you will become a contributor at the OpenWorm Foundation, where we are attempting to build a virtual organism. You will learn about developmental neurobiology, and join the DevoWorm group. OpenWorm has an active interest in data science, and DevoWorm in particular has an active interest in machine learning research and education. We seek someone with experience with programming languages C++ and Python, and a machine learning platform such as TensorFlow or Keras.

Mentor: Bradly Alicea (balicea@openworm.org) and Stephen Larson (stephen@openworm.org), OpenWorm Foundation (https://openworm.org).

NOTES

[1] Blogpost on pre-trained models: https://thenode.biologists.com/pre-trained-machine-learning-models-for-developmental-biology/uncategorized/

[2] Crawford-Young, S.J., Dittapongpitch, S., Gordon, R., and Harrington, K.I.S. (2018). Acquisition and reconstruction of 4D surfaces of axolotl embryos with the flipping stage robotic microscope. Biosystems, 173, 214-220. doi:10.1016/j.biosystems. 2018.10.006.

Hello there,

I’m Mayukh Deb, an undergrad student at Amrita University, Kerala, India. I’m a deep learning enthusiast and have been busy making fun deep learning projects. I’ve also been working with image processing, data wrangling and augmentation etc. I’m also a member of amFOSS, a student run community of open source enthusiasts. And I spend most of my time with PyTorch, Numpy and Pandas. Most of my projects are on image datasets from kaggle.

Recently I’ve been working a deep neural network that derives medical inferences from ECG data, which works pretty well already . Another one of my recent projects include using deep learning to steer a car and keep it on the racetrack in a game purely from visual data.

I got really interested in this project because I’ve always been fascinated with application of deep learning in biology and medical sciences, and it intersects with my skillset pretty well.

I’m really looking forward to contribute to this organization and learn a ton of new things on the way.

It would be great if I get a few pointers and microtasks which would help me obtain a better understanding of this project.

Hi @Mayukhdeb,

Thanks for your interest.

There are 3 ways to get pointers/microtasks for the project.

Let me begin with the fastest way :

  1. Visit OpenWorm’s website linked above and you will find an option to join OpenWorm’s Slack channel. This is fast because mentors of OpenWorm are usually very active on Slack, so you will get your queries answered very fast.
  2. Other way is to email the mentors. Their email IDs are listed above.
  3. Else, you can tag them here and ask them any specific questions that you have -> @b.alicea and @Stephen_Larson1
1 Like

Hello Mayukh,

Sounds like you would be a good candidate for this project. This builds on projects from previous years, so please see our previous project presentations for some context. You might also be interested in the DevoWormML course, which was held last Fall and covers some of the topics our group is interested in.

As Arnab has suggested, please join the OpenWorm Slack, and join #devoworm and #devowormml. The blog post (mentioned in the project description) and DevoWormML materials should give you a better idea of what we are looking for. I can guide you with proposal development, so send me a draft version for feedback when you get to that stage. Good luck!

1 Like

Thanks for the reply !

I’ve already read the blog post given in the project description, and I’m planning to go through the previous project presentations and the DevoWormML course today.

Apart from this, since yesterday I’ve also been setting up an image augmentation pipeline and experimenting with different augmentation techniques on images of blood cells.

See you in slack !

Hi @Mayukhdeb. Once you’ve gone through all the links, you can start writing the proposal with the solutions and approaches that you can think of. We can give our feedback on it.

If you are finding it difficult to understand any biological terminology, you can look at Worm Book. You can learn about the C.Elegans worm( its development stage, its adult stage etc). This way you can gain a better understanding of the data we have.

1 Like

@nvinayvarma189 thanks for the reply !

Since I don’t have access to the your dataset for now, I’ll use a dummy dataset containing images of blood cells to produce the examples in the proposal.

Will work on these and keep you guys updated for sure.

Hi everyone, I’m Adarsh Kumar an Undergrad Computer Science student at IIIT-Delhi, India. I’m a Deep Learning enthusiast and have experience of several Machine Learning/ Deep Learning projects. I’ve done a Deep Learning- 5 Course Specialisation on Coursera and a Machine Learning course from Andrew Ng on Coursera as well and have good hold on DL/ML Fundamentals as well as some of the latest advancements in the field.

I have good experience of working with ResNets, Inception networks, object detection(YOLO), several CNN architechture like AlexNet, VGG-16, ResNet-50. I’ve also worked on PCA, SVMs, RNNs, LSTMs, Sequence to Sequence architechture like Attention models. I’m fluent with Python, Java and also with Frameworks like TensorFlow and Keras(as for most of my projects I’ve used either of the two). I love solving challenging problems and learning new things. I’m a hard working and dedicated guy. Since the last couple of days I’ve being doing the following:

  1. I’ve gone through the previous year GSoC projects recommended above (I just realised that GSoC-2017 Project was done by my Senior Siddharth).
  2. I’ve also read the blog provided in the Notes
  3. I have skimmed over the Dataset DevoZoo and now have a good feel of how the data will be.
  4. I’ve also seen the DevoWormML Course(the one about the Pre-trained models) which provided me with even more clarity and after doing all that now I have good enough idea about the project.

I always wanted to work at the intersection of Biology and Deep Learning and this project provides me with that opportunity, I’d love to contribute to the project and learn from others here.
Any pointers on what should I do next?

The next step is to write a draft proposal. I can help you through the process if it is helpful. Base your proposal on the above project announcement. Please include a task timeline for how you will execute the project. Look forward to hearing from you soon.

Thanks for the reply @b.alicea

Sure I’ll start with my proposal and will keep you informed.
Thanks

Hi ,this is Yamini ,a third year Computer Science student from Bits Pilani,Goa.I have gone through the above mentioned parts.The data set had microscopy videos and few images of embryos.My question is we are gonna feed the images or videos into the model?

MANAGED BY INCF