GSoC 2020 project idea 20: A reduced time-series feature library to efficiently characterize neural dynamics

malin · January 15, 2020, 11:51am

Large volumes of high-quality and diverse open neuroimaging datasets continue to be measured and shared, from microscopic single-neuron spike recordings to macroscopic whole-brain fMRI data. These incredible time-series datasets encapsulate the rich temporal dynamics underlying our ability to process information around us but the dominant analytic tools we use to analyze them are mostly very simple. For example, time series are most commonly analyzed using features derived from the Fourier transform, despite many more sophisticated analysis methods being developed over the past decades across a range of scientific disciplines. There is a real need to unleash more sophisticated analyses on these data, but there are challenges in selecting suitable methods and implementing them efficiently in open-source environments to enable their application to large time-series datasets.

Aims: In this project, we will leverage a comprehensive library of time-series analysis methods that we recently developed (Matlab-based hctsa, https://github.com/benfulcher/hctsa), to formulate a new reduced feature set tailored to neuroimaging data. We recently showed that this library of >7000 features can be reduced to just 22 with minimal loss in accuracy: the C-coded catch22 feature set (https://github.com/chlubba/catch22). This project will follow a similar methodology to produce an efficiently coded set of time-series features for neuroimaging data, enabling the diversity and sophistication of the time-series analysis literature to be leveraged by the neuroscience community. As with catch22, the features will be coded in C, with wrappers for other open coding languages including python.

Skills: Python and C coding, familiarity with Matlab (to run existing code), and familiarity with statistical analyses.

Mentors: Ben Fulcher (ben.fulcher@sydney.edu.au) and Joseph Lizier (joseph.lizier@sydney.edu.au)

sugam45 · January 18, 2020, 1:30pm

Hello,
My name is Sugam Srivastava. I am an undergraduate from the Indian Institute of Technology Kanpur. I am an open-source enthusiast and have been a part of the community for about three years now. I am particularly interested in the area of deep learning and neuroscience.

During my undergrad, I have already learned the skills required for this course through my projects and classes. I would love to contribute to this project during the summers and learn a bunch of things on the way.

It would be great if anyone can tell me how to get started and also give a few pointers regarding the project.

arnab1896 · January 18, 2020, 7:06pm

Hi @sugam45,
Welcome and thanks for your interest.

RE: How to get started?
You can get started by leveraging the links of repositories shared in project idea above. Please revert with specific queries/issues you have from the links above. Also, try thinking about a flowchart of how you would go about solving the problem statement given above; that will help you ask mentors questions that are finding difficult to answer yourself.

RE: Specific pointers regarding the project :
Tagging the mentors @ben.fulcher so that they can guide you here. Ben, could you also notify Joseph? Seems like I am unable to tag him here

ben.fulcher · January 19, 2020, 11:31pm

Thanks Sugam and welcome! The best place to start is with reading the paper about catch22—this will form a model for the work we will do in this project (but using open neuroscience datasets to quantify the performance of each time-series feature).
Ben

vinay · January 21, 2020, 6:18am

Hi all,
I am Vinay, a student at International Institute of Information Technology, Hyderabad. Over the past years, I have worked on a wide variety of data problems as a part of academia and open source projects.

I have gone through the initial procedures in catch22 where 7658 hctsa features are pre-filtered to 4791 and these are finally reduced to 22 with methods such as Performance Filtering(best features on all tasks) and Redundancy minimization(hierarchical linkage clustering).

So, are we expected to come with similar methods of feature reduction to neuroscience datasets and not limiting to UEA/UCR repository ?

ben.fulcher · January 28, 2020, 2:17am

Dear Vinay,
Welcome to Neurostars and thanks for your interest in this project (and apologies for the delay—still setting up my notifications).
Yes, you’re right—a major limitation of catch22 is that the performance of each feature was evaluated on a set of datasets that are very different to neuroimaging data. Thus it’s not clear whether these are well-suited to neuroimaging. Our aim here is to apply a similar procedure to the one in catch22 to neuroscience datasets of a given modality (e.g., lots of EEG datasets with a clear outcome, e.g., patient/control labels for classification). The resulting reduced feature sets (coded efficiently) could provide a succinct, accessible, and influential way of representing neuroimaging data.

vinay · January 28, 2020, 2:00pm

Thanks for the clarification @ben.fulcher.

Will familiarize with the related repositories as the initial step to get a deeper understanding of the work

imraniac · March 8, 2020, 9:27pm

Hi, My name is Imran Alam, I m an undergraduate student from Jadavpur University. I have recently studied the functional connectivity of brain regions in fMRI dataset, analysed regional time-series data and also published a paper on the study. So, I believe that I am very much familiar with the tools (like FSL), python programming and have some theoretical knowledge of neuroscience. I would like to work on this project. I have gone through some links above and read all the comments. So, to start with should I give my ideas or is there any task that I need to complete? @ben.fulcher

ben.fulcher · March 8, 2020, 11:35pm

Hi Imran, and welcome to Neurostars!
Very happy to have you interested in the project—that is indeed a very useful set of background skills to have for this project.
Our aim will be apply a large set of features to a range of openly available classification tasks using neuroimaging data. Examples are https://openneuro.org/ and http://fcon_1000.projects.nitrc.org/.
We can use a pipeline similar to that described in the catch22 paper to score features on their usefulness and hopefully distill down to a useful subset of features for this task…
Ben

LauraM · March 9, 2020, 7:33am

Hi!
My name is Laura Masaracchia and I will start a PhD in Computational Neuroscience in Aarhus University within the next couple of months. I am writing here because I think this project idea might be the closest to what I am thinking to do.
My idea is to convert into Python a MATLAB toolkit for time series data in computational neuroscience which has been developed by my future supervisor and which I will be using in my future research.
The toolbox is called HMM-MAR (Hidden Markov Model - Multivariate Autoregressive) and it segments multivariate time series into states that are characterized by their unique quasi-stationary spectral properties. The toolkit supports different data modalities (EEG, MEG, LFP, fMRI, etc) and includes, besides the analysis methods, utilities for preprocessing, interrogation of the results, semi-supervised prediction of events and some other utilities. You can find the toolkit here https://github.com/OHBA-analysis/HMM-MAR/wiki .
@ben.fulcher do you think my idea is pertinent to this project? (Sorry if it is not, I am new and I am still trying to figure my way around here!)

Thank you in advance!
Looking forward to your comment,
Laura

imraniac · March 9, 2020, 8:01am

Ok, I read the paper with catch22 pipeline and @ben.fulcher I have a query, that should I use the existing library of catch22 and modify it to work on 4d data or create a fresh one with the same pipeline.
Currently, I have been trying to read the Nifti images in C and extract the time series from individual voxels. Then I will apply the catch22 pipeline on those time series. But there is an issue, that there are several voxels in a typical brain fMRI scan (3T or 7T) and it can take a very long time to extract.

imraniac · March 9, 2020, 8:13am

One thing we can do is extract only the time series for ROIs (region of interest) in rs-fMRI (resting state), which will be the average time series of all voxels within the specified region. In this way we will have a relevant and relatively small no. of data that represents the whole functional unit.
Please tell me if I should proceed in this way.

Thanks
Imran

ben.fulcher · March 10, 2020, 5:45am

Hi Laura,
Welcome to the discussion and sharing your interesting ideas! The HMM-MAR analysis seems like a specific processing pipeline for multivariate time series, whereas here we’re concerned with features of univariate time series. That is, we’re interested in algorithms that take a single univariate time series, and output summary statistics of the structure in it. This project is about distilling a large, interdisciplinary literature on these types of summary statistics (or features) to a smaller, interpretable set, and coding them efficiently. To do this, we need to know which of the thousands of possible features are most useful for the types of questions researchers tend to be interested in addressing with respect to neuroimaging. We will follow a similar analytic pipeline as we used for catch22, in evaluating the performance of different features on a set of neuroimaging tasks. You can have a read of that paper if interested.
I’ve written an introduction to using features to answer time-series analysis questions here.
Hope this is helpful!
Best,
Ben

ben.fulcher · March 10, 2020, 5:50am

We will want to use hctsa in the starting point, following the same logical steps as in catch22 to reduce the >7000 features to a smaller number (perhaps ~20).
If we go with classifying psychiatric disorders from resting-state fMRI, I suspect we will need properly processed and corrected data (e.g., using ICA-AROMA) at a parcellation of ~100 brain regions (you are right that analysis at the voxel level will be far too computationally intensive).
There are other more straightforward ways to formulate data from which to evaluate the performance of individual features (e.g., making each evaluation based on a single brain region, and we don’t have to use disorder classification as our only output, we could also think about comparing conditions in a task, or evaluate accuracy of fingerprinting individuals on the basis of their neuroimaging data, etc.)
Hope this clarifies.

imraniac · March 16, 2020, 3:10pm

Okay I understood the overview of this project. I have started writing the proposal and I will ask if I have any further doubts/queries. Where should I share my proposal with you?
Thank You.

ben.fulcher · March 17, 2020, 1:51am

Great, thanks for your interest in this project. You may submit proposals directly through the GSoC platform: https://summerofcode.withgoogle.com/

Aldo_Camargo · March 24, 2020, 10:20pm

Hi Ben,

The project looks very interesting and with a lot of potential to be use in many research areas. I am currently studying the Alwheimer with neuroimaging (MRI, and fMRI). I read the paper that you suggested to read to one of the participants. I am interested on this project, but the aim is not clear for me. So, Do you want to move the matlab code to a C,C++, python ?

Have a nice day,

Aldo