GSoC 2020 project idea 21: Automated comparison of scientific methods for time-series analysis

ben.fulcher · March 11, 2020, 2:22am

Thanks you @Salmankhancodes, @ram this is correct. You can imagine a new function comes in a generates a new ~column of hctsa_datamatrix.csv, and we want to see how the patterns in this column (characterizing the new feature) relate to existing patterns contained in columns of hctsa_datamatrix.csv. Descriptions of each file are in the figshare repo.

ram · March 11, 2020, 3:35am

@ben.fulcher Thanks now I’m getting clear image what we actually need to do…and now what about comparison ?..are we just going to to compare on the bases of values that we got in our new column and the most similar columns will be the best match or we need to find some pattern for comparison?

vinay · March 11, 2020, 4:58am

@Salmankhancodes maybe a small addition. I guess one has to use OutputToCSV to reduce .mat to .csv files

Salmankhancodes · March 11, 2020, 5:09am

We already have the output in .csv format so not required.

ben.fulcher · March 11, 2020, 11:44am

A simple Spearman correlation-based similarity metric is a good place to start.

Salmankhancodes · March 11, 2020, 6:08pm

@ben.fulcher Sir when the user will contribute it’s analysis method then:

the addition of their analysis method in the data-matrix will be any random position or it has to be placed at a specific position by applying clustering algorithm like k-medoids.
addition of analysis method will be an optional feature or it will be added automatically as soon as the user will upload it for comparison purpose ?

ben.fulcher · March 12, 2020, 9:54am

Thanks @Salmankhancodes.
1—We will not need to place the new method in the data matrix—the ordering of columns does not need to be special.
2—I think we will allow the user to upload at their discretion (to avoid noise). For this they will need to provide some information about their function (check out how this is done currently on the CompEngine website for uploading time-series data). If possible, it would be great to have a backend where a website manager could approve the upload.

Salmankhancodes · March 12, 2020, 10:27am

Ok Thanks @ben.fulcher Sir so like CompEngine we will make it an optional choice to add their analysis method

harsht24 · March 14, 2020, 5:02am

@ben.fulcher Sir so is this our task to get rank for all input features using Spearman Rank Correlation?

ben.fulcher · March 14, 2020, 5:24am

Each input algorithm/feature will need to be ranked for its similarity to the existing library of algorithms/features, and the result visualized.

Salmankhancodes · March 14, 2020, 7:13am

Hi, @harsht24
For finding correlation between data we need not to do it manually instead we can simply use the spearman’s correlation function from scipy library and compare our output with the each of the data-matrix columns.
Hope this helps you .
Thanks

ram · March 14, 2020, 8:00am

@ben.fulcher Since our output file contains only values , how we will tell which column is about which feature?

ben.fulcher · March 15, 2020, 2:03am

@ram Please try your best to answer your question before asking it. In this case @Salmankhancodes has already described this to you above, and pointed you to the figshare documentation which describes it. The info is in hctsa_features.csv.

Salmankhancodes · March 20, 2020, 3:09pm

@ben.fulcher Sir we have decent time interval between final proposal submission and announcement for selected students and as i had already submitted my final proposal so i thought i should discuss with you what will you suggest me to do during this period , any resources that i should go through or tools and frameworks that i should brush up or any other contribution to any related repositories or any other thing that you would like to suggest me so that it will us in the longer run during our coding period?

ben.fulcher · March 21, 2020, 1:13am

Thanks for the question @Salmankhancodes

Go through and familiarize yourself with the open code repositories for CompEngine (frontend and backend) described above.
Read the CompEngine preprint: https://arxiv.org/abs/1905.01042
Any papers related to hctsa (listed in the README of the hctsa repo) would be useful background reading. This one puts the main points in context: https://arxiv.org/abs/1709.08055
You could even try running a sample of the underlying algorithm with the steps described in the hctsa documentation (these steps are what we will try to automate in the online platform). https://hctsa-users.gitbook.io/hctsa-manual/analyzing_visualizing/feature-comparison

Salmankhancodes · March 21, 2020, 4:48am

@ben.fulcher Thanks For your advice Sir. I’ll surely go through it.

ram · March 21, 2020, 3:15pm

@ben.fulcher Is it necessary for this project to use CompEngine ? Can’t we build whole new web portal from different technologies ?

ben.fulcher · March 22, 2020, 12:43am

Open to hearing any plan, but my assumption is that building on an open-source platform (CompEngine) that achieves the same thing (but for data instead of features) would be time-efficient.

ram · March 22, 2020, 8:15am

why i am saying this is because marionette and backbone is quite outdated …

also I had followed the steps given in compengine frontend respo. but getting error how to set up successfully

Salmankhancodes · March 22, 2020, 5:41pm

@ram I think you haven’t setup the configuration settings.