CNS*2020 Software Showcase S1: Information theory and directed network inference (using JIDT and IDTxl)

Information theoretic measures including transfer entropy are widely used to analyse neuroimaging time series and to infer directed connectivity. The JIDT and IDTxl software toolkits provide efficient measures and algorithms for these applications:

  • JIDT provides a fundamental computation engine for efficient estimation of information theoretic measures for a variety of applications. It can be easily used in Matlab, Python, and Java, and provides a GUI interface for push-button analysis and code template generation.
  • IDTxl is a specific Python toolkit for directed network inference in neuroscience. It employs multivariate transfer entropy and hierarchical statistical tests to control false positives and has been validated at realistic scales for neural data sets. The inference can be run in parallel using GPUs or a high-performance computing cluster.

This tutorial session will help you get started with software analyses via brief overviews of the toolkits and demonstrations.

Resources and links (including the slides for the tutorial) are available at the showcase homepage.

Please use this topic to post any follow up questions, or the google groups for each toolkit (listed at the showcase homepage) for more technical questions.

1 Like

I would like to know if it is possible to get the estimated pdf from your software.
We are trying to reproduce results from AIS using parzen windows estimators, but so far there is no match with your computations. It would be great to have access to the estimated pdfs.
Thanking you in advance!

1 Like

Hi Pablo – the answer is sort of yes and sort of no – it depends in a big way on which estimator you are using.

You mention a Parzen window, so the best match for that in JIDT is our box-kernel estimator (i.e. rectangular kernel which weights points with 1 if they are inside the box or 0 for outside). For this estimator, yes, it’s relatively easy to access the estimated PDFs since they come directly from the counts of how many points were within the box kernel. Hopefully your Parzen estimator is using a simple box as well, then everything will be directly comparable.

You can access these probabilities for the AIS kernel estimator in two ways:

  1. In debug mode they will be printed to the screen for each sample. Add a line e.g. “calc.setDebug(true);” before the calculation to turn on debug mode). During the calculation then, you will get a print out for each sample of: the sample index, the probabilities of each marginal and the joint probability (note these are probability densities, not masses), the ratio of these probabilities inside the log for that sample, the log ratio, and finally the running sum of the log ratios. This comes from the internal MI kernel estimator inside the AIS class – you can see the debug print statement in the source code at line 193 in the current version (as of today) in this class.
  2. The underlying MI kernel estimator does actually provide getProbability() methods - you can see calls to those in the MI code a little above the debug line I pointed you to above. … however it is a private member inside the AIS class – you would have to hack the source code and recompile if you want to get access to that. :slight_smile:

Let me know if that helps?

1 Like

I’m also adding the other Q&A that were leftover from the Crowdcast session and addressing them here:

1. from @Alice_Schwarze – idtxl question: is there some greedy optimization going on in finding the network structure? should I expect to get slightly different results if I run it several times on the same data?
(Summary of our live verbal answer) Yes, it is precisely a greedy optimisation. Stochasticity comes in only from empirically sampling the null distribution for the transfer entropies, in order to determine statistical significance of candidate parent nodes. This is the case for the KSG (non-linear) estimator, but not for the linear estimator nor for discrete data. In practice it is only likely to impact changing some nodes from being classified as false positives to true negatives or vice-versa (assuming some ground truth), and the number of potentially impacted nodes is low (given that the false positive rate is strictly controlled). True positive / false negative switches are possibly but unlikely to occur except for weak connections.

2. from Qiang Li – What’s the difference between Transfer entropy (TE) and directed information (DI)?
I’ve written a detailed comparison of these before in Appendix C “Relation of transfer entropy to Massey’s directed information” of my PhD thesis (submitted version or in the Springer published version) .
In short, TE measures the information from the previous sample of a source to the next value of a target, given the target past.
DI gives a sum of TE-like terms, as we slide along the time series. These TE-like terms are different to TEs though, they are the information from the source to the target - at the same time step - whilst still conditioning on the target past. It is more trying to measure a relationship between two series as a whole, rather than getting at their directed dynamic relationship. The lack of a time difference in DI is probably related to that philosophy of how the time series as a whole relate, rather than the philosophy of TE being about how the source is used in the dynamic update of the target.

3. from @Pablo_Estevez – What estimator do you recommend?
We go into this question in some detail in the slides for Module 4 “Estimators and JIDT” (and the corresponding video lectures) of my short course distributed with JIDT.
In short, for continuous-valued data, I would point you to two top options, both are useful to evaluate:

  1. Linear-Gaussian estimator, which assumes Gaussian PDFs, with linear coupling. Pro: it is very fast, can work with little data, and has analytically established null distributions. Con: it assumes a linear model.
  2. KSG estimator, which is the best of breed estimator for handling non-linear relationships. Pro: it is completely model-free. Con: it is significantly slower, and requires more data.
    Both are useful to evaluate; even where you know or suspect non-linear relationships, comparing the KSG value to the linear-Gaussian value gives useful insights into how strong the non-linear relationships are over what you see from the linear component alone.

4. from Rajanikant Panda – for resting state fMRI data, at a time how many nodes (brain region), its feasible to test?
I assume you mean for the effective network inference with IDTxl. In our validation paper in Network Neuroscience, we were able to run the algorithm on networks of up to 100 nodes for 10000 time steps, using both linear (faster) and KSG (slower) estimators. The latter required a week or two using several tens of cores on our high performance cluster - having access to that compute power was critical. For rs-fMRI, you’ll typically have more nodes (but on similar order) but an order less time steps, so performance would be similar.

5. from Wouter Klijn "Could your tool be used in an online/streaming fashion? Stream in data and report the current stats in an continues way?"
I’m going to answer this one on the issue Wouter has posted on our github page for it.