Iâ€™m also adding the other Q&A that were leftover from the Crowdcast session and addressing them here:

**1. from @Alice_Schwarze â€“ idtxl question: is there some greedy optimization going on in finding the network structure? should I expect to get slightly different results if I run it several times on the same data?**

(Summary of our live verbal answer) Yes, it is precisely a greedy optimisation. Stochasticity comes in only from empirically sampling the null distribution for the transfer entropies, in order to determine statistical significance of candidate parent nodes. This is the case for the KSG (non-linear) estimator, but not for the linear estimator nor for discrete data. In practice it is only likely to impact changing some nodes from being classified as false positives to true negatives or vice-versa (assuming some ground truth), and the number of potentially impacted nodes is low (given that the false positive rate is strictly controlled). True positive / false negative switches are possibly but unlikely to occur except for weak connections.

**2. from Qiang Li â€“ Whatâ€™s the difference between Transfer entropy (TE) and directed information (DI)?**

Iâ€™ve written a detailed comparison of these before in Appendix C â€śRelation of transfer entropy to Masseyâ€™s directed informationâ€ť of my PhD thesis (submitted version or in the Springer published version) .

In short, TE measures the information from the previous sample of a source to the next value of a target, given the target past.

DI gives a sum of TE-like terms, as we slide along the time series. These TE-like terms are different to TEs though, they are the information from the source to the target - at the same time step - whilst still conditioning on the target past. It is more trying to measure a relationship between two series as a whole, rather than getting at their directed dynamic relationship. The lack of a time difference in DI is probably related to that philosophy of how the time series as a whole relate, rather than the philosophy of TE being about how the source is used in the dynamic update of the target.

**3. from @Pablo_Estevez â€“ What estimator do you recommend?**

We go into this question in some detail in the slides for Module 4 â€śEstimators and JIDTâ€ť (and the corresponding video lectures) of my short course distributed with JIDT.

In short, for continuous-valued data, I would point you to two top options, both are useful to evaluate:

- Linear-Gaussian estimator, which assumes Gaussian PDFs, with linear coupling. Pro: it is very fast, can work with little data, and has analytically established null distributions. Con: it assumes a linear model.
- KSG estimator, which is the best of breed estimator for handling non-linear relationships. Pro: it is completely model-free. Con: it is significantly slower, and requires more data.

Both are useful to evaluate; even where you know or suspect non-linear relationships, comparing the KSG value to the linear-Gaussian value gives useful insights into how strong the non-linear relationships are over what you see from the linear component alone.

**4. from Rajanikant Panda â€“ for resting state fMRI data, at a time how many nodes (brain region), its feasible to test?**

I assume you mean for the effective network inference with IDTxl. In our validation paper in Network Neuroscience, we were able to run the algorithm on networks of up to 100 nodes for 10000 time steps, using both linear (faster) and KSG (slower) estimators. The latter required a week or two using several tens of cores on our high performance cluster - having access to that compute power was critical. For rs-fMRI, youâ€™ll typically have more nodes (but on similar order) but an order less time steps, so performance would be similar.

**5. from Wouter Klijn "Could your tool be used in an online/streaming fashion? Stream in data and report the current stats in an continues way?"**

Iâ€™m going to answer this one on the issue Wouter has posted on our github page for it.