Question about the approximate random effects model in alleninf

Feng_Liu · August 31, 2017, 10:18am

Dear experts,

I have a question about the random effects model in alleninf software (https://github.com/chrisfilo/alleninf). I have read the OHBM poster carefully (https://f1000research.com/posters/1097120). In the poster, it wrote that the approximate random effect analysis calculates the slope of best linear fit for each donor and performs a one sample t test on those estimates.

Now, I would like to investigate whether the significant correlation exists between statistical map and gene expression, and if so, want to use the random effects model to examine whether this correlation could generalize beyond six donors.

Here, I am just wondering that why don’t perform a one sample t test on the correlation values but on slopes. I have perform several one-sample t-test simulations using r and beta values, but obtained different results (t and p value). Actually, when beta value is large, the r value may be small, and vice versa.

In addition, if I use beta to perform one-sample t-test, I should first determine which one is dependent variable and which one is independent variable. The neuroimaging statistical map is independent and expression value is dependent variable, right?

So, which one is more appropriate to choose to perform one-sample t-test in my case?

BTW, how about the random-effect model in the NeuroVault? Use the same method as alleninf?

Best,
Feng

ChrisGorgolewski · September 4, 2017, 6:40pm

I don’t remember a good justification for using slope (or parameter) estimates vs r values - it might relate to the fact that r is bound to -1:1 range and thus not normally distributed. I used parameter estimates because this is how the Friston-Holmes approximation also works (using contrast images).

As for dependent/independent variable as far as I remember due to normalization the slopes estimates are symmetrical so it does not matter which is dependent - independent.

NeuroVault uses the same approach: https://github.com/NeuroVault/NeuroVault/blob/master/neurovault/apps/statmaps/ahba.py#L50

xuqiang9042 · September 8, 2017, 2:49am

How did you do the probe selection? Mean of all probes for a certain gene or first principal component of PCA or other method in NeuroVault?

ChrisGorgolewski · September 8, 2017, 3:03am

I think it was mean. See: https://github.com/NeuroVault/NeuroVault/blob/master/ahba_docker/preparing_AHBA_data.py#L97

xuqiang9042 · September 8, 2017, 3:31am

Ok,here is the question:Why did I do the gene expression decoding using both NeuroVault and Alleninf by default settings,but get the different results(T values of one sample T test and P values).

ChrisGorgolewski · September 8, 2017, 4:04am

Difference might come from the fact that in Neurovault maps are downscaled (to 4mm as far as I remember) and values from multiple wells are averaged within one 4mm voxel. I would not expect large differences though.

The P-values in NeuroVault are corrected in multiple comparisons.

The NeuroVault implementation has been validated against PET and myelin maps so I would trust it more (see https://github.com/NeuroVault/NeuroVault/pull/519).

Feng_Liu · September 11, 2017, 9:22am

I have check the code of alleninf and neurovault. There are several differences between this.

In the neurovault, z-transformation was performed (subtracting the mean and dividing the standard deviation) in both expression value and image. However, in the alleninf, it may not perform the z-transformation.

Neurovault:
print “z scoring (%s)” % donor_id
expression_data = pd.DataFrame(zscore(expression_data, axis=1), columns=expression_data.columns, index=expression_data.index)
nifti_values = zscore(nifti_values)

Alleninf:
def approximate_random_effects(data, labels, group):

correlation_per_donor = {}
for donor_id in set(data[group]):
    correlation_per_donor[donor_id], _, _, _, _ = linregress(list(data[labels[0]][data[group] == donor_id]),
                                                   list(data[labels[1]][data[group] == donor_id]))
average_slope = np.array(correlation_per_donor.values()).mean()
t, p_val = ttest_1samp(correlation_per_donor.values(), 0)

In the case of using z-transformation, the correlation value is actually, equal to the parameter estimate (beta value); and moreover, the slopes estimates are symmetrical so it does not matter which is dependent - independent (if and only if the z-score normalization was performed).

However, in this case, the parameter estimate (equal to the correlation coefficient) is bound to -1:1 range and thus not normally distributed. Thus, we could not use one-sample t-test to test the parameter estimates and we should use fisher r-to-z transformation to improve the normality, right?

In the neurovault, I do not find any code to perform fisher r-to-z transformation.

        print "z scoring (%s)" % donor_id
        expression_data = pd.DataFrame(zscore(expression_data, axis=1), columns=expression_data.columns,
                                       index=expression_data.index)
        nifti_values = zscore(nifti_values)

        print "Calculating linear regressions (%s)" % donor_id
        regression_results = np.linalg.lstsq(np.c_[nifti_values, np.ones_like(nifti_values)], expression_data.T)
        results_df = pd.DataFrame({"slope": regression_results[0][0]}, index=expression_data.index)

        results_df.columns = pd.MultiIndex.from_tuples([(donor_id[1:], c,) for c in results_df.columns],
                                                       names=['donor_id', 'parameter'])

        results_dfs.append(results_df)

    print "Concatenating results"
    results_df = pd.concat(results_dfs, axis=1)
    del results_dfs

t, p = ttest_1samp(results_df, 0.0, axis=1)

What’s your opinion?

Feng_Liu · September 11, 2017, 9:41am

In your preparing_AHBA_data.py code.

I notice that in this code, there are
df.columns = list(sample_annot_df[“reduced_coordinate”])
# removing out side of the brain coordinates
df.drop(sample_annot_df.index[sample_annot_df.reduced_coordinate.isnull()], axis=1, inplace=True)
# averaging measurements from similar locations
df = df.groupby(level=0, axis=1).mean()

    df.columns = pd.MultiIndex.from_tuples([(donor_id, c,) for c in list(df.columns)],
                                           names=['donor_id', 'reduced_coordinate'])
    dfs.append(df)

You first want to remove the coordinate outside the brain? I am wondering that why some samples are outside the brain (all the samples come from the six donors’ brain).

The second question is how to average measurements from similar locations? Averaging both nifti values and expression values? How to group adjacent samples? What is the nature of this algorithm? Because I would like to implement this using MATLAB code.

xuqiang9042 · September 14, 2017, 3:19am

I think there are much technical details are not clear,such as z-transformations issues and no indendency of overlaping values of two spheres centered at their MNI coordinate just as asked by Feng Liu above.Please give your explanations.

ChrisGorgolewski · September 14, 2017, 3:10pm

Hi there. I’m currently swamped with other work, but since you already know where to find source code for the alleninf and NeuroVault implementations I’m sure you’ll manage to figure out the nitty gritty details.