Breaking down Jackknife analysis

ulgenklc · December 18, 2023, 4:28pm

Hello all

I am trying to understand Jackknife analysis. My CBMA steps are pretty similar to the ones used in NiMARE documentation. Having studies_df.csv and coords_df.csv files;

I create the dataset db via the following code:

dset_dict = {}

for i, row in studies_df.iterrows():

this_study_coords = coords_df[coords_df['study_id'] == row[0]]
contrast = {"coords": { "space": this_study_coords['space'].unique()[0],
                        "x": list(this_study_coords['x']),
                        "y": list(this_study_coords['y']),
                        "z": list(this_study_coords['z'])},
            "metadata":{"sample_sizes": [row[1]]}}

dset_dict[row[0]] = {"contrasts": {"1": contrast }}

with NamedTemporaryFile(mode='w', suffix=".json") as fp:

json.dump(dset_dict, fp)
fp.flush()
db = Dataset(fp.name)

Sanity check: Does my Step1 look okay for both ALE and MKDA analysis?

kernel_mkda = kernel.MKDAKernel(r = radius)
meta_mkda_app = MKDADensity(kernel_transformer = kernel_mkda, null_method = 'approximate')
mkda_uncorrected = meta_mkda_app.fit(db)
corr_mkda = FWECorrector(method = "montecarlo", n_iters = 10000, n_cores = 8)
mkda_corrected = corr_mkda.transform(mkda_uncorrected)
jackknife = Jackknife(target_image = 'z_desc-mass_level-cluster_corr-FWE_method-montecarlo', voxel_thresh = None)
JK = jackknife.transform(mkda_corrected.copy())
JK.tables['z_desc-mass_level-cluster_corr-FWE_method-montecarlo_diag-Jackknife_tab-counts_tail-positive'].

Question #1: What is the reason that the sum of all columns add up to 1? Is it because proportional summary statistics (z-score in this case) are averaged across all voxels in each cluster? Are they mass averaged? What does this look like? Can you explain it in a simple toy example?

Question #2: How to interpret the nonzero values in this data frame? My understanding is that the values have something to do with FWE corrected z-scores (or uncorrected z-scores?) if a given experiment wasn’t initially considered in the dataset, and hence provides a score for the heterogeneity of the dataset.

Say we are looking at the 1st row (Apps-1), we have 0.125 for both 5th and 6th clusters. What does having a 0.125 z-score say about the relationship between clusters 5 and 6? Or, in general, what does any number of clusters which have nonzero z-scores in the same row have in common?

Alternatively, by focusing on column 5 (PositiveTail5), how does the studies which have nonzero values (8 experiments with each having 0.125 as z-scores) in this column relate to each other?

jdkent · June 12, 2024, 8:55pm

Hi @ulgenklc,

Welcome!

Apologies for the long delay.

Question #1: What is the reason that the sum of all columns add up to 1? Is it because proportional summary statistics (z-score in this case) are averaged across all voxels in each cluster? Are they mass averaged? What does this look like? Can you explain it in a simple toy example?

the values in the table are proportions that were averaged across voxels in the cluster. So by necessity if you had two studies and they each contributed equally to a cluster, they would each be weighed 0.5.

outline of steps we take in NiMARE:

remove the target study from the dataset
run ALE/MKDA/whatever algorithm on the remaining data.
divide the stat values from the algorithm in step 2 by the original stat values
average the voxels across the clusters identified in the original analysis
add row to the table.

In a hypothetical scenario, if we had 3 studies (Study-1, Study-2, and Study-3) and 2 clusters (Cluster-A and Cluster-B), and let’s say Study-1 was the sole contributor to Cluster-A. When Study-1 is removed from the analysis during one of the jackknife iterations, The stat values for the voxels within that cluster will drop to zero meaning the cluster does not exist anymore, meaning that study was the sole contributor to that cluster. For the other cluster, when the next iteration of jacknife is run, the removing study 2 halves the values of the voxels found in cluster B, meaning study-2 contributed half of the influence, and the same for study 3.

Study/Cluster	A	B
1	1	0
2	0	0.5
3	0	0.5

Question #2: How to interpret the nonzero values in this data frame? My understanding is that the values have something to do with FWE corrected z-scores (or uncorrected z-scores?) if a given experiment wasn’t initially considered in the dataset, and hence provides a score for the heterogeneity of the dataset.

The values in the table are proportions, not z-scores, the corrected z-scores come into play when generating the clusters. The clusters table is generated within the jackknife analysis and we give the user control to select which statistical map and what threshold to apply to find clusters.

The literal interpretation of the values is the proportion of the statistical value this study accounts for in the cluster. You can use this information to decide if a study has undue influence on the outcome of a particular cluster. For example, if one study is responsible for a cluster, then I would interpret that cluster much more cautiously since it was driven by one study. I don’t have good thresholds to determine how many studies should contribute to a cluster for it to be considered robust.

Say we are looking at the 1st row (Apps-1), we have 0.125 for both 5th and 6th clusters. What does having a 0.125 z-score say about the relationship between clusters 5 and 6? Or, in general, what does any number of clusters which have nonzero z-scores in the same row have in common?

It would say this study contributed to multiple clusters in the analysis to some degree, i.e., the study reported coordinates close to/in both clusters.

Alternatively, by focusing on column 5 (PositiveTail5), how does the studies which have nonzero values (8 experiments with each having 0.125 as z-scores) in this column relate to each other?

All those experiments reported coordinates close to/in that cluster.

Jackknife’s diagnostic purpose is a sanity check on which studies influenced which results broken down by cluster, one could imagine doing this on a voxel based level as well, but that would take more time to interpret, or just taking an average of all voxels to see the overall influence of each study, looking at clusters takes a middle of the road approach, trying to summarize results in “meaningful” blobs.

Opinions on how to use jackknife

You may notice patterns that could inspire additional analyses (MACM) if a subset of the studies contribute to clusters A/B and another subset primarily load onto clusters (C/D). This may suggest there is more organization to which regions tend to coactivate with each other. Jackknife is not a method to quantitatively tell you this, but it can illustrate patterns you can follow up on with more analyses, scrutinize certain studies, or help couch your interpretation of results, when reporting a cluster, readers may believe every study contributed equally to the reported cluster, this analysis lets the reader know how many studies, which specific studies, and the extent each study contributed to a cluster.

Hope this helps!
James