EDIT: Chapter 7 (“Statistical inference on images”) from the Poldrack/Mumford/Nichols Handbook of Functional MRI Data Analysis more or less answered my question.
I’m writing one of a series of papers based on data from a larger study where we had people do a whole bunch of different stuff while they were in the MRI. Each of the three papers (one resting state, two different tasks) is headed by a different author. I used SPM (task paper 1) and the other two (task paper 2 + resting state) used AFNI, because we’re at different labs/institutions now and that’s the software we use.
I report results (peak Z’s and T’s) based on a voxelwise FWE-corrected p < .05, whereas the AFNI task paper used a cluster defining threshold of p < .005, k = 30 based on 3dClustSim. A reviewer had concerns about their CDT being too lenient. My general sense based on what I’ve read (e.g., Woo et al 2014, Eklund et al 2016, Carter et al 2016; OHBM blog, etc.) is that if the AFNI folks follow updated recommendations (e.g., use a CDT of p <.001), that should be fine in terms of field standards for controlling for false positives.
However, the PIs are asking us to standardize our analyses as much as possible across the three papers. I can’t figure out how a particular CDT maps on to my FWE-corrected p < .05. I understand some aspects of the how the two programs handle multiple correction differently (e.g, SPM using random field theory; clusterwise vs. voxelwise inference) but generally, would we expect more or less comparable results from the two approaches? If yes, why? If no, why not?
Please help me explain this to my PIs