Preregistrations: How much preliminary knowledge is allowed?

This is probably a very open-ended question, but I feel this is particularly relevant for studies using ABCD data, since we will inevitably revisit the same data again and again.

Ideally, preregistration should be done before any analysis has been conducted on the data. To quote the OSF template, registration following analysis of the data…

includes preliminary analysis of variables, calculation of descriptive statistics, and observation of data distributions. This may affect the interpretability of your results and likely prevent you from describing this as a preregistration.

However, when studying large, ongoing datasets like ABCD, avoiding this is impractical. We most certainly will use the data we have seen and analyzed before to ask many different questions. In this situation, what should we be doing? How can we prevent our latent preliminary knowledge from biasing preregistrations? How ‘different’ should new preregistrations be from our previous studies?


As a somewhat related personal example, I recently did an unregistered exploratory analysis on genetic risks for various psychiatric disorders and dMRI metrics. This was supposed to be a side project for learning to process ABCD genomic data, but I ended up finding some fairly intriguing correlations. Unfortunately, after reviewing my methods, I discovered some errors such as not including ancestry principal components as covariates.

For my next analysis, I would like to fix those problems and expand the scope. I am also excited about preregistrations after week 4 lectures, but I am not sure whether I should preregister. On one hand, there are a lot of benefits to preregistering, but on the other, I feel like this is a post-hoc registration. Is this ‘too late’ to preregister?

I think as long as you do not have results already that you are planning to write up, it is not “too late” to preregister, though as you note the earlier the better.

I think this is enough to warrant a preregistration.

All science builds off of prior knowledge, and those who design big dataset like ABCD want you to use these observations to inform new studies. A nice thing about preregistering is that it makes it public what you do and do not know, as well as put in writing what you plan to do for your analyses. It holds you accountable to do something as described, rather than trying a bunch of exploratory analyses and only reporting what worked.

1 Like

Looks like there is a template for Secondary Data Preregistration on OSF! The section on prior knowledge addresses my concern.