I have multi-site data (patients with depression) and am interested in using COMBAT for harmonizing across the different sites before performing any statistical analysis. My question is: if I am interested in an association analysis (e.g., after combining the harmonized data from all sites, correlating a feature with a clinical scale that measures depression severity), could (or should) I then include my clinical data as as covariate to preserve within COMBAT?
Going over the literature, diagnosis status (e.g., clinical or healthy control) is commonly included, along with other common “biological” covariates being age and sex. There’s also this paper (A Guide to ComBat Harmonization of Imaging Biomarkers in Multicenter Studies - PMC) which gives some examples of how disease severity when not uniform across the different sites could (and in some cases, should) be included.
Not to extrapolate too much, but if disease/diagnosis status can be included, is there any reason why any clinical data such as from a scale that is to be used in the statistical analysis should not be used within COMBAT?
Any insight would be greatly appreciated.