Get original DICOM name from either dcm2niix processed files or fmriprep metadata?

kbond · December 8, 2023, 8:34pm

Hello,

I recently preprocessed some files using dcm2niix, BIDS-formatted, and then ran fmriprep. Of course the original DICOM name looks something like: MR.1.3.12.2.1107.5.2.43.166033.202310040936006694730084
And of course the BIDS output is more human-readable, e.g. sub-01_dwi.bval

Is there a way to recover the original DICOM name from files that have been BIDS-formatted or fmriprepped? I looked into the json metadata & didn’t find much.

Thanks in advance for your help!

effigies · December 8, 2023, 8:57pm

DICOMs are intended to be understandable purely from the metadata, and so the names are not standardized.

Your center may follow a convention, though, so if the full dcm2niix output is preserved in the BIDS sidecars, you may be able to come up with a method for reconstructing the filenames, at least partially. Then you could use pattern matching to fill in the holes. For example, if you somewhere find “3.12.2.1107” and you know the scan date/time was 2023-10-04@09:36, you could match MR*3.12.21107*202310040936*.

neurolabusc · December 12, 2023, 5:36pm

You can use dcm2niix to retain personally identifying information by using the -ba n option to (bids anonymization: no).

Consider the dcm_qa datasets converted with the command dcm2niix -ba n -f %s_%t_%p ~/dcm_qa. In this case, the file 25_20140310133834_fMRI_MB_asc.json will include the details:

	"SeriesInstanceUID": "1.3.12.2.1107.5.2.32.35131.2014031013014324219590803.0.0.0",
	"StudyInstanceUID": "1.3.12.2.1107.5.2.32.35131.30000014022817282751500000052",
	"StudyID": "1",
	"PatientName": "stc_test",
	"PatientID": "crlab",
	"PatientBirthDate": "1980-07-07",
	"PatientSex": "M",
	"PatientWeight": 100.698,
...
	"AcquisitionTime": "14:01:49.417500",
	"AcquisitionDateTime": "2014-03-10T14:01:49.417500",

Note that DICOM filenames are typically the mediaObjectInstanceUID, which are unique for each DICOM images, and a single BIDS NIfTI file may be concatenate images from thousands of DICOM images. However, all the images should share a SeriesInstanceUID (0020,000E) that you could look up with your DICOM database system.

You could use dcm2niix to rename your DICOMs so that all images that share the SeriesInstanceUID are in a single folder where the folder name matches the SeriesInstanceUID using the %j in the naming:

dcm2niix -r  -f %t/%s_%p/%j/%4r_%o.dcm ~/path/to/DICOMs

For many users, a key aspect of DICOM to BIDS conversion is to hide personally identifiable information, and depending on where you live this may be regulated by federal laws. Therefore, you should be very careful about retaining these UIDs in datasets you are sharing. For the USA, you will want to ensure you meet the Safe Harbor principles, but laws are much stricter for EU data. The first part of the UID does reveal details of the system that generated those images, and the latter part often reveals date stamps (e.g. I suspect your data was acquired on 4 October 2023 at 9:36am). As @effigies noted, without the -ba n option, dcm2niix will store the AcquisitionTime in the JSON but not the AcquisitionDate (for Siemens data, the time of day helps align the images with physiological data).

neurolabusc · December 12, 2023, 5:44pm

As an aside, some users may find that dcm2niix reveals more private information than they would like. Here is a simple Python script I wrote (with the help of ChatGPT) to further anonymize BIDS json files:

import os
import json

def remove_keys(json_data):
    keys = ["SAR", "InstitutionAddress", "InstitutionalDepartmentName", "SeriesNumber", "AcquisitionNumber", "DeviceSerialNumber", "StationName", "AcquisitionTime", "InstitutionName", "InstitutionalDepartmentName", "StationName"]
    for key in keys:
        if key in json_data:
            del json_data[key]
    

def process_json_file(file_path):
    print(file_path)
    with open(file_path, 'r') as file:
        data = json.load(file)

    remove_keys(data)

    with open(file_path, 'w') as file:
        json.dump(data, file, indent=4)

def process_files_in_directory(directory):
    for root, _, files in os.walk(directory):
        files = [f for f in files if not f[0] == '.']
        for file_name in files:
            if file_name.endswith(".json"):
                file_path = os.path.join(root, file_name)
                process_json_file(file_path)

if __name__ == '__main__':
    target_directory = '/Volumes/bids/'  # Change this to the root directory where you want to start the search
    
    process_files_in_directory(target_directory)

kbond · December 12, 2023, 8:39pm

Thanks for both of these tips, Chris! Extraordinarily helpful. Much appreciated.