I want to anonymise a bunch off cases with dicom files and want to keep the folder structure and names.
What would be the quickest way to do that?
I have not done this (stripping DICOMS of anonymous data without format conversion) myself (I almost always go from DICOM to NIFTI/BIDS), but this package looks promising: https://pydicom.github.io/deid/
@vsoch might know more
hey @lehnin I’d be glad to help! Please post an issue on the deid board with an example of your directory structure (and some example or dummy data without identifiers if you have it) and I’d be glad to show you a little snippet of code to do this.
I strongly suggest you carefully validate any method you use to anonymize DICOM files.
- In my experience (helping develop dcm2niix), DICOM anonymization is the leading cause for generating corrupted DICOM files that either do not convert or do not convert with rich meta-data. While anonymization is laudable, one may want to consider Chris’ approach to sharing NIfTI/BIDS data and privately archiving the raw DICOM privately in an encrypted form.
- Be aware that some vendor’s DICOM creation has been afflicted by leakage of private data into empty portions of other DICOM tags. In these cases, the arrays were not filled with zeros and therefore contained data from the last time that part of memory was used. These errors would slip through most anonymization routines (as private data is now appearing in non-private tags). The community and the vendors have been pretty good about detecting and fixing these bugs. However, these problems have certainly existed and might still exist. These problems are exceptionally hard to detect. This means it is hard to be 100% sure a DICOM file has been anonymized. In contrast, the NIfTI/BIDS data is much smaller and more explicit, so there really are virtually no pockets to hide unexpected data (and since the NIfTI header is a known size, a single zero function at header creation can avoid this problem). Note, I always advocate keeping the raw DICOM data privately: this more complicated format is a richer source of meta data. However, I would be wary of sharing DICOM data if you are especially worried about privacy.
- Be aware that the built-in Siemens Vida XA10 Anonymize function strips out not only personal data, but many private tags and timing information that is required to subsequent processed. Siemens recommends
the use an offline/in-house anonymization software instead .
- Most anonymization tools were developed before the Enhanced DICOM format was created, and many of these tools hopelessly mangle images stored in Enhanced DICOM format. In particular, Philips’ interpretation of Enhanced DICOM is exceptionally verbose and is very susceptible to tag manipulation.
- You may want to consider the gdcmanon anonymize/encrypt feature that uses a 512-bit RSA key. This means the private data is encrypted, and can be recovered (with the key) to allow thorough auditing.
- I think you need to be very clear about what tags you consider identifiable. Can your participant’s identity be unmasked by knowing the time and place of the scan? Do anatomical scans need to be de-faced? Do you have a good method to audit data at a later date?