Datalad - how to check file sizes

I am reading the through the documentation and unable to find instructions on how to gather file size of the current dataset.

What I’d like to do is run a code that would give me the total directory sizes of my raw and processed directories that are in BIDS format that are already stored in the datalad framework. Note: I do not want the filesize of all the different branches of the dataset; i just need the filesize of the commited dataset. I hope this makes sense.

Datalad 0.13.1 on Linux

On a per-dataset-basis, a datalad status --annex all would report how many annexed files exist, how much data they entail, and how much of this data is present locally (i.e., retrieved via datalad get or git annex get).

$ datalad status --annex all
783 annex'd files (290.0 MB/6.5 GB present/total size)
nothing to save, working tree clean

You can supply a path to a directory or file as well:

datalad status --annex all sub-01
11 annex'd files (290.0 MB/290.0 MB present/total size)
nothing to save, working tree clean

You could also use git annex info --fast <path>:

# example: get size of local data in single directory in an OpenNeuro dataset
(handbook) adina@muninn in ~/repos/ds001241 on git:master
❱ git annex info --fast sub-01
directory: sub-01
local annex keys: 11
local annex size: 304.08 megabytes
annexed files in working tree: 11
size of annexed files in working tree: 304.08 megabytes

If you add some command line filtering, you can get a sorted list:

# I only retrieved data from one subject, so all other sub directories don't have local data
(handbook) adina@muninn in ~/repos/ds001241 on git:master
❱ git annex info --fast sub-* --json | jq -j '."local annex size", "\t", .directory, "\t", "\n"' | sort -h
0 bytes	sub-02	
0 bytes	sub-03	
0 bytes	sub-04	
0 bytes	sub-05	
0 bytes	sub-06	
0 bytes	sub-07	
0 bytes	sub-08	
0 bytes	sub-09	
0 bytes	sub-10	
0 bytes	sub-11	
0 bytes	sub-12	
0 bytes	sub-13	
0 bytes	sub-14	
0 bytes	sub-15	
0 bytes	sub-16	
0 bytes	sub-17	
0 bytes	sub-18	
0 bytes	sub-19	
0 bytes	sub-20	
0 bytes	sub-21	
0 bytes	sub-22	
0 bytes	sub-23	
0 bytes	sub-24	
304.08 megabytes	sub-01		

Hope this is what you had in mind :slight_smile: