Summary of what happened:
I am evaluating the RIA Store model for a dataset but am not clear on details about a workflow that uses the optional 7zipped archive.
Command used (and if a helper script was used, a link to the helper script or the command generated):
Given a dataset, I know that I can create a RIA sibling and add annexed files to it via something like the following
ria=[somepath]
alias=mydata
datalad create-sibling-ria -s ria-backup --alias ${alias} --new-store-ok "ria+file://${ria}"
datalad push --to ria-backup
After that, my impression is that the recommended way to create the archive is
datalad export-archive-ora -d . ${ria}/alias/${alias}/archives/archive.7z
This creates a RIA store that looks something like the following
[...]
βββ 825
β βββ 647de-74c1-4a38-8163-e03cf23c1814
β βββ annex
β β βββ objects
β β βββ 0p
β β β βββ mp
β β β βββ SHA256E-s197785421--bfe1f8cc2daab0b7758579a8a1a787e2283f7e47fe49c37ea5ae83766992e83c.nii.gz
β β β βββ SHA256E-s197785421--bfe1f8cc2daab0b7758579a8a1a787e2283f7e47fe49c37ea5ae83766992e83c.nii.gz
[...]
β βββ archives
β β βββ archive.7z
[...]
βββ alias
β βββ mydata -> ../825/647de-74c1-4a38-8163-e03cf23c1814
βββ error_logs
βββ ria-layout-version
But Iβm confused about how Iβd update the RIA store.
- What happens after I annex more files in the original dataset, or modify previously annexed files? That is, does the
archive.7z
need to be recreated from scratch? - How should I drop the regular annex in the RIA store? Is there a tool for deduplicating the RIA store so that the only copy of annexed files are stored in
archive.7z
?- From experimenting, it seems like I can delete
825/647de-74c1-4a38-8163-e03cf23c1814/annex/objects
, but that seems risky because there isnβt a guarantee that files are actually stored insidearchive.7z
.
- From experimenting, it seems like I can delete
Version:
git annex version
git-annex version: 10.20230407
build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Benchmark Feeds Testsuite S3 WebDAV
dependency versions: aws-0.24 bloomfilter-2.0.1.0 cryptonite-0.30 DAV-1.3.4 feed-1.3.2.1 ghc-9.4.4 http-client-0.7.13.1 persistent-sqlite-2.13.1.1 torrent-10000.1.3 uuid-1.3.15 yesod-1.6.2.1
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL X*
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg hook external
operating system: darwin aarch64
supported repository versions: 8 9 10
upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10
datalad.__version__
Out[2]: '0.18.3'
Environment (Docker, Singularity, custom installation):
β― mamba env export
name: datalad-demo
channels:
- conda-forge
dependencies:
- bzip2=1.0.8=h3422bc3_4
- ca-certificates=2022.12.7=h4653dfc_0
- libcxx=16.0.2=h4653b0c_0
- libexpat=2.5.0=hb7217d7_1
- libffi=3.4.2=h3422bc3_5
- libsqlite=3.40.0=h76d750c_1
- libzlib=1.2.13=h03a7124_4
- ncurses=6.3=h07bb92c_1
- openssl=3.1.0=h53f4e23_2
- p7zip=16.02=hbdafb3b_1001
- pip=23.1.1=pyhd8ed1ab_0
- python=3.11.3=h1456518_0_cpython
- readline=8.2=h92ec313_1
- setuptools=67.7.2=pyhd8ed1ab_0
- tk=8.6.12=he1e0b03_0
- tzdata=2023c=h71feb2d_0
- wheel=0.40.0=pyhd8ed1ab_0
- xz=5.2.6=h57fd34a_0
Data formatted according to a validatable standard? Please provide the output of the validator:
na
Relevant log outputs (up to 20 lines):
na
Screenshots / relevant information:
Thanks!