Datalad unable to 'get' subjects to ephemeral clone although saved

Dear Community,

we are working in a neuroimaging lab and currently establish datalad for image processing.
I created a clone of our test dataset which contains 10 subjects to establish an analysis pipeline (bianca for WMH segmentation). The clone has the following structure:

ROOT/data/SUBDATASET/SUBJECT-SUBDATASET
(caps=datalad dataset)

For reasons of validation, I wanted to check my pipeline on other subjects too without cloning a new dataset. Therefore, I copied subjects from a different dataset into my clone (from a datalad dataset with the structure DATASET/SUBJECT-SUBDATASET), with the command

cp -rfL /path/to/subjectfolder/ DATASET/data/xxx/

After that, the new copied subjects were saved with

datalad save

# and

datalad save -d^

Running datalad status in the folders containing the SUBDATASETS, in the DATASET folder and in ROOT reveals that all changes are saved. However, testing the pipeline leads to error related to the new subjects. The pipeline creates a clone of the DATASET, but only fetches the original 10 subjects with datalad get -n DATASET, not the newly added ones.
Do I miss an important step during copying the new subjects to an dataset that originally does not contain these subjects? Or is it better to create a completely new clone from the dataset the subjects are originally in. Are clones even a adequate method to collaboratively work on a dataset. i.e., testing new pipelines on it?

I am thankful for any advice.

All the best,
Carola

datalad wtf

WTF

configuration <SENSITIVE, report disabled by configuration>

credentials

  • keyring:
    • active_backends:
      • PlaintextKeyring with no encyption v.1.0 at /home/bax0929/.local/share/python_keyring/keyring_pass.cfg
    • config_file: /home/bax0929/.config/python_keyring/keyringrc.cfg
    • data_root: /home/bax0929/.local/share/python_keyring

datalad

  • full_version: 0.14.6
  • version: 0.14.6

dataset

  • id: 6278ef92-e226-4646-8368-a127b7e0483b
  • metadata: <SENSITIVE, report disabled by configuration>
  • path: /work/bax0929/spielwiese/CSI_TEST_bianca/code
  • repo: AnnexRepo

dependencies

  • annexremote: 1.5.0
  • appdirs: 1.4.4
  • boto: 2.49.0
  • cmd:7z: 16.02
  • cmd:annex: 8.20201104-g13bab4f2c
  • cmd:bundled-git: 2.29.2
  • cmd:git: 2.29.2
  • cmd:system-git: 2.29.2
  • cmd:system-ssh: 7.4p1
  • humanize: 3.9.0
  • iso8601: 0.1.14
  • keyring: 23.0.1
  • keyrings.alt: 4.0.2
  • msgpack: 1.0.2
  • requests: 2.25.1
  • wrapt: 1.12.1

environment

  • LANG: en_US.UTF-8
  • PATH: /work/fatx405/miniconda3/envs/datalad/bin:/work/fatx405/miniconda3/condabin:/work/fatx405/software/mrtrix3/bin:/work/fatx405/miniconda3/bin:/sw/link/git/2.32.0/bin:/sw/env/system-gcc/singularity/3.5.2-overlayfix/bin:/sw/link/nano/5.7/bin:/sw/batch/slurm/19.05.6/bin:/sw/rrz/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin

extensions

  • container:
    • description: Containerized environments
    • entrypoints:
      • datalad_container.containers_add.ContainersAdd:
        • class: ContainersAdd
        • load_error: None
        • module: datalad_container.containers_add
        • names:
          • containers-add
          • containers_add
      • datalad_container.containers_list.ContainersList:
        • class: ContainersList
        • load_error: None
        • module: datalad_container.containers_list
        • names:
          • containers-list
          • containers_list
      • datalad_container.containers_remove.ContainersRemove:
        • class: ContainersRemove
        • load_error: None
        • module: datalad_container.containers_remove
        • names:
          • containers-remove
          • containers_remove
      • datalad_container.containers_run.ContainersRun:
        • class: ContainersRun
        • load_error: None
        • module: datalad_container.containers_run
        • names:
          • containers-run
          • containers_run
    • load_error: None
    • module: datalad_container
    • version: 1.1.5
  • metalad:
    • description: DataLad semantic metadata command suite
    • entrypoints:
      • datalad_metalad.aggregate.Aggregate:
        • class: Aggregate
        • load_error: None
        • module: datalad_metalad.aggregate
        • names:
          • meta-aggregate
          • meta_aggregate
      • datalad_metalad.dump.Dump:
        • class: Dump
        • load_error: None
        • module: datalad_metalad.dump
        • names:
          • meta-dump
          • meta_dump
      • datalad_metalad.extract.Extract:
        • class: Extract
        • load_error: None
        • module: datalad_metalad.extract
        • names:
          • meta-extract
          • meta_extract
    • load_error: None
    • module: datalad_metalad
    • version: 0.2.1
  • neuroimaging:
    • description: Neuroimaging tools
    • entrypoints:
      • datalad_neuroimaging.bids2scidata.BIDS2Scidata:
        • class: BIDS2Scidata
        • load_error: None
        • module: datalad_neuroimaging.bids2scidata
        • names:
          • bids2scidata
    • load_error: None
    • module: datalad_neuroimaging
    • version: 0.3.1
  • ukbiobank:
    • description: UKBiobank dataset support
    • entrypoints:
      • datalad_ukbiobank.init.Init:
        • class: Init
        • load_error: None
        • module: datalad_ukbiobank.init
        • names:
          • ukb-init
          • ukb_init
      • datalad_ukbiobank.update.Update:
        • class: Update
        • load_error: None
        • module: datalad_ukbiobank.update
        • names:
          • ukb-update
          • ukb_update
    • load_error: None
    • module: datalad_ukbiobank
    • version: 0.3.3

git-annex

  • build flags:
    • Assistant
    • Webapp
    • Pairing
    • Inotify
    • DBus
    • DesktopNotify
    • TorrentParser
    • MagicMime
    • Feeds
    • Testsuite
    • S3
    • WebDAV
  • dependency versions:
    • aws-0.22
    • bloomfilter-2.0.1.0
    • cryptonite-0.26
    • DAV-1.3.4
    • feed-1.3.0.1
    • ghc-8.8.4
    • http-client-0.6.4.1
    • persistent-sqlite-2.10.6.2
    • torrent-10000.1.1
    • uuid-1.3.13
    • yesod-1.6.1.0
  • key/value backends:
    • SHA256E
    • SHA256
    • SHA512E
    • SHA512
    • SHA224E
    • SHA224
    • SHA384E
    • SHA384
    • SHA3_256E
    • SHA3_256
    • SHA3_512E
    • SHA3_512
    • SHA3_224E
    • SHA3_224
    • SHA3_384E
    • SHA3_384
    • SKEIN256E
    • SKEIN256
    • SKEIN512E
    • SKEIN512
    • BLAKE2B256E
    • BLAKE2B256
    • BLAKE2B512E
    • BLAKE2B512
    • BLAKE2B160E
    • BLAKE2B160
    • BLAKE2B224E
    • BLAKE2B224
    • BLAKE2B384E
    • BLAKE2B384
    • BLAKE2BP512E
    • BLAKE2BP512
    • BLAKE2S256E
    • BLAKE2S256
    • BLAKE2S160E
    • BLAKE2S160
    • BLAKE2S224E
    • BLAKE2S224
    • BLAKE2SP256E
    • BLAKE2SP256
    • BLAKE2SP224E
    • BLAKE2SP224
    • SHA1E
    • SHA1
    • MD5E
    • MD5
    • WORM
    • URL
    • X*
  • local repository version: 8
  • operating system: linux x86_64
  • remote types:
    • git
    • gcrypt
    • p2p
    • S3
    • bup
    • directory
    • rsync
    • web
    • bittorrent
    • webdav
    • adb
    • tahoe
    • glacier
    • ddar
    • git-lfs
    • httpalso
    • hook
    • external
  • supported repository versions:
    • 8
  • upgrade supported from repository versions:
    • 0
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
  • version: 8.20201104-g13bab4f2c

location

  • path: /work/bax0929/spielwiese/CSI_TEST_bianca/code
  • type: dataset

metadata_extractors

  • annex (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: None
    • module: datalad.metadata.extractors.annex
    • version: None
  • audio (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: No module named ‘mutagen’ [audio.py::17]
    • module: datalad.metadata.extractors.audio
  • bids (datalad-neuroimaging 0.3.1):
    • distribution: datalad-neuroimaging 0.3.1
    • load_error: None
    • module: datalad_neuroimaging.extractors.bids
    • version: None
  • datacite (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: None
    • module: datalad.metadata.extractors.datacite
    • version: None
  • datalad_core (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: None
    • module: datalad.metadata.extractors.datalad_core
    • version: None
  • datalad_rfc822 (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: None
    • module: datalad.metadata.extractors.datalad_rfc822
    • version: None
  • dicom (datalad-neuroimaging 0.3.1):
    • distribution: datalad-neuroimaging 0.3.1
    • load_error: None
    • module: datalad_neuroimaging.extractors.dicom
    • version: None
  • exif (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: No module named ‘exifread’ [exif.py::16]
    • module: datalad.metadata.extractors.exif
  • frictionless_datapackage (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: None
    • module: datalad.metadata.extractors.frictionless_datapackage
    • version: None
  • image (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: No module named ‘PIL’ [image.py::16]
    • module: datalad.metadata.extractors.image
  • metalad_annex (datalad-metalad 0.2.1):
    • distribution: datalad-metalad 0.2.1
    • load_error: None
    • module: datalad_metalad.extractors.annex
    • version: None
  • metalad_core (datalad-metalad 0.2.1):
    • distribution: datalad-metalad 0.2.1
    • load_error: None
    • module: datalad_metalad.extractors.core
    • version: None
  • metalad_custom (datalad-metalad 0.2.1):
    • distribution: datalad-metalad 0.2.1
    • load_error: None
    • module: datalad_metalad.extractors.custom
    • version: None
  • metalad_runprov (datalad-metalad 0.2.1):
    • distribution: datalad-metalad 0.2.1
    • load_error: None
    • module: datalad_metalad.extractors.runprov
    • version: None
  • nidm (datalad-neuroimaging 0.3.1):
    • distribution: datalad-neuroimaging 0.3.1
    • load_error: None
    • module: datalad_neuroimaging.extractors.nidm
    • version: None
  • nifti1 (datalad-neuroimaging 0.3.1):
    • distribution: datalad-neuroimaging 0.3.1
    • load_error: None
    • module: datalad_neuroimaging.extractors.nifti1
    • version: None
  • xmp (datalad 0.14.6):
    • distribution: datalad 0.14.6
    • load_error: No module named ‘libxmp’ [xmp.py::20]
    • module: datalad.metadata.extractors.xmp

metadata_indexers

python

  • implementation: CPython
  • version: 3.8.1

system

  • distribution: centos/7/Core
  • encoding:
    • default: utf-8
    • filesystem: utf-8
    • locale.prefered: UTF-8
  • max_path_length: 301
  • name: Linux
  • release: 4.14.240-1.0.33.el7.rrz.x86_64
  • type: posix
  • version: #1 SMP Thu Jul 22 18:29:43 CEST 2021

Thank you @carlo-may for trying to distill the situation in concise description, but unfortunately (unlike just cut/pasted output or a reproducer script like http://www.onerussian.com/tmp/test-merge-renamed.sh
(just a random recent one) I find it difficult to figure out what the issue is. E.g.

  • datalad get -n DATASET (without -r) should not get any subdatasets, but you said that it got some which means that those directories aren’t subdatasets… check for that. Also – was there any error or msg?

  • was /path/to/subjectfolder/ just some directory or a dataset? (seems like a directory), but then you should have created that subdataset datalad create -d DATASET/data/xxx/ -f subjectfolder/ or alike
    before calling to save and it should have also had -r since you want to save data in that subject subdataset

PS. FWIW, datalad addurls uses // as an indicator for (sub)dataset boundary, so ROOT/data/SUBDATASET/SUBJECT-SUBDATASET could be shown as root//data/subdataset//subject-subdataset// in this (non)convention.