I created a dataset using datalad ukb extension and downloaded some uk biobank data using the datalad ukb-update
command. My data is stored in the incoming
branch as zip files. I want to have them in the incoming-native
branch that extracts files. What is the best way to do that? should I do something like merging the two branches?
This should be happening automatically. The incoming
branch should have the raw downloads (ZIP and other formats provided by UKB). The `incoming-native’ branch has the extracts, see http://docs.datalad.org/projects/ukbiobank/concepts.html#branches
Can you possibly post the exact invocation of ukb-init
and ukb-update
that you have used? I suspect that something might have gone wrong, if you don’t see any extracts.
I should have explained the situation more accurately. I used ukb-init
with individual id and field ids as advised in documents, and then used the ukb-update
with a surrogate ukbfetch
to move the already downloaded data from another place to the created dataset folder. All of these operations has been done with an older version of datalad-ukb
that resulted in having the data in the incoming branch. when I rerun those commands with the updated version for a new individual, data is in incoming-native
branch and extracted. So, the main question is how can I have the already moved data which are in the incoming
branch in incoming-native
?
Ah, thanks for the clarification. I think the system is not meant to be able to perform what you desire without further action.
What ukb-update
does, is to rewrite the managed branches completely with every “new” download. So if you only feed it a “partial” update, it will not include any of the old downloads.
This is done, because UKB does not provide any indication of a version. They may change any data record content, data record availability, and even participant availability at any point in time and without notice. Therefore we use a fresh download of all data records for a participant at the time of an update, to have a consistent record matching the time of the update.
So if you really do know that nothing has changed, or you chose to not care about this, you have to take the original ZIPs out of the previous state of the incoming
branch, and reinject them together with the other, new downloads. Symlinks to the annex should be good enough (but I have not tried that recently).
If there is demand for this to become a regular update feature, please file an issue at Issues · datalad/datalad-ukbiobank · GitHub
Thanks for the explanation. Now the behavior of the ukb-update is clear to me. will do your suggested procedure.
I tried your suggestion and again, it ended up in incoming
branch with nothing in incoming-native
. Here is the steps that I took,
- copied the data file to a folder out of the repo. (I did it simply with
cp
) - changed the ukbfetch code to get the file from that folder.
- run
datalad ukb-update
on that dataset.
When I am trying the ukb-update for a dataset that is newly created and initialized withukb-init
it works fine and I end up getting the data extracted inincoming-native
branch! can it be related to the active branch? since for already moved data, the active branch is incoming.