Connecting DataLad to local S3 Object Store (MinIO)

… just a quick reply on the environment:

I did a clean install of Ubuntu 22.04 when
starting and only added the required software
around DataLad.

(base) super@vm-datalad:~$ which git-annex
/home/super/miniconda3/bin/git-annex
(base) super@vm-datalad:~$
(base) super@vm-datalad:~$ alias | grep git
(base) super@vm-datalad:~$

Both commands don’t yield any unwanted
surprises IMHO.

I will come back on the issue tomorrow …

Hi there and sorry for the delay,

as I enter the commands manually whilte testing, there are definitively no other commands between the two exports and git annex initremote.

I’ve now tried to run all three commands in one line as proposed with and without a leading env but still receive the same error as before …

Well, coming back on your “Otherwise …” - I don’t have access to docker and would be a complete newbie on that, adding another layer of “troubles”, which is not good IMHO …

But as far as I understand it, someone even must have set up and configured that container and the surroundings. Can you provide the OS and the commands/settings used for that ?

On the other hand it needs to run on bare metal without docker, right ?
What is the OS you are currently using ?

Looking at the issue from another perspective, would you be able to set up a plain MinIO server and try to establish a connection ?

Tnx & gooood night for now …

The problem seems to be on the client side, I don’t think Minio is the problem.

To start from scratch, I ran on my laptop.
(from the minio doc:)

docker run \
>    -p 9000:9000 \
>    -p 9090:9090 \
>    --name minio \
>    -v ~/minio/data:/data \
>    -e "MINIO_ROOT_USER=ROOTNAME" \
>    -e "MINIO_ROOT_PASSWORD=CHANGEME123" \
>    quay.io/minio/minio server /data --console-address ":9090"

I set up a new virtualenv, because I don’t like conda bulkyness and slowness (but still get the conda-forge latest git-annex nodep manually):

virtualenv ~/.virtualenvs/test_gitannex
cd ~/.virtualenvs/test_gitannex/
wget https://anaconda.org/conda-forge/git-annex/10.20220724/download/linux-64/git-annex-10.20220724-nodep_h1234567_0.tar.bz2
tar -xjf git-annex-10.20220724-nodep_h1234567_0.tar.bz2
rm git-annex-10.20220724-nodep_h1234567_0.tar.bz2
source bin/activate
pip install datalad

and then in an existing datalad repo

git config --add annex.security.allowed-ip-addresses 127.0.0.1
export AWS_ACCESS_KEY_ID=ROOTNAME
export AWS_SECRET_ACCESS_KEY=CHANGEME123
git annex initremote -d testremote type=S3 encryption=none exporttree=no autoenable=true host=localhost port=9000 protocol=http chunk=1GiB bucket=testbucket requeststyle=path

It worked right away.

Let me know if these commands work for you or where it fails.

Hi all

I just wanted to post an update.

We had the opportunity to chat with @quantelchen and take a closer look at the “no S3 credentials configured” issue. In this case, the problem was that some of the commands have been ran with sudo and some not, hence the env mismatch (in fact, none of the commands need to be ran with sudo, and none of them should). Running everything from the beginning, we made the problem go away.

As a side note, we then ran into another problem, this time related to http/https. IIRC, http worked, but since this was a local (testing) MinIO instance, our https connections were rejected due to an untrusted certificate. This has been discussed in this git-annex forum thread (see the comments) - in short, while certificate verification for https cannot be bypassed, one can add a self-signed certificate to be trusted locally (or, as an alternative, use letsencrypt certificates). But since this is just a testing setup for a future “proper” deployment, we weren’t too concerned with https.

Thanks a lot @bpinsard for your input on configuration options, and your troubleshooting efforts, they are very much appreciated!

1 Like

Created an account just to reply to this old thread. @bpinsard, I’ve been playing around a nearly identical setup today and your code did the trick:

datalad version: 0.19.2
annexremote: 1.6.0

Remote Minio instance running via Docker. Traefik v2 as a reverse proxy. With a little ChatGPT magic, I quickly created a IAM policy (shown below). The policy was able to do the following:

  • Allow for read-write permissions to the user “pranav” who is also a member of “group/admin”
  • Block anonymous download (datalad clone <https public target> worked, datalad get -r . failed, as it should)
  • Allow for readonly to authenticated user in “group/rush” from Minio
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": ["s3:GetObject", "s3:PutObject"],
      "Effect": "Allow",
      "Principal": {
        "AWS": ["arn:aws:iam:::group/admin"]
      },
      "Resource": [
        "arn:aws:s3:::test-datalad-s3-1/*"
      ]
    },
    {
      "Action": ["s3:GetObject"],
      "Effect": "Allow",
      "Principal": {
        "AWS": ["arn:aws:iam:::group/rush"]
      },
      "Resource": [
        "arn:aws:s3:::test-datalad-s3-1/*"
      ]
    }
  ]

}

Just wanted to say thanks for listing out the port=443 flag requirement along with protocol=https. it was not working without it

1 Like