MRIQC API service down?

pvelasco · October 21, 2020, 6:36pm

I’m trying to query the MRIQC API (e.g., https://mriqc.nimh.nih.gov/api/v1/T1w) and I’m getting a “HTTP Error 502: Bad Gateway”. I am able to connect to the server itself (https://mriqc.nimh.nih.gov).
Does anybody know if the API server is down?
Thanks.

rwblair · October 21, 2020, 8:45pm

Looks like there is an issue with its database connection. Will let you know when it’s back online.

rwblair · October 23, 2020, 8:41pm

Should be up now, let me know if you see any issues with it.

pvelasco · October 26, 2020, 3:34pm

It works now.
Thanks!

bfuchs18 · November 12, 2024, 5:38pm

Hi,

I am also encountering HTTP Error 502: Bad Gateway and HTTP Error 504: Gateway Time-out errors when trying to query the API. I’ve tried several times over the past few weeks, and these errors don’t consistently occur on the same pages. I’m wondering if these issues could also be due to the server being down or experiencing instability.

Does anyone have recommendations on handling these errors effectively or on improving the connection to the server? Any advice would be greatly appreciated.

I am using the function below, adapted from here.

def get_iqms_despite_errors(modality, versions=None, software='mriqc', page_limit=None):
    """
    Grab all iqms for the given modality and the list of versions
    """
    
    print(f"Running query for {modality}")
    
    url_root = 'https://mriqc.nimh.nih.gov/api/v1/{modality}?{query}'
    page = 1
    dfs = []

    if versions is None:
        versions = ['*']

       
    for version in versions:
        while True:
            
            if page_limit is not None and page > page_limit: 
                print("Reached specified page limit (page_limit)")
                break
                
            query = []

            if software is not None:
                query.append('"provenance.software":"%s"' % software)

            if version != '*':
                query.append('"provenance.version":"%s"' % version)

            page_url = url_root.format(
                modality=modality, query='where={%s}&page=%d' % (','.join(query), page)
            )
            print(f"Fetching {page_url}")
            
            try:
                with urllib.request.urlopen(page_url) as url:
                    data = json.loads(url.read().decode())
                    dfs.append(pd.json_normalize(data['_items']))

                    # Check if 'next' link exists
                    if 'next' not in data['_links'].keys():
                        break  # End of pages, stop the loop
                    else:
                        page += 1  # Continue to the next page

            except urllib.error.HTTPError as e:
                print(f"HTTP error on page {page}: {e}")
                page += 1  # Skip this page and move to the next one
                continue  # Continue the loop despite the error

            except Exception as e:
                print(f"Other error on page {page}: {e}")
                page += 1  # Skip this page and move to the next one
                continue  # Continue the loop despite the error


    # Compose a pandas dataframe
    return pd.concat(dfs, ignore_index=True)```

rwblair · November 12, 2024, 6:42pm

Server has been hit pretty hard by a few clients crawling it. I updated the python servers worker model to be more asynchronous to maybe survive timeouts from certain requests better.

Putting a timeout/sleep between urllib.request.urlopen may help reduce number of errors you’re seeing. I haven’t implemented any server side throttling so far, so hugging the server to death isn’t too hard.

Let me know if you see any reduction in 502 or 504s, curious to see if my configuration changes made any difference.