How to obtain the DOI of a scientific paper using the exact title, author name and year of publication and using an API?

JohannesWiesner · January 16, 2020, 3:33pm

Dear Neurostars,

I know that this question is not a neuro question and I already feel bad about breaking this rule. I already posted this question stackoverflow, but I haven’t got any replies. Since most of you are researchers that have dealt with scientific publications, DOIS, etc., your basically my last hope solving this problem.

I have the titles, author names, and years of publication for around 100 scientific papers. My aim is to get the DOI for each of those papers. My idea was to use the crossref api and the crossrefapi package. Unfortunately, I can’t get that to work.

Here’s an example, where I would like to get the DOI for the paper called ‘Cortical and subcortical gray matter abnormalities in schizophrenia determined through structural magnetic resonance imaging with optimized volumetric voxel-based morphometry’. The author’s name is Ananth and the paper was published in 2002:

import numpy as np
import pandas as pd
from crossref.restful import Works

works = Works()

author = 'Ananth'
paper_title = 'Cortical and subcortical gray matter abnormalities in schizophrenia determined through structural magnetic resonance imaging with optimized volumetric voxel-based morphometry.'

w = works.query(author=author).query(container_title=paper_title).filter(from_online_pub_date='2002',until_online_pub_date='2002')

w_size = w.count()
paper_list = np.empty(shape=(w_size,1),dtype=object)

for idx,item in enumerate(w):
    if 'title' in item:
        paper_list[idx] = item['title']
    else:
        paper_list[idx] = 'NaN'

for title in paper_list:
    print(title)

This gives me the following output:

['Pregnancy outcomes in women treated with elective versus ultrasound-indicated cervical cerclage']
['Revisiting sonographic abdominal circumference measurements: a comparison of outer centiles with established nomograms']
['Cervical length and spontaneous prematurity: laying the foundation for future interventional randomized trials for the short cervix']
['A comparison of sonographic cervical parameters in predicting spontaneous preterm birth in high-risk singleton gestations']
['Algebraic Techniques for Analysis of Large Discrete-Valued Datasets']

All of these papers have one author called Ananth, but the titles are not related at all and some of these papers also are not from 2002.

Does someone know how to do fix this?

If I just use the GUI from crossref and type in the title, the first entry in the output is the correct paper. I thought, that the crossref api would work similarly.

I read this post, but I would like to have an automated version using an api, so that you don’t have to upload a .csv or .xml file each time. I also read this discussion on GitHub, but I am not sure, if this is related to my problem.

Also, I don’t necessarily have to use crossref or the crossrefapi package, so any other suggestions on how to automatically obtain a DOI are also appreciated.

dangom · January 17, 2020, 7:18am

This seems like an XY problem. Would you care to share why you want to collect the DOIs? What would you like to do with them?

If you have the papers in pdf form, that’s essentially a solved problem.

Most pdfs you’ll download from journals will contain the DOI in their metadata, so if you run:

exiftool paper.pdf | grep Digital
Digital Object Identifier       : 10.1038/nature25988

you should get the DOIs.

If it so happens that some of the papers don’t come with their metadata, then adding them to Zotero, allowing it to grab the metadata for you, and exporting your collection as a .bib file should give you all DOIs.

If you don’t have the papers, then:

Either get the papers and go back to 1, or download the unpaywall database snapshot and query for your titles. Don’t even bother with author names or publication years. Alternatively you can make a request against their API (less of an overkill then downloading the whole database): https://unpaywall.org/products/api

JohannesWiesner · January 17, 2020, 2:08pm

@dangom: Yes, I would like to download the papers using their DOIs. I also want to compare the DOIs with other pdf-files (with the DOIs as filenames), to check whether I already have them.

So it is possible to query the unpaywall API using the title only? There’s only an example on how to make a request using the DOI but I don’t know how to do this the other way around. How would I do this for example with my title ( ‘Cortical and subcortical gray matter abnormalities in schizophrenia determined through structural magnetic resonance imaging with optimized volumetric voxel-based morphometry)?