GSoC 2022 Project Idea 19.1: Automatic reviewer matching using Natural Language Processing: Infrastructure for NBDT Journal (350 h)

malin · February 3, 2022, 12:15pm

The peer-review process is one of the most crucial processes in science. It allows scientists who work in a similar research area to assess and review the publication before publishing. However, as the research community grows, it gets harder to find relevant reviewers. This issue not only puts the burden on editors who need to search for relevant reviewers but also on publications that may get reviewed slower. Moreover, a selection process may introduce bias. An automatic selection approach promises to fasten the review process and reduce bias during an assignment. Here, we propose an automatic pipeline for paper-reviewer assignments for the Neurons, Behavior, Data analysis, and Theory (NBDT) platinum open access overlay journal which allows editors to search, create, find, and assign reviewers in the field facilitated by NLP and machine learning. We envision that the open-source repository can be used in other journals or conferences in the future.

Planned effort: Approximately 300-350 hours

Intended skill level: Intermediate, Advanced

Project effort: Preferred full-time

Pre-requisite skills: Python, REST, Javascript

Lead mentor: Titipat Achakulvisut (Mahidol University) – BKK timezone

Co-mentors: Daniele Marinazzo @Daniele_Marinazzo (Ghent University), Konrad Kording @Konrad_Kording (University of Pennsylvania) – EST timezone

Project aims and tasks:

We describe the project structure and approximate work as follows

Create and design a reviewer database structure for the NBDT journal. This may include name, email, institution, research interests, published papers, relevant papers, Semantic Scholar ID (50 hours)
Create and design a web application with log-in using ReactJS that allows editors to store and add information about the reviewers. This should store active reviewers, reviews they have done, and timestamp of the reviews (100 hours)
Write a Python script to parse reviewers from publications from Arxiv, PsycArxiv, BioArxiv, and relevant neuroscience journals from the MEDLINE database. (50 hours)
Create an API using FastAI for a web application that allows an editor to put in a given abstract and then find relevant reviewers using Natural Language Processing and deep learning techniques (50 hours)
Optimizing the assignment function to maximize relevance and reduce potential biases (50 hours)

Tech keywords: Python, FastAPI, SQL, REST, ReactJS

vars · February 6, 2022, 9:58am

Hey!

I am an undergraduate sophomore currently majoring in Computational Natural Science.

As I was going through INCF and its work, I was quite intrigued by this project. I like how the idea could be generalized for use elsewhere too. With some tweaking, such an algorithm could be used for many other journals (as mentioned in the project proposal) and perhaps even version control systems like GitHub/GitLab.

As a less experienced researcher and contributor, it’s often very intimidating to add someone as a reviewer for my work. Such a tool would make it much easier!

It would be great if anyone could help me out on how I could get started on this project. If this project is currently not being worked on, Please let me know if there’s anything else I could do.

titipata · February 9, 2022, 4:01am

Hi @vars, thanks for your interest. Yes, I know that developing a project requires some tools to work with. Here is the list of tools that we may use alongside the project:

Python libraries: numpy, pandas, lxml (to parse data from MEDLINE dataset), scikit-learn (for implementing basic Natural language processing tasks), huggingface (Deep learning model related to NLP tasks), FastAPI (API library for Python).
Frontend development: These can be a little tricky. We are mostly familiar with NextJS and GatsbyJS which are based on ReactJS.
Cloud: Firebase, Firebase Storage (for collecting user data, and database)

We will point out more specific publications and implementations later on. The project mentors are all familiar with all these tools. We also have some developers that can help guide with tools selection.

Let us know if these tackle your concerns!

vars · February 9, 2022, 12:51pm

Heyy, thanks a lot for sharing!

I’m familiar with most of the Python libraries, and frontend technologies mentioned here. I’ll learn the usage of Firebase in the coming days.

I’ll start working on the project proposal. Or, instead, is it preferred that I just start working on a prototype based on the following plan?

Based on the project description, I see the following subtasks to work on:

Create and design a reviewer database structure for the NBDT journal. (Initially a view of the schema for the database, then actually designing the database using Firebase)
Create and design a web application with log-in using ReactJS that allows editors to store and add information about the reviewers
Parse reviewers from publications
Create API using FASTAI to find relevant reviewers
Optimize the assignment function

Is following this order fine?

titipata · February 16, 2022, 6:26pm

Hi @vars, that sounds great! Start exploring tools would be a perfect start for the project. The steps you mentioned are what we planned to implement and explore.

byzhang · March 6, 2022, 7:56pm

Hi @titipata @Konrad_Kording, nice to meet you virtually! I’m an incoming masters student in CS at the University of Pennsylvania. I received my B.S. in neuroscience (computational) from USC. As someone who has previous experience with neuroscience research, software engineering, and full-stack web development, I think this project is a perfect match for me. The tech stacks required are right up my alley.
To prepare for this project, I will familiarize myself with the tools and try to draft a proposal soon. Look forward to working with you.

titipata · March 7, 2022, 3:56am

@byzhang that’s great to hear. I think this is a perfect match for you. Definitely, feel free to reach out and we can discuss further too!

byzhang · March 7, 2022, 11:13pm

Hi @titipata, thanks for your response. As I’m working to create a draft proposal, what’s the best way to get some feedbacks?

titipata · March 8, 2022, 5:22am

@byzhang Feel free to send me via an email cc. Konrad and Danielle! I don’t type out emails here but you can find them easily online. We can also arrange a short meeting before you proceed with the project too so that it best aligns with your expertise. Also, we might have more people working on it from multiple locations which would be quite great

Aadi207 · March 17, 2022, 7:13am

Hi @titipata
I am proficient in programming languages, including Python and Java. I have done basic machine learning projects in python and intermediate projects using java. I want to start working on this project. Could you please guide me on where to start from? More specifically, is there a current issue or work that could be assigned to me?
I am looking forward to working with you.

titipata · March 17, 2022, 7:31am

Hi @Aadi207 thanks for reaching out! We haven’t divided the work among everyone yet. However, there are kinda 2-3 workflow we plan to do 1) gathering reviewer’s data from open dataset (Python) 2) building recommendation engine for suggesting reviewers (Natural Language Processing and Huggingface library) 3) create database and frontend for NBDT editors to find reviewers (ReactJS). I’ll plan things out and try to reach out to everyone so that we can work efficiently over a few next months and in the summer. Hope this works for you!

Aadi207 · March 17, 2022, 7:51pm

Yes, that’s perfect. To be of more help, I am gonna try and generate some recommendation system using nlp that could be used in the project, does that work?

titipata · March 29, 2022, 2:54pm

@Aadi207 Let’s discuss further details via email and call to see the fit first and then you can start implementing upon the selection process. You do not have to start working on it now!