GSoC 2022 Project Idea 19.2: Developing a Latex to XML pipeline and exploring a standalone platform for NBDT Journal (175 h)

We launched a platinum open access overlay neuroscience journal (Neurons, Behavior, Data Analysis, and Theory – NBDT). After 30 articles accepted, we are now ready for inclusion in Pubmed Central (PMC). This requires providing the full text of articles in an XML format that conforms to an acceptable journal article DTD (Document Type Definition). The following pages provide detailed information about PMC’s technical requirements:

How to Include a Journal in PMC.

This section details all of our data sample requirements.

See the PMC file specifications at: http://www.ncbi.nlm.nih.gov/pmc/pub/filespec/.

This page details the XML and image file requirements, and describes how these files must be named and packaged for delivery.

The minimum criteria for evaluation of sample data can be found at: http://www.ncbi.nlm.nih.gov/pmc/pub/min_requirements/.

At the moment papers are submitted to NBDT in latex format.

NBDT is hosted by Scholastica, and they have already something in place for archiving in XML: Why archiving is essential for open access journals and how to get started. We can certainly start from there, but it would be good to have an independent streamline in case we decide to change platform.

Planned effort: Approximately 175 hours

Intended skill level: Intermediate, Advanced

Project effort: Half-time

Pre-requisite skills: Latex, XML, Python, Html

Lead mentor: Daniele Marinazzo @Daniele_Marinazzo (Ghent University) – CET timezone

Co-mentor: Konrad Kording @Konrad_Kording (University of Pennsylvania) – EST timezone

Project aims and tasks:

We describe the project structure and approximate work as follows

  • Create a pipeline to convert latex papers in xml, this can be a good start Tools for Converting LaTeX to XML (45 hours)
  • Adapt the final format to PMC requirements (see above) (30 hours)
  • Integrate with the current Scholastica platform (30 hours)
  • Explore and prototype a standalone platform (70 hours)

Tech keywords: Latex, XML, Python, Html, DTD (Document Type Definition)

Hi @Daniele_Marinazzo @Konrad_Kording
l’m Naresh Kumar, a junior pursuing my B.Tech(CSE) at Amrita School of Engineering. I’m interested in working this project and I feel that my skill set matches this project. I’ve submitted my project proposal in GSoC portal. I’m attaching the same for your reference here.