Grobid
GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents.
It is designed and expected to be used to parse academic papers, where it works particularly well.
Note: if the articles supplied to Grobid are large documents (e.g. dissertations) exceeding a certain number of elements, they might not be processed.
This page covers how to use the Grobid to parse articles for LangChain.
Installation
The grobid installation is described in details in https://grobid.readthedocs.io/en/latest/Install-Grobid/. However, it is probably easier and less troublesome to run grobid through a docker container, as documented here.