(On Hold) HT-MAX: High-throughput Materials Discovery for Extremes

2023 - Present Active Project

Project Overview

Description

High-Throughput Materials Discovery for Extreme Conditions (HTMDEC), a multi-institution center led by Johns Hopkins University; the CMU project (Rollett & Strubell) focuses on scraping data from the literature.  Support from the Army Research Laboratory (ARL).

Summary of CMU Contribution:

CMU will build upon its prior work to develop Large Language Models (LLMs) for scraping data from the literature that is relevant to the aims of the HT-MAX project, e.g., discovery of novel materials in the area of ceramic composites.  This will include identifying relevant papers in the literature and extract numerical data from those papers.  We will use appropriate software tools to extract date from tables from PDFs and the associated metadata. The metadata comprises the PDF title, table title, DOI and the date of publication. Further, we will expand the existing pre­trained NER (Named Entity Recognition) models for table titles and deepen user queries to table contents. We will save the data in a suitable database. We will expand the existing graphical interface for users to interact with the database that searches using keywords and saves the retrieved the data.  We will continue to leverage state-of-the-art language models, e.g., MatBert, to develop a multi-label classifier that is capable of classifying abstracts of relevance to the overall project into a specific domain(s) in the various disciplines involved.