Just another WordPress site - Ruhr-Universität Bochum
MatNexus: A tool for systematic text extraction and analysis in materials science
- Date: 19.09.2024
- Time:
- Place: GC-MAC Summer School 2023, KIT Karlsruhe, Germany
Abstract
Materials science is a data-rich field with the key source of data being the scientific literature. Yet, the ever-increasing volume of information presents a formidable challenge for researchers seeking to extract useful insights and make informed decisions. To this end, we have developed a systematic method through 'MatNexus', designed to enhance the extraction and analysis of textual data. Our method commences with an automated collection of pertinent scientific papers, moving forward with meticulous preprocessing of the raw text data, and culminating with the generation of word embeddings via the word2vec model. This stepwise approach provides a clean corpus for in-depth analysis, which allows to transform words or "entities" into into embeddings, amenable to mathematical operations. This method enables an in-depth comparison of materials based on their properties and characteristics as detailed in literature. Further, computed similarity measures can help cluster materials into categories, improving data comprehension and analysis. 'MatNexus' thereby serves as a potent tool for creating a corpus for text mining, predictive analysis, and latest research overview. We demonstrate the utility of our method with a specific application in electrocatalysis. Here, 'MatNexus' proves instrumental in aiding researchers discover novel materials and devise robust electrocatalystic candidate materials, leveraging existing published results. Consequently, 'MatNexus' illustrates a significant stride in navigating the vast textual data ocean in materials science.