TY - JOUR
T1 - pathfinder
T2 - A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy
AU - Iyer, Kartheik G.
AU - Yunus, Mikaeel
AU - O’Neill, Charles
AU - Ye, Christine
AU - Hyk, Alina
AU - McCormick, Kiera
AU - Ciucă, Ioana
AU - Wu, John F.
AU - Accomazzi, Alberto
AU - Astarita, Simone
AU - Chakrabarty, Rishabh
AU - Cranney, Jesse
AU - Field, Anjalie
AU - Ghosal, Tirthankar
AU - Ginolfi, Michele
AU - Huertas-Company, Marc
AU - Jabłońska, Maja
AU - Kruk, Sandor
AU - Liu, Huiling
AU - Marchidan, Gabriel
AU - Mistry, Rohit
AU - Naiman, J. P.
AU - Peek, J. E.G.
AU - Polimera, Mugdha
AU - Rodríguez Méndez, Sergio J.
AU - Schawinski, Kevin
AU - Sharma, Sanjib
AU - Smith, Michael J.
AU - Ting, Yuan Sen
AU - Walmsley, Mike
N1 - The authors are extremely grateful to all the beta testers who provided feedback to pathfinder while it was being developed. Part of this work was done at the 2024 Jelinek Memorial Summer Workshop on Speech and Language Technologies and was supported with discretionary funds from Johns Hopkins University and from the EU Horizons 2020 program's Marie Sklodowska-Curie grant No. 101007666 (ESPERANTO). Advanced Research Computing at Hopkins provided cloud computing to support the research. K.I. would like to thank the organizers of the Galevo23 workshop and KITP for providing an ideal environment for K.I. to meet I.C., Y.S.T., and J.P. and get this project started. K.I. is also grateful to Michael Kurtz for reminding him that the embedding space is a Hausdorff space, not a pure vector space. Support for K.I. was provided by NASA through the NASA Hubble Fellowship grant HST-HF2-51508 awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., for NASA, under contract NAS5-26555. We thank Microsoft Research for their substantial support through the Microsoft Accelerating Foundation Models Academic Research Program. We are deeply grateful to Dr. Kenji Takeda from MSFR for his constant support for UniverseTBD projects. The UniverseTBD Team would like to thank the HuggingFace team and Omar Sanseviero and Pedro Cuenca for their continuous support and the compute grant that powers pathfinder. We are also grateful for the support from OpenAI through the OpenAI Researcher Access Program.
PY - 2024/12/1
Y1 - 2024/12/1
N2 - The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords. Utilizing state-of-the-art large language models (LLMs) and a corpus of 385,166 peer-reviewed papers from the Astrophysics Data System, pathfinder offers an innovative approach to scientific inquiry and literature exploration. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context as a complement to currently existing methods that use keywords or citation graphs. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes. We demonstrate the tool’s versatility through case studies, showcasing its application in various research scenarios. The system’s performance is evaluated using custom benchmarks, including single-paper and multipaper tasks. Beyond literature review, pathfinder offers unique capabilities for reformatting answers in ways that are accessible to various audiences (e.g., in a different language or as simplified text), visualizing research landscapes, and tracking the impact of observatories and methodologies. This tool represents a significant advancement in applying artificial intelligence to astronomical research, aiding researchers at all career stages in navigating modern astronomy literature.
AB - The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural language instead of syntactic searches with keywords. Utilizing state-of-the-art large language models (LLMs) and a corpus of 385,166 peer-reviewed papers from the Astrophysics Data System, pathfinder offers an innovative approach to scientific inquiry and literature exploration. Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context as a complement to currently existing methods that use keywords or citation graphs. It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes. We demonstrate the tool’s versatility through case studies, showcasing its application in various research scenarios. The system’s performance is evaluated using custom benchmarks, including single-paper and multipaper tasks. Beyond literature review, pathfinder offers unique capabilities for reformatting answers in ways that are accessible to various audiences (e.g., in a different language or as simplified text), visualizing research landscapes, and tracking the impact of observatories and methodologies. This tool represents a significant advancement in applying artificial intelligence to astronomical research, aiding researchers at all career stages in navigating modern astronomy literature.
UR - https://www.scopus.com/pages/publications/85210920034
UR - https://www.scopus.com/pages/publications/85210920034#tab=citedBy
U2 - 10.3847/1538-4365/ad7c43
DO - 10.3847/1538-4365/ad7c43
M3 - Article
AN - SCOPUS:85210920034
SN - 0067-0049
VL - 275
JO - Astrophysical Journal, Supplement Series
JF - Astrophysical Journal, Supplement Series
IS - 2
M1 - 38
ER -