|
|
ULEO: Unified Language of Experiment Operations for Representation of Synthesis Protocols |
Fu Yun1,2,Zhu Liya1,Li Dan1,Sun Mengge1,2,Zhang Jianfeng3,Liu Xiwen1,2() |
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China 3Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract [Objective] This study addresses the unified representation issue of experimental operation verbs in synthetic experiment protocols, which provides high-quality experimental protocol data for science intelligence and robotics. [Methods] We utilized a collaborative approach driven by data and expert knowledge to identify and standardize experimental operation verbs from literature and patent texts related to synthesis. First, we used advanced open-source large models like ChatGLM2-6B to identify experimental operation verbs. Then, we combined Wu-Palmer and cosine similarity to standardize these verbs. Finally, we assessed their classification accuracy with expert knowledge. [Results] The study identified 149 operation verbs for inorganic synthetic experiments and 141 operation verbs for organic synthetic experiments. Expert judgment revealed that many of the 124 operation terms appearing in both groups do not possess distinct category characteristics. Therefore, we merged the two categories to have 166 experimental operation verbs representing the operations in organic, inorganic, and hybrid synthesis experiments. [Limitations] The study only employed basic prompt engineering techniques to direct the large model to recognize experimental operation verbs from publicly accessible datasets. This study focused on operation terms involved in synthesis, engineering, and basic steps without considering operation terms in dynamic, analytical, and name reactions. [Conclusions] This study establishes a unified language for representing experimental operations in synthesis, applicable to organic, inorganic, and hybrid synthesis reactions. It could inform the future development of scientific robotics experiments.
|
Received: 04 September 2023
Published: 06 February 2024
|
|
Fund:National Natural Science Foundation of China(72234005) |
Corresponding Authors:
Liu Xiwen,ORCID:0000-0003-0820-3622,E-mail:liuxw@mail.las.ac.cn。
|
[1] |
Coley C W, Thomas D A, Lummiss J A M, et al. A Robotic Platform for Flow Synthesis of Organic Compounds Informed by AI Planning[J]. Science, 2019, 365(6453): eaax1566.
|
[2] |
Steiner S, Wolf J, Glatzel S, et al. Organic Synthesis in a Modular Robotic System Driven by a Chemical Programming Language[J]. Science, 2019, 363(6423): eaav2211.
|
[3] |
Burger B, Maffettone P M, Gusev V V, et al. A Mobile Robotic Chemist[J]. Nature, 2020, 583(7815): 237-241.
doi: 10.1038/s41586-020-2442-2
|
[4] |
Zhu Q, Zhang F, Huang Y, et al. An All-round AI-Chemist with a Scientific Mind[J]. National Science Review, 2022, 9(10):nwac190.
doi: 10.1093/nsr/nwac190
|
[5] |
Jiang Y, Salley D, Sharma A, et al. An Artificial Intelligence Enabled Chemical Synthesis Robot for Exploration and Optimization of Nanomaterials[J]. Science Advances, 2022, 8(40): eabo2626.
|
[6] |
Zhao H, Chen W, Huang H, et al. A Robotic Platform for the Synthesis of Colloidal Nanocrystals[J]. Nature Synthesis, 2023, 2(6): 505-514.
doi: 10.1038/s44160-023-00250-5
|
[7] |
付芸, 刘细文, 朱丽雅, 等. 实验规程的过程级语义表示研究综述[J]. 数据分析与知识发现, 2023, 7(8):1-16.
|
[7] |
(Fu Yun, Liu Xiwen, Zhu Liya, et al. Review of Semantic Representation of Experimental Protocols at Process-Level[J]. Data Analysis and Knowledge Discovery, 2023, 7(8):1-16 )
|
[8] |
Vaucher A C, Zipoli F, Geluykens J, et al. Automated Extraction of Chemical Synthesis Actions from Experimental Procedures[J]. Nature Communications, 2020, 11(1): Article No.3601.
|
[9] |
Mehr S H M, Craven M, Leonov A I, et al. A Universal System for Digitization and Automatic Execution of the Chemical Synthesis Literature[J]. Science, 2020, 370(6512): 101-108.
doi: 10.1126/science.abc2986
pmid: 33004517
|
[10] |
Hammer A J S, Leonov A I, Bell N L, et al. Chemputation and the Standardization of Chemical Informatics[J]. JACS Au, 2021, 1(10): 1572-1587.
doi: 10.1021/jacsau.1c00303
pmid: 34723260
|
[11] |
Kim E, Jensen Z, Van Grootel A, et al. Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks[J]. Journal of Chemical Information and Modeling, 2020, 60(3): 1194-1201.
doi: 10.1021/acs.jcim.9b00995
pmid: 31909619
|
[12] |
Wang Z, Cruse K, Fei Y, et al. ULSA: Unified Language of Synthesis Actions for the Representation of Inorganic Synthesis Protocols[J]. Digital Discovery, 2022, 1(3): 313-324.
doi: 10.1039/D1DD00034A
|
[13] |
Kononova O, Huo H, He T, et al. Text-mined Dataset of Inorganic Materials Synthesis Recipes[J]. Scientific Data, 2019, 6: Article No. 203.
|
[14] |
Wang Z, Kononova O, Cruse K, et al. Dataset of Solution-based Inorganic Materials Synthesis Procedures Extracted from the Scientific Literature[J]. Scientific Data, 2022, 9: Article No.231.
|
[15] |
Cruse K, Trewartha A, Lee S, et al. Text-mined Dataset of Gold Nanoparticle Synthesis Procedures, Morphologies, and Size Entities[J]. Scientific Data, 2022, 9: Article No.234.
|
[16] |
Huo H, Rong Z, Kononova O, et al. Semi-supervised Machine-learning Classification of Materials Synthesis Procedures[J]. npj Computational Materials, 2019, 5: Article No.62.
|
[17] |
Kim E, Huang K, Kononova O, et al. Distilling a Materials Synthesis Ontology[J]. Matter, 2019, 1(1): 8-12.
doi: 10.1016/j.matt.2019.05.011
|
[18] |
付芸, 朱丽雅, 韩涛, 等. 实验规程数据化研究与建设趋势分析[J/OL]. 信息资源管理学报. https://link.cnki.net/urlid/42.1812.G2.20240128.2226.002.
|
[18] |
(Fu Yun, Zhu Liya, Han Tao, et al. Trends Analysis of Experimental Protocol Datafication on Research and Construction[J/OL]. Journal of Information Resources Management. https://link.cnki.net/urlid/42.1812.G2.20240128.2226.002.)
|
[19] |
Lowe D M. Extraction of Chemical Structures and Reactions from the Literature[D]. University of Cambridge, 2012.
|
[20] |
Hawizy L, Jessop D M, Adams N, et al. ChemicalTagger: A Tool for Semantic Text-mining in Chemistry[J]. Journal of Cheminformatics, 2011, 3: Article No.17.
|
[21] |
Zeng A, Liu X, Du Z, et al. GLM-130B: An Open Bilingual Pre-trained Model[OL]. arXiv Preprint, arXiv:2210.02414.
|
[22] |
Kojima T, Gu S S, Reid M, et al. Large Language Models are Zero-Shot Reasoners[OL]. arXiv Preprint, arXiv:2205.11916.
|
[23] |
Zheng C, Liu Z, Xie E, et al. Progressive-Hint Prompting Improves Reasoning in Large Language Modelsc[OL]. arXiv Preprint, arXiv:2304.09797.
|
[24] |
Wu Z, Palmer M. Verb Semantics and Lexical Selection[C]// Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 1994:133-138.
|
[25] |
Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing. 2013:3111-3119.
|
[26] |
Soldatova L N, Nadis D, King R D, et al. EXACT2: The Semantics of Biomedical Protocols[J]. BMC Bioinformatics, 2014, 15(Suppl14): Article No.S5.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|