Please wait a minute...
Data Analysis and Knowledge Discovery  2024, Vol. 8 Issue (1): 30-39    DOI: 10.11925/infotech.2096-3467.2023.0867
Current Issue | Archive | Adv Search |
ULEO: Unified Language of Experiment Operations for Representation of Synthesis Protocols
Fu Yun1,2,Zhu Liya1,Li Dan1,Sun Mengge1,2,Zhang Jianfeng3,Liu Xiwen1,2()
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
3Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (1769 KB)   HTML ( 25
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study addresses the unified representation issue of experimental operation verbs in synthetic experiment protocols, which provides high-quality experimental protocol data for science intelligence and robotics. [Methods] We utilized a collaborative approach driven by data and expert knowledge to identify and standardize experimental operation verbs from literature and patent texts related to synthesis. First, we used advanced open-source large models like ChatGLM2-6B to identify experimental operation verbs. Then, we combined Wu-Palmer and cosine similarity to standardize these verbs. Finally, we assessed their classification accuracy with expert knowledge. [Results] The study identified 149 operation verbs for inorganic synthetic experiments and 141 operation verbs for organic synthetic experiments. Expert judgment revealed that many of the 124 operation terms appearing in both groups do not possess distinct category characteristics. Therefore, we merged the two categories to have 166 experimental operation verbs representing the operations in organic, inorganic, and hybrid synthesis experiments. [Limitations] The study only employed basic prompt engineering techniques to direct the large model to recognize experimental operation verbs from publicly accessible datasets. This study focused on operation terms involved in synthesis, engineering, and basic steps without considering operation terms in dynamic, analytical, and name reactions. [Conclusions] This study establishes a unified language for representing experimental operations in synthesis, applicable to organic, inorganic, and hybrid synthesis reactions. It could inform the future development of scientific robotics experiments.

Key wordsUnified Language of Experiment Operations      AI for Science      Synthesis Experimental Protocols      Experiment Operations      Science Robotics     
Received: 04 September 2023      Published: 06 February 2024
ZTFLH:  G35  
  N19  
Fund:National Natural Science Foundation of China(72234005)
Corresponding Authors: Liu Xiwen,ORCID:0000-0003-0820-3622,E-mail:liuxw@mail.las.ac.cn。   

Cite this article:

Fu Yun, Zhu Liya, Li Dan, Sun Mengge, Zhang Jianfeng, Liu Xiwen. ULEO: Unified Language of Experiment Operations for Representation of Synthesis Protocols. Data Analysis and Knowledge Discovery, 2024, 8(1): 30-39.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2023.0867     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2024/V8/I1/30

数据集间重叠
论文篇数
液态
合成
固态
合成
溶胶-
凝胶合成
金纳米粒子
合成
液态合成 29 881 - - -
固态合成 167 15 144 - -
溶胶-凝胶合成 462 719 7 579 -
金纳米粒子合成 135 0 0 5 154
去重后的论文篇数 56 302
The Number of Papers After Duplicates Elimination
数据集 一条数据的起始标识 实验规程标识 人工设置的实验操作类标识 原文中使用的实验操作词标识
液态合成 doi operations type string
固态合成 doi operations type string
溶胶-凝胶合成 targets_string operations type token
金纳米粒子合成 targets_string synth_actions type string
procedure_graph op_type op_string
The Representation of Experimental Operation Fields
Examples of Experimental Operation Description
Framework of Constructing ULEO
步骤类型 步骤描述 实验操作词 操作词描述 来源
动态步骤* 基于反馈执行的步骤 HeatUntilComplete 加热直至指定温度 基础、工程、合成和分析步骤
分析步骤* 执行分析中的非合成步骤 RunNMR 使用核磁共振波谱法 基础、工程步骤
命名反应步骤* 常见的命名反应 SuzukiCoupling Suzuki偶联反应 基础、工程、合成、命名反应步骤
合成步骤 常见的合成过程 Evaporate 在给定温度和压力下,在给定时间内蒸发旋转蒸发器的内容物 基础、工程、合成步骤
工程步骤 常见的低级处理 HeatChillToTemp 加热/冷却容器至指定温度,并保持加热器/冷却器打开 基础、工程步骤
基础步骤 编译中产生的与设备相关的直接可执行步骤 CChillerSetTemp 为冷凝器设置温度 -
The Experimental Steps and Examples of Operation Verbs
Example of Using ChemicalTagger and ChatGLM2-6B
Example of Using ChatGLM2-6B
The Standardization of Experimental Operation Verbs
无机合成实验操作词 有机合成实验操作词
press peptize flocculate start alkylate
fire polish sterilize check ionize
sonicate decarbonate sulfurize compare deionize
ground passivate drain catalyze liquefy
dialyze nitride siphon weigh saponify
autoclave oxygenate graphitize thicken graft
etch densify nebulize characterize demineralize
carbonize gelatinize plasticize acetylate halogenate
hydrate - - accumulate -
Experimental Operation Verbs Only Appeared in Inorganic or Organic Synthesis
The Difference Between Using Preference
Top 20% Commonly Used Verbs of Inorganic Synthesis Experimental Operation
Top 20% Commonly Used Verbs of Organic Synthesis Experimental Operation
Top 20% Verbs Preferring to Use in Inorganic Synthesis Experimental Operation
Top 20% Verbs Preferring to Use in Organic Synthesis Experimental Operation
[1] Coley C W, Thomas D A, Lummiss J A M, et al. A Robotic Platform for Flow Synthesis of Organic Compounds Informed by AI Planning[J]. Science, 2019, 365(6453): eaax1566.
[2] Steiner S, Wolf J, Glatzel S, et al. Organic Synthesis in a Modular Robotic System Driven by a Chemical Programming Language[J]. Science, 2019, 363(6423): eaav2211.
[3] Burger B, Maffettone P M, Gusev V V, et al. A Mobile Robotic Chemist[J]. Nature, 2020, 583(7815): 237-241.
doi: 10.1038/s41586-020-2442-2
[4] Zhu Q, Zhang F, Huang Y, et al. An All-round AI-Chemist with a Scientific Mind[J]. National Science Review, 2022, 9(10):nwac190.
doi: 10.1093/nsr/nwac190
[5] Jiang Y, Salley D, Sharma A, et al. An Artificial Intelligence Enabled Chemical Synthesis Robot for Exploration and Optimization of Nanomaterials[J]. Science Advances, 2022, 8(40): eabo2626.
[6] Zhao H, Chen W, Huang H, et al. A Robotic Platform for the Synthesis of Colloidal Nanocrystals[J]. Nature Synthesis, 2023, 2(6): 505-514.
doi: 10.1038/s44160-023-00250-5
[7] 付芸, 刘细文, 朱丽雅, 等. 实验规程的过程级语义表示研究综述[J]. 数据分析与知识发现, 2023, 7(8):1-16.
[7] (Fu Yun, Liu Xiwen, Zhu Liya, et al. Review of Semantic Representation of Experimental Protocols at Process-Level[J]. Data Analysis and Knowledge Discovery, 2023, 7(8):1-16 )
[8] Vaucher A C, Zipoli F, Geluykens J, et al. Automated Extraction of Chemical Synthesis Actions from Experimental Procedures[J]. Nature Communications, 2020, 11(1): Article No.3601.
[9] Mehr S H M, Craven M, Leonov A I, et al. A Universal System for Digitization and Automatic Execution of the Chemical Synthesis Literature[J]. Science, 2020, 370(6512): 101-108.
doi: 10.1126/science.abc2986 pmid: 33004517
[10] Hammer A J S, Leonov A I, Bell N L, et al. Chemputation and the Standardization of Chemical Informatics[J]. JACS Au, 2021, 1(10): 1572-1587.
doi: 10.1021/jacsau.1c00303 pmid: 34723260
[11] Kim E, Jensen Z, Van Grootel A, et al. Inorganic Materials Synthesis Planning with Literature-Trained Neural Networks[J]. Journal of Chemical Information and Modeling, 2020, 60(3): 1194-1201.
doi: 10.1021/acs.jcim.9b00995 pmid: 31909619
[12] Wang Z, Cruse K, Fei Y, et al. ULSA: Unified Language of Synthesis Actions for the Representation of Inorganic Synthesis Protocols[J]. Digital Discovery, 2022, 1(3): 313-324.
doi: 10.1039/D1DD00034A
[13] Kononova O, Huo H, He T, et al. Text-mined Dataset of Inorganic Materials Synthesis Recipes[J]. Scientific Data, 2019, 6: Article No. 203.
[14] Wang Z, Kononova O, Cruse K, et al. Dataset of Solution-based Inorganic Materials Synthesis Procedures Extracted from the Scientific Literature[J]. Scientific Data, 2022, 9: Article No.231.
[15] Cruse K, Trewartha A, Lee S, et al. Text-mined Dataset of Gold Nanoparticle Synthesis Procedures, Morphologies, and Size Entities[J]. Scientific Data, 2022, 9: Article No.234.
[16] Huo H, Rong Z, Kononova O, et al. Semi-supervised Machine-learning Classification of Materials Synthesis Procedures[J]. npj Computational Materials, 2019, 5: Article No.62.
[17] Kim E, Huang K, Kononova O, et al. Distilling a Materials Synthesis Ontology[J]. Matter, 2019, 1(1): 8-12.
doi: 10.1016/j.matt.2019.05.011
[18] 付芸, 朱丽雅, 韩涛, 等. 实验规程数据化研究与建设趋势分析[J/OL]. 信息资源管理学报. https://link.cnki.net/urlid/42.1812.G2.20240128.2226.002.
[18] (Fu Yun, Zhu Liya, Han Tao, et al. Trends Analysis of Experimental Protocol Datafication on Research and Construction[J/OL]. Journal of Information Resources Management. https://link.cnki.net/urlid/42.1812.G2.20240128.2226.002.)
[19] Lowe D M. Extraction of Chemical Structures and Reactions from the Literature[D]. University of Cambridge, 2012.
[20] Hawizy L, Jessop D M, Adams N, et al. ChemicalTagger: A Tool for Semantic Text-mining in Chemistry[J]. Journal of Cheminformatics, 2011, 3: Article No.17.
[21] Zeng A, Liu X, Du Z, et al. GLM-130B: An Open Bilingual Pre-trained Model[OL]. arXiv Preprint, arXiv:2210.02414.
[22] Kojima T, Gu S S, Reid M, et al. Large Language Models are Zero-Shot Reasoners[OL]. arXiv Preprint, arXiv:2205.11916.
[23] Zheng C, Liu Z, Xie E, et al. Progressive-Hint Prompting Improves Reasoning in Large Language Modelsc[OL]. arXiv Preprint, arXiv:2304.09797.
[24] Wu Z, Palmer M. Verb Semantics and Lexical Selection[C]// Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics. 1994:133-138.
[25] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing. 2013:3111-3119.
[26] Soldatova L N, Nadis D, King R D, et al. EXACT2: The Semantics of Biomedical Protocols[J]. BMC Bioinformatics, 2014, 15(Suppl14): Article No.S5.
[1] Hu Zhongyi, Shui Diancheng, Wu Jiang. Identifying Structural Elements of Scholarly Abstracts with ERNIE-DPCNN[J]. 数据分析与知识发现, 2024, 8(1): 125-144.
[2] Li Xuesi, Zhang Zhixiong, Wang Yufei, Liu Yi. A Review on Methods for Domain Knowledge Evolution Analysis[J]. 数据分析与知识发现, 2024, 8(1): 1-15.
[3] Shen Lingyun, Le Xiaoqiu. Review of Text Neural Semantic Parsing Methods[J]. 数据分析与知识发现, 2023, 7(12): 1-21.
[4] Cao Wei, Liao Chenyue, Zhang Fuwei. RMB Exchange Rate Forecasting Driven by Cross-Market and Cross-Source Sentiment Analysis[J]. 数据分析与知识发现, 2023, 7(12): 75-87.
[5] Lyu Xueqiang, Du Yifan, Zhang Le, Pan Huiping, Tian Chi. GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features[J]. 数据分析与知识发现, 2023, 7(12): 155-163.
[6] Wei Jianxiang, Lu Qian, Han Pu, Huang Weidong. Event Detection Model Based on Semantic Information Fusion[J]. 数据分析与知识发现, 2023, 7(12): 64-74.
[7] Li Helong, Ren Changsong, Liu Xinru, Wang Cunhua. Review of Textual Sentiment Research in Financial Markets[J]. 数据分析与知识发现, 2023, 7(12): 22-39.
[8] Wang Yong, Chen Junyu, Liu Dong, Deng Jiangzhou. A Deep Learning Recommendation Model with Item Audience Feature[J]. 数据分析与知识发现, 2023, 7(12): 114-124.
[9] Wu Xuxu, Chen Peng, Jiang Huan. Micro-Blog Fine-Grained Sentiment Analysis Based on Multi-Feature Fusion[J]. 数据分析与知识发现, 2023, 7(12): 102-113.
[10] Lai Yubin, Chen Yan, Hu Xiaochun, Huang Xin. Sentiment Analysis of Micro-blog on Public Health Emergency with Prompt Embedding[J]. 数据分析与知识发现, 2023, 7(11): 46-55.
[11] Yang Ruyun, Ma Jing. A Feature-Enhanced Multi-modal Emotion Recognition Model Integrating Knowledge and Res-ViT[J]. 数据分析与知识发现, 2023, 7(11): 14-25.
[12] Zeng Ziming, Zhang Yu. Rumor Detection of Public Health Emergencies Based on Data Augmentation and Multi-Task Learning[J]. 数据分析与知识发现, 2023, 7(11): 56-67.
[13] Zhai Yujia, Zhou Rui, Li Yan, Mao Zhigang. Analyzing Researchers’ Interdisciplinarity and Academic Impacts[J]. 数据分析与知识发现, 2023, 7(11): 140-157.
[14] Gao Haoxin, Sun Lijuan, Wu Jingchen, Gao Yutong, Wu Xu. Online Sensitive Text Classification Model Based on Heterogeneous Graph Convolutional Network[J]. 数据分析与知识发现, 2023, 7(11): 26-36.
[15] Lin Zhe, Chen Pinghua. Analyzing Text Sentiments Based on Patch Attention and Involution[J]. 数据分析与知识发现, 2023, 7(11): 37-45.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn