Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (8): 1-16     https://doi.org/10.11925/infotech.2096-3467.2023.0335
  综述评介 本期目录 | 过刊浏览 | 高级检索 |
实验规程的过程级语义表示研究综述*
付芸1,2,刘细文1,2(),朱丽雅1,韩涛1,2
1中国科学院文献情报中心 北京 100190
2中国科学院大学经济与管理学院信息资源管理系 北京 100190
Review of Semantic Representation of Experimental Protocols at Process-Level
Fu Yun1,2,Liu Xiwen1,2(),Zhu Liya1,Han Tao1,2
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2Department of Information Resources Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (5477 KB)   HTML ( 54
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】揭示实验规程过程级语义表示研究进展,发现尚需解决的关键研究问题,探究发展趋势。【文献范围】 使用相关主题词在Web of Science、arXiv、Engineering Village、中国知网、万方、维普中检索筛选出76篇文献,并参考知名实验规程专业期刊的提交要求和评审原则文档。【方法】在界定实验规程及其过程级语义表示相关概念基础上,从过程级语义表示方法、表示要素抽取方法以及相关表示数据应用三方面进行分析评述。【结果】实验规程的过程级语义表示研究整体处于发展初期,表示方法中表示框架尚未统一、表示要素各异,从以自然语言编写为主的实验规程中自动抽取过程级语义表示要素难度较大、效果一般,过程级语义表示的实验规程数据已在部分方向开展应用研究,整体可提升空间较大。【局限】 未详细阐述面向表示要素自动抽取技术细节及数据应用方法过程。【结论】未来应融合各类表示方法的优势以探索构建包含较完整要素的统一表示框架,探索基于先进智能技术的表示要素自动抽取方法研究,探索使用过程级语义表示的实验数据开展广泛应用研究。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
付芸
刘细文
朱丽雅
韩涛
关键词 实验规程过程级语义表示表示方法表示要素抽取方法数据应用    
Abstract

[Objective] This paper explores the research progress of the process-level semantic representation of experimental protocols. It aims to discover the key issues to be addressed and identify development trends. [Coverage] We used related topics to retrieve the relevant literature from Web of Science, arXiv, Engineering Village, CNKI, Wanfang, and VIP. We also examined the requirements of the submission requirements and evaluation principles of renowned journals on experimental protocols. [Methods] First, we defined the concepts of experimental protocols and their semantic representation at the process-level. Then, we examined the representation methods, representation element extraction, and application of representative data. [Results] The research on process-level semantic representation is in the early development stages. The representation framework was not unified, and the elements were different. The experimental protocols were mainly written in natural language, which were difficult to extract the representation elements automatically. Some studies explored the application of process-level semantic representation data, which leaves more knowledge gaps to be filled. [Limitations] This paper does not thoroughly discuss the technical details of extracting representation elements from literature and data application methods. [Conclusions] We need to establish a unified representation framework for more complete elements by integrating various representation methods. We should also explore automatic extraction methods based on advanced intelligent technology and application using the process-level semantic representation data.

Key wordsExperimental Protocols    Process-Level Semantic Representation    Representation Method    Representation Element Extraction Method    Data Application
收稿日期: 2023-04-14      出版日期: 2023-09-13
ZTFLH:  G35  
  N19  
基金资助:* 国家自然科学基金重点项目(72234005);国家社会科学基金项目(22BTQ019)
通讯作者: 刘细文,ORCID:0000-0003-0820-3622,E-mail:liuxw@mail.las.ac.cn。   
引用本文:   
付芸, 刘细文, 朱丽雅, 韩涛. 实验规程的过程级语义表示研究综述*[J]. 数据分析与知识发现, 2023, 7(8): 1-16.
Fu Yun, Liu Xiwen, Zhu Liya, Han Tao. Review of Semantic Representation of Experimental Protocols at Process-Level. Data Analysis and Knowledge Discovery, 2023, 7(8): 1-16.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.0335      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I8/1
本体 对象属性 标注
属性
数据
属性
公理 逻辑
公理
声明
公理
注释
断言
EXPO 325个,包括Entity类、Abstract和Physical两个子类,整体类别层级已划分至9层;每个类都明确定义其标签和表示含义、包含的下一子类、对应的公理、不相交类等信息 78个,包括hasAttribute和hasPart两个属性及子属性;每个属性都明确定义其用法、上级属性、对应的类等 2个 0个 2 067条 1 019条 402条 646条
EXACT2 162个,包括experimental action、experimental procedure、alert message等;每个类都明确定义其标签和表示含义、包含的下一子类、对应的公理、不相交类等信息 16个,包括hasAttribute和hasPart两个属性及子属性;每个属性都明确定义其用法、上级属性、对应的类等 7个 1个 858条 371条 168条 319条
P-PLAN 12个,其中Plan和MultiStep分属多个类;每个类都明确定义其标签和表示含义、包含的下一子类、对应的公理等信息 15个,包括correspondsToStep、hasInputVar等;每个属性都明确定义其标签和表示含义、表示语言转化形式、对应的类等 13个 0个 142条 47条 61 319条
OntoSafe 513个,包括Sensor、controller、Hazard等,每个类都明确定义其标签和表示含义、包含的下一子类、对应的公理等信息 80个,包括define、hasElment、isCausedBy等;每个属性都明确定义其用法、表示语言转化形式等 5个 70个 2 496条 1 172条 720 604条
Table 1  4个本体内容概要
Fig.1  EXPO中与实验规程相关类及关系结构
Fig.2  EXACT2类及关系结构
Fig.3  P-PLAN示例
Fig.4  OntoSafe示例
数据模型名称 实验动作及表示参数
有机合成实验动作序列模型[14] 定义28类实验动作,且为每个动作预定义相关属性:
InvalidAction(error)、Add(material, dropwise, temperature, atmosphere, duration)、CollectLayer (layer)、Concentrate(none)、Degas(gas, duration)、DrySolid(duration, temperature, atmosphere)、DrySolution(material)、Extract(solvent, repetitions)、Filter(phase_to_keep)、FollowOtherProcedure(none)、MakeSolution(materials)、Microwave(duration, temperature)、OtherLanguage(none)、Partition (material_1, material_2)、PH(material, ph, dropwise, temperature)、PhaseSeparation(none)、Purify (none)、Quench(material, dropwise, temperature)、Recrystallize(solvent)、Reflux(duration, dean_stark)、SetTemperature(temperature)、Sonicate(duration, temperature)、Stir(duration, temperature)、Triturate(solvent)、Wait(duration, temperature)、Wash(material, repetitions)、Yield(material)、NoAction(none)
无机合成实验规程的过程级语义表示数据模型 固态合成[31] 定义6类实验动作:HeatingOperation、ShapingOperation、DryingOperation 、LiquidGrinding、QuenchingOperation、SolutionMixing
预定义8类通用属性:token、type、conditions、heating_temperature(max_value, min_value, values, units)、heating_time(max_value, min_value, values, units)、heating_atmosphere、mixing_device、mixing_media
基于液体的无机合成[32] 定义6类实验动作:MixingOperation、PurificationOperation、HeatingOperation、DryingOperation、CoolingOperation、ShapingOperation
预定义8类通用属性:token、type、conditions、temperature(max_value, min_value, values, units)、time(max_value, min_value, values, units)、atmosphere、mixing_device、mixing_media
金纳米粒子[33] 定义5类实验动作:MIXING、STARTSYNTHESIS、HEATING、COOLING、DRYING
预定义14类通用属性:congtain_recipe、conditions、temperature(value, unit, max_value, min_value)、time(value, unit, max_value, min_value)、string、subject、type、env_toks、op_token、op_type、ref_op、subject、temp_values(max, min, tok_ids, units, values)、time_values(max, min, tok_ids, units, values)
无机合成动作统一语言模型ULSA[30] 定义8类动作:Starting、Mixing、Purification、Heating、Cooling、Shaping、Reaction、Non-Altering
Table 2  合成实验规程的过程级语义表示数据模型
Fig.5  CRF模型表示的阿司匹林合成示意图[34]
Fig.6  χDL模型示例[41]
图名称 节点标签 边标签
动作图 表示操作和参数,省略/缺失的参数通过添加“隐式参数”节点来处理,灰色节点表示缺少引用边,表示“原始”节点;定义18类:TARGET、MATERIAL、DESCRIPTOR、AMT_UNIT、CND_MISC、CND_UNIT、INTERMED、OPERATION、NUMBER、AMT_MISC、PROP_UNIT、PROP_TYPE、PROP_MISC、SYNTH_APRT、CHAR_APRT、BRAND、META、REF 表示操作与其参数之间的关系,定义两类:association、reference
合成图 表示材料、操作和属性,定义11类:material-start、material-intermedium、material-final、material-solvent、materials-others、operation、property-time、property-temp、property-rot、property-press、property-atmosphere 表示节点间的关系,定义三类:
condition、next、coreference
流程图 表示材料、操作和属性,定义19类:operation、material、nonrecipe-material、number、property-misc、property-type、property-unit、amount-unit、amount-misc、condition-misc、condition-type、synthesis-apparatus、apparatus-unit、apparatus-property-type、material-descriptor、apparatus-descriptor、brand、meta、reference 表示节点间的关系,定义三类:
Operation-Operation、Operation-Material、Remaining relations
过程执行图 表示操作和参数,其中操作用橙色标记,表示实验操作动词,共定义14类:Transfer、Temperature、Treatment、General、Mix、Spin、Create、Destroy、Remove、Measure、Wash、Time、Seal、Convert
参数用蓝色标记,表示实验物理对象,共定义8类:Reagent、Measurement、Setting、Location、Modifier、Device、Method、Seal
表示操作与其参数之间的关系,定义三类:core-roles、non-core roles、temporal edges
Table 3  结构化图表示方法
模型名称 文献类型 技术方法 抽取内容
ChemDataExtractor 2.0 [57] 科技文献 User Model包含三部分:Quantity Model(包括任何物理量,如时间、密度等)、Compound Model(包括名称、标签和角色)、Base Model(包括用户定义的字段) 基于用户预定义的化学知识本体
ChemicalTagger [58] 专利 基于规则的语法解析树 实验实体、动作及其关系
Synthesis Project [4] 期刊论文 利用ChemDataExtractor[59]和SpaCy解析器,识别合成动词;混合使用神经网络单词标记和遍历依赖分析树提取合成参数;暂无法处理文本中多个分离的合成路线 实验合成操作及参数
动作图抽取[5] 期刊论文 神经网络模型:实体抽取;语法树解析:动作图抽取 实验动作、参数及关系
SynthReader [9] 期刊论文 基于专家定义的模式匹配启发法NLP算法 把实验程序由文本格式转化为χDL格式
专利实验规程抽取[14] 专利 Pistachio中的规则模型:LeadMine + ChemicalTagger,自定义的基于规则的NLP模型 实验动作及相关的化合物、数量和反应条件等
合成图抽取[42]、流程图抽取[43] 期刊论文 基于深度学习的序列标记模型(Mat-ELMo和Bi-LSTM-CRF):抽取实体;基于简单启发式规则的关系提取器:抽取关系 固态电池制造实验合成图;材料合成实验规程
过程执行图(PEG)抽取[15] 文本标注语料 基于SciBERT和消息传递神经网络的PEG预测管道方法,同时该网络联合学习跨度和关系表示 生物化学实验规程
固体氧化物电池实验规程抽取[60] 期刊论文 BiLSTM-CRF Mat2Vec+Word2Vec 材料、值、装置、实验槽等信息
无机材料合成实验规程抽取[54] 期刊论文 ChemDataExtractor:单词标记及预处理;LDA+RF:实验信息分类;马尔科夫链:动作序列预测 合成方法、实验步骤和详细的加工参数
无机领域合成实验规程抽取[32-33,46] 期刊论文 MatBERT-BiLSTM-CRF:材料实体识别;RNN+基于规则的句子依赖树解析算法:实验合成动作识别与分类 材料:目标、前体或其他材料,以及实验合成的动作序列
实验规程实体及关系管道抽取[61] 文本标注语料 基于 PubmedBERT 的神经穷举模型:实验实体识别;神经穷举模型扩展模型:实验关系识别(实验动作序列) 实验实体及实验动作序列抽取效果最好的方法,其他稍弱的方法见文献[62]
实验规程实体及关系管道抽取[63] 文本标注语料 模型分三部分:跨度表示:跨句关系关键特征提取;关系编码器+卷积+解码器:发现长距离输入实体之间的局部关系;多头R-GCN(多头关系图卷积网络):解决隐式参数 解决隐式论点(动作和实体之间的隐藏关系)以及超越局部的跨句子长范围语义关系
Table 4  实验动作序列抽取方法
[1] Jumper J, Evans R, Pritzel A, et al. Highly Accurate Protein Structure Prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589.
doi: 10.1038/s41586-021-03819-2
[2] De Pablo J J, Jackson N E, Webb M A, et al. New Frontiers for the Materials Genome Initiative[J]. NPJ Computational Materials, 2019, 5: 41.
doi: 10.1038/s41524-019-0173-4
[3] Girault I, D’Ham C, Ney M, et al. Characterizing the Experimental Procedure in Science Laboratories: A Preliminary Step Towards Students Experimental Design[J]. International Journal of Science Education, 2012, 34(6): 825-854.
doi: 10.1080/09500693.2011.569901
[4] Kim E, Huang K, Saunders A, et al. Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning[J]. Chemistry of Materials, 2017, 29(21): 9436-9444.
doi: 10.1021/acs.chemmater.7b03500
[5] Mysore S, Kim E, Strubell E, et al. Automatically Extracting Action Graphs from Materials Science Synthesis Procedures[OL]. arXiv Preprint, arXiv:1711.06872.
[6] Baker M. 1,500 Scientists Lift the Lid on Reproducibility[J]. Nature, 2016, 533(7604): 452-454.
doi: 10.1038/533452a
[7] Seifrid M, Pollice R, Aguilar-Granda A, et al. Autonomous Chemical Experiments: Challenges and Perspectives on Establishing a Self-Driving Lab[J]. Accounts of Chemical Research, 2022, 55(17): 2454-2466.
doi: 10.1021/acs.accounts.2c00220 pmid: 35948428
[8] Coley C W, Eyke N S, Jensen K F. Autonomous Discovery in the Chemical Sciences Part II: Outlook[J]. Angewandte Chemie, 2020, 59(52): 23414-23436.
[9] Mehr S H M, Craven M, Leonov A I, et al. A Universal System for Digitization and Automatic Execution of the Chemical Synthesis Literature[J]. Science, 2020, 370(6512): 101-108.
doi: 10.1126/science.abc2986 pmid: 33004517
[10] Soldatova L N, King R D. An Ontology of Scientific Experiments[J]. Journal of the Royal Society, Interface, 2006, 3(11): 795-803.
pmid: 17015305
[11] Lewis T. Design and Inquiry: Bases for an Accommodation Between Science and Technology Education in the Curriculum?[J]. Journal of Research in Science Teaching, 2006, 43(3): 255-281.
doi: 10.1002/(ISSN)1098-2736
[12] Yang X J, Zhang X L, Zuo J, et al. An Analysis of Relation Extraction Within Sentences from Wet Lab Protocols[C]// Proceedings of the 2021 IEEE International Conference on Big Data. 2021: 562-570.
[13] Soldatova L N, Nadis D, King R D, et al. EXACT2: The Semantics of Biomedical Protocols[J]. BMC Bioinformatics, 2014, 15(14): S5.
[14] Vaucher A C, Zipoli F, Geluykens J, et al. Automated Extraction of Chemical Synthesis Actions from Experimental Procedures[J]. Nature Communications, 2020, 11: 3601.
doi: 10.1038/s41467-020-17266-6 pmid: 32681088
[15] Tamari R, Bai F, Ritter A, et al. Process-Level Representation of Scientific Protocols with Interactive Annotation[C]// Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021: 2190-2202.
[16] Steiner S, Wolf J, Glatzel S, et al. Organic Synthesis in a Modular Robotic System Driven by a Chemical Programming Language[J]. Science, 2019, 363(6423): eaav2211.
[17] Arch-int N, Arch-int S. Semantic Ontology Mapping for Interoperability of Learning Resource Systems Using a Rule-Based Reasoning Approach[J]. Expert Systems with Applications, 2013, 40(18): 7428-7443.
doi: 10.1016/j.eswa.2013.07.027
[18] Daraio C, Lenzerini M, Leporelli C, et al. The Advantages of an Ontology-Based Data Management Approach: Openness, Interoperability and Data Quality[J]. Scientometrics, 2016, 108(1): 441-455.
doi: 10.1007/s11192-016-1913-6
[19] Nelson E K, Piehler B, Eckels J, et al. LabKey Server: An Open Source Platform for Scientific Data Integration, Analysis and Collaboration[J]. BMC Bioinformatics, 2011, 12: 71.
doi: 10.1186/1471-2105-12-71 pmid: 21385461
[20] Rodríguez M, Laguía J. An Ontology for Process Safety[J]. Chemical Engineering Transactions, 2019, 77: 67-72.
[21] McGuinness D L, Harmelen F V. Web Ontology Language [A]// Encyclopedia of Social Network Analysis and Mining[M]. New York: Springer, 2014.
[22] Kügler P, Marian M, Schleich B, et al. tribAIn—Towards an Explicit Specification of Shared Tribological Understanding[J]. Applied Sciences, 2020, 10(13): 4421.
doi: 10.3390/app10134421
[23] King R D, Rowland J, Oliver S G, et al. The Automation of Science[J]. Science, 2009, 324(5923): 85-89.
doi: 10.1126/science.1165620 pmid: 19342587
[24] Qi D, King R D, Hopkins A L, et al. An Ontology for Description of Drug Discovery Investigations[J]. Journal of Integrative Bioinformatics, 2010, 7(3): 126.
[25] Vanschoren J, Soldatova L N. Exposé: An Ontology for Data Mining Experiments[C]// Proceedings of International Workshop on the 3 rd Generation Data Mining: Towards Service-Oriented Knowledge Discovery. 2010: 31-46.
[26] Cheung K, Drennan J, Hunter J. Towards an Ontology for Data-Driven Discovery of New Materials[C]// Proceedings of Semantic Scientific Knowledge Integration AAAI/SSS Workshop. 2008: 9-14.
[27] Soldatova L N, Aubrey W, King R D, et al. The EXACT Description of Biomedical Protocols[J]. Bioinformatics, 2008, 24(13): i295-i303.
doi: 10.1093/bioinformatics/btn156
[28] Celebi R, Moreira J R, Hassan A A, et al. Towards FAIR Protocols and Workflows: The OpenPREDICT Use Case[J]. PeerJ Computer Science, 2020, 6: e281.
doi: 10.7717/peerj-cs.281 pmid: 33816932
[29] Barrows E, Martin K, Smith T. Markup Language for Chemical Process Control and Simulation[J]. Computers & Chemical Engineering, 2022, 160: 107702.
doi: 10.1016/j.compchemeng.2022.107702
[30] Wang Z R, Cruse K, Fei Y X, et al. ULSA: Unified Language of Synthesis Actions for the Representation of Inorganic Synthesis Protocols[J]. Digital Discovery, 2022, 1(3): 313-324.
doi: 10.1039/D1DD00034A
[31] Kononova O, Huo H Y, He T J, et al. Text-Mined Dataset of Inorganic Materials Synthesis Recipes[J]. Scientific Data, 2019, 6: 203.
doi: 10.1038/s41597-019-0224-1 pmid: 31615989
[32] Wang Z R, Kononova O, Cruse K, et al. Dataset of Solution-Based Inorganic Materials Synthesis Procedures Extracted from the Scientific Literature[J]. Scientific Data, 2022, 9: 231.
doi: 10.1038/s41597-022-01317-2 pmid: 35614129
[33] Cruse K, Trewartha A, Lee S, et al. Text-Mined Dataset of Gold Nanoparticle Synthesis Procedures, Morphologies, and Size Entities[J]. Scientific Data, 2022, 9: 234.
doi: 10.1038/s41597-022-01321-6 pmid: 35618761
[34] Coley C W, Thomas III D A, Lummiss J A M, et al. A Robotic Platform for Flow Synthesis of Organic Compounds Informed by AI Planning[J]. Science, 2019, 365(6453): eaax1566.
doi: 10.1126/science.aax1566
[35] Hammer A J S, Leonov A I, Bell N L, et al. Chemputation and the Standardization of Chemical Informatics[J]. JACS Au, 2021, 1(10): 1572-1587.
doi: 10.1021/jacsau.1c00303 pmid: 34723260
[36] Wang Z, Zhao W, Hao G F, et al. Automated Synthesis: Current Platforms and Further Needs[J]. Drug Discovery Today, 2020, 25(11): 2006-2011.
doi: 10.1016/j.drudis.2020.09.009
[37] Collins N, Stout D, Lim J P, et al. Fully Automated Chemical Synthesis: Toward the Universal Synthesizer[J]. Organic Process Research & Development, 2020, 24(10): 2064-2077.
[38] Bubliauskas A, Blair D J, Powell-Davies H, et al. Digitizing Chemical Synthesis in 3D Printed Reactionware[J]. Angewandte Chemie, 2022, 61(24): e202116108.
[39] Angelone D, Hammer A J S, Rohrbach S, et al. Convergence of Multiple Synthetic Paradigms in a Universally Programmable Chemical Synthesis Machine[J]. Nature Chemistry, 2021, 13(1): 63-69.
doi: 10.1038/s41557-020-00596-9 pmid: 33353971
[40] Wilbraham L, Mehr S H M, Cronin L. Digitizing Chemistry Using the Chemical Processing Unit: From Synthesis to Discovery[J]. Accounts of Chemical Research, 2021, 54(2): 253-262.
doi: 10.1021/acs.accounts.0c00674 pmid: 33370095
[41] Rohrbach S, Šiaučiulis M, Chisholm G, et al. Digitization and Validation of a Chemical Synthesis Literature Database in the ChemPU[J]. Science, 2022, 377(6602): 172-180.
doi: 10.1126/science.abo0058 pmid: 35857541
[42] Kuniyoshi F, Makino K, Ozawa J, et al. Annotating and Extracting Synthesis Process of All-Solid-State Batteries from Scientific Literature [OL]. arXiv Preprint, arXiv:2002.07339.
[43] Makino K, Kuniyoshi F, Ozawa J, et al. Extracting and Analyzing Inorganic Material Synthesis Procedures in the Literature[J]. IEEE Access, 2022, 10: 31524-31537.
doi: 10.1109/ACCESS.2022.3160201
[44] Guo J, Ibanez-Lopez A S, Gao H Y, et al. Automated Chemical Reaction Extraction from Scientific Literature[J]. Journal of Chemical Information and Modeling, 2022, 62(9): 2035-2045.
doi: 10.1021/acs.jcim.1c00284
[45] Mysore S J Z, Kim E, Huang K, et al. The Materials Science Procedural Text Corpus: Annotating Materials Synthesis Procedures with Shallow Semantic Structures[C]// Proceedings of the 13th Linguistic Annotation Workshop (Law Xiii). 2019: 56-64.
[46] Kononova O, Huo H Y, He T J, et al. Author Correction: Text-Mined Dataset of Inorganic Materials Synthesis Recipes[J]. Scientific Data, 2019, 6: 273.
doi: 10.1038/s41597-019-0297-x pmid: 31729397
[47] Kim E, Huang K, Kononova O, et al. Distilling a Materials Synthesis Ontology[J]. Matter, 2019, 1(1): 8-12.
doi: 10.1016/j.matt.2019.05.011
[48] Artrith N, Butler K T, Coudert F X, et al. Best Practices in Machine Learning for Chemistry[J]. Nature Chemistry, 2021, 13(6): 505-508.
doi: 10.1038/s41557-021-00716-z pmid: 34059804
[49] Hiszpanski A M, Gallagher B, Chellappan K, et al. Nanomaterial Synthesis Insights from Machine Learning of Scientific Articles by Extracting, Structuring, and Visualizing Knowledge[J]. Journal of Chemical Information and Modeling, 2020, 60(6): 2876-2887.
doi: 10.1021/acs.jcim.0c00199 pmid: 32286818
[50] Zhang Y, Wang C, Soukaseum M, et al. Unleashing the Power of Knowledge Extraction from Scientific Literature in Catalysis[J]. Journal of Chemical Information and Modeling, 2022, 62(14): 3316-3330.
doi: 10.1021/acs.jcim.2c00359
[51] Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python[J]. Journal of Machine Learning Research, 2011, 12: 2825-2830.
[52] Kim E, Huang K, Tomala A, et al. Machine-Learned and Codified Synthesis Parameters of Oxide Materials[J]. Scientific Data, 2017, 4: 170127.
doi: 10.1038/sdata.2017.127
[53] Wang W R, Jiang X, Tian S H, et al. Automated Pipeline for Superalloy Data by Text Mining[J]. NPJ Computational Materials, 2022, 8: 9.
doi: 10.1038/s41524-021-00687-2
[54] Huo H Y, Rong Z Z, Kononova O, et al. Semi-Supervised Machine-Learning Classification of Materials Synthesis Procedures[J]. NPJ Computational Materials, 2019, 5: 62.
doi: 10.1038/s41524-019-0204-1
[55] Krippendorff K. Content Analysis: An Introduction to Its Methodology[M]. The 4th Edition. Thousand Oaks: SAGE Publications, 2019.
[56] Cohen J. Weighted Kappa: Nominal Scale Agreement with Provision for Scaled Disagreement or Partial Credit[J]. Psychological Bulletin, 1968, 70(4): 213-220.
doi: 10.1037/h0026256 pmid: 19673146
[57] Mavračić J, Court C J, Isazawa T, et al. ChemDataExtractor 2.0: Autopopulated Ontologies for Materials Science[J]. Journal of Chemical Information and Modeling, 2021, 61(9): 4280-4289.
doi: 10.1021/acs.jcim.1c00446 pmid: 34529432
[58] Hawizy L, Jessop D M, Adams N, et al. ChemicalTagger: A Tool for Semantic Text-Mining in Chemistry[J]. Journal of Cheminformatics, 2011, 3(1): 1-13.
doi: 10.1186/1758-2946-3-1
[59] Swain M C, Cole J M. ChemDataExtractor: A Toolkit for Automated Extraction of Chemical Information from the Scientific Literature[J]. Journal of Chemical Information and Modeling, 2016, 56(10): 1894-1904.
doi: 10.1021/acs.jcim.6b00207 pmid: 27669338
[60] Friedrich A, Adel H, Tomazic F, et al. The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1255-1268.
[61] Sohrab M G, Duong Nguyen A K, Miwa M, et al. mgsohrab at WNUT 2020 Shared Task-1: Neural Exhaustive Approach for Entity and Relation Recognition over Wet Lab Protocols[C]// Proceedings of the 6th Workshop on Noisy User-Generated Text. 2020: 290-298.
[62] Tabassum J, Xu W, Ritter A, et al. WNUT-2020 Task 1 Overview: Extracting Entities and Relations from Wet Lab Protocols[C]// Proceedings of the 6th Workshop on Noisy User-Generated Text. 2020: 260-267.
[63] Kulkarni C, Chan J, Fosler-Lussier E, et al. Learning Latent Structures for Cross Action Phrase Relations in Wet Lab Protocols[C]// Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1:Long Papers). 2021: 6737-6750.
[64] Gopalaswamy V, Betti R, Knauer J P, et al. Tripled Yield in Direct-Drive Laser Fusion Through Statistical Modelling[J]. Nature, 2019, 565(7741): 581-586.
doi: 10.1038/s41586-019-0877-0
[65] Walker E, Kammeraad J, Goetz J, et al. Learning to Predict Reaction Conditions: Relationships Between Solvent, Molecular Structure, and Catalyst[J]. Journal of Chemical Information and Modeling, 2019, 59(9): 3645-3654.
doi: 10.1021/acs.jcim.9b00313 pmid: 31381340
[66] Gao H Y, Struble T J, Coley C W, et al. Using Machine Learning to Predict Suitable Conditions for Organic Reactions[J]. ACS Central Science, 2018, 4(11): 1465-1476.
doi: 10.1021/acscentsci.8b00357 pmid: 30555898
[67] Maser M R, Cui A Y, Ryou S, et al. Multilabel Classification Models for the Prediction of Cross-Coupling Reaction Conditions[J]. Journal of Chemical Information and Modeling, 2021, 61(1): 156-166.
doi: 10.1021/acs.jcim.0c01234 pmid: 33417449
[68] Vaucher A C, Schwaller P, Geluykens J, et al. Inferring Experimental Procedures from Text-Based Representations of Chemical Reactions[J]. Nature Communications, 2021, 12: 2573.
doi: 10.1038/s41467-021-22951-1 pmid: 33958589
[69] Miyao T, Kaneko H, Funatsu K. Inverse QSPR/QSAR Analysis for Chemical Structure Generation (from y to x)[J]. Journal of Chemical Information and Modeling, 2016, 56(2): 286-299.
doi: 10.1021/acs.jcim.5b00628 pmid: 26818135
[70] Tagade P M, Adiga S P, Pandian S, et al. Attribute Driven Inverse Materials Design Using Deep Learning Bayesian Framework[J]. NPJ Computational Materials, 2019, 5: 127.
doi: 10.1038/s41524-019-0263-3
[71] Onishi T, Kadohira T, Watanabe I. Relation Extraction with Weakly Supervised Learning Based on Process-Structure-Property-Performance Reciprocity[J]. Science and Technology of Advanced Materials, 2018, 19(1): 649-659.
doi: 10.1080/14686996.2018.1500852 pmid: 30245757
[72] Fukada K, Seyama M. Designing a Multilayer Film via Machine Learning of Scientific Literature[J]. Scientific Reports, 2022, 12: 930.
doi: 10.1038/s41598-022-05010-7 pmid: 35042971
[73] MacLeod B P, Parlane F G L, Morrissey T D, et al. Self-Driving Laboratory for Accelerated Discovery of Thin-Film Materials[J]. Science Advances, 2020, 6(20): eaaz8867.
doi: 10.1126/sciadv.aaz8867
[74] Li J G, Tu Y X, Liu R L, et al. Toward “On‐Demand” Materials Synthesis and Scientific Discovery Through Intelligent Robots[J]. Advanced Science, 2020, 7(7): 1901957.
doi: 10.1002/advs.v7.7
[75] Kusne A G, Yu H S, Wu C M, et al. On-the-Fly Closed-Loop Materials Discovery via Bayesian Active Learning[J]. Nature Communications, 2020, 11: 5966.
doi: 10.1038/s41467-020-19597-w pmid: 33235197
[76] Zhao H T, Chen W, Huang H, et al. A Robotic Platform for the Synthesis of Colloidal Nanocrystals[J]. Nature Synthesis, 2023, 2(6): 505-514.
doi: 10.1038/s44160-023-00250-5
[77] Burger B, Maffettone P M, Gusev V V, et al. A Mobile Robotic Chemist[J]. Nature, 2020, 583(7815): 237-241.
doi: 10.1038/s41586-020-2442-2
[78] Zhu Q, Zhang F, Huang Y, et al. An All-Round AI-Chemist with a Scientific Mind[J]. National Science Review, 2022, 9(10): nwac190.
doi: 10.1093/nsr/nwac190
[79] Williams K, Bilsland E, Sparkes A, et al. Cheaper Faster Drug Development Validated by the Repositioning of Drugs Against Neglected Tropical Diseases[J]. Journal of the Royal Society, Interface, 2015, 12(104): 20141289.
doi: 10.1098/rsif.2014.1289
[80] Wei Z P, Su J L, Wang Y, et al. A Novel Cascade Binary Tagging Framework for Relational Triple Extraction[C]// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020: 1476-1488.
[81] OpenAI. GPT-4 Technical Report [OL]. arXiv Preprint, arXiv: 2303.08774.
[1] 黄永文, 岳笑, 刘建华. 关联数据应用的体系框架及构建关联数据应用的建议[J]. 现代图书情报技术, 2011, 27(9): 7-13.
[2] 沈芸芸, 肖珑, 冯英. 元数据应用规范研究[J]. 现代图书情报技术, 2010, 26(12): 1-8.
[3] 张崇. DC元数据在国内的应用及思考[J]. 现代图书情报技术, 2004, 20(11): 6-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn