[Objective] This paper proposes a method to automatically extract technical indicators from the “U.S. Commerce Control List”, aiming to better understand technical details of the listed products and the U.S. export control policies. [Methods] We represented the technical indicators as their objects, names, relationships, and values. Then, we proposed an automated model to extract technical indicators, and stored them as structured four-element records. [Results] The proposed method effectively extract the technical indicators from the “Commerce Control List” in a non-supervised manner. The precision and F1 values of our method reached 87.34% and 86.52%, respectively. [Limitations] The proposed extraction method is mainly for the text of the “Commerce Control List”, and more research is needed to examine it with other corpus. [Conclusions] This proposed method could effectively extract technical indicators from “Commerce Control List” of the United States.
袁钺, 庞娜, 李广建. 美国《商业管制清单》中技术指标自动抽取研究*[J]. 数据分析与知识发现, 2023, 7(1): 35-48.
Yuan Yue, Pang Na, Li Guangjian. Automatically Extracting Technical Indicators from U.S. Commerce Control List. Data Analysis and Knowledge Discovery, 2023, 7(1): 35-48.
0A987.f. Laser aiming devices or laser illuminators “specially designed” for use on firearms, and having an operational wavelength exceeding 400 nm but not exceeding 710 nm.
2B006.b.2. Linear position feedback units “specially designed” for machine tools and having an overall “accuracy” less (better) than (800 + (600 × L/1,000)) nm (L equals effective length in mm);
指标关系
比较级、比较短语、范围短语
一般会出现在指标数值的附近
与指标数值存在句法依存
2B116.c. Vibration thrusters (shaker units), with or without associated amplifiers, capable of imparting a force equal to or greater than 50 kN (11,250 lbs.), measured ‘bare table’, and usable in vibration test systems described in 2B116.a;
2B209.b. Rotor-forming mandrels designed to form cylindrical rotors of inside diameter between 75 mm and 400 mm.
指标名称
专业领域下的指标术语
一般与指标数值和指标关系出现在同一物项句中,且位置较为靠前
与指标数值或指标关系存在句法依存
2A991.a.1. Manufactured for use at operating temperatures above 573 K (300 °C) either by using special materials or by special heat treatment; or
2B005.b. Ion implantation “production equipment” having beam currents of 5 mA or more;
指标对象
技术、产品及其组成部件所对应的名词
一般出现在受控物项文本的开头部分,有时可能出现在同ECCN类的上级物项的文本中
与指标数值、指标关系和指标名称存在句法依存(有可能跨物项句)
9A990.a. Diesel engines, n.e.s., for trucks, tractors, and automotive applications of continuous brake horsepower of 400 BHP (298 kW) or greater (performance based on SAE J1349 standard conditions of 100 Kpa and 25°)
1A008.a. ‘Shaped charges’ having all of the following: 1A008.a.1. Net Explosive Quantity (NEQ) greater than 90 g; and 1A008.a.2. Outer casing diameter equal to or greater than 75 mm;
Table 2 指标四要素在CCL中的表现形式
Fig.1 受控物项的技术指标抽取方法
Fig.2 指标数值抽取示例
Fig.3 指标关系抽取示例
Fig.4 指标名称抽取示例
Fig.5 指标对象抽取示例
统计信息
数量
标注数据所属领域数量/个
10
标注数据所属组别数量/组
5
标注数据涉及物项句数量/条
991
指标数值/个 指标关系/个 指标名称/个 指标对象/个
1 121 1 123 882 567
Table 3 实验数据集信息统计
统计信息
涉及领域/个
涉及组别/组
涉及物项句/条
指标数值 指标关系 指标名称 指标对象
10 10 10 10
5 5 5 5
991 964 870 567
Table 4 4种指标要素分布统计
抽取对象
精确率/%
召回率/%
F1值/%
指标数值
98.14
98.66
98.40
指标关系
99.38
99.73
99.69
指标名称
72.73
88.89
80.00
指标对象
93.75
74.07
82.76
总体抽取结果
90.74
92.88
91.80
Table 5 指标抽取实验结果
Fig.6 句法依存分析例句
作者
精确率/%
召回率/%
F1值/%
谢维佳等[14]
↓0.56
↑0.37
-
郭少卿等[15]
↓6.73
↓12.12
↓9.45
吴胜等[16]
↓5.24
↓13.70
↓9.58
时公泽等[17]
↓24.74
↓24.13
↓24.46
Kim等[18]
↓20.24
↓50.88
↓39.20
Table 6 相关无监督研究抽取的结果
抽取方法
精确率/%
召回率/%
F1值/%
本文方法
87.34
85.71
86.52
人工标注的方法
100
100
100
Table 7 四要素抽取结果
Fig.7 依存句法分析和句法树错误识别案例
[1]
Commerce Control List (CCL)[EB/OL]. [2022-07-05]. https://www.bis.doc.gov/index.php/regulations/commerce-control-list-ccl.
( Zheng Yanning, Deng Bo. Analysis of the Application of Information Extraction Technology in Information Science[J]. Information Studies: Theory & Application, 2008, 31(5): 769-772.)
( Wu Chao, Zheng Yanning, Hua Bolin. Numerical Information Extraction: A Review of Research[J]. Journal of Library Science in China, 2014, 40(2): 107-119.)
( Li Guangjian, Wang Kai, Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study[J]. Data Analysis and Knowledge Discovery, 2020, 4(9): 26-40.)
( Song Rui, Lin Hongfei, Chang Fuyang. Chinese Comparative Sentences Identification and Comparative Relations Extraction[J]. Journal of Chinese Information Processing, 2009, 23(2): 102-107.)
[6]
Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical Information Extraction Applications: A Literature Review[J]. Journal of Biomedical Informatics, 2018, 77: 34-49.
doi: S1532-0464(17)30256-3
pmid: 29162496
[7]
Zhou P, El-Gohary N. Ontology-Based Automated Information Extraction from Building Energy Conservation Codes[J]. Automation in Construction, 2017, 74: 103-117.
doi: 10.1016/j.autcon.2016.09.004
[8]
Wanichayapong N, Pruthipunyaskul W, Pattara-Atikom W, et al. Social-Based Traffic Information Extraction and Classification[C]// Proceedings of 2011 11th International Conference on ITS Telecommunications. 2011: 107-112.
( Tang Xiaobo, Tan Mingliang, Hu Xiaoran, et al. A Review of Financial Decision-Making Support-Oriented Knowledge Acquisition[J]. Journal of Information Resource Management, 2020, 10(3): 27-35.)
( Rao Qi, Wang Peiyan, Zhang Guiping. Text Feature Analysis on SAO Structure Extraction from Chinese Patent Literatures[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 349-356.)
( Du Wentao, Liu Peiyu, Fei Shaodong, et al. Chinese Comparative Sentences Recognition Based on Associated Feature Vocabulary[J]. Journal of Computer Applications, 2013, 33(6): 1591-1594.)
doi: 10.3724/SP.J.1087.2013.01591
[12]
Hu X, Wu J, He J R. Textual Indicator Extraction from Aviation Accident Reports[C]// Proceedings of AIAA Aviation 2019 Forum. 2019. DOI: 10.2514/6.2019-2939.
doi: 10.2514/6.2019-2939
( Li Chunjie, Ma Jianling, Zhu Xuemei. A Overview of Numerical Information Extraction Research and Application Analysis[J]. Information Science, 2019, 37(2): 40-45, 124.)
( Xie Weijia, Wang Yingtao. Research on the Extraction of Laboratory Data and Information in the Electronic Medical Records System[J]. China Digital Medicine, 2015, 10(3): 69-70, 96.)
( Guo Shaoqing, Le Xiaoqiu. Identifying Actual Value of Numerical Indicator from Scientific Paper[J]. Data Analysis and Knowledge Discovery, 2018, 2(1): 21-28.)
Wu Sheng, Liu Maofu, Hu Huijun, et al. Unsupervised Extraction of Attribute-Value Entity Relation from Chinese Texts[J]. Journal of Wuhan University(Natural Science Edition), 2016, 62(6): 552-560.)
( Shi Gongze, Wang Haochang. Ontology Concept Extraction for Product Indicators Based on Double Strategy Combination[J]. Information Technology, 2017(3): 26-29, 33.)
[18]
Kim S, Jeong M, Lee G G. A Local Tree Alignment Approach to Relation Extraction of Multiple Arguments[J]. Information Processing & Management, 2011, 47(4): 593-605.
doi: 10.1016/j.ipm.2010.12.002
[19]
ibiblio. A Dictionary of Units of Measurement[EB/OL]. [2022-05-18]. http://www.ibiblio.org/units.
[20]
15 CFR Supplement No. 1 to Part 774 - The Commerce Control List[EB/OL]. [2022-07-05]. https://www.govinfo.gov/content/pkg/CFR-2020-title15-vol2/pdf/CFR-2020-title15-vol2-part774-appNo-.pdf.