Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (1): 35-48    DOI: 10.11925/infotech.2096-3467.2022.0571
Current Issue | Archive | Adv Search |
Automatically Extracting Technical Indicators from U.S. Commerce Control List
Yuan Yue,Pang Na,Li Guangjian()
Department of Information Management, Peking University, Beijing 100871, China
Download: PDF (2070 KB)   HTML ( 25
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a method to automatically extract technical indicators from the “U.S. Commerce Control List”, aiming to better understand technical details of the listed products and the U.S. export control policies. [Methods] We represented the technical indicators as their objects, names, relationships, and values. Then, we proposed an automated model to extract technical indicators, and stored them as structured four-element records. [Results] The proposed method effectively extract the technical indicators from the “Commerce Control List” in a non-supervised manner. The precision and F1 values of our method reached 87.34% and 86.52%, respectively. [Limitations] The proposed extraction method is mainly for the text of the “Commerce Control List”, and more research is needed to examine it with other corpus. [Conclusions] This proposed method could effectively extract technical indicators from “Commerce Control List” of the United States.

Key wordsCommerce Control List      Technical Indicators      Automatic Extraction      Non-supervision     
Received: 03 June 2022      Published: 09 November 2022
ZTFLH:  G353  
Fund:National Social Science Fund of China(15ZDB129)
Corresponding Authors: Li Guangjian,ORCID:0000-0002-2897-6246,E-mail:ligj@pku.edu.cn。   

Cite this article:

Yuan Yue, Pang Na, Li Guangjian. Automatically Extracting Technical Indicators from U.S. Commerce Control List. Data Analysis and Knowledge Discovery, 2023, 7(1): 35-48.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022.0571     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2023/V7/I1/35

作者 抽取对象 数据集 数据集标注情况
饶齐等[10] 指标名称 中文专利文本 有标注
杜文韬等[11] 指标关系(“差”“不如”等常见比较词) 汽车和电子产品评论 有标注
Hu等[12] 指标数值 航空事故报告 有标注
谢维佳等[14] 指标数值 电子病历文本 无标注
郭少卿等[15] 指标数值、指标单位、指标关系(等于、大于、小于等关系比较词) 气候变化领域和天文领域语料 无标注
吴胜等[16] 指标数值、指标名称 钢铁行业、船舶行业等领域文本 无标注
时公泽等[17] 指标名称 液态奶领域文本 无标注
Kim等[18] 指标数值 报道诺贝尔奖的新闻文本 无标注
李广建等[4] 指标数值、指标名称 美国《商业管制清单》 无标注
Characteristics of Relevant Studies
要素 表现形式-内容特征 表现形式-位置特征 表现形式-语法特征 举例
指标数值 数学运算符、单位、数量词 一般会出现在某一物项句介词之后 与指标关系存在句法依存 0A987.f. Laser aiming devices or laser illuminators “specially designed” for use on firearms, and having an operational wavelength exceeding 400 nm but not exceeding 710 nm.
2B006.b.2. Linear position feedback units “specially designed” for machine tools and having an overall “accuracy” less (better) than (800 + (600 × L/1,000)) nm (L equals effective length in mm);
指标关系 比较级、比较短语、范围短语 一般会出现在指标数值的附近 与指标数值存在句法依存 2B116.c. Vibration thrusters (shaker units), with or without associated amplifiers, capable of imparting a force equal to or greater than 50 kN (11,250 lbs.), measured ‘bare table’, and usable in vibration test systems described in 2B116.a;
2B209.b. Rotor-forming mandrels designed to form cylindrical rotors of inside diameter between 75 mm and 400 mm.
指标名称 专业领域下的指标术语 一般与指标数值和指标关系出现在同一物项句中,且位置较为靠前 与指标数值或指标关系存在句法依存 2A991.a.1. Manufactured for use at operating temperatures above 573 K (300 °C) either by using special materials or by special heat treatment; or
2B005.b. Ion implantation “production equipment” having beam currents of 5 mA or more;
指标对象 技术、产品及其组成部件所对应的名词 一般出现在受控物项文本的开头部分,有时可能出现在同ECCN类的上级物项的文本中 与指标数值、指标关系和指标名称存在句法依存(有可能跨物项句) 9A990.a. Diesel engines, n.e.s., for trucks, tractors, and automotive applications of continuous brake horsepower of 400 BHP (298 kW) or greater (performance based on SAE J1349 standard conditions of 100 Kpa and 25°)
1A008.a. ‘Shaped charges’ having all of the following:
1A008.a.1. Net Explosive Quantity (NEQ) greater than 90 g; and
1A008.a.2. Outer casing diameter equal to or greater than 75 mm;
The Expression of Four Elements in CCL
Technical Indicator Extraction Method of Controlled Items
Example of Indicator Value Extraction
Example of Indicator Relationship Extraction
Example of Indicator Name Extraction
Example of Indicator Object Extraction
统计信息 数量
标注数据所属领域数量/个 10
标注数据所属组别数量/组 5
标注数据涉及物项句数量/条 991
指标数值/个
指标关系/个
指标名称/个
指标对象/个
1 121
1 123
882
567
Statistics of Experimental Dataset Information
统计信息 涉及领域/个 涉及组别/组 涉及物项句/条
指标数值
指标关系
指标名称
指标对象
10
10
10
10
5
5
5
5
991
964
870
567
Statistics of the Four-elements of Indicators
抽取对象 精确率/% 召回率/% F1值/%
指标数值 98.14 98.66 98.40
指标关系 99.38 99.73 99.69
指标名称 72.73 88.89 80.00
指标对象 93.75 74.07 82.76
总体抽取结果 90.74 92.88 91.80
Experimental Results of Indicator Extraction
Example of Syntactic Dependency Analysis
作者 精确率/% 召回率/% F1值/%
谢维佳等[14] ↓0.56 ↑0.37 -
郭少卿等[15] ↓6.73 ↓12.12 ↓9.45
吴胜等[16] ↓5.24 ↓13.70 ↓9.58
时公泽等[17] ↓24.74 ↓24.13 ↓24.46
Kim等[18] ↓20.24 ↓50.88 ↓39.20
Results of Relevant Unsupervised Studies
抽取方法 精确率/% 召回率/% F1值/%
本文方法 87.34 85.71 86.52
人工标注的方法 100 100 100
Experimental Results of Four-elements
Example of Syntactic Dependency Analysis and Syntax Tree Error Recognition Cases
[1] Commerce Control List (CCL)[EB/OL]. [2022-07-05]. https://www.bis.doc.gov/index.php/regulations/commerce-control-list-ccl.
[2] 郑彦宁, 邓擘. 信息抽取技术在情报学中的应用分析[J]. 情报理论与实践, 2008, 31(5): 769-772.
[2] ( Zheng Yanning, Deng Bo. Analysis of the Application of Information Extraction Technology in Information Science[J]. Information Studies: Theory & Application, 2008, 31(5): 769-772.)
[3] 吴超, 郑彦宁, 化柏林. 数值信息抽取研究进展综述[J]. 中国图书馆学报, 2014, 40(2): 107-119.
[3] ( Wu Chao, Zheng Yanning, Hua Bolin. Numerical Information Extraction: A Review of Research[J]. Journal of Library Science in China, 2014, 40(2): 107-119.)
[4] 李广建, 王锴, 张庆芝. 基于多源数据的美国出口管制分析框架及其实证研究[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
[4] ( Li Guangjian, Wang Kai, Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study[J]. Data Analysis and Knowledge Discovery, 2020, 4(9): 26-40.)
[5] 宋锐, 林鸿飞, 常富洋. 中文比较句识别及比较关系抽取[J]. 中文信息学报, 2009, 23(2): 102-107.
[5] ( Song Rui, Lin Hongfei, Chang Fuyang. Chinese Comparative Sentences Identification and Comparative Relations Extraction[J]. Journal of Chinese Information Processing, 2009, 23(2): 102-107.)
[6] Wang Y, Wang L, Rastegar-Mojarad M, et al. Clinical Information Extraction Applications: A Literature Review[J]. Journal of Biomedical Informatics, 2018, 77: 34-49.
doi: S1532-0464(17)30256-3 pmid: 29162496
[7] Zhou P, El-Gohary N. Ontology-Based Automated Information Extraction from Building Energy Conservation Codes[J]. Automation in Construction, 2017, 74: 103-117.
doi: 10.1016/j.autcon.2016.09.004
[8] Wanichayapong N, Pruthipunyaskul W, Pattara-Atikom W, et al. Social-Based Traffic Information Extraction and Classification[C]// Proceedings of 2011 11th International Conference on ITS Telecommunications. 2011: 107-112.
[9] 唐晓波, 谭明亮, 胡潇然, 等. 面向金融决策支持的知识获取研究综述[J]. 信息资源管理学报, 2020, 10(3): 27-35.
[9] ( Tang Xiaobo, Tan Mingliang, Hu Xiaoran, et al. A Review of Financial Decision-Making Support-Oriented Knowledge Acquisition[J]. Journal of Information Resource Management, 2020, 10(3): 27-35.)
[10] 饶齐, 王裴岩, 张桂平. 面向中文专利SAO结构抽取的文本特征比较研究[J]. 北京大学学报(自然科学版), 2015, 51(2): 349-356.
[10] ( Rao Qi, Wang Peiyan, Zhang Guiping. Text Feature Analysis on SAO Structure Extraction from Chinese Patent Literatures[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 349-356.)
[11] 杜文韬, 刘培玉, 费绍栋, 等. 基于关联特征词表的中文比较句识别[J]. 计算机应用, 2013, 33(6): 1591-1594.
doi: 10.3724/SP.J.1087.2013.01591
[11] ( Du Wentao, Liu Peiyu, Fei Shaodong, et al. Chinese Comparative Sentences Recognition Based on Associated Feature Vocabulary[J]. Journal of Computer Applications, 2013, 33(6): 1591-1594.)
doi: 10.3724/SP.J.1087.2013.01591
[12] Hu X, Wu J, He J R. Textual Indicator Extraction from Aviation Accident Reports[C]// Proceedings of AIAA Aviation 2019 Forum. 2019. DOI: 10.2514/6.2019-2939.
doi: 10.2514/6.2019-2939
[13] 李春杰, 马建玲, 主雪梅. 数值信息抽取研究概述及应用分析[J]. 情报科学, 2019, 37(2): 40-45, 124.
[13] ( Li Chunjie, Ma Jianling, Zhu Xuemei. A Overview of Numerical Information Extraction Research and Application Analysis[J]. Information Science, 2019, 37(2): 40-45, 124.)
[14] 谢维佳, 王映涛. 电子病历系统中检验数据信息抽取研究[J]. 中国数字医学, 2015, 10(3): 69-70, 96.
[14] ( Xie Weijia, Wang Yingtao. Research on the Extraction of Laboratory Data and Information in the Electronic Medical Records System[J]. China Digital Medicine, 2015, 10(3): 69-70, 96.)
[15] 郭少卿, 乐小虬. 科技论文中数值指标实际取值识别[J]. 数据分析与知识发现, 2018, 2(1): 21-28.
[15] ( Guo Shaoqing, Le Xiaoqiu. Identifying Actual Value of Numerical Indicator from Scientific Paper[J]. Data Analysis and Knowledge Discovery, 2018, 2(1): 21-28.)
[16] 吴胜, 刘茂福, 胡慧君, 等. 中文文本中实体数值型关系无监督抽取方法[J]. 武汉大学学报(理学版), 2016, 62(6): 552-560.
[16] Wu Sheng, Liu Maofu, Hu Huijun, et al. Unsupervised Extraction of Attribute-Value Entity Relation from Chinese Texts[J]. Journal of Wuhan University(Natural Science Edition), 2016, 62(6): 552-560.)
[17] 时公泽, 王浩畅. 基于双模式的产品指标本体概念抽取[J]. 信息技术, 2017(3): 26-29, 33.
[17] ( Shi Gongze, Wang Haochang. Ontology Concept Extraction for Product Indicators Based on Double Strategy Combination[J]. Information Technology, 2017(3): 26-29, 33.)
[18] Kim S, Jeong M, Lee G G. A Local Tree Alignment Approach to Relation Extraction of Multiple Arguments[J]. Information Processing & Management, 2011, 47(4): 593-605.
doi: 10.1016/j.ipm.2010.12.002
[19] ibiblio. A Dictionary of Units of Measurement[EB/OL]. [2022-05-18]. http://www.ibiblio.org/units.
[20] 15 CFR Supplement No. 1 to Part 774 - The Commerce Control List[EB/OL]. [2022-07-05]. https://www.govinfo.gov/content/pkg/CFR-2020-title15-vol2/pdf/CFR-2020-title15-vol2-part774-appNo-.pdf.
[1] Li Guangjian,Wang Kai,Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
[2] Liu Qingxiang,Zhang Pengzhu,Zhang Xiaoyan,Liu Jingfang. Automatically Extracting Talents’ Knowledge Structure Online[J]. 现代图书情报技术, 2016, 32(4): 56-63.
[3] Zeng Wen,Xu Shuo,Zhang Yunliang,Zhai Juanhua. The Research and Analysis on Automatic Extraction of Science and Technology Literature Terms[J]. 现代图书情报技术, 2014, 30(1): 51-55.
[4] Zhang Xiuxiu ,Ma Jianxia. Automatic Extraction of Semantic Metadata from PDF Research Papers[J]. 现代图书情报技术, 2009, 3(2): 102-106.
[5] Zeng Su,Ma Jianxia,Zhang Xiuxiu. New Development of Automatic Metadata Extraction[J]. 现代图书情报技术, 2008, 24(4): 7-11.
[6] He Lin. Research on the Relation Extraction of Domain Ontology[J]. 现代图书情报技术, 2008, 24(4): 35-38.
[7] Tan Chunmei,Yan Shiwei,Liu Zimu. Design and Realization of Knowledge Element Automatic Extraction of Network Special Subject Knowledge Organization[J]. 现代图书情报技术, 2008, 24(3): 62-67.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn