Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (9): 26-40    DOI: 10.11925/infotech.2096-3467.2020.0645
Current Issue | Archive | Adv Search |
Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study
Li Guangjian(),Wang Kai,Zhang Qingzhi
Department of Information Management, Peking University, Beijing 100871, China
Download: PDF (1717 KB)   HTML ( 5
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper propose a fine-grained multi-dimensional analysis framework based on multi-source data and in-depth semantic contents, aiming to address the deficiencies in analyzing U.S. export controls.[Methods] We constrcuted the framework based on the concept of multi-source data fusion, which integrated data from the CCL for items, the EAR for regulations, the blacklist for entities, and the Federal Register for polices. First, we identified the technical terms, the exact technical indicators values and the relationship between the controlled items from the multi-source data. Then, we built an index using the semantic dictionary and model. Third, we used the named entity recognition method to establish the correlated relationship between the controlled items and entities. This framework contains four analysis modes for the status quo, the specific items, the time sequences, and the countries.[Results] We examined the effectiveness of the framework with an empirical study on lithography. The recall for recognizing the controlled items reached 97.3% with the same tail ECCN number. The precision of recognizing Chinese mainland’s entity domains was up to 83.8%.[Limitations] We only selected the lithography for the empirical study and the framework could be improved.[Conclusions] The proposed framework provides an effective method to analyze the texts of U.S. export control documents.

Key wordsMulti-Source Data Fusion      Export Control      Commerce Control List      Multi-Dimensional Analysis Framework     
Received: 03 July 2020      Published: 22 July 2020
ZTFLH:  TP391  
Corresponding Authors: Li Guangjian     E-mail: ligj@pku.edu.cn

Cite this article:

Li Guangjian,Wang Kai,Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study. Data Analysis and Knowledge Discovery, 2020, 4(9): 26-40.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0645     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I9/26

A Multi-Dimensional Analysis Framework Based on Multi-Source Data Fusion for Export Control Analysis
The Structure and Available Content of Commerce Control List
The Structure and Available Content of the Entity List and the Unverified List
The Structure and Available Content of EAR
The Structure and Available Content of Federal Register
文件类型 原文和可抽取名词术语实体(粗体) 用途
商业管制清单 Power generating or propulsion equipment specially designed 确定具体的受控物品
实体清单等“黑名单”数据 Beijing Aeronautical Manufacturing Technology Research Institute 识别实体的领域、地理位置等信息
出口管制条例 UNSC Resolutions 707 and 687 require that Iraq eliminate its nuclear weapons program and restrict its nuclear activities to the use of isotopes for medical 识别文件涉及的具体领域、产品、国家、决议等
联邦公报 or entering nuclear power plants—unless the license or card is issued by a State that meets the requirements set forth in the Act 识别文件涉及的具体领域、产品、国家等
The Sample of the Noun-Entity Extraction
匹配词性规则 合并后词性
NNP+NNP专有名词+专有名词 NNP 专有名词
NN(S)+NN(S)常用名词+常用名词 NNI 名词组合
NNI+NN名词组合+常用名词 NNI 名词组合
JJ+NN形容词或序数词+常用名词 NNI 名词组合
The Speech Rules
Noun Entity Recognition Process and Results
文件类型 原文和可抽取数值实体(粗体)
商业管制清单 A second-layer overlay error of less than 23 nm on the mask
出口管理条例 Test kits containing no more than 300 grams of any chemical
联邦公报 the technology is maturing, and is expected to be widely used at the 45nm technology node
The Sample of the Extraction of the Technical Indicator’s Value
Technology Index Recognition Process and Results
关系类型 引导词 含义 实例
包含 controlled 受控范围包含相关受控物品的范围 refurbishing of commodities controlled by ECCN 0A604 or for bombs
延伸 not controlled、except 受控范围不包含相关的受控物品 Smoke hand grenades and stun hand grenades not controlled by ECCN 1A984
参见 无controlled、except、not controlled等具体引导词 需要参考相关受控物品 0A018: See ECCN 0A919 for foreign-made military commodities
Types of Relationships Between the Controlled Items
The Process and Results of Recognizing the Correlative Relationship
受控类别产品 内容 相关度
3C992 光刻机抗蚀材料 0.899 7
3C002 光刻机抗蚀材料 0.557 9
3B001 制造半导体的设备(光刻机) 0.543 1
3B991 制造半导体的设备(光刻机) 0.519 5
The Results of Recognizing Semantic Relationship Between the Controlled ECCNs
The Method to Construct the Relationship Between the Controlled Items and the Controlled Entities
The Process and Results of Recognizingthe Entity
The Thermodynamic Chart for the Change of the Controlled Lithography in CCL
CCL 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019



人工
识别
370 370 370 350 350 350 350 350 350 350 350 245 245 245 245 245 245 <245, ≥15
<15, >1
<245, ≥15
<15, >1
<245, ≥15
<15, >1
<245, ≥15
<15, >1
<245, ≥15
<15, >1
<193, ≥15
<15, >1
本文
方法
370 370 370 350 350 350 350 350 350 350 350 245 245 245 245 245 245 245,15
15,1
245,15
15,1
245,15
15,1
245,15
15,1
245,15
15,1
193,15
15,1
MRF 人工
识别
/ / 700 / 500 500 500 350 / 180 180 180 180 180 180 95 95 95 95 45 45 45 45
本文
方法
/ / 700 / 500 500 500 350 / 180 180 180 180 180 180 95 95 95 95 45 45 45 45
Numerical Indexes Identified by Artificial Recognition and the Algorithm Proposed in This Paper (Unit: nm)
识别方法 识别得到受控物品
名词实体匹配/语义索引 3B001、3B991、3C002、3C992
关联关系识别 包含 3A001、3A991、3C001
延伸 3C003、3C004、3C005
Lithography Related Controls
识别方法 类别数 物品数 按类别召回率 按物品召回率
人工识别 15 291 100% 100%
机器识别 不识别同尾号 10 279 66.7% 95.9%
识别同尾号 14 283 93.3% 97.3%
The Recall for Controlled Items by Different Recognizing Methods
实体数量 可识别领域词数量 可识别率 识别准确率
全部 1 108 791 71.4% 31.7%
中国大陆 101 68 67.3% 83.8%
The Precision of Recognizing the Entity Domain
[1] 化柏林, 李广建. 大数据环境下的多源融合型竞争情报研究[J]. 情报理论与实践, 2015,38(4):1-5.
[1] ( Hua Bolin, Li Guangjian. Research on the Multi-source Fusion Competitive Intelligence Under the Environment of Big Data[J]. Information Studies: Theory & Application, 2015,38(4):1-5.)
[2] Tamada D, Achilleas P. Theory and Practice of Export Control: Balancing International Security and International Economic Relations[M]. Singapore: Springer, 2017.
[3] 彭爽, 张晓东. 论美国的出口管制体制[J]. 经济资料译丛, 2015(2):24-41.
[3] ( Peng Shuang, Zhang Xiaodong. Comments on American Export Control System[J]. Journal of Translation from Foreign Literature of Economics, 2015(2):24-41.)
[4] 彭爽, 曾国安. 美国出口管制政策的演变与启示[J]. 理论月刊, 2014(1):185-188.
[4] ( Peng Shuang, Zeng Guoan. The Evolution and Enlightenment of American Export Control Policy[J]. Theory Monthly, 2014(1):185-188.)
[5] 靳风. 美国出口管制体系概览[J]. 当代美国评论, 2018,2(2):117-120.
[5] ( Jin Feng. Overview of US Export Control System[J]. Contemporary American Review, 2018,2(2):117-120.)
[6] Krauland E J, 黄迎, Egan B. 揭秘美国出口管制黑名单[J]. 中国外汇, 2018(20):39-41.
[6] ( Krauland E J, Huang Ying, Egan B. Deciphering the US Export Control Blacklist[J]. China Forex, 2018(20):39-41.)
[7] 靖德果. 从瓦森纳及美国两用品清单看我国航天军民融合的发展[J]. 军民两用技术与产品, 2018(19):32-37.
[7] ( Jing Deguo. Analyzing the Development of Civil Military Integration in Chinese Aerospace Industry in View of the Control List of Dual-Use Goods from the Wassenaar Agreement and the United States[J]. Dual Use Technologies & Products, 2018(19):32-37.)
[8] 刘禹希. 美国对华航空航天技术出口管制政策体系研究[D]. 合肥: 中国科学技术大学, 2019.
[8] ( Liu Yuxi. Research on America Export Control Policy System of Aerospace Technology to China[D]. Hefei: University of Science and Technology of China, 2019.)
[9] 葛晓峰. 美国两用物项出口管制法律制度分析[J]. 国际经济合作, 2018(1):46-50.
[9] ( Ge Xiaofeng. The Analysis of the Legal System of Export Control of Dual-Use Items in the United States[J]. Journal of International Economic Cooperation, 2018(1):46-50.)
[10] 杨宇田, 陈峰. 列入美国技术出口管制部门受限名单的企事业单位分析[J]. 情报杂志, 2018,37(10):90-96.
[10] ( Yang Yutian, Chen Feng. Analysis of Entities Included in the Restricted List of U.S. Technology Export Control Department[J]. Journal of Intelligence, 2018,37(10):90-96.)
[11] 陆天驰, 闵超, 高伊林, 等. 竞争情报视角下的中美人工智能技术领域差距分析——以美国商品管制清单为例[J]. 情报杂志, 2019,38(11):25-33.
[11] ( Lu Tianchi, Min Chao, Gao Yilin, et al. An Analysis of the Gap of Artificial Intelligence Technology Between China and the United States from the Perspective of Competitive Intelligence: A Case Study of American Commodity Control List[J]. Journal of Intelligence, 2019,38(11):25-33.)
[12] Fellbaum C, Miller G. WordNet: An Electronic Lexical Database[M]. Cambridge, MA: MIT Press, 1998.
[13] Brown K. The Encyclopedia of Language and Linguistics[M]. Oxford: Elsevier, 2005.
[14] Miller G A. WordNet: A Lexical Database for English[J]. Communications of the ACM, 1995,38(11):39-41.
[15] Deerwester S, Dumais S T, Furnas G W, et al. Indexing by Latent Semantic Analysis[J]. Journal of the American Society for Information Science, 1990,41(6):391-407.
doi: 10.1002/(ISSN)1097-4571
[1] Shao Qi,Mu Dongmei,Wang Ping,Jin Chunyan. Identifying Subjects of Online Opinion from Public Health Emergencies[J]. 数据分析与知识发现, 2020, 4(9): 68-80.
[2] Ye Guanghui,Xu Tong. Dynamic City Profile Based on Evolutionary Analysis[J]. 数据分析与知识发现, 2020, 4(9): 100-110.
[3] Dai Jianhua, Deng Yubin. Extracting Emotion-Cause Pairs Based on Emotional Dilation Gated CNN[J]. 数据分析与知识发现, 2020, 4(8): 98-106.
[4] Yu Bengong, Zhu Mengdi. Question Classification Based on Bidirectional GRU with Hierarchical Attention and Multi-channel Convolution[J]. 数据分析与知识发现, 2020, 4(8): 50-62.
[5] Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[6] Wang Sili, Zhu Zhongming, Yang Heng, Liu Wei. Research on Automatic Identification of Hypernym-Hyponym Relations of Domain Concepts Based on Pattern and Projection Learning [J]. 数据分析与知识发现, 0, (): 1-.
[7] Weng Mengjuan,Yao Changqing,Han Hongqi,Wang Lijun,Ran Yaxin. Classification and Indexing Method with CNN for Imbalanced Datasets[J]. 数据分析与知识发现, 2020, 4(7): 87-95.
[8] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[9] Qiu Erli,He Hongwei,Yi Chengqi,Li Huiying. Research on Public Policy Support Based on Character-level CNN Technology[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[10] Wang Jiandong,Yu Shiyang. Principles on Constructing National Economic Brain[J]. 数据分析与知识发现, 2020, 4(7): 2-17.
[11] Xu Hongxia,Yu Qianqian,Qian Li. Studying Content Interaction Data with Topic Model and Sentiment Analysis[J]. 数据分析与知识发现, 2020, 4(7): 110-117.
[12] Li Keyu,Wang Hao,Gong Lijuan,Tang Huihui. Measurement and Distribution of Index Quality in Research Topics from Academic Databases[J]. 数据分析与知识发现, 2020, 4(6): 91-108.
[13] Wei Tingxin,Bai Wenlei,Qu Weiguang. Sense Prediction for Chinese OOV Based on Word Embedding and Semantic Knowledge[J]. 数据分析与知识发现, 2020, 4(6): 109-117.
[14] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[15] Jiao Qihang,Le Xiaoqiu. Generating Sentences of Contrast Relationship[J]. 数据分析与知识发现, 2020, 4(6): 43-50.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn