Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (6): 1-14    DOI: 10.11925/infotech.2096-3467.2019.1145
CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning
Liang Ye1,2,Li Xiaoyuan3,Xu Hang2(),Hu Yiran2
1Artificial Intelligence and Human Languages Lab, Beijing Foreign Studies University, Beijing 100089, China
2School of Information Science and Technology, Beijing Foreign Studies University, Beijing 100089, China
3School of Asian Studies, Beijing Foreign Studies University, Beijing 100089, China
[Objective] This paper explores the relationship of information mapping among different languages, aiming to effectively monitor public opinion around the world and guide domestic audience effectively. [Methods] We proposed CLOpin, a cross-linguistic knowledge-mapping framework in the field of public opinion analysis and early warning. The platform developed several toolsets for different scenarios to process cross-linguistic data sets. CLOpin could integrate data from various sources efficiently and construct a knowledge graph to implement cross-linguistic public opinion analysis and early warning. [Results] Within the first hour following breaking news, the knowledge integrity of our model was 13.9% higher than that of the single language knowledge graph models. Our model’s knowledge integrity was 5.2% lower than that of the latter in 24 hours. [Limitations] The construction of our model was constrained by the scarcity of domain experts, which is the bottleneck for the knowledge graph of non-common language. [Conclusions] The CLOpin framework help us accurately grasp public opinion and early warning accordingly.

Key wordsCross-Lingual      Knowledge Graph      Public Opinion Analysis      Early Warning      Machine Learning     
Received: 18 October 2019      Published: 07 July 2020
Xu Hang

Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning. Data Analysis and Knowledge Discovery, 2020, 4(6): 1-14.

Relationship Among the Three Knowledge Graphs
架构名称 输入数据源 是否引入专家工具集与机器学习和深度学习方法相结合 输出
CLOpin ①从结构化实例转换的RDF数据(英语和非英语)
XLore 非结构化数据 CKG、IKG
WikiCiKE 结构化数据 CKG
ConceptNet5.5 结构化数据、非结构化数据及专家先验知识 CKG
DBpedia NIF 非结构化数据 A corpus
EventKG 结构化数据、非结构化数据 CKG
Body-Mind-Language Europarl corpus CKG
CrossOIE 结构化数据 A classifier
Frameworks of Multi-Lingual Knowledge Graph
Framework of CLOpin
Generation of CUOL
概念识别码 词汇识别码 字符串识别码 词源识别码

S0008563 A0008123
特朗普(汉藏语系) 特朗普(汉语)
S0008548 A0009306
K?p(Undetermined) K?p(越南语)
S0008521 A0008966
(Undetermined) (老挝语)
S0005623 A0001452
Trump(印欧语系) Trump(英语)
S0004578 A0007896
Trompete(印欧语系) Trompete(葡萄牙语)
Conceptual Characteristics
概念语料 中文释义 唯一识别码
terrorist attack 恐怖袭击 C0008532
blast 爆炸 C0008745
casualities 受害者 C0005241
Expert Corpus Samples in the Process of Fusion
输入材料 抽取的概念 新词
恐怖分子承认了这一行动,受害者人数可能会增加。爆炸对周围的商店造成了巨大的破坏。 1.恐怖分子:Concept: [C0005622] terrorist
2.爆炸:Concept: [C0008745] blast
3.受害者:Concept: [C0005241] casualities
恐怖分子:Concept: [C0005622]
Examples of Word Discovery
Concept and Relationship Fusion Subsystem
CUI String Source
C0005896 特朗普 汉语媒体
Trump 英语媒体
Conceptual Fusion Results
Construction Process of IKG
模式类型 基于模式的抽取规则
事件发生时间 情况出现在****
事件导致后果 本次事件造成****
Rule Base in Entity Extraction
Entity and Relationship Fusion Process
Process of Clustering Using Canopy+K-means
关系类型 主语 关系 宾语
两个概念之间的关系 C0008532
避开 C0001235(安检)
两个实例之间的关系 I0008745
导致 I0005241(受害者出现)
Triad Example
Results of Entity and Relationship Extraction
Results of Cross-Lingual Fusion
事件编号 事件名称 发生时间 汉语 英语 德语 印尼语 越南语
11468 印尼海啸 2018/9/30 42 24 9 265 5
11793 沙特记者被肢解事件 2018/10/2 21 33 17 1 2
14854 法国“黄背心”活动 2018/11/17 34 18 30 6 4
15298 俄罗斯扣押乌克兰军舰事件 2018/11/25 15 42 26 2 5
17583 嫦娥四号月背探测事件 2019/1/3 213 8 6 4 3
18820 美国退出《中导条约》事件 2019/2/1 8 23 13 8 2
20136 索马里首都恐怖袭击事件 2019/3/1 11 18 10 0 3
21033 埃航波音客机坠毁事件 2019/3/10 78 36 19 5 6
21812 新西兰清真寺枪击事件 2019/3/15 39 15 11 1 2
23515 巴黎圣母院火灾事件 2019/4/15 53 27 23 4 3
The Number of News for the Same Event in Several Languages
事件编号 事件名称 信息点数量(1小时) 信息点数量(24小时)
单语言平均值 跨语言复合值
11468 印尼海啸 26 29 30
11793 沙特记者被肢解事件 18 22 25
14854 法国“黄背心”活动 12 15 16
15298 俄罗斯扣押乌克兰军舰事件 26 28 29
17583 嫦娥四号月背探测事件 68 69 71
18820 美国退出《中导条约》事件 19 21 21
20136 索马里首都恐怖袭击事件 14 17 18
21033 埃航波音客机坠毁事件 37 42 45
21812 新西兰清真寺枪击事件 35 39 40
23515 巴黎圣母院火灾事件 42 48 51
Fusion Effects of Cross-Lingual and Mono-Lingual Information
