Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (2/3): 239-248    DOI: 10.11925/infotech.2096-3467.2019.0550
Current Issue | Archive | Adv Search |
Real-time Analysis Model for Short Texts with Relationship Graph of Domain Semantics
Tian Zhonglin1,2,Wu Xu1,2,3(),Xie Xiaqing1,2,Xu Jin1,2,Lu Yueming1,2
1School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
2Key Laboratory of Trustworthy Distributed Computing and Service (BUPT), Ministry of Education, Beijing 100876, China
3Beijing University of Posts and Telecommunications Library, Beijing 100876, China
Download: PDF (1057 KB)   HTML ( 20
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper studies the domain discrimination for public opinions of online communities, aiming to improve knowledge base, as well as the effectiveness of the machine learning models.[Methods] We retrieved 478,303 pieces of textual data from multiple online communities for college students. Then, we created a semantic relationship graph with a total of 5,248 nodes and 16,488 edges, which could also be extended automatically. Finally, we proposed a short text analysis model to conduct domain analysis for the texts.[Results] The F value of the proposed model reached 83.94%, which was 8.56%, 5.97% and 4.27% higher than those of the SVM, NB and CNN methods.[Limitations] The sample size needs to be expanded and the parameter feedback mechanism needs to be modified.[Conclusions] Compared with methods based on machine learning, the proposed model’s accuracy is improved. It could also conduct real-time analysis.

Key wordsSemantic Relation Graph      Text Analysis      Semantic Computation     
Received: 24 May 2019      Published: 26 April 2020
ZTFLH:  TP391  
Corresponding Authors: Xu Wu     E-mail: wux@bupt.edu.cn

Cite this article:

Tian Zhonglin,Wu Xu,Xie Xiaqing,Xu Jin,Lu Yueming. Real-time Analysis Model for Short Texts with Relationship Graph of Domain Semantics. Data Analysis and Knowledge Discovery, 2020, 4(2/3): 239-248.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.0550     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I2/3/239

System Architecture
词语 自然属性 领域专属属性
POS Engilsh 关注指数 类型
偷窃 v steal 8 PS(财产安全)
小偷 n thief 6 PS(财产安全)
Semantic Node Property
Examples of Semantic
关系起始项 关系终止项 语义关系
食堂 地点关系
今天中午 时间关系
苍蝇 主动关系
苍蝇 恶心 因果关系
恶心 并列关系、近义关系
恶心 并列关系、近义关系
恶心 领导 目的关系
领导 目的关系
领导 管理 主动关系
Semantic Relation
The Process of Building Semantic Diagrams
Part of the Semantic Diagram of University Public Opinion
数据集 领域相关文本数量 文本总数量
训练集 7 350 20 000
测试集 6 840 20 000
The Statistics of Experimental Data Set
The Accuracy of Different Threshold
测试方法 P(%) R(%) F(%)
SVM 78.23 72.35 75.18
NB 76.36 79.24 77.77
CNN 80.64 78.35 79.47
本文短文本实时分析方法 84.32 83.12 83.74
The Results of the Accuracy Test
测试方法 数据流量(篇/秒) 延迟时间(秒)
实时主题检测TopicSketch 50 0.71±0.5
本文短文本实时分析方法 22 1.36±0.4
The Results of the Timeliness Test
[1] 左蒙, 李昌祖 . 网络舆情研究综述:从理论研究到实践应用[J]. 情报杂志, 2017,36(10):71-78,140.
[1] ( Zuo Meng, Li Changzu . A Review of Network Public Opinion: from Theoretical Research to Practical Application[J]. Journal of Intelligence, 2017,36(10):71-78, 140.)
[2] 丁诗晴 . 基于在线网站评论的中文文本挖掘[D]. 武汉:华中科技大学, 2016.
[2] ( Ding Shiqing . Chinese Text Mining Based on Online Customer Review[D]. Wuhan: Huazhong University of Science & Technology, 2016.)
[3] 张璐 . 基于情感计算的网络社区舆情分析预警技术研究[D]. 北京:北京邮电大学, 2018.
[3] ( Zhang Lu . Analysis and Early Warning Technology Research Based on Affective Computing in Online Community[D]. Beijing: Beijing University of Posts and Telecommunications, 2018.)
[4] 严仲培 . 面向旅游在线评论的文本挖掘方法研究[D]. 合肥:合肥工业大学, 2018.
[4] ( Yan Zhongpei . Research on the Method of Text Mining for Travel Online Comments[D]. Hefei: Hefei University of Technology, 2018.)
[5] 杨郁琪 . 基于文本挖掘的用户满意度影响因素研究[D]. 太原:中北大学, 2018.
[5] ( Yang Yuqi . Study on the Influencing Factors of User Satisfaction Based on Text Mining[D]. Taiyuan: North University of China, 2018.)
[6] 范宁 . 基于文本挖掘在民宿满意度中的研究[D]. 桂林:广西师范大学, 2019.
[6] ( Fan Ning . Research on Satisfaction of Homestay Based on Text Mining[D]. Guilin: Guangxi Normal University, 2018.)
[7] Ramanathan V, Meyyappan T . Twitter Text Mining for Sentiment Analysis on People’s Feedback About Oman Tourism [C]// Proceedings of the 4th MEC International Conference on Big Data and Smart City (ICBDSC), Muscat, Oman. 2019.
[8] 李丽蓉 . 网络舆情分析系统中关键技术研究[J]. 山西警察学院学报, 2019,27(1):43-46.
[8] ( Li Lirong . Research on Key Technologies in Network Public Opinion Analysis System[J]. Journal of Shanxi Police College, 2019,27(1):43-46.)
[9] Ramadhani A M, Goo H S . Twitter Sentiment Analysis Using Deep Learning Methods [C]// Proceedings of the 7th International Annual Engineering Seminar (InAES), Yogyakarta, Indonesia. 2017.
[10] Halibas A S, Shaffi A S, Mohamed M A K V . Application of Text Classification and Clustering of Twitter Data for Business Analytics [C]// Proceedings of the 2018 Majan International Conference (MIC), Muscat, Oman. 2018.
[11] 张祥 . 面向政务需求的网络舆情分析方法研究[D]. 成都:电子科技大学, 2017.
[11] ( Zhang Xiang . Research on Public Opinion Analysis Method of the Network for the Needs of Government[D]. Chengdu: University of Electronic Science and Technology of China, 2017.)
[12] 张健立 . 一种基于语义关系图的词义消歧算法[J]. 科技通报, 2015,31(3):228-232,257.
[12] ( Zhang Jianli . Word Sense Disambiguation Algorithm Based on Semantic Relation Graph[J]. Bulletin of Science and Technology, 2015,31(3):228-232,257.)
[13] 张仰森, 郑佳, 李佳媛 . 一种基于语义关系图的词语语义相关度计算模型[J]. 自动化学报, 2018,44(1):87-98.
[13] ( Zhang Yangsen, Zheng Jia, Li Jiayuan . A Model for Calculating Semantic Relatedness of Words Considering Semantic Relationship Graph[J]. Acta Automatica Sinica, 2018,44(1):87-98.)
[14] 王宏显, 周强, 邬晓钧 . 《知网》语义关系图的自动构建[J]. 中文信息学报, 2008,22(5):90-96.
[14] ( Wang Hongxian, Zhou Qiang, Wu Xiaojun . The Automatic Construction of Lexical Semantic Graph Based on HowNet[J]. Journal of Chinese Information Processing, 2008,22(5):90-96.)
[15] 王知津, 郑悦萍 . 图书馆工作与研究[J].图书馆工作与研究, 2013(11):13-19.
[15] ( Wang Zhijin, Zheng Yueping . The Concepts and Types of Semantic Relations in Information Organization[J]. Library Work and Study,2013(11):13-19.)
[16] Xie W, Zhu F, Jiang J , et al. Topicsketch: Real-time Bursty Topic Detection from Twitter[J]. IEEE Transactions on Knowledge and Data Engineering, 2016,28(8):2216-2229.
[1] Chen Jun,Liang Hao,Qian Chen. Studying Investment Decisions of Rewarded Crowdfunding Users with Emotional Distance and Text Analysis[J]. 数据分析与知识发现, 2021, 5(4): 60-71.
[2] Hyonil Kim,Ou Shiyan. Identifying Citation Texts with Unsupervised Method[J]. 数据分析与知识发现, 2021, 5(1): 66-77.
[3] Jiang Wu,Yinghui Zhao,Jiahui Gao. Research on Weibo Opinion Leaders Identification and Analysis in Medical Public Opinion Incidents[J]. 数据分析与知识发现, 2019, 3(4): 53-62.
[4] Chengzhi Zhang,Zheng Li. Extracting Sentences of Research Originality from Full Text Academic Articles[J]. 数据分析与知识发现, 2019, 3(10): 12-18.
[5] Yu Yan,Zhao Naixuan. Weighted Topic Model for Patent Text Analysis[J]. 数据分析与知识发现, 2018, 2(4): 81-89.
[6] Ma Tianyi,Zhang Pengzhu,Feng Haoyin. Knowledge Requirement Model for Online Outsourcing Tasks[J]. 现代图书情报技术, 2016, 32(3): 74-81.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn