[Objective] This study aims to solve the high-dimensional and sparse issues facing traditional large-scale corpus analysis methods. [Methods] First, we used the probability of co-occurrence to represent the mutual information between words, and extracted combination of words with values higher than the threshold. Then, we constructed the initial network with the third level entries based on syntactic structure. Finally, we developed the text complex network with the correction algorithm to express topic semantics. [Results] We retrieved 6,936 micro-blog posts from the trending topic of “global outbreak of network ransomware” as experiment corpus, and built a network model with 217 nodes and 2,019 sides. We also explored micro-blogging topics with the new model. [Limitations] More research is needed on the network node weight assignments in text complex networks. [Conclusions] The proposed model could effectively reduce the redundancy of network nodes, and improve the semantic expression of topic complex network.
刘冰瑶, 马静, 李晓峰. 一种“特征降维”文本复杂网络的话题表示模型*[J]. 数据分析与知识发现, 2017, 1(11): 53-61.
Liu Bingyao,Ma Jing,Li Xiaofeng. Topic Representation Model Based on “Feature Dimensionality Reduction”. Data Analysis and Knowledge Discovery, 2017, 1(11): 53-61.
(Ma Hongwei, Lu Bei, Chen Zhiqun.Research on Micro Blog Language Characteristics Based on Complex Net-work[J]. Computer Engineering and Applications, 2015, 51(19): 119-124.)
doi: 10.3778/j.issn.1002-8331.1309-0425
(Liu Tong.Algorithm Research of Text Key Work Extraction Based on Complex Networks[J]. Application Research of Computers, 2016, 33(2): 365-369.)
doi: 10.3969/j.issn.1001-3695.2016.02.010
(Yang Zhimo, Liu Huailiang, Zhao Hui.An Algorithm of Chinese Text Representation Based on Complex Network[J]. New Technology of Library and Information Service, 2014(11): 38-44.)
(Zhan Zhiping, Yang Xiaoping.Measuring Semantic Similarity in Short Texts Through Complex Network[J]. Journal of Chinese Information Processing, 2016, 30(4): 71-80.)
(Zhang Zhiyuan, Huo Weigang.A Topic Text Network Construction Method Based on PL-LDA Model[J]. Complex Systems and Complexity Science, 2017, 14(1): 52-57.)
doi: 10.13306/j.1672-3813.2017.01.008
[7]
Amancio D R, Aluisio S M, Oliveira O N, et al.Complex Networks Analysis of Language Complexity[J]. EPL, 2012, 100: 58002.
doi: 10.1209/0295-5075/100/58002
[8]
Amancio D R. Network Analysis of Named Entity Interactions in Written Texts [OL]. Preprint arXiv, arXiv:1509.05281v1.
[9]
Amancio D R.Probing the Topological Properties of Complex Networks Modeling Short Written Texts[J]. PLoS One, 2014, 10(2): e0118394.
doi: 10.1371/journal.pone.0118394
pmid: 25719799
[10]
Amancio D R.Complex Networks Analysis of Manual and Machine Translations[J]. International Journal of Modern Physics C, 2008, 19(4): 583-598.
doi: 10.1142/S0129183108012285
[11]
Kuramochi T, Okada N, Tanikawa K, et al.Applying to Twitter Networks of a Community Extraction Method Using Intersection Graph and Semantic Analysis [A] // Human-Computer Interaction. Users and Contexts of Use[M]. Springer Berlin Heidelberg, 2013: 314-323.
[12]
Lim K W, Chen C, Buntine W. Twitter-Network Topic Model: A Full Bayesian Treatment for Social Network and Text Modeling [OL]. Preprint arXiv, arXiv:1609.06791v1.
[13]
汪小帆, 李翔, 陈关荣. 复杂网络理论及其应用[M]. 北京: 清华大学出版社, 2006.
[13]
(Wang Xiaofan, Li Xiang, Chen Guanrong.Complex Network Theory and Its Applications [M]. Beijing: Tsinghua University Press, 2006.)
(Wang Xiaofan.An Overview of Algorithms for Analyzing Community Structure in Complex Networks[J]. Complex Systems and Complexity Science, 2005, 2(3): 1-12.)
doi: 10.3969/j.issn.1672-3813.2005.03.001
(Liu Zhiyuan, Sun Maosong.Chinese Word Co-occurrence Network: Its Small World Effect and Scale-free Property[J]. Journal of Chinese Information Processing, 2007, 21(6): 52-58.)
[16]
Manning C D, Schütze H.Foundations of Statistical Natural Language Processing [M]. The MIT Press, 1999.
[17]
邬智慧. 中文微博的语体特征研究[D]. 武汉: 华中师范大学, 2012.
[17]
(Wu Zhihui.The Research of the Chinese Micro-blog’s Linguistic Style[D]. Wuhan: Central China Normal University, 2012.)
[18]
Newman M E, Girvan M.Finding and Evaluating Community Structure in Networks[J]. Physical Review E: Statistical Nonlinear & Soft Matter Physics, 2004, 69(2): 026113.