Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (11): 53-61    DOI: 10.11925/infotech.2096-3467.2017.0707
Orginal Article Current Issue | Archive | Adv Search |
Topic Representation Model Based on “Feature Dimensionality Reduction”
Liu Bingyao, Ma Jing(), Li Xiaofeng
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Download: PDF (2955 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study aims to solve the high-dimensional and sparse issues facing traditional large-scale corpus analysis methods. [Methods] First, we used the probability of co-occurrence to represent the mutual information between words, and extracted combination of words with values higher than the threshold. Then, we constructed the initial network with the third level entries based on syntactic structure. Finally, we developed the text complex network with the correction algorithm to express topic semantics. [Results] We retrieved 6,936 micro-blog posts from the trending topic of “global outbreak of network ransomware” as experiment corpus, and built a network model with 217 nodes and 2,019 sides. We also explored micro-blogging topics with the new model. [Limitations] More research is needed on the network node weight assignments in text complex networks. [Conclusions] The proposed model could effectively reduce the redundancy of network nodes, and improve the semantic expression of topic complex network.

Key wordsFeature Dimensionality Reduction      Text Complex Network      Topic Representation     
Received: 18 July 2017      Published: 27 November 2017
ZTFLH:  TP391.1  

Cite this article:

Liu Bingyao,Ma Jing,Li Xiaofeng. Topic Representation Model Based on “Feature Dimensionality Reduction”. Data Analysis and Knowledge Discovery, 2017, 1(11): 53-61.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0707     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I11/53

网络 1 2 3 4 5 6 7 8 9 10
G1 不到 打开 论文 小时
G2 多家 修补 教育网 勒索 比特币 黑客 论文 病毒 开机 植入
网络 N M <k> C L
G1 8 005 45 591 4.02 2.1×10-3 3.78
G2 217 2 019 18.96 0.420 2.914
GR 217 2 019 3.84 6.9×10-5 3.39
社区 关键词
社区1 勒索病毒; 爆发; 专家; 更新; 安全补丁; 操作系统
社区2 黑客; 勒索; 比特币; 人民币; 解锁; 攻击
社区3 病毒; 遭受; 毕业季; 高校; 论文; 教育网; 放缓
社区4 美国; 怒斥
[1] 刘海涛. 语言网络: 隐喻, 还是利器[J]. 浙江大学学报: 人文社会科学版, 2011, 41(2): 169-180.
[1] (Liu Haitao.Linguistic Networks: Metaphor or Tool[J]. Journal of Zhejiang University: Humanities and Social Sciences, 2011, 41(2): 169-180.)
[2] 马宏炜, 陆蓓, 谌志群. 微博语言的复杂网络特征研究[J]. 计算机工程与应用, 2015, 51(19): 119-124.
doi: 10.3778/j.issn.1002-8331.1309-0425
[2] (Ma Hongwei, Lu Bei, Chen Zhiqun.Research on Micro Blog Language Characteristics Based on Complex Net-work[J]. Computer Engineering and Applications, 2015, 51(19): 119-124.)
doi: 10.3778/j.issn.1002-8331.1309-0425
[3] 刘通. 基于复杂网络的文本关键词提取算法研究[J]. 计算机应用研究, 2016, 33(2): 365-369.
doi: 10.3969/j.issn.1001-3695.2016.02.010
[3] (Liu Tong.Algorithm Research of Text Key Work Extraction Based on Complex Networks[J]. Application Research of Computers, 2016, 33(2): 365-369.)
doi: 10.3969/j.issn.1001-3695.2016.02.010
[4] 杨志墨, 刘怀亮, 赵辉.一种基于复杂网络的中文文本表示算法[J]. 现代图书情报技术, 2014(11): 38-44.
[4] (Yang Zhimo, Liu Huailiang, Zhao Hui.An Algorithm of Chinese Text Representation Based on Complex Network[J]. New Technology of Library and Information Service, 2014(11): 38-44.)
[5] 詹志平, 杨小平. 一种基于复杂网络的短文本语义相似度计算[J]. 中文信息学报, 2016, 30(4): 71-80.
[5] (Zhan Zhiping, Yang Xiaoping.Measuring Semantic Similarity in Short Texts Through Complex Network[J]. Journal of Chinese Information Processing, 2016, 30(4): 71-80.)
[6] 张志远, 霍纬纲. 一种基于PL-LDA模型的主题文本网络构建方法[J]. 复杂系统与复杂性科学, 2017, 14(1): 52-57.
doi: 10.13306/j.1672-3813.2017.01.008
[6] (Zhang Zhiyuan, Huo Weigang.A Topic Text Network Construction Method Based on PL-LDA Model[J]. Complex Systems and Complexity Science, 2017, 14(1): 52-57.)
doi: 10.13306/j.1672-3813.2017.01.008
[7] Amancio D R, Aluisio S M, Oliveira O N, et al.Complex Networks Analysis of Language Complexity[J]. EPL, 2012, 100: 58002.
doi: 10.1209/0295-5075/100/58002
[8] Amancio D R. Network Analysis of Named Entity Interactions in Written Texts [OL]. Preprint arXiv, arXiv:1509.05281v1.
[9] Amancio D R.Probing the Topological Properties of Complex Networks Modeling Short Written Texts[J]. PLoS One, 2014, 10(2): e0118394.
doi: 10.1371/journal.pone.0118394 pmid: 25719799
[10] Amancio D R.Complex Networks Analysis of Manual and Machine Translations[J]. International Journal of Modern Physics C, 2008, 19(4): 583-598.
doi: 10.1142/S0129183108012285
[11] Kuramochi T, Okada N, Tanikawa K, et al.Applying to Twitter Networks of a Community Extraction Method Using Intersection Graph and Semantic Analysis [A] // Human-Computer Interaction. Users and Contexts of Use[M]. Springer Berlin Heidelberg, 2013: 314-323.
[12] Lim K W, Chen C, Buntine W. Twitter-Network Topic Model: A Full Bayesian Treatment for Social Network and Text Modeling [OL]. Preprint arXiv, arXiv:1609.06791v1.
[13] 汪小帆, 李翔, 陈关荣. 复杂网络理论及其应用[M]. 北京: 清华大学出版社, 2006.
[13] (Wang Xiaofan, Li Xiang, Chen Guanrong.Complex Network Theory and Its Applications [M]. Beijing: Tsinghua University Press, 2006.)
[14] 汪小帆. 复杂网络中的社团结构分析算法研究综述[J]. 复杂系统与复杂性科学, 2005, 2(3): 1-12.
doi: 10.3969/j.issn.1672-3813.2005.03.001
[14] (Wang Xiaofan.An Overview of Algorithms for Analyzing Community Structure in Complex Networks[J]. Complex Systems and Complexity Science, 2005, 2(3): 1-12.)
doi: 10.3969/j.issn.1672-3813.2005.03.001
[15] 刘知远, 孙茂松. 汉语词同现网络的小世界效应和无标度特性[J]. 中文信息学报, 2007, 21(6): 52-58.
[15] (Liu Zhiyuan, Sun Maosong.Chinese Word Co-occurrence Network: Its Small World Effect and Scale-free Property[J]. Journal of Chinese Information Processing, 2007, 21(6): 52-58.)
[16] Manning C D, Schütze H.Foundations of Statistical Natural Language Processing [M]. The MIT Press, 1999.
[17] 邬智慧. 中文微博的语体特征研究[D]. 武汉: 华中师范大学, 2012.
[17] (Wu Zhihui.The Research of the Chinese Micro-blog’s Linguistic Style[D]. Wuhan: Central China Normal University, 2012.)
[18] Newman M E, Girvan M.Finding and Evaluating Community Structure in Networks[J]. Physical Review E: Statistical Nonlinear & Soft Matter Physics, 2004, 69(2): 026113.
[1] Yu Chuanming,Yuan Sai,Zhu Xingyu,Lin Hongjun,Zhang Puliang,An Lu. Research on Deep Learning Based Topic Representation of Hot Events[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[2] Jiao Yan,Jing Ma,Kang Fang. Computing Text Semantic Similarity with Syntactic Network of Co-occurrence Distance[J]. 数据分析与知识发现, 2019, 3(12): 93-100.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn