Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (11): 53-61    DOI: 10.11925/infotech.2096-3467.2017.0707
Topic Representation Model Based on “Feature Dimensionality Reduction”
Liu Bingyao, Ma Jing(), Li Xiaofeng
College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
[Objective] This study aims to solve the high-dimensional and sparse issues facing traditional large-scale corpus analysis methods. [Methods] First, we used the probability of co-occurrence to represent the mutual information between words, and extracted combination of words with values higher than the threshold. Then, we constructed the initial network with the third level entries based on syntactic structure. Finally, we developed the text complex network with the correction algorithm to express topic semantics. [Results] We retrieved 6,936 micro-blog posts from the trending topic of “global outbreak of network ransomware” as experiment corpus, and built a network model with 217 nodes and 2,019 sides. We also explored micro-blogging topics with the new model. [Limitations] More research is needed on the network node weight assignments in text complex networks. [Conclusions] The proposed model could effectively reduce the redundancy of network nodes, and improve the semantic expression of topic complex network.

Key wordsFeature Dimensionality Reduction      Text Complex Network      Topic Representation     
Received: 18 July 2017      Published: 27 November 2017
ZTFLH:  TP391.1  

Liu Bingyao,Ma Jing,Li Xiaofeng. Topic Representation Model Based on “Feature Dimensionality Reduction”. Data Analysis and Knowledge Discovery, 2017, 1(11): 53-61.

网络 1 2 3 4 5 6 7 8 9 10
G1 不到 打开 论文 小时
G2 多家 修补 教育网 勒索 比特币 黑客 论文 病毒 开机 植入
网络 N M <k> C L
G1 8 005 45 591 4.02 2.1×10-3 3.78
G2 217 2 019 18.96 0.420 2.914
GR 217 2 019 3.84 6.9×10-5 3.39
社区 关键词
社区1 勒索病毒; 爆发; 专家; 更新; 安全补丁; 操作系统
社区2 黑客; 勒索; 比特币; 人民币; 解锁; 攻击
社区3 病毒; 遭受; 毕业季; 高校; 论文; 教育网; 放缓
社区4 美国; 怒斥
