Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (7-8): 78-86    DOI: 10.11925/infotech.1003-3513.2016.07.10
Orginal Article Current Issue | Archive | Adv Search |
Extracting Topic and Opinion from Microblog Posts with New Algorithm
Yao Zhaoxu(),Ma Jing
College of Economic and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Download: PDF(567 KB)   HTML ( 73
Export: BibTeX | EndNote (RIS)      

[Objective] This paper proposes an algorithm to extract topic and opinion information from the microblog posts automatically. [Methods] First, we used the improved TF-IDF algorithm to build the topic characteristic word vector. Second, we generated lexical chain for the topics based on the relevance among words of the vector. Finally, we extracted the topic and opinion information with the sentiment dictionary, and then generated the “topic+opinion” entries. [Results] We analyzed 24,598 Sina microblog posts of four trending events from June 2014 to June 2015 retrieved by a specially designed crawler. The precision and recall rates of the proposed method were 80.3% and 76.67%, respectively. [Limitations] The data size was small, the effect that the topic model extracted the feature about Weibo still required to be improved. [Conclusions] The proposed algorithm could effectively extract the “topic and opinion” information from micoblog posts.

Key wordsText mining      extraction      Topic model      Microblog topic     
Received: 28 January 2016      Published: 29 September 2016

Cite this article:

Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm. New Technology of Library and Information Service, 2016, 32(7-8): 78-86.

URL:     OR

[1] 中国互联网络信息中心. 第36次中国互联网络发展状况统计报告[R/OL]. .
[1] (China Internet Network Information Center. The 36th Statistical Report on the Network Development of China Internet [R/OL].
[2] 艾瑞咨询. 2014年中国微博用户行为研究报告[R/OL]. .
[2] (iResearch. The 2014 Research on China Weibo User Behavioral Report [R/OL].
[3] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87.
[3] (Hong Yu, Zhang Yu, Liu Ting, et al.Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[4] Becker H, Naaman M, Gravano L.Beyond Trending Topics: Real-World Event Identification on Twitter[C]. In: Proceedings of the 5th International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain. AAAI Press, 2011.
[5] Popescu A M, Etzioni O.Extracting Product Features and Opinions from Reviews[A]. // Natural Language Processing and Text Mining[M]. Springer London, 2007.
[6] Ritter A, Mausam, Etzioni O, et al.Open Domain Event Extraction from Twitter[C]. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2012.
[7] Blei D M, Ng A Y, Jordan M I, et al.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[8] Lin C H, He Y L.Joint Sentiment/Topic Model for Sentiment Analysis [C]. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 375-384.
[9] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63.
[9] (Tang Xiaobo, Xiang Kun.Topic Mining Based on LDA Model and Popularity of Weibo[J]. Library and Information Service, 2014, 58(5): 58-63.)
[10] Rosen-Zvi M, Griffiths T, Steyvers M, et al.The Author- Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2012.
[11] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802.
[11] (Zhang Chenyi, Sun Jianling, Ding Yiqun.Topic Mining for Microblog Based on MB-LDA Model[J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
[12] 寇宛秋, 李芳. 基于种子词汇的话题标签抽取研究[J]. 中文信息学报, 2013, 27(5): 114-121.
[12] (Kou Wanqiu, Li Fang.Topic Label Extraction Based on Seed Word[J]. Journal of Chinese Information Processing, 2013, 27(5): 114-121. )
[13] 钱哲怡, 李芳. 基于关键词和命名实体识别的新闻话题线索抽取[J]. 计算机应用与软件, 2011, 28(12): 168-171.
[13] (Qian Zheyi, Li Fang.Keyword and Name Entity Identification Based News Topic Thread Extraction[J]. Computer Applications and Software, 2011, 28(12): 168-171.)
[14] Hoffman M D, Blei D M, Bach F R.Online Learning for Latent Dirichlet Allocation[C]. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems. 2010.
[15] Ramage D, Hall D, Nallapati R, et al.Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora [C]. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore. 2009.
[16] Darling W, Song F.Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA[OL]. arXiv: 1303.2826.
[17] 闫泽华. 基于LDA的新闻线索抽取研究[D]. 上海: 上海交通大学, 2012.
[17] (Yan Zehua.News Threading Based on LDA Model[D]. Shanghai: Shanghai Jiaotong University, 2012.)
[18] 王宇阳. 基于本体进化的自适应中文话题跟踪算法研究[D]. 南京: 南京航空航天大学, 2013.
[18] (Wang Yuyang.Research on Algorithm of Adaptive Chinese Topic Tracking Based on Ontology Evolution [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2013.)
[19] 郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490.
[19] (Guo Yixiu, Lv Xueqiang, Li Zhuo.Burstyn Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490.)
[20] Kim S M, Hovy E.Determining the Sentiment of Opinions [C]. In: Proceedings of the 20th International Conference on Computational Linguistics. 2004.
[21] 陈建美. 中文情感词汇本体的构建及其应用[D]. 大连: 大连理工大学, 2008.
[21] (Chen Jianmei.The Construction and Application of Chinese Emotion Word Ontology [D]. Dalian: Dalian University of Technology, 2008.)
[1] Xiaofeng Li,Jing Ma,Chi Li,Hengmin Zhu. Identifying Commodity Names Based on XGBoost Model[J]. 数据分析与知识发现, 2019, 3(7): 34-41.
[2] Xiuxian Wen,Jian Xu. Research on Product Characteristics Extraction and Hedonic Price Based on User Comments[J]. 数据分析与知识发现, 2019, 3(7): 42-51.
[3] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[4] Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang. Visualizing Policy Texts Based on Multi-View Collaboration[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[5] Ruihua Qi,Junyi Zhou,Xu Guo,Caihong Liu. Extracting Book Review Topics with Knowledge Base[J]. 数据分析与知识发现, 2019, 3(6): 83-91.
[6] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[7] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[8] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[9] Yuemin Wu,Ganggui Ding,Bin Hu. Extracting Relationship of Agricultural Financial Texts with Attention Mechanism[J]. 数据分析与知识发现, 2019, 3(5): 86-92.
[10] Lu An,Yanping Liang. Selection of Users’ Behaviors Towards Different Topics of Microblog on Public Health Emergencies[J]. 数据分析与知识发现, 2019, 3(4): 33-41.
[11] Zhiqiang Liu,Yuncheng Du,Shuicai Shi. Extraction of Key Information in Web News Based on Improved Hidden Markov Model[J]. 数据分析与知识发现, 2019, 3(3): 120-128.
[12] Hongxia Xu,Chunwang Li. Review of Knowledge Extraction of Scientific Literature[J]. 数据分析与知识发现, 2019, 3(3): 14-24.
[13] Zhen Zhang,Jin Zeng. Extracting Keywords from User Comments: Case Study of Meituan[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
[14] Shengchun Ding,Linlin Hou,Ying Wang. Product Knowledge Map Construction Based on the E-commerce Data[J]. 数据分析与知识发现, 2019, 3(3): 45-56.
[15] Yue Yuan,Dongbo Wang,Shuiqing Huang,Bin Li. The Comparative Study of Different Tagging Sets on Entity Extraction of Classical Books[J]. 数据分析与知识发现, 2019, 3(3): 57-65.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938