[Objective] This paper proposes an algorithm to extract topic and opinion information from the microblog posts automatically. [Methods] First, we used the improved TF-IDF algorithm to build the topic characteristic word vector. Second, we generated lexical chain for the topics based on the relevance among words of the vector. Finally, we extracted the topic and opinion information with the sentiment dictionary, and then generated the “topic+opinion” entries. [Results] We analyzed 24,598 Sina microblog posts of four trending events from June 2014 to June 2015 retrieved by a specially designed crawler. The precision and recall rates of the proposed method were 80.3% and 76.67%, respectively. [Limitations] The data size was small, the effect that the topic model extracted the feature about Weibo still required to be improved. [Conclusions] The proposed algorithm could effectively extract the “topic and opinion” information from micoblog posts.
姚兆旭,马静. 面向微博话题的“主题+观点”词条抽取算法研究*[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm. New Technology of Library and Information Service, 2016, 32(7-8): 78-86.
(Hong Yu, Zhang Yu, Liu Ting, et al.Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[4]
Becker H, Naaman M, Gravano L.Beyond Trending Topics: Real-World Event Identification on Twitter[C]. In: Proceedings of the 5th International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain. AAAI Press, 2011.
[5]
Popescu A M, Etzioni O.Extracting Product Features and Opinions from Reviews[A]. // Natural Language Processing and Text Mining[M]. Springer London, 2007.
[6]
Ritter A, Mausam, Etzioni O, et al.Open Domain Event Extraction from Twitter[C]. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2012.
[7]
Blei D M, Ng A Y, Jordan M I, et al.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[8]
Lin C H, He Y L.Joint Sentiment/Topic Model for Sentiment Analysis [C]. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 375-384.
(Tang Xiaobo, Xiang Kun.Topic Mining Based on LDA Model and Popularity of Weibo[J]. Library and Information Service, 2014, 58(5): 58-63.)
[10]
Rosen-Zvi M, Griffiths T, Steyvers M, et al.The Author- Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2012.
(Zhang Chenyi, Sun Jianling, Ding Yiqun.Topic Mining for Microblog Based on MB-LDA Model[J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
(Qian Zheyi, Li Fang.Keyword and Name Entity Identification Based News Topic Thread Extraction[J]. Computer Applications and Software, 2011, 28(12): 168-171.)
[14]
Hoffman M D, Blei D M, Bach F R.Online Learning for Latent Dirichlet Allocation[C]. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems. 2010.
[15]
Ramage D, Hall D, Nallapati R, et al.Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora [C]. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore. 2009.
[16]
Darling W, Song F.Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA[OL]. arXiv: 1303.2826.
[17]
闫泽华. 基于LDA的新闻线索抽取研究[D]. 上海: 上海交通大学, 2012.
[17]
(Yan Zehua.News Threading Based on LDA Model[D]. Shanghai: Shanghai Jiaotong University, 2012.)
[18]
王宇阳. 基于本体进化的自适应中文话题跟踪算法研究[D]. 南京: 南京航空航天大学, 2013.
[18]
(Wang Yuyang.Research on Algorithm of Adaptive Chinese Topic Tracking Based on Ontology Evolution [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2013.)
(Guo Yixiu, Lv Xueqiang, Li Zhuo.Burstyn Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490.)
[20]
Kim S M, Hovy E.Determining the Sentiment of Opinions [C]. In: Proceedings of the 20th International Conference on Computational Linguistics. 2004.
[21]
陈建美. 中文情感词汇本体的构建及其应用[D]. 大连: 大连理工大学, 2008.
[21]
(Chen Jianmei.The Construction and Application of Chinese Emotion Word Ontology [D]. Dalian: Dalian University of Technology, 2008.)