Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (7-8): 78-86    DOI: 10.11925/infotech.1003-3513.2016.07.10
Orginal Article Current Issue | Archive | Adv Search |
Extracting Topic and Opinion from Microblog Posts with New Algorithm
Yao Zhaoxu(),Ma Jing
College of Economic and Management, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes an algorithm to extract topic and opinion information from the microblog posts automatically. [Methods] First, we used the improved TF-IDF algorithm to build the topic characteristic word vector. Second, we generated lexical chain for the topics based on the relevance among words of the vector. Finally, we extracted the topic and opinion information with the sentiment dictionary, and then generated the “topic+opinion” entries. [Results] We analyzed 24,598 Sina microblog posts of four trending events from June 2014 to June 2015 retrieved by a specially designed crawler. The precision and recall rates of the proposed method were 80.3% and 76.67%, respectively. [Limitations] The data size was small, the effect that the topic model extracted the feature about Weibo still required to be improved. [Conclusions] The proposed algorithm could effectively extract the “topic and opinion” information from micoblog posts.

Key wordsText mining      extraction      Topic model      Microblog topic     
Received: 28 January 2016      Published: 29 September 2016

Cite this article:

Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm. New Technology of Library and Information Service, 2016, 32(7-8): 78-86.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.07.10     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I7-8/78

[1] 中国互联网络信息中心. 第36次中国互联网络发展状况统计报告[R/OL]. .
[1] (China Internet Network Information Center. The 36th Statistical Report on the Network Development of China Internet [R/OL].
[2] 艾瑞咨询. 2014年中国微博用户行为研究报告[R/OL]. .
[2] (iResearch. The 2014 Research on China Weibo User Behavioral Report [R/OL].
[3] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87.
[3] (Hong Yu, Zhang Yu, Liu Ting, et al.Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[4] Becker H, Naaman M, Gravano L.Beyond Trending Topics: Real-World Event Identification on Twitter[C]. In: Proceedings of the 5th International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain. AAAI Press, 2011.
[5] Popescu A M, Etzioni O.Extracting Product Features and Opinions from Reviews[A]. // Natural Language Processing and Text Mining[M]. Springer London, 2007.
[6] Ritter A, Mausam, Etzioni O, et al.Open Domain Event Extraction from Twitter[C]. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2012.
[7] Blei D M, Ng A Y, Jordan M I, et al.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[8] Lin C H, He Y L.Joint Sentiment/Topic Model for Sentiment Analysis [C]. In: Proceeding of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 375-384.
[9] 唐晓波, 向坤. 基于LDA模型和微博热度的热点挖掘[J]. 图书情报工作, 2014, 58(5): 58-63.
[9] (Tang Xiaobo, Xiang Kun.Topic Mining Based on LDA Model and Popularity of Weibo[J]. Library and Information Service, 2014, 58(5): 58-63.)
[10] Rosen-Zvi M, Griffiths T, Steyvers M, et al.The Author- Topic Model for Authors and Documents [C]. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2012.
[11] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802.
[11] (Zhang Chenyi, Sun Jianling, Ding Yiqun.Topic Mining for Microblog Based on MB-LDA Model[J]. Journal of Computer Research and Development, 2011, 48(10): 1795-1802.)
[12] 寇宛秋, 李芳. 基于种子词汇的话题标签抽取研究[J]. 中文信息学报, 2013, 27(5): 114-121.
[12] (Kou Wanqiu, Li Fang.Topic Label Extraction Based on Seed Word[J]. Journal of Chinese Information Processing, 2013, 27(5): 114-121. )
[13] 钱哲怡, 李芳. 基于关键词和命名实体识别的新闻话题线索抽取[J]. 计算机应用与软件, 2011, 28(12): 168-171.
[13] (Qian Zheyi, Li Fang.Keyword and Name Entity Identification Based News Topic Thread Extraction[J]. Computer Applications and Software, 2011, 28(12): 168-171.)
[14] Hoffman M D, Blei D M, Bach F R.Online Learning for Latent Dirichlet Allocation[C]. In: Proceedings of the 24th Annual Conference on Neural Information Processing Systems. 2010.
[15] Ramage D, Hall D, Nallapati R, et al.Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-labeled Corpora [C]. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore. 2009.
[16] Darling W, Song F.Probabilistic Topic and Syntax Modeling with Part-of-Speech LDA[OL]. arXiv: 1303.2826.
[17] 闫泽华. 基于LDA的新闻线索抽取研究[D]. 上海: 上海交通大学, 2012.
[17] (Yan Zehua.News Threading Based on LDA Model[D]. Shanghai: Shanghai Jiaotong University, 2012.)
[18] 王宇阳. 基于本体进化的自适应中文话题跟踪算法研究[D]. 南京: 南京航空航天大学, 2013.
[18] (Wang Yuyang.Research on Algorithm of Adaptive Chinese Topic Tracking Based on Ontology Evolution [D]. Nanjing: Nanjing University of Aeronautics and Astronautics, 2013.)
[19] 郭跇秀, 吕学强, 李卓. 基于突发词聚类的微博突发事件检测方法[J]. 计算机应用, 2014, 34(2): 486-490.
[19] (Guo Yixiu, Lv Xueqiang, Li Zhuo.Burstyn Topics Detection Approach on Chinese Microblog Based on Burst Words Clustering[J]. Journal of Computer Applications, 2014, 34(2): 486-490.)
[20] Kim S M, Hovy E.Determining the Sentiment of Opinions [C]. In: Proceedings of the 20th International Conference on Computational Linguistics. 2004.
[21] 陈建美. 中文情感词汇本体的构建及其应用[D]. 大连: 大连理工大学, 2008.
[21] (Chen Jianmei.The Construction and Application of Chinese Emotion Word Ontology [D]. Dalian: Dalian University of Technology, 2008.)
[1] Wang Yifan,Li Bo,Shi Hua,Miao Wei,Jiang Bin. Annotation Method for Extracting Entity Relationship from Ancient Chinese Works[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[2] Ma Jiangwei, Lv Xueqiang, You Xindong, Xiao Gang, Han Junmei. Extracting Relationship Among Military Domains with BERT and Relation Position Features[J]. 数据分析与知识发现, 2021, 5(8): 1-12.
[3] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[4] Chai Qingfeng, Shi Linyan, Mei Shan, Xiong Haitao, He Huixin. Extracting Knowledge Elements of Sci-Tech Literature Based on Artificial and Machine Features[J]. 数据分析与知识发现, 2021, 5(8): 132-144.
[5] Tan Ying, Tang Yifei. Extracting Citation Contents with Coreference Resolution[J]. 数据分析与知识发现, 2021, 5(8): 25-33.
[6] Zhang Jiandong, Chen Shiji, Xu Xiaoting, Zuo Wenge. Extracting PDF Tables Based on Word Vectors[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[7] Yu Xuehan, He Lin, Xu Jian. Extracting Events from Ancient Books Based on RoBERTa-CRF[J]. 数据分析与知识发现, 2021, 5(7): 26-35.
[8] Zhao Danning,Mu Dongmei,Bai Sen. Automatically Extracting Structural Elements of Sci-Tech Literature Abstracts Based on Deep Learning[J]. 数据分析与知识发现, 2021, 5(7): 70-80.
[9] Chen Xingyue, Ni Liping, Ni Zhiwei. Extracting Financial Events with ELECTRA and Part-of-Speech[J]. 数据分析与知识发现, 2021, 5(7): 36-47.
[10] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[11] Xu Guang,Ren Ming,Song Chengyu. Extracting China’s Economic Image from Western News[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[12] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[13] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[14] Shi Xiang,Liu Ping. Extraction and Representation of Domain Knowledge with Semantic Description Model and Knowledge Elements——Case Study of Information Retrieval[J]. 数据分析与知识发现, 2021, 5(4): 123-133.
[15] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn