Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (10): 70-80    DOI: 10.11925/infotech.1003-3513.2016.10.08
Orginal Article Current Issue | Archive | Adv Search |
Identifying Food Topics from User-Generated Contents in Microblogs
Zhang Xiaoyong1,2,Zhou Qingqing1,2,Zhang Chengzhi1,2,3()
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2 Alibaba Research Center for Complex Sciences, Hangzhou Normal University, Hangzhou 311121, China
3 Jiangsu Key Laboratory of Data Engineering and Knowledge Service (Nanjing University), Nanjing 210093, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study aims to identify microblog post topics, and then automatically extract high quality ones with the help of text clustering techniques. [Methods] We collected food related microblog posts from Sina Weibo as raw data, then applied text clustering and deep learning techniques to detect the target topics. First, we categorized the microblog posts by the four seasons in accordance with their publishing dates. Second, we created a vector space model and used text clustering method to retrieve candidate topics. Finally, we automatically identified the quality topics with deep learning technology. [Results] We automatically identified the high quality topics manually found by researchers, and their topic coverage values were all higher than 0.5. [Limitations] We decided the topic quality based on qualitative data. [Conclusions] The proposed method could extract high quality topics effectively. The retrieved topics reflect the distribution of food related microblog posts in the four seasons.

Key wordsTopic detection      User-Generated Contents      Topic coverage      Food mining     
Received: 26 May 2016      Published: 23 November 2016

Cite this article:

Zhang Xiaoyong,Zhou Qingqing,Zhang Chengzhi. Identifying Food Topics from User-Generated Contents in Microblogs. New Technology of Library and Information Service, 2016, 32(10): 70-80.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.10.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I10/70

[1] 中国互联网络信息中心.第37次中国互联网络发展状况统计报告[R/OL]. (2016-01-22). [2016-05-25]. .
[1] (China Internet Network Information Center. The 37th Report of Chinese Internet Development [R/OL]. (2016-01-22). [2016-05-25].
[2] 殷风景, 肖卫东, 葛斌, 等. 一种面向网络话题发现的增量文本聚类算法[J]. 计算机应用研究, 2011, 28(1): 54-57.
[2] (Yin Fengjing, Xiao Weidong, Ge Bin, et al.Incremental Algorithm for Clustering Texts in Internet-oriented Topic Detection[J]. Application Research of Computers, 2011, 28(1): 54-57.)
[3] 王伟, 许鑫. 基于聚类的网络舆情热点发现及分析[J]. 现代图书情报技术, 2009(3): 74-79.
[3] (Wang Wei, Xu Xin.Online Public Opinion Hotspot Detection and Analysis Based on Document Clustering[J]. New Technology of Library & Information Service, 2009(3): 74-79.)
[4] 徐东亮. 基于聚类分析的网络论坛舆情信息挖掘技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2010.
[4] (Xu Dongliang.Research of Public Opinion Information Mining on Bulletin Board Systems Based on Cluster Analysis[D]. Harbin: Harbin Institute of Technology, 2010.)
[5] 朱恒民, 李青. 面向话题衍生性的微博网络舆情传播模型研究[J]. 现代图书情报技术, 2012(5): 60-64.
[5] (Zhu Hengmin, Li Qing.Public Opinion Propagation Model with Topic Derivatives in the Micro-blog Network[J]. New Technology of Library & Information Service, 2012(5): 60-64.)
[6] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87.
[6] (Hong Yu, Zhang Yu, Liu Ting, et al.Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[7] Allan J, Carbonell J, Doddington G, et al.Topic Detection and Tracking Pilot Study Final Report[C]. In: Proceedings of the 1998 Broadcast News Transcription and Understanding Workshop. 1998.
[8] 路荣, 项亮, 刘明荣, 等. 基于隐主题分析和文本聚类的微博客中新闻话题的发现[J]. 模式识别与人工智能, 2012, 25(3): 382-387.
[8] (Lu Rong, Xiang Liang, Liu Mingrong, et al.Discovering News Topics from Microblogs Based on Hidden Topics Analysis and Text Clustering[J]. Pattern Recognition & Artificial Intelligence, 2012, 25(3): 382-387.)
[9] 骆卫华, 刘群, 程学旗. 话题检测与跟踪技术的发展与研究[C]. 见: 全国计算语言学联合学术会议 (JSCL-2003) 论文集. 北京: 清华大学出版社, 2003: 560-566.
[9] (Luo Weihua, Liu Qun, Cheng Xueqi.Development and Analysis of Technology of Topic Detection and Tracking [C]. In: Proceedings of the 7th National Conference on Computational Linguistics. Beijing: Tsinghua University Press, 2003: 560-566. )
[10] Xu J, Croft W B.Cluster-based Language Models for Distributed Retrieval [C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999.
[11] Wartena C, Brussee R.Topic Detection by Clustering Keywords [C]. In: Proceedings of the 19th International Conference on Database and Expert Systems Application. IEEE Computer Society, 2008: 54-58.
[12] Yang Y, Pierce T, Carbonell J.A Study on Retrospective and On-line Event Detection[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998.
[13] Jia Z Y, Qing H E, Zhang H J, et al.A News Event Detection and Tracking Algorithm Based on Dynamic Evolution Model[J]. Journal of Computer Research & Development, 2004, 41(7): 1273-1280.
[14] 贾自艳, 何清, 张海俊, 等. 一种基于动态进化模型的事件探测和追踪算法[J]. 计算机研究与发展, 2004, 41(7): 1273-1280.
[14] (Jia Ziyan, He Qing, Zhang Haijun, et al.A News Event Detection and Tracking Algorithm Based on Dynamic Evolution Model[J]. Journal of Computer Research & Development, 2004, 41(7): 1273-1280.)
[15] 马彬, 洪宇, 陆剑江, 等. 基于线索树双层聚类的微博话题检测[J]. 中文信息学报, 2012, 26(6): 121-128.
[15] (Ma Bin, Hong Yu, Lu Jianjiang, et al.A Thread-based Two-stage Clustering Method of Microblog Topic Detection[J]. Journal of Chinese Information Processing, 2012, 26(6): 121-128.)
[16] Hofmann T.Probabilistic Latent Semantic Indexing[C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999.
[17] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[18] Griths T L, Steyvers M.A Probabilistic Approach to Semantic Representation [C]. In: Proceedings of the 24th Annual Conference of the Congnitive Science Society. 2002: 381-386.
[19] 单斌, 李芳. 基于LDA话题演化研究方法综述[J]. 中文信息学报, 2010, 24(6): 43-49.
[19] (Shan Bin, Li Fang.A Survey of Topic Evolution Based on LDA[J]. Journal of Chinese Information Processing, 2010, 24(6): 43-49.)
[20] 贺亮, 李芳. 基于话题模型的科技文献话题发现和趋势分析[J]. 中文信息学报, 2012, 26(2): 109-115.
[20] (He Liang, Li Fang.Topic Discovery and Trend Analysis in Scientific Literature Based on Topic Model[J]. Journal of Chinese Information Processing, 2012, 26(2): 109-115.)
[21] 吴永辉, 王晓龙, 丁宇新, 等. 基于主题的自适应、在线网络热点发现方法及新闻推荐系统[J]. 电子学报, 2010, 38(11): 2620-2624.
[21] (Wu Yonghui, Wang Xiaolong, Ding Yuxin, et al.Adaptive On-Line Web Topic Detection Method for Web News Recommendation System[J]. Acta Electronica Sinica, 2010, 38(11): 2620-2624.)
[22] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802.
[22] (Zhang Chenyi, Sun Jianling, Ding Yiqun.Topic Mining for Microblog Based on MB-LDA Model[J]. Journal of Computer Research & Development, 2011, 48(10): 1795-1802.)
[23] Civitello L.Cuisine and Culture: A History of Food and People[M]. Wiley, 2011.
[24] Tregear A.From Stilton to Vimto: Using Food History to Re-think Typical Products in Rural Development[J]. Sociologia Ruralis, 2003, 43(2): 91-107.
[25] 王仁湘. 饮食与中国文化[M]. 北京: 人民出版社, 1993.
[25] (Wang Renxiang.Diet and Chinese Culture [M]. Beijing: People’s Publishing House, 1993.)
[26] 张景明. 中国北方游牧民族饮食文化研究[M]. 北京: 文物出版社, 2008.
[26] (Zhang Jingming.Chinese Nomads Food Culture[M]. Beijing: Cultural Relics Press, 2008. )
[27] Mennell S, Murcott A, Otterloo A H V. The Sociology of Food: Eating, Diet and Culture[M]. London: Sage Publications, 1992.
[28] Beardsworth A, Keil E T.Sociology on the Menu: An Invitation to the Study of Food and Society[J]. British Journal of Sociology, 2002, 49(2): 327-328.
[29] Germov J, Williams L.A Sociology of Food and Nutrition: The Social Appetite [M]. The 3rd Edition. Oxford University Press, 2008.
[30] 陈传康. 中国饮食文化的区域分化和发展趋势[J]. 地理学报, 1994, 49(3): 226-235.
[30] (Chen Chuankang.The Culture of Chinese Diet: Regional Differentiation and Developing Trends[J]. Acta Geographica Sinica, 1994, 49(3): 226-235.)
[31] 蔡晓梅, 司徒尚纪. 中国地理学视角的饮食文化研究回顾与展望[J]. 云南地理环境研究, 2006, 18(5): 83-88.
[31] (Cai Xiaomei, Situ Shangji.A Review on the Studies of Food Culture from Geographical Perspective[J]. Yunnan Geographic Environment Research, 2006, 18(5): 83-88.)
[32] 蓝勇. 中国饮食辛辣口味的地理分布及其成因研究[J]. 地理研究, 2001, 16(5): 229-237.
[32] (Lan Yong.On The Reasons and Distrbution of Pungent Flavour in Chinese Food and Drink[J]. Geographical Research, 2001, 16(5): 229-237.)
[33] Ahn Y Y, Ahnert S E, Bagrow J P, et al. Flavor Network and the Principles of Food Pairing [J/OL]. Scientific Reports, 2011: Article No. 196. .
[34] Sherman P W, Billing J.Darwinian Gastronomy: Why We Use Spices[J]. Bioscience, 1999, 49(6): 453.
[35] Zhu Y X, Huang J, Zhang Z K, et al.Geography and Similarity of Regional Cuisines in China[J]. PLoS One, 2013, 8(11): e79161.
[36] Salton G, Yang C S.On the Specification of Term Values in Automatic Indexing[J]. Journal of Documentation, 1973, 29(4): 351-372.
[37] Arthur D, Vassilvitskii S.K-means++: The Advantages of Careful Seeding [C]. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 2007: 1027-1035.
[38] 彭楠赟, 王厚峰, 凌晨添. 基于层次聚类的网络新闻热点发现[A]. //中国计算语言学研究前沿进展(2009-2011)[R]. 北京: 清华大学出版社, 2011: 487-492.
[38] (Peng Nanyun, Wang Houfeng, Ling Chentian.Event Mining in On-line News Based on Hierarchical Clustering [A]. // Advances of Computational Linguistics in China [R]. Beijing: Tsinghua University Press, 2011: 487-492.)
[39] Hinton G E.Learning Distributed Representations of Concepts [C]. In: Proceedings of the 8th Annual Meeting of the Cognitive Science Society. 1986.
[40] Tan P N, Steinbach M, Kumar V, et al.Introduction to Data Mining[M]. Pearson, 2010.
[41] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space [OL]. ArXiv: 1301. 3781.
[42] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119.
[1] Gang Li,Sijing Chen,Jin Mao,Yansong Gu. Spatio-Temporal Comparison of Microblog Trending Topics on Natural Disasters[J]. 数据分析与知识发现, 2019, 3(11): 1-15.
[2] Fang Xiaofei,Huang Xiaoxi,Wang Rongbo,Chen Zhiqun,Wang Xiaohua. Identifying Hot Topics from Mobile Complaint Texts[J]. 数据分析与知识发现, 2017, 1(2): 19-27.
[3] Wang Yuefen,Jia Xinlu,Fu Zhu. Content Using Behavior of Academic Social Network System: Case Study of Popular Blogs from Sciencenet.cn[J]. 现代图书情报技术, 2016, 32(6): 63-72.
[4] Zhao Yingguang, An Xinying, Li Yong, Jia Xiaofeng. A Method for Detecting the Hot Topic of Literature Based on Lifecycle——A Case Study of Neoplasm Field[J]. 现代图书情报技术, 2012, (11): 86-91.
[5] Le Xiaoqiu, Hong Na. A Survey of Burst Topic Detection Towards Social Text Stream Data[J]. 现代图书情报技术, 2012, (10): 21-27.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn