Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (10): 70-80     https://doi.org/10.11925/infotech.1003-3513.2016.10.08
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
面向在线社交网络用户生成内容的饮食话题发现研究*
张晓勇1,2,周清清1,2,章成志1,2,3()
1南京理工大学经济管理学院 南京 210094
2杭州师范大学阿里巴巴复杂科学研究中心 杭州 311121
3江苏省数据工程与知识服务重点实验室(南京大学) 南京 210093
Identifying Food Topics from User-Generated Contents in Microblogs
Zhang Xiaoyong1,2,Zhou Qingqing1,2,Zhang Chengzhi1,2,3()
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2 Alibaba Research Center for Complex Sciences, Hangzhou Normal University, Hangzhou 311121, China
3 Jiangsu Key Laboratory of Data Engineering and Knowledge Service (Nanjing University), Nanjing 210093, China
全文: PDF (733 KB)   HTML ( 31
输出: BibTeX | EndNote (RIS)      
摘要 

目的】通过大规模文本聚类技术进行话题检测, 并自动拣选优质话题。【方法】以新浪微博上与饮食相关的微博内容为数据源, 结合文本聚类与深度学习知识进行话题检测。通过匹配微博发布的月份, 将微博划分为四季微博; 使用向量空间模型和文本聚类方法, 对不同季节的微博进行话题检测, 获得候选话题; 结合深度学习知识, 提出主题覆盖率概念, 用以自动评价话题质量, 去除低质量话题。【结果】基于主题覆盖率的话题筛选结果符合人工拣选预期, 抽取获得主题覆盖率高于0.5的优质话题。【局限】话题检测质量的评价主要以定性评价为主。【结论】通过计算主题覆盖率来自动选择优质话题, 该方法效率高, 通用性强, 获得的话题便于理解, 较好地揭示了四季中饮食微博的话题分布。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张晓勇
周清清
章成志
关键词 话题检测用户生成内容主题覆盖率饮食挖掘    
Abstract

[Objective] This study aims to identify microblog post topics, and then automatically extract high quality ones with the help of text clustering techniques. [Methods] We collected food related microblog posts from Sina Weibo as raw data, then applied text clustering and deep learning techniques to detect the target topics. First, we categorized the microblog posts by the four seasons in accordance with their publishing dates. Second, we created a vector space model and used text clustering method to retrieve candidate topics. Finally, we automatically identified the quality topics with deep learning technology. [Results] We automatically identified the high quality topics manually found by researchers, and their topic coverage values were all higher than 0.5. [Limitations] We decided the topic quality based on qualitative data. [Conclusions] The proposed method could extract high quality topics effectively. The retrieved topics reflect the distribution of food related microblog posts in the four seasons.

Key wordsTopic detection    User-Generated Contents    Topic coverage    Food mining
收稿日期: 2016-05-26      出版日期: 2016-11-23
基金资助:*本文系国家社会科学基金项目“在线社交网络中基于用户的知识组织模式研究”(项目编号: 14BTQ033)、国家社会科学基金重点项目“大数据环境下社会舆情与决策支持方法体系研究”(项目编号: 14AZD084)和江苏省普通高校研究生科研创新(实践)计划项目“基于社交媒体的多粒度电影评论挖掘研究”(项目编号: SJLX15_0166)的研究成果之一
引用本文:   
张晓勇,周清清,章成志. 面向在线社交网络用户生成内容的饮食话题发现研究*[J]. 现代图书情报技术, 2016, 32(10): 70-80.
Zhang Xiaoyong,Zhou Qingqing,Zhang Chengzhi. Identifying Food Topics from User-Generated Contents in Microblogs. New Technology of Library and Information Service, 2016, 32(10): 70-80.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.10.08      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2016/V32/I10/70
[1] 中国互联网络信息中心.第37次中国互联网络发展状况统计报告[R/OL]. (2016-01-22). [2016-05-25]. .
[1] (China Internet Network Information Center. The 37th Report of Chinese Internet Development [R/OL]. (2016-01-22). [2016-05-25].
[2] 殷风景, 肖卫东, 葛斌, 等. 一种面向网络话题发现的增量文本聚类算法[J]. 计算机应用研究, 2011, 28(1): 54-57.
[2] (Yin Fengjing, Xiao Weidong, Ge Bin, et al.Incremental Algorithm for Clustering Texts in Internet-oriented Topic Detection[J]. Application Research of Computers, 2011, 28(1): 54-57.)
[3] 王伟, 许鑫. 基于聚类的网络舆情热点发现及分析[J]. 现代图书情报技术, 2009(3): 74-79.
[3] (Wang Wei, Xu Xin.Online Public Opinion Hotspot Detection and Analysis Based on Document Clustering[J]. New Technology of Library & Information Service, 2009(3): 74-79.)
[4] 徐东亮. 基于聚类分析的网络论坛舆情信息挖掘技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2010.
[4] (Xu Dongliang.Research of Public Opinion Information Mining on Bulletin Board Systems Based on Cluster Analysis[D]. Harbin: Harbin Institute of Technology, 2010.)
[5] 朱恒民, 李青. 面向话题衍生性的微博网络舆情传播模型研究[J]. 现代图书情报技术, 2012(5): 60-64.
[5] (Zhu Hengmin, Li Qing.Public Opinion Propagation Model with Topic Derivatives in the Micro-blog Network[J]. New Technology of Library & Information Service, 2012(5): 60-64.)
[6] 洪宇, 张宇, 刘挺, 等. 话题检测与跟踪的评测及研究综述[J]. 中文信息学报, 2007, 21(6): 71-87.
[6] (Hong Yu, Zhang Yu, Liu Ting, et al.Topic Detection and Tracking Review[J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.)
[7] Allan J, Carbonell J, Doddington G, et al.Topic Detection and Tracking Pilot Study Final Report[C]. In: Proceedings of the 1998 Broadcast News Transcription and Understanding Workshop. 1998.
[8] 路荣, 项亮, 刘明荣, 等. 基于隐主题分析和文本聚类的微博客中新闻话题的发现[J]. 模式识别与人工智能, 2012, 25(3): 382-387.
[8] (Lu Rong, Xiang Liang, Liu Mingrong, et al.Discovering News Topics from Microblogs Based on Hidden Topics Analysis and Text Clustering[J]. Pattern Recognition & Artificial Intelligence, 2012, 25(3): 382-387.)
[9] 骆卫华, 刘群, 程学旗. 话题检测与跟踪技术的发展与研究[C]. 见: 全国计算语言学联合学术会议 (JSCL-2003) 论文集. 北京: 清华大学出版社, 2003: 560-566.
[9] (Luo Weihua, Liu Qun, Cheng Xueqi.Development and Analysis of Technology of Topic Detection and Tracking [C]. In: Proceedings of the 7th National Conference on Computational Linguistics. Beijing: Tsinghua University Press, 2003: 560-566. )
[10] Xu J, Croft W B.Cluster-based Language Models for Distributed Retrieval [C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999.
[11] Wartena C, Brussee R.Topic Detection by Clustering Keywords [C]. In: Proceedings of the 19th International Conference on Database and Expert Systems Application. IEEE Computer Society, 2008: 54-58.
[12] Yang Y, Pierce T, Carbonell J.A Study on Retrospective and On-line Event Detection[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1998.
[13] Jia Z Y, Qing H E, Zhang H J, et al.A News Event Detection and Tracking Algorithm Based on Dynamic Evolution Model[J]. Journal of Computer Research & Development, 2004, 41(7): 1273-1280.
[14] 贾自艳, 何清, 张海俊, 等. 一种基于动态进化模型的事件探测和追踪算法[J]. 计算机研究与发展, 2004, 41(7): 1273-1280.
[14] (Jia Ziyan, He Qing, Zhang Haijun, et al.A News Event Detection and Tracking Algorithm Based on Dynamic Evolution Model[J]. Journal of Computer Research & Development, 2004, 41(7): 1273-1280.)
[15] 马彬, 洪宇, 陆剑江, 等. 基于线索树双层聚类的微博话题检测[J]. 中文信息学报, 2012, 26(6): 121-128.
[15] (Ma Bin, Hong Yu, Lu Jianjiang, et al.A Thread-based Two-stage Clustering Method of Microblog Topic Detection[J]. Journal of Chinese Information Processing, 2012, 26(6): 121-128.)
[16] Hofmann T.Probabilistic Latent Semantic Indexing[C]. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 1999.
[17] Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[18] Griths T L, Steyvers M.A Probabilistic Approach to Semantic Representation [C]. In: Proceedings of the 24th Annual Conference of the Congnitive Science Society. 2002: 381-386.
[19] 单斌, 李芳. 基于LDA话题演化研究方法综述[J]. 中文信息学报, 2010, 24(6): 43-49.
[19] (Shan Bin, Li Fang.A Survey of Topic Evolution Based on LDA[J]. Journal of Chinese Information Processing, 2010, 24(6): 43-49.)
[20] 贺亮, 李芳. 基于话题模型的科技文献话题发现和趋势分析[J]. 中文信息学报, 2012, 26(2): 109-115.
[20] (He Liang, Li Fang.Topic Discovery and Trend Analysis in Scientific Literature Based on Topic Model[J]. Journal of Chinese Information Processing, 2012, 26(2): 109-115.)
[21] 吴永辉, 王晓龙, 丁宇新, 等. 基于主题的自适应、在线网络热点发现方法及新闻推荐系统[J]. 电子学报, 2010, 38(11): 2620-2624.
[21] (Wu Yonghui, Wang Xiaolong, Ding Yuxin, et al.Adaptive On-Line Web Topic Detection Method for Web News Recommendation System[J]. Acta Electronica Sinica, 2010, 38(11): 2620-2624.)
[22] 张晨逸, 孙建伶, 丁轶群. 基于MB-LDA模型的微博主题挖掘[J]. 计算机研究与发展, 2011, 48(10): 1795-1802.
[22] (Zhang Chenyi, Sun Jianling, Ding Yiqun.Topic Mining for Microblog Based on MB-LDA Model[J]. Journal of Computer Research & Development, 2011, 48(10): 1795-1802.)
[23] Civitello L.Cuisine and Culture: A History of Food and People[M]. Wiley, 2011.
[24] Tregear A.From Stilton to Vimto: Using Food History to Re-think Typical Products in Rural Development[J]. Sociologia Ruralis, 2003, 43(2): 91-107.
[25] 王仁湘. 饮食与中国文化[M]. 北京: 人民出版社, 1993.
[25] (Wang Renxiang.Diet and Chinese Culture [M]. Beijing: People’s Publishing House, 1993.)
[26] 张景明. 中国北方游牧民族饮食文化研究[M]. 北京: 文物出版社, 2008.
[26] (Zhang Jingming.Chinese Nomads Food Culture[M]. Beijing: Cultural Relics Press, 2008. )
[27] Mennell S, Murcott A, Otterloo A H V. The Sociology of Food: Eating, Diet and Culture[M]. London: Sage Publications, 1992.
[28] Beardsworth A, Keil E T.Sociology on the Menu: An Invitation to the Study of Food and Society[J]. British Journal of Sociology, 2002, 49(2): 327-328.
[29] Germov J, Williams L.A Sociology of Food and Nutrition: The Social Appetite [M]. The 3rd Edition. Oxford University Press, 2008.
[30] 陈传康. 中国饮食文化的区域分化和发展趋势[J]. 地理学报, 1994, 49(3): 226-235.
[30] (Chen Chuankang.The Culture of Chinese Diet: Regional Differentiation and Developing Trends[J]. Acta Geographica Sinica, 1994, 49(3): 226-235.)
[31] 蔡晓梅, 司徒尚纪. 中国地理学视角的饮食文化研究回顾与展望[J]. 云南地理环境研究, 2006, 18(5): 83-88.
[31] (Cai Xiaomei, Situ Shangji.A Review on the Studies of Food Culture from Geographical Perspective[J]. Yunnan Geographic Environment Research, 2006, 18(5): 83-88.)
[32] 蓝勇. 中国饮食辛辣口味的地理分布及其成因研究[J]. 地理研究, 2001, 16(5): 229-237.
[32] (Lan Yong.On The Reasons and Distrbution of Pungent Flavour in Chinese Food and Drink[J]. Geographical Research, 2001, 16(5): 229-237.)
[33] Ahn Y Y, Ahnert S E, Bagrow J P, et al. Flavor Network and the Principles of Food Pairing [J/OL]. Scientific Reports, 2011: Article No. 196. .
[34] Sherman P W, Billing J.Darwinian Gastronomy: Why We Use Spices[J]. Bioscience, 1999, 49(6): 453.
[35] Zhu Y X, Huang J, Zhang Z K, et al.Geography and Similarity of Regional Cuisines in China[J]. PLoS One, 2013, 8(11): e79161.
[36] Salton G, Yang C S.On the Specification of Term Values in Automatic Indexing[J]. Journal of Documentation, 1973, 29(4): 351-372.
[37] Arthur D, Vassilvitskii S.K-means++: The Advantages of Careful Seeding [C]. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 2007: 1027-1035.
[38] 彭楠赟, 王厚峰, 凌晨添. 基于层次聚类的网络新闻热点发现[A]. //中国计算语言学研究前沿进展(2009-2011)[R]. 北京: 清华大学出版社, 2011: 487-492.
[38] (Peng Nanyun, Wang Houfeng, Ling Chentian.Event Mining in On-line News Based on Hierarchical Clustering [A]. // Advances of Computational Linguistics in China [R]. Beijing: Tsinghua University Press, 2011: 487-492.)
[39] Hinton G E.Learning Distributed Representations of Concepts [C]. In: Proceedings of the 8th Annual Meeting of the Cognitive Science Society. 1986.
[40] Tan P N, Steinbach M, Kumar V, et al.Introduction to Data Mining[M]. Pearson, 2010.
[41] Mikolov T, Chen K, Corrado G, et al.Efficient Estimation of Word Representations in Vector Space [OL]. ArXiv: 1301. 3781.
[42] Mikolov T, Sutskever I, Chen K, et al.Distributed Representations of Words and Phrases and Their Compositionality[J]. Advances in Neural Information Processing Systems, 2013, 26: 3111-3119.
[1] 吴旭,陈春旭. 基于多策略的群聊话题检测技术*[J]. 数据分析与知识发现, 2021, 5(5): 1-9.
[2] 魏家泽,董诚,何彦青,刘志辉,彭柯芸. 基于均衡段落和分话题向量的新闻热点话题检测研究*[J]. 数据分析与知识发现, 2020, 4(10): 70-79.
[3] 王婷婷, 王凯平, 戚桂杰. 基于情感分析的开放式创新平台创意采纳研究: 以Salesforce为例*[J]. 数据分析与知识发现, 2018, 2(4): 38-47.
[4] 蒋翠清, 宋凯伦, 丁勇, 刘尧. 基于用户生成内容的潜在客户识别方法*[J]. 数据分析与知识发现, 2018, 2(3): 1-8.
[5] 岳子静, 章成志, 周清清. 基于UGC的中国各地区用户饮食偏好挖掘研究*[J]. 数据分析与知识发现, 2017, 1(11): 84-93.
[6] 王曰芬,贾新露,傅柱. 学术社交网络用户内容使用行为研究*——基于科学网热门博文的实证分析[J]. 现代图书情报技术, 2016, 32(6): 63-72.
[7] 吕英杰, 范静, 刘景方. 基于文体学的中文UGC作者身份识别研究[J]. 现代图书情报技术, 2013, 29(9): 48-53.
[8] 赵辉, 刘怀亮. 面向用户生成内容的短文本聚类算法研究[J]. 现代图书情报技术, 2013, 29(9): 88-92.
[9] 李蕾, 章成志. 社会化标签质量评估研究综述[J]. 现代图书情报技术, 2013, 29(11): 22-29.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn