Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (6): 79-86     https://doi.org/10.11925/infotech.1003-3513.2014.06.09
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
微博中文本特征质量对检索效果的影响
唐晓波, 房小可
武汉大学信息管理学院 武汉 430072
The Effect of the Quality of Textual Features on Retrieval in Micro-blog
Tang Xiaobo, Fang Xiaoke
School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF (747 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的]通过对国内4大微博平台中特征词质量的测度, 探讨其质量指标对检索效果的影响。[方法]将权重计算指标TF-IDF从特征词角度提升为特征的研究, 并通过描述能力和辨别能力两个质量测度指标对国内4个主流微博平台中各特征的质量进行评估。[结果]微博中文本特征的描述能力和辨别能力对检索效果产生正向影响; 各平台不同特征的质量对分类有着不同程度的影响, 两种测度指标综合考虑时得到的分类效果最好。[局限]微博中的对话回复、粉丝数、关注数等特征并没有被考虑在内; 对于语义研究中的特征词一词多义或者同义词的讨论并未涉猎。[结论]本研究可更好地揭示微博中各种特征影响检索效果好坏的重要程度, 有助于研究者对各平台特征作用的深入理解, 从而从根本上提高社会化媒体平台的检索质量。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
唐晓波
房小可
关键词 微博文本特征描述能力辨别能力检索    
Abstract

[Objective] To discuss the effect of features quality on the search results through the four major domestic microblogging. [Methods] The weight calculation indicators TF-IDF is enhanced from the perspective of the whole feature, and the quality of each feature in the microblogging is further assessed by the two measure indicators including descriptive power and discriminative power. [Results] The descriptive power and discriminative power in microblogging appeare positive effects on the search results; Different quality of features in each platform has different impact to the classified results; And integrating the two indexes has the best effective in the classification. [Limitations] Some other features in the microblogging, namely dialogue replies, and number of fans, have not been taken into account. And the word semantic ambiguity characteristic like synonyms is not discussed yet. [Conclusions] This study helps features in the microblogging to be in-depth understood through the discussion that the effect of features quality on the search results. So as to improve the retrieval quality in the social media platforms.

Key wordsMicro-blog    Text features    Descriptive power    Discriminative power    Retrieval
收稿日期: 2013-12-04      出版日期: 2014-07-09
:  G203  
基金资助:

本文系国家自然科学基金项目“社会化媒体集成检索与语义分析方法研究”(项目编号: 71273194)和武汉大学2013年研究生自主科研项目“社会化媒体检索策略研究”(项目编号: 2013104010206)的研究成果之一。

通讯作者: 房小可E-mail:fangxiaoke1987218@163.com     E-mail: fangxiaoke1987218@163.com
作者简介: 作者贡献声明:唐晓波:提出研究思路,设计研究方案,以及最终版本的修订;房小可:数据采集、清洗和各特征质量指标的计算,以及论文的撰写。
引用本文:   
唐晓波, 房小可. 微博中文本特征质量对检索效果的影响[J]. 现代图书情报技术, 2014, 30(6): 79-86.
Tang Xiaobo, Fang Xiaoke. The Effect of the Quality of Textual Features on Retrieval in Micro-blog. New Technology of Library and Information Service, 2014, 30(6): 79-86.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.06.09      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I6/79

[1] 中国互联网信息中心. 第31次中国互联网络发展状况统计报告[R/OL]. [2013-12-03]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/hlwtjbg/201301/P020130122600399530412.pdf. (China Internet Network Information Center. Statistical Report on the 31st Internet Development of China Internet Network[R/OL]. [2013-12-03]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/hlwtjbg/201301/P020130122600399530412.pdf.)
[2] Li Z C, Liu J, Jiang Y, et al.Low Rank Metric Learning for Social Image Retrieval[C]. In: Proceedings of the 20th ACM International Conference on Multimedia. New York: ACM, 2012: 853-856.
[3] Zhou D, Lawless S, Wade V. Improving Search via Personalized Query Expansion Using Social Media[J]. Information Retrieval, 2012, 15(3-4): 218-242.
[4] Sizov S.GeoFolk: Latent Spatial Semantics in Web 2.0 Social Media[C]. In: Proceedings of the 3rd ACM International Conference on Web Search and Data Mining(WSDM'10). New York: ACM, 2010:281-290.
[5] Hsu M H, Chen H H.Efficient and Effective Prediction of Social Tags to Enhance Web Search[J]. Journal of the American Society for Information Science and Technology, 2011, 62(8): 1473-1487.
[6] Hu J, Wang B, Liu Y, et al.Personalized Tag Recommendation Using Social Influence[J]. Journal of Computer Science and Technology, 2012, 27(3): 527-540.
[7] Kuo Y H, Cheng W H, Lin H T, et al. Unsupervised Semantic Feature Discovery for Image Object Retrieval and Tag Refinement[J]. IEEE Transactions on Multimedia, 2012, 14(4): 1079-1090.
[8] 宋立荣, 李思经. 从数据质量到信息质量的发展[J]. 情报科学, 2010, 28(2): 182-186. (Song Lirong, Li Sijing. Development from Data Quality to Information Quality[J]. Information Science, 2010, 28(2): 182-186.)
[9] Strong D M, Lee Y W, Wang R Y. Data Quality in Context[J]. Communications of the ACM, 1997, 40(5): 103-110.
[10] 查先进, 陈明红. 信息资源质量评估研究[J]. 中国图书馆学报, 2010, 36(2): 46-55. (Zha Xianjin, Chen Minghong. Quality Assessment of Information Resources[J]. Journal of Library Science in China, 2010, 36(2): 46-55.)
[11] Agichtein E, Castillo C, Donato D, et al. Finding High-quality Content in Social Media[C]. In: Proceedings of the 2008 International Conference on Web Search and Data Mining(WSDM'08). New York: ACM, 2008: 183-194.
[12] Pike J C, Bateman P J, Butler B.Dialectic Tensions of Information Quality: Social Networking Sites and Hiring[J]. Journal of Computer-mediated Communication, 2013, 19(1): 56-77.
[13] Camm C F, Sunderland N, Camm A J.A Quality Assessment of Cardiac Auscultation Material on YouTube[J]. Clinical Cardiology, 2013, 36(2): 77-81.
[14] Mishne G. Using Blog Properties to Improve Retrieval[C]. In: Proceedings of the International Conference on Web Search and Web Data Mining. USA: ACM, 2007.
[15] FitzGerald N, Carenini G, Murray G, et al. Exploiting Conversational Features to Detect High-quality Blog Comments[C]. In: Proceedings of the 24th Canadian Conference on Advances in Artificial Intelligence (AI'11). Berlin, Heidelberg: Springer-Verlag, 2011: 122-127.
[16] Lih A. Wikipedia as Participatory Journalism: Reliable Sources? Metrics for Evaluating Collaborative Media as a News Resource[C]. In: Proceedings of the 5th International Symposium on Online Journalism. University of Texas at Austin, 2004: 1-31.
[17] Dalip D H, Goncalves M A, Cristo M, et al. Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia[C]. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM, 2009: 295-304.
[18] Reich J, Murnane R, Willett J. The State of Wiki Usage in US K-12 Schools: Leveraging Web 2.0 Data Warehouses to Assess Quality and Equity in Online Learning Environments [J]. Educational Research, 2012, 41(1): 7-15.
[19] De Moura E S, Fernandes D, Ribeiro-Neto B, et al. Using Structural Information to Improve Search in Web Collections [J]. Journal of the American Society for Information Science and Technology, 2010, 61(12): 2503-2513.
[20] Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web[R]. Technical Report of Stanford InfoLab. USA:Stanford InfoLab Publication Server, 1998: 1-17.
[21] Salton G, Wong A, Yang C S.A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[22] 陈行亮.论情报研究系统动力学仿真[J].图书情报知识, 1991(3): 44. (Cheng Hangliang. Research on the System Dynamics Simulation in Intelligence[J]. Knowledge of Library and Information Science, 1991(3): 44.)
[23] 张晓辉, 李莹, 王华勇, 等. 应用特征聚合进行中文文本分类的改进KNN算法[J]. 东北大学学报: 自然科学版, 2003, 24(3): 229-232. (Zhang Xiaohui, Li Ying, Wang Huayong, et al. An Improved KNN Algorithm Applied Term Feature Combination Technology for Chinese Textual Classification[J]. Journal of Northeastern University: Natural Science Edition, 2003, 24(3): 229-232.)

[1] 陈杰,马静,李晓峰. 融合预训练模型文本特征的短文本分类方法*[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[2] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[3] 孟镇,王昊,虞为,邓三鸿,张宝隆. 基于特征融合的声乐分类研究*[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[4] 卢利农,祝忠明,张旺强,王小春. 基于Lingo3G聚类算法的机构知识库跨库知识整合与知识指纹服务实现[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[5] 李跃艳,王昊,邓三鸿,王伟. 近十年信息检索领域的研究热点与演化趋势研究——基于SIGIR会议论文的分析[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[6] 张梦瑶, 朱广丽, 张顺香, 张标. 基于情感分析的微博热点话题用户群体划分模型 *[J]. 数据分析与知识发现, 2021, 5(2): 43-49.
[7] 席运江, 杜蝶蝶, 廖晓, 仉学红. 基于超网络的企业微博用户聚类研究及特征分析*[J]. 数据分析与知识发现, 2020, 4(8): 107-118.
[8] 徐以聪,田学东,李新福,杨芳,史青宣. 基于犹豫模糊权重的数学表达式检索 *[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[9] 邱尔丽,何鸿魏,易成岐,李慧颖. 基于字符级CNN技术的公共政策网民支持度研究 *[J]. 数据分析与知识发现, 2020, 4(7): 28-37.
[10] 李轲禹,王昊,龚丽娟,唐慧慧. 学术数据库中研究主题术语的质量测度及分布研究*[J]. 数据分析与知识发现, 2020, 4(6): 91-108.
[11] 朱路,田晓梦,曹赛男,刘媛媛. 基于高阶语义相关的子空间跨模态检索方法研究*[J]. 数据分析与知识发现, 2020, 4(5): 84-91.
[12] 李铁军,颜端武,杨雄飞. 基于情感加权关联规则的微博推荐研究*[J]. 数据分析与知识发现, 2020, 4(4): 27-33.
[13] 梁艳平,安璐,刘静. 同类突发公共卫生事件微博话题共振研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 122-133.
[14] 熊欣,王昊,张海潮,张宝隆. 中文术语粒度对其区分能力测度的影响分析*[J]. 数据分析与知识发现, 2020, 4(2/3): 143-152.
[15] 徐月梅,刘韫文,蔡连侨. 基于深度融合特征的政务微博转发规模预测模型*[J]. 数据分析与知识发现, 2020, 4(2/3): 18-28.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn