Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (6): 79-86    DOI: 10.11925/infotech.1003-3513.2014.06.09
Current Issue | Archive | Adv Search |
The Effect of the Quality of Textual Features on Retrieval in Micro-blog
Tang Xiaobo, Fang Xiaoke
School of Information Management, Wuhan University, Wuhan 430072, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] To discuss the effect of features quality on the search results through the four major domestic microblogging. [Methods] The weight calculation indicators TF-IDF is enhanced from the perspective of the whole feature, and the quality of each feature in the microblogging is further assessed by the two measure indicators including descriptive power and discriminative power. [Results] The descriptive power and discriminative power in microblogging appeare positive effects on the search results; Different quality of features in each platform has different impact to the classified results; And integrating the two indexes has the best effective in the classification. [Limitations] Some other features in the microblogging, namely dialogue replies, and number of fans, have not been taken into account. And the word semantic ambiguity characteristic like synonyms is not discussed yet. [Conclusions] This study helps features in the microblogging to be in-depth understood through the discussion that the effect of features quality on the search results. So as to improve the retrieval quality in the social media platforms.

Key wordsMicro-blog      Text features      Descriptive power      Discriminative power      Retrieval     
Received: 04 December 2013      Published: 09 July 2014
:  G203  

Cite this article:

Tang Xiaobo, Fang Xiaoke. The Effect of the Quality of Textual Features on Retrieval in Micro-blog. New Technology of Library and Information Service, 2014, 30(6): 79-86.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.06.09     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I6/79

[1] 中国互联网信息中心. 第31次中国互联网络发展状况统计报告[R/OL]. [2013-12-03]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/hlwtjbg/201301/P020130122600399530412.pdf. (China Internet Network Information Center. Statistical Report on the 31st Internet Development of China Internet Network[R/OL]. [2013-12-03]. http://www.cnnic.cn/hlwfzyj/hlwxzbg/hlwtjbg/201301/P020130122600399530412.pdf.)
[2] Li Z C, Liu J, Jiang Y, et al.Low Rank Metric Learning for Social Image Retrieval[C]. In: Proceedings of the 20th ACM International Conference on Multimedia. New York: ACM, 2012: 853-856.
[3] Zhou D, Lawless S, Wade V. Improving Search via Personalized Query Expansion Using Social Media[J]. Information Retrieval, 2012, 15(3-4): 218-242.
[4] Sizov S.GeoFolk: Latent Spatial Semantics in Web 2.0 Social Media[C]. In: Proceedings of the 3rd ACM International Conference on Web Search and Data Mining(WSDM'10). New York: ACM, 2010:281-290.
[5] Hsu M H, Chen H H.Efficient and Effective Prediction of Social Tags to Enhance Web Search[J]. Journal of the American Society for Information Science and Technology, 2011, 62(8): 1473-1487.
[6] Hu J, Wang B, Liu Y, et al.Personalized Tag Recommendation Using Social Influence[J]. Journal of Computer Science and Technology, 2012, 27(3): 527-540.
[7] Kuo Y H, Cheng W H, Lin H T, et al. Unsupervised Semantic Feature Discovery for Image Object Retrieval and Tag Refinement[J]. IEEE Transactions on Multimedia, 2012, 14(4): 1079-1090.
[8] 宋立荣, 李思经. 从数据质量到信息质量的发展[J]. 情报科学, 2010, 28(2): 182-186. (Song Lirong, Li Sijing. Development from Data Quality to Information Quality[J]. Information Science, 2010, 28(2): 182-186.)
[9] Strong D M, Lee Y W, Wang R Y. Data Quality in Context[J]. Communications of the ACM, 1997, 40(5): 103-110.
[10] 查先进, 陈明红. 信息资源质量评估研究[J]. 中国图书馆学报, 2010, 36(2): 46-55. (Zha Xianjin, Chen Minghong. Quality Assessment of Information Resources[J]. Journal of Library Science in China, 2010, 36(2): 46-55.)
[11] Agichtein E, Castillo C, Donato D, et al. Finding High-quality Content in Social Media[C]. In: Proceedings of the 2008 International Conference on Web Search and Data Mining(WSDM'08). New York: ACM, 2008: 183-194.
[12] Pike J C, Bateman P J, Butler B.Dialectic Tensions of Information Quality: Social Networking Sites and Hiring[J]. Journal of Computer-mediated Communication, 2013, 19(1): 56-77.
[13] Camm C F, Sunderland N, Camm A J.A Quality Assessment of Cardiac Auscultation Material on YouTube[J]. Clinical Cardiology, 2013, 36(2): 77-81.
[14] Mishne G. Using Blog Properties to Improve Retrieval[C]. In: Proceedings of the International Conference on Web Search and Web Data Mining. USA: ACM, 2007.
[15] FitzGerald N, Carenini G, Murray G, et al. Exploiting Conversational Features to Detect High-quality Blog Comments[C]. In: Proceedings of the 24th Canadian Conference on Advances in Artificial Intelligence (AI'11). Berlin, Heidelberg: Springer-Verlag, 2011: 122-127.
[16] Lih A. Wikipedia as Participatory Journalism: Reliable Sources? Metrics for Evaluating Collaborative Media as a News Resource[C]. In: Proceedings of the 5th International Symposium on Online Journalism. University of Texas at Austin, 2004: 1-31.
[17] Dalip D H, Goncalves M A, Cristo M, et al. Automatic Quality Assessment of Content Created Collaboratively by Web Communities: A Case Study of Wikipedia[C]. In: Proceedings of the 9th ACM/IEEE-CS Joint Conference on Digital Libraries. New York: ACM, 2009: 295-304.
[18] Reich J, Murnane R, Willett J. The State of Wiki Usage in US K-12 Schools: Leveraging Web 2.0 Data Warehouses to Assess Quality and Equity in Online Learning Environments [J]. Educational Research, 2012, 41(1): 7-15.
[19] De Moura E S, Fernandes D, Ribeiro-Neto B, et al. Using Structural Information to Improve Search in Web Collections [J]. Journal of the American Society for Information Science and Technology, 2010, 61(12): 2503-2513.
[20] Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web[R]. Technical Report of Stanford InfoLab. USA:Stanford InfoLab Publication Server, 1998: 1-17.
[21] Salton G, Wong A, Yang C S.A Vector Space Model for Automatic Indexing[J]. Communications of the ACM, 1975, 18(11): 613-620.
[22] 陈行亮.论情报研究系统动力学仿真[J].图书情报知识, 1991(3): 44. (Cheng Hangliang. Research on the System Dynamics Simulation in Intelligence[J]. Knowledge of Library and Information Science, 1991(3): 44.)
[23] 张晓辉, 李莹, 王华勇, 等. 应用特征聚合进行中文文本分类的改进KNN算法[J]. 东北大学学报: 自然科学版, 2003, 24(3): 229-232. (Zhang Xiaohui, Li Ying, Wang Huayong, et al. An Improved KNN Algorithm Applied Term Feature Combination Technology for Chinese Textual Classification[J]. Journal of Northeastern University: Natural Science Edition, 2003, 24(3): 229-232.)

[1] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[2] Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[3] Lu Linong,Zhu Zhongming,Zhang Wangqiang,Wang Xiaochun. Cross-database Knowledge Integration and Fingerprint of Institutional Repositories with Lingo3G Clustering Algorithm[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[4] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[5] Xu Yicong,Tian Xuedong,Li Xinfu,Yang Fang,Shi Qingxuan. Retrieving Mathematical Expressions Based on Hesitant Fuzzy Weight[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[6] Zhu Lu,Tian Xiaomeng,Cao Sainan,Liu Yuanyuan. Subspace Cross-modal Retrieval Based on High-Order Semantic Correlation[J]. 数据分析与知识发现, 2020, 4(5): 84-91.
[7] Xiong Xin,Wang Hao,Zhang Haichao,Zhang Baolong. Impacts of Chinese Term Granularity on Measuring Term Discriminative Capacity[J]. 数据分析与知识发现, 2020, 4(2/3): 143-152.
[8] Xinyu Zai,Xuedong Tian. Retrieving Scientific Documents with Formula Description Structure and Word Embedding[J]. 数据分析与知识发现, 2020, 4(1): 131-138.
[9] Mingxuan Huang,Shoudong Lu,Hui Xu. Cross-Language Information Retrieval Based on Weighted Association Patterns and Rule Consequent Expansion[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[10] Junliang Yao,Xiaoqiu Le. Semantic Matching for Sci-Tech Novelty Retrieval[J]. 数据分析与知识发现, 2019, 3(6): 50-56.
[11] Mingqing Zhao,Shengqiang Wu. Research on Stock Market Weighted Prediction Method Based on Micro-blog Sentiment Analysis[J]. 数据分析与知识发现, 2019, 3(2): 43-51.
[12] Ye Guanghui,Yang Jinqing. Route Recommendation Based on Two-way Link Analysis of Urban Name Entities[J]. 数据分析与知识发现, 2019, 3(11): 79-88.
[13] Zeng Ziming,Yang Qianwen. Sentiment Analysis for Micro-blogs with LDA and AdaBoost[J]. 数据分析与知识发现, 2018, 2(8): 51-59.
[14] Sun Haixia,Wang Lei,Wu Yingjie,Hua Weina,Li Junlian. Matching Strategies for Institution Names in Literature Database[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[15] Li Lei,He Daqing,Zhang Chengzhi. Survey on Social Question and Answer[J]. 数据分析与知识发现, 2018, 2(7): 1-12.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn