手机短信文本信息流的自动文摘生成

doi:10.11925/infotech.1003-3513.2013.02.07

现代图书情报技术

2013, Vol. 29

Issue (2): 43-49 https://doi.org/10.11925/infotech.1003-3513.2013.02.07

知识组织与知识管理

本期目录 | 过刊浏览 | 高级检索

手机短信文本信息流的自动文摘生成

刘金岭¹, 倪晓红², 王新功²

1. 淮阴工学院计算机工程学院淮安 223003;
2. 沧州师范学院计算机系沧州 061001

Automatic Abstracting Generating Based on Mobile Short Message Text Information Flow

Liu Jinling¹, Ni Xiaohong², Wang Xingong²

1. Computer Engineering Faculty, Huaiyin Institute of Technology, Huaian 223003, China;
2. Department of Computer, Cangzhou Teachers College, Cangzhou 061001, China

摘要
参考文献
相关文章
Metrics

全文: PDF (855 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要针对手机短信文本信息流的特点,设计一种自动文摘生成模型。该模型利用词共现定义语义相似度,根据TF-IDF定义特征词权值以及文摘候选句权值。算法通过清除孤立点、根据权值筛选文摘句以及文摘句排序,生成冗余度较小且可读性较好的短信文本信息流文摘。相关数据实验证明,文摘句的生成质量和算法效率都比较高。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	刘金岭
	倪晓红
	王新功

关键词 ：手机短信文本, 信息流, 文摘, 权值

Abstract：Due to the characteristics of mobile short message text information flow in the practical application,an automatic digest generation model is designed. The model uses word co-occurrence to define the semantic similarity. Using the TF-IDF,weights of feature words and abstracts candidate sentence weights are defined in the model. By removing isolated points, the algorithm generates smaller redundancy and more readable short text messages flow digest according to the weight screening abstract and abstract sort. Experiments of the relevant data show that the model has better quality and higher efficiency in abstract generation.

Key words： Mobile short message text Information flow Abstracts Weights

收稿日期: 2012-08-23 出版日期: 2013-04-24

TP391

基金资助:本文系河北省科技支撑计划项目“手机垃圾短信语义识别与分类”(项目编号:10213581)和淮安市社会支撑基金项目“基于数据挖掘的淮安市人力资源及就业状况研究”(项目编号:HASZ2012046)的研究成果之一。

通讯作者: 刘金岭,liujinlingg@126.com E-mail: liujinlingg@126.com

引用本文:

刘金岭, 倪晓红, 王新功. 手机短信文本信息流的自动文摘生成[J]. 现代图书情报技术, 2013, 29(2): 43-49.
Liu Jinling, Ni Xiaohong, Wang Xingong. Automatic Abstracting Generating Based on Mobile Short Message Text Information Flow. New Technology of Library and Information Service, 2013, 29(2): 43-49.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2013.02.07 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2013/V29/I2/43

[1] 12321 网络不良与垃圾信息举报受理中心.2011 年下半年手机短信息状况调查报告[R/OL].[2012-08-17]. http://12321.cn/pdf/ sms1102.pdf.(12321 Report Center.Investigation Report of 2011 Second Half of Mobile Phone Short Message[R/OL]. [2012-08-17]. http://12321.cn/pdf/ sms1102.pdf.)
[2] Carbonell J, Goldstein J. The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM,1998:335-336.
[3] Lapata M.Automatic Evaluation of Information Ordering:Kendall's Tau[J].Computational Linguistics,2006,32(4):471-484.
[4] Hu M, Sun A, Lim E P. Comments-oriented Document Summarization: Understanding Documents with Readers' Feedback[C].In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA:ACM,2008:291-298.
[5] Zajic D, Dorr B J, Lin J. Single-document and Multi-document Summarization Techniques for Email Threads Using Sentence Compression[J].Information Processing and Management,2008, 44(4):1600-1610.
[6] 彭泽映,俞晓明,许洪波,等.大规模短文本的不完全聚类[J].中文信息学报, 2011,25(1):54-59.(Peng Zeying, Yu Xiaoming, Xu Hongbo, et al. Incomplete Clustering for Large Scale Short Texts[J]. Journal of Chinese Information Processing, 2011,25(1):54-59.)
[7] Newman M E J. Power Laws, Pareto Distributions and Zipf's Law[J]. Contemporary Physics, 2005,46(5):323-351.
[8] 黄承慧,印鉴,侯昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011,34(5):856-864. (Huang Chenghui, Yin Jian, Hou Fang. A Text Similarity Measurement Combining Word Semantic Information with TF-IDF Method[J].Chinese Journal of Computers,2011,34(5):856-864.)
[9] 郝秀兰,胡运发,申情.中文论坛内容监测的方法研究[J].中文信息学报, 2012,26(3):129- 136. (Hao Xiulan, Hu Yunfa, Shen Qing. Research on Content Monitoring on Chinese Web Forums[J]. Journal of Chinese Information Processing, 2012,26(3):129- 136.)
[10] 刘美玲,郑德权,赵铁军,等.动态多文档文摘模型[J].软件学报,2012,23(2):289-298. (Liu Meiling, Zheng Dequan, Zhao Tiejun, et al. Dynamic Multi-document Summarization Model[J]. Journal of Software, 2012,23(2):289-298.)
[11] 徐永东,王亚东,刘杨,等.多文档文摘中基于时间信息的句子排序策略研究[J].中文信息学报,2009,23(4):27-33. (Xu Yongdong, Wang Yadong, Liu Yang, et al. Research on Temporal Information Based Sentences Ordering in Multi-document Automatic Summarization[J].Journal of Chinese Information Processing, 2009,23(4):27-33.)
[12] Lin C Y. ROUGE:A Package for Automatic Evaluation of Summaries[C]. In: Proceedings of the ACL-04 Workshop.2004:74-81.

[1]	覃幸新, 王荣波, 黄孝喜, 谌志群. 基于多权值的Slope One协同过滤算法^*[J]. 数据分析与知识发现, 2017, 1(6): 65-71.
[2]	包楚晗, 贾丹萍, 何琳, 马晓雯, 艾毓茜. *中文科技论文图表摘要设计研究^——以图书情报领域为例**[J]. 数据分析与知识发现, 2017, 1(10): 21-31.
[3]	刘天祎,步一,赵丹群,黄文彬. 自动引文摘要研究述评[J]. 现代图书情报技术, 2016, 32(5): 1-8.
[4]	王华秋. 一种基于和声搜索的协同过滤算法研究[J]. 现代图书情报技术, 2012, (12): 79-84.
[5]	宋丽哲,朱先忠,牛振东,李凯. 基于智能信息处理的数字图书馆网络资源建设实践*[J]. 现代图书情报技术, 2004, 20(8): 10-13.
[6]	许剑颖. 统计分析法自动标引的改进研究[J]. 现代图书情报技术, 2004, 20(2): 92-95.
[7]	曾英姿. 基于两种不同的平台检索BIOSIS Previews数据库[J]. 现代图书情报技术, 2004, 20(12): 61-63.
[8]	邓发云,杨平鲜. 剑桥科学文摘数据库的使用与技巧[J]. 现代图书情报技术, 2003, 19(6): 93-94.
[9]	陈定权. 信息检索系统中的用户相关反馈机制[J]. 现代图书情报技术, 2002, 18(4): 33-35.
[10]	沈玮杰. 基于文献结构的自动文摘的初探[J]. 现代图书情报技术, 2002, 18(3): 23-27.
[11]	苏新宁. 计算机自动编制文摘[J]. 现代图书情报技术, 1997, 13(6): 39-43.
[12]	朱爱群. 自动标引和自动文摘对机器翻译的影响[J]. 现代图书情报技术, 1997, 13(1): 47-50.
[13]	李明. 从字频统计出发的中文文摘自动编写[J]. 现代图书情报技术, 1996, 12(3): 42-45.
[14]	王永成. 自动编制文献摘要及知识的自动提取[J]. 现代图书情报技术, 1993, 9(3): 13-13.

Viewed

Full text

Abstract

Cited

Shared

Discussed