Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (12): 39-45    DOI: 10.11925/infotech.1003-3513.2011.12.06
Current Issue | Archive | Adv Search |
Research on Chinese Keywords Extraction Based on Characters Sequence Annotation
Wang Hao, Deng Sanhong, Su Xinning
Department of Information Management, Nanjing University, Nanjing 210093, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  Based on the whole Chinese booklist of a certain university library as well as the analysis of its book indexing information, the paper summarizes the features and extracting laws of Chinese keywords, and establishes a Chinese keywords extraction model based on characters sequence annotation, which proposes the basic idea and implementation scheme for extracting keywords. It verifies the feasibility, rationality and practicality of the model by large-scale experiments, and basically solves the problems of Chinese keywords extraction without executing words segmentation, which shows that characters sequence annotation is better than words sequence annotation.
Key wordsSequence annotation      Conditional random fields      Keywords extraction      Machine learning      Characters sequence      Words sequence     
Received: 08 October 2011      Published: 02 February 2012
: 

TP391.1

 

Cite this article:

Wang Hao, Deng Sanhong, Su Xinning. Research on Chinese Keywords Extraction Based on Characters Sequence Annotation. New Technology of Library and Information Service, 2011, 27(12): 39-45.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.12.06     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I12/39

[1] Hulth A. Combining Machine Learning and Natural Language Processing for Automatic Keyword Extraction[D]. Stockholm: Stockholm University, 2004.

[2] 王昊, 严明, 苏新宁. 基于机器学习的中文书目自动分类研究[J]. 中国图书馆学报, 2010,36(6): 28-39.

[3] 章成志, 苏新宁. 基于条件随机场的自动标引模型研究[J]. 中国图书馆学报, 2008,34(5): 89-94, 99.

[4] Chu C M, O’Brien A. Subject Analysis:The Critical First Stage in Indexing[J]. Journal of Information Science, 1993, 19(6): 439-454.

[5] 邓箴, 包宏. 改进的关键词抽取方法研究[J]. 计算机工程与设计, 2009,30(20): 4677-4680, 4769.

[6] 张雪英, Krause J. 中文文本关键词自动抽取方法研究[J]. 情报学报, 2008,27(4): 512-520.

[7] 徐文海, 温有奎. 一种基于TFIDF方法的中文关键词抽取算法[J]. 情报理论与实践, 2008,31(2): 298-302.

[8] 张庆国, 薛德军, 张振海, 等. 海量数据集上基于特征组合的关键词自动抽取[J]. 情报学报, 2006,25(5): 587-593.

[9] 杨洁, 季铎, 蔡东风, 等. 基于联合权重的多文档关键词抽取技术[J]. 中文信息学报, 2008,22(6): 75-79.

[10] 王灿辉, 张敏, 马少平, 等. 基于相邻词的中文关键词自动抽取[J]. 广西师范大学学报:自然科学版, 2007,25(2): 161-164.

[11] 李素建, 王厚峰, 俞士汶,等. 关键词自动标引的最大熵模型应用研究[J]. 计算机学报, 2004,27(9): 1192-1197.

[12] Frank E, Paynter G W, Witten I H,et al. Domain-Specific Keyphrase Extraction[C]. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence,Stockholm, Sweden.Morgan Kaufmann, 1999: 668-673.

[13] 章成志. 基于集成学习的自动标引方法研究[J]. 情报学报, 2010,29(1): 3-8.

[14] Zhang K, Xu H, Tang J, et al. Keyword Extraction Using Support Vector Machine[C]. In: Proceedings of the 7th International Conference on Web-Age Information Management (WAIM2006), Hong Kong, China.2006: 85-96.

[15] 中国科学院计算技术研究所. ICTCLAS汉语分词系统简介[EB/OL]. [2011-08-13]. http://ictclas.org/ictclas_introduction.html.

[16] 黄昌宁, 赵海. 由字构词——中文分词新方法[C]. 见:中国中文信息学会二十五周年学术会议报告, 2006: 53-63.

[17] Kudo T. CRF++: Yet Another CRF Toolkit[EB/OL]. [2011-08-07]. http://crfpp.sourceforge.net/.
[1] Wang Hanxue,Cui Wenjuan,Zhou Yuanchun,Du Yi. Identifying Pathogens of Foodborne Diseases with Machine Learning[J]. 数据分析与知识发现, 2021, 5(9): 54-62.
[2] Chen Donghua,Zhao Hongmei,Shang Xiaopu,Zhang Runtong. Optimizing Large Hospital Operating Rooms with Data Analytics[J]. 数据分析与知识发现, 2021, 5(9): 115-128.
[3] Che Hongxin,Wang Tong,Wang Wei. Comparing Prediction Models for Prostate Cancer[J]. 数据分析与知识发现, 2021, 5(9): 107-114.
[4] Su Qiang, Hou Xiaoli, Zou Ni. Predicting Surgical Infections Based on Machine Learning[J]. 数据分析与知识发现, 2021, 5(8): 65-75.
[5] Cao Rui,Liao Bin,Li Min,Sun Ruina. Predicting Prices and Analyzing Features of Online Short-Term Rentals Based on XGBoost[J]. 数据分析与知识发现, 2021, 5(6): 51-65.
[6] Zhong Jiawa,Liu Wei,Wang Sili,Yang Heng. Review of Methods and Applications of Text Sentiment Analysis[J]. 数据分析与知识发现, 2021, 5(6): 1-13.
[7] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[8] Chai Guorong,Wang Bin,Sha Yongzhong. Public Health Risk Forecasting with Multiple Machine Learning Methods Combined:Case Study of Influenza Forecasting in Lanzhou, China[J]. 数据分析与知识发现, 2021, 5(1): 90-98.
[9] Chen Dong,Wang Jiandong,Li Huiying,Cai Sihang,Huang Qianqian,Yi Chengqi,Cao Pan. Forecasting Poultry Turnovers with Machine Learning and Multiple Factors[J]. 数据分析与知识发现, 2020, 4(7): 18-27.
[10] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[11] Yang Heng,Wang Sili,Zhu Zhongming,Liu Wei,Wang Nan. Recommending Domain Knowledge Based on Parallel Collaborative Filtering Algorithm[J]. 数据分析与知识发现, 2020, 4(6): 15-21.
[12] Zhao Ping,Sun Lianying,Tu Shuai,Bian Jianling,Wan Ying. Identifying Scenic Spot Entities Based on Improved Knowledge Transfer[J]. 数据分析与知识发现, 2020, 4(5): 118-126.
[13] Wang Shuyi,Liu Sai,Ma Zheng. Microblog Image Privacy Classification with Deep Transfer Learning[J]. 数据分析与知识发现, 2020, 4(10): 80-92.
[14] Ruojia Wang,Lu Zhang,Jimin Wang. Automatic Triage of Online Doctor Services Based on Machine Learning[J]. 数据分析与知识发现, 2019, 3(9): 88-97.
[15] Gang Li,Huayang Zhou,Jin Mao,Sijing Chen. Classifying Social Media Users with Machine Learning[J]. 数据分析与知识发现, 2019, 3(8): 1-9.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn