Please wait a minute...
New Technology of Library and Information Service  2012, Vol. 28 Issue (6): 36-42    DOI: 10.11925/infotech.1003-3513.2012.06.06
Current Issue | Archive | Adv Search |
A Method to Improve Accuracy of Automatic Indexing for Chinese-English Mixed Text
Zhao Yan1, Chen Heng2,3
1. College of International Business, Shanghai International Studies University, Shanghai 200083, China;
2. Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;
3. Shanghai Information Center for Life Sciences, Chinese Academy of Sciences, Shanghai 200031, China
Download: PDF(692 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  This paper elaborates the inherent characteristics of the Chinese-English mixed text in the field of life sciences. On basis of the cybernetic principle, a new integrated indexing control method, which is conducted to improve the accuracy of automatic indexing for Chinese-English mixed text, is proposed in this paper. The method includes three relatively independent and interdependent parts, which are feed-forward control, in-progress control, and feed-back control. Subsequently, the three parts and their integrated application effectiveness for improving the indexing accuracy are introduced in detail. Experimental results show that the proposed new method is successfully applied in indexing of the literature and knowledge database for Hepatitis B subject and the information indexing accuracy is greatly improved.
Key wordsChinese-English mixed text      String matching      Accuracy of automatic indexing      Cybernetics      Literature and knowledge database for Hepatitis B subject     
Received: 26 April 2012      Published: 30 August 2012
: 

G254

 

Cite this article:

Zhao Yan, Chen Heng. A Method to Improve Accuracy of Automatic Indexing for Chinese-English Mixed Text. New Technology of Library and Information Service, 2012, 28(6): 36-42.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.06.06     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V28/I6/36

[1] 赵衍,张永娟,陈成材,等.一种提高计算机自动赋词标引准确性的综合方法——基于创新型CBA数据库的实证分析[J]. 情报杂志 ,2012,31(5):185-191.(Zhao Yan,Zhang Yongjuan,Chen Chengcai,et al.An Integrated Method for Improving the Accuracy of Computer Automatic Assignment Indexing—Empirical Analysis Based on the Innovative Chinese Biology Abstract[J].Journal of Intelligence,2012, 31(5):185-191.)

[2] 李纲,戴强斌.基于词汇链的关键词自动标引方法[J]. 图书情报知识 ,2011(3):67-71.(Li Gang,Dai Qiangbin.Keywords Automatic Indexing Based on Lexical Chains[J].Document,Information & Knowledge, 2011(3):67-71.)

[3] 黄昌宁.中文信息处理中的分词问题[J]. 语言文字应用 ,1997(1):72-78.(Huang Changning. Word Segmentation in Chinese Information Processing[J]. Applied Linguistics,1997(1):72-78.)

[4] 王兰波,张积友,范冰冰.一种国内信息导航系统中的中文信息自动标引方法的设计与实现[J]. 计算机应用与软件 ,2002,19(5):36-40.(Wang Lanbo, Zhang Jiyou, Fan Bingbing. Design and Implementation of a New Chinese Information Manipulation Method Used in the Internal Information Navigating System[J]. Computer Applications and Software, 2002,19(5):36-40.)

[5] 袁鼎荣,李新友,邵延振.用于中文分词的组合型歧义消解算法[J]. 计算机应用与软件 ,2011,28(6):57-58.(Yuan Dingrong,Li Xinyou,Shao Yanzhen.Combinatorial Word Senses Disambiguation Algorithm for Chinese Word Segmentation[J].Computer Applications and Software,2011,28(6):57-58.)

[6] 邱冰,皇甫娟.基于中文信息处理的古代汉语分词研究[J]. 微计算机信息 ,2008,24(8-3):100-102.(Qiu Bing, Huang Fujuan. Study on the Trend of Ancient Chinese Words Based on the Word Automatic Segmentation[J]. Microcomputer Information, 2008,24(8-3):100-102.)

[7] 梁南元.书面汉语自动分词系统——CDWS[J]. 中文信息学报 ,1987(2):44-52.(Liang Nanyuan. The Printed Chinese Distinguishing Word System – CDWS[J]. Journal of Chinese Information Processing, 1987(2):44-52.)

[8] Sun M S, Benjamin K T. Ambiguity Resolution in Chinese Word Segmentation[C].In: Proceedings of the 10th Asia Conference on Language Information and Computation,Hong Kong, China.1995:121-126.

[9] GB1220011-90,汉语信息处理词汇第01部分:基本术语[S].北京:中国标准出版社,1991.(GB1220011-90, Chinese Information Processing Vocabulary Part 1: Basic Terminology[S].Beijing: Standards Press of China,1991.)

[10] 翁宏伟.中文信息处理中歧义及歧义自动识别方法的比较[J]. 现代语文 ,2006(12):93-94.(Weng Hongwei. Ambiguity in Chinese Information Processing and Comparing on Ambiguity Automatic Identification Methods[J]. Modern Chinese,2006(12):93-94.)

[11] 谭璐,姜璐.系统科学导论[M].北京:北京师范大学出版社,2009:137-143.(Tan Lu,Jiang Lu. Introduction to System Science[M].Beijing: Beijing Normal University Press,2009:137-143.)

[12] 张永娟,张砷,陈成材,等.乙肝专题文献知识库的创新构建[J]. 图书馆学研究 ,2011(22):28-31.(Zhang Yongjuan,Zhang Shen,Chen Chengcai,et al.Innovative Construction of the Literature and Knowledge Database for Hepatitis B Subject[J].Research on Library Science,2011(22):28-31.)
[1] Wang Jingting. Research Towards Chinese String Similarity Based on the Clustering Feature of Chinese Characters[J]. 现代图书情报技术, 2011, 27(2): 48-53.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn