Please wait a minute...
Advanced Search
现代图书情报技术  2012, Vol. 28 Issue (6): 36-42    DOI: 10.11925/infotech.1003-3513.2012.06.06
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
一种提高中英文混编文本标引准确性的方法
赵衍1, 陈恒2,3
1. 上海外国语大学国际工商管理学院 上海 200083;
2. 中国科学院上海生命科学研究院 上海 200031;
3. 中国科学院上海生命科学信息中心 上海 200031
A Method to Improve Accuracy of Automatic Indexing for Chinese-English Mixed Text
Zhao Yan1, Chen Heng2,3
1. College of International Business, Shanghai International Studies University, Shanghai 200083, China;
2. Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;
3. Shanghai Information Center for Life Sciences, Chinese Academy of Sciences, Shanghai 200031, China
全文: PDF(692 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 分析生命科学领域中英文混编文本的内在特点,基于控制论原理,提出一种旨在提高中英文混编文本的信息自动标引准确性的整合新方法。该方法包含三个相对独立而又相互联系的部分,即前馈控制、中期控制和反馈控制。实验表明,该新方法可以被成功应用在乙肝专题文献知识数据库的标引中,并能大幅度提高信息标引的准确性。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈恒
赵衍
关键词 中英文混编文本字符串匹配自动标引准确性控制论乙肝专题文献知识数据库    
Abstract:This paper elaborates the inherent characteristics of the Chinese-English mixed text in the field of life sciences. On basis of the cybernetic principle, a new integrated indexing control method, which is conducted to improve the accuracy of automatic indexing for Chinese-English mixed text, is proposed in this paper. The method includes three relatively independent and interdependent parts, which are feed-forward control, in-progress control, and feed-back control. Subsequently, the three parts and their integrated application effectiveness for improving the indexing accuracy are introduced in detail. Experimental results show that the proposed new method is successfully applied in indexing of the literature and knowledge database for Hepatitis B subject and the information indexing accuracy is greatly improved.
Key wordsChinese-English mixed text    String matching    Accuracy of automatic indexing    Cybernetics    Literature and knowledge database for Hepatitis B subject
收稿日期: 2012-04-26     
: 

G254

 
基金资助:

本文系上海外国语大学规划基金项目(2011年度)“多语言环境下的Web数据挖掘技术”(项目编号:2011114061)、“上海外国语大学创新科研团队”(2011年度)、上海市浦江人才计划项目(2009年度)“重大传染性疾病—乙型肝炎专题知识库的构建与信息挖掘整合技术的应用”和中国科学院“小百人”人才计划择优支持项目(2010年度)“重大传染性疾病—艾滋病和病毒性肝炎知识挖掘型专题文献数据库的设计与构建”的研究成果之一。

引用本文:   
赵衍, 陈恒. 一种提高中英文混编文本标引准确性的方法[J]. 现代图书情报技术, 2012, 28(6): 36-42.
Zhao Yan, Chen Heng. A Method to Improve Accuracy of Automatic Indexing for Chinese-English Mixed Text. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2012.06.06.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2012.06.06
[1] 赵衍,张永娟,陈成材,等.一种提高计算机自动赋词标引准确性的综合方法——基于创新型CBA数据库的实证分析[J]. 情报杂志 ,2012,31(5):185-191.(Zhao Yan,Zhang Yongjuan,Chen Chengcai,et al.An Integrated Method for Improving the Accuracy of Computer Automatic Assignment Indexing—Empirical Analysis Based on the Innovative Chinese Biology Abstract[J].Journal of Intelligence,2012, 31(5):185-191.)

[2] 李纲,戴强斌.基于词汇链的关键词自动标引方法[J]. 图书情报知识 ,2011(3):67-71.(Li Gang,Dai Qiangbin.Keywords Automatic Indexing Based on Lexical Chains[J].Document,Information & Knowledge, 2011(3):67-71.)

[3] 黄昌宁.中文信息处理中的分词问题[J]. 语言文字应用 ,1997(1):72-78.(Huang Changning. Word Segmentation in Chinese Information Processing[J]. Applied Linguistics,1997(1):72-78.)

[4] 王兰波,张积友,范冰冰.一种国内信息导航系统中的中文信息自动标引方法的设计与实现[J]. 计算机应用与软件 ,2002,19(5):36-40.(Wang Lanbo, Zhang Jiyou, Fan Bingbing. Design and Implementation of a New Chinese Information Manipulation Method Used in the Internal Information Navigating System[J]. Computer Applications and Software, 2002,19(5):36-40.)

[5] 袁鼎荣,李新友,邵延振.用于中文分词的组合型歧义消解算法[J]. 计算机应用与软件 ,2011,28(6):57-58.(Yuan Dingrong,Li Xinyou,Shao Yanzhen.Combinatorial Word Senses Disambiguation Algorithm for Chinese Word Segmentation[J].Computer Applications and Software,2011,28(6):57-58.)

[6] 邱冰,皇甫娟.基于中文信息处理的古代汉语分词研究[J]. 微计算机信息 ,2008,24(8-3):100-102.(Qiu Bing, Huang Fujuan. Study on the Trend of Ancient Chinese Words Based on the Word Automatic Segmentation[J]. Microcomputer Information, 2008,24(8-3):100-102.)

[7] 梁南元.书面汉语自动分词系统——CDWS[J]. 中文信息学报 ,1987(2):44-52.(Liang Nanyuan. The Printed Chinese Distinguishing Word System – CDWS[J]. Journal of Chinese Information Processing, 1987(2):44-52.)

[8] Sun M S, Benjamin K T. Ambiguity Resolution in Chinese Word Segmentation[C].In: Proceedings of the 10th Asia Conference on Language Information and Computation,Hong Kong, China.1995:121-126.

[9] GB1220011-90,汉语信息处理词汇第01部分:基本术语[S].北京:中国标准出版社,1991.(GB1220011-90, Chinese Information Processing Vocabulary Part 1: Basic Terminology[S].Beijing: Standards Press of China,1991.)

[10] 翁宏伟.中文信息处理中歧义及歧义自动识别方法的比较[J]. 现代语文 ,2006(12):93-94.(Weng Hongwei. Ambiguity in Chinese Information Processing and Comparing on Ambiguity Automatic Identification Methods[J]. Modern Chinese,2006(12):93-94.)

[11] 谭璐,姜璐.系统科学导论[M].北京:北京师范大学出版社,2009:137-143.(Tan Lu,Jiang Lu. Introduction to System Science[M].Beijing: Beijing Normal University Press,2009:137-143.)

[12] 张永娟,张砷,陈成材,等.乙肝专题文献知识库的创新构建[J]. 图书馆学研究 ,2011(22):28-31.(Zhang Yongjuan,Zhang Shen,Chen Chengcai,et al.Innovative Construction of the Literature and Knowledge Database for Hepatitis B Subject[J].Research on Library Science,2011(22):28-31.)
[1] 王静婷. 基于汉字聚类特征的中文字符串相似度计算研究[J]. 现代图书情报技术, 2011, 27(2): 48-53.
[2] 孙海霞,成颖. 信息集成中的字符串匹配技术研究*[J]. 现代图书情报技术, 2007, 2(7): 22-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn