Please wait a minute...
Advanced Search
现代图书情报技术  2007, Vol. 2 Issue (11): 33-39     https://doi.org/10.11925/infotech.1003-3513.2007.11.07
  21届机检会论文选登 本期目录 | 过刊浏览 | 高级检索 |
自动标引研究的回顾与展望*
章成志
1(南京理工大学信息管理系 南京 210094)
2(中国科学技术信息研究所 北京 100038)
Review and Prospect of Automatic Indexing Research
Zhang Chengzhi
1(Department of Information Management,Nanjing University ofScience & Technology,Nanjing 210094,China)
2(Institute of Scientific & Technical Information of China,Beijing 100038,China)
全文: PDF (604 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 

对自动标引的研究进行总结与回顾。对标引对象进行界定;分析自动标引研究的3个阶段,并列出50年研究历程中的代表性方法;详细描述自动标引研究路线图、并对抽词标引与赋词标引方法进行详细分类;最后指出自动标引中存在的问题,并对今后的自动标引研究和应用方向进行展望。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
章成志
关键词 自动标引抽词标引赋词标引    
Abstract

The review of the automatic indexing research is presented.Firstly,the indexing object in the automatic indexing is proposed.Then,three phases and the representative methods of the automatic indexing in the past 50 years are described respectively.The road map of automatic indexing research is explained in detail.The classification of the keyword extraction and keyword assignment methods is put forward respectively.Finally,the issues in the automatic indexing are summarized,and the future research topics and application related to the automatic indexing are discussed.

Key wordsAutomatic indexing    Keyword extraction    Keyword assignment
收稿日期: 2007-09-13      出版日期: 2007-11-25
ZTFLH: 

TP391 

 
     
  G252

 
基金资助:

* 本文系2006年江苏省研究生培养创新工程项目“主题聚类及其应用研究”和“十一五”国家科技支撑计划重点项目(项目编号:2006BAH03B00)子课题“科技热点动态监测技术研究与应用”的研究成果之一。

通讯作者: 章成志     E-mail: zcz51@citiz.net
作者简介: 章成志
引用本文:   
章成志. 自动标引研究的回顾与展望*[J]. 现代图书情报技术, 2007, 2(11): 33-39.
Zhang Chengzhi. Review and Prospect of Automatic Indexing Research. New Technology of Library and Information Service, 2007, 2(11): 33-39.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2007.11.07      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2007/V2/I11/33

[1] 曾元显.关键词自动提取技术与相关词反馈[J].中国图书馆学报,1997(59):59-64.
[2] 王强军,李芸,张普.信息技术领域术语提取的初步研究[J].术语标准化与信息技术,2003(1):32-33,37.
[3] Xun E,Huang C,Zhou M.A Unified Statistical Model for the Identification of English baseNP[A].In:Proceedings of 4th ACM Conference on Digital Libraries[C].Beakeley,CA,USA,2000:254-255.
[4] 李素建,王厚峰,俞士汶,等.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197.
[5] 张燕飞.信息组织的主题语言[M].武汉:武汉大学出版社,2005:226.
[6] Allan J,Carbonell J,Doddington G,et al.Topic Detection and Tracking Pilot Study:Final Report[A].In:Proceedings of DARPA Broadcast News Transcription and Understanding Workshop[C].Lansdowne,Virginia,USA,1998:194-218.
[7] 侯汉清,马张华.主题法导论[M].北京:北京大学出版社,1991:1.
[8] 刘华.基于关键短语的文本内容标引研究[D].北京:北京语言大学,2005.
[9] 戚雨春,董达武,许以理,等.语言学百科词典[M].上海:上海辞书出版社,1993:97.
[10] Lahtinen T.Automatic Indexing:An Approach Using an Index Term Corpus and Combining Linguistic and Statistical Methods[R].Academic Dissertation,University of Helsinki,Finland,2000:34.
[11] Harter S P.Online Information Retrieval:Concepts,Principles and Techniques[M].Orlando,Florida:Academic Press,Inc.,1986:42.
[12] Luhn H P.A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J].IBM Journal of Research and Development,1957,1(4):309-317
[13] Luhn H P.The Automatic Creation of Literature Abstracts[J].IBM Journal of Research and Development.1958,2(2):159-165.
[14] Baxendale P E.Machine-made Index for Technical Literature——an Experiment[J].IBM.Journal of Research and Development,1958,2(4):354-361.
[15] Edmundson H P,Oswald V A.Automatic Indexing and Abstracting of the Contents of Documents[R].Planning Research Corp,Document PRC R-126,ASTIA AD No.231606,Los Angeles,1959:1-142.
[16] Maron M E,Kuhns J L.On Relevance,Probabilistic Indexing and Information Retrieval[J].Journal of the Association for Computer Machinery,1960,7(3):216-244.
[17] Edmundson H P.New Methods in Automatic Abstracting Extracting[J].Journal of the Association for Computing Machinery,1969,16(2):264-285.
[18] Lois L E.Experiments in Automatic Indexing and Extracting[J].Information Storage and Retrieval,1970(6):313-334.
[19] Salton G,Yang C S.On the Specification of Term Values in Automatic Indexing[J].Journal of Documentation,1973,29(4):351-72.
[20] Salton G,Wong A,Yang C S.A Vector Space Model for Automatic Indexing[J].Communications of ACM,1975,18(11):613-620.
[21] Dillon M,Gray A S.FASIT:A Fully Automated Syntactically Based Indexing System[J].Journal of the American Society for Information Science,1983,34(2):99-108.
[22] Devadason F.Computerization of Deep Structure Based Indexes[J].International Classification,1985,12(2):87-94.
[23] Deerwester S,Dumais S T,Landauer T K,et al.Indexing by Latent Semantic Analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[24] Silva W T,MiliDiu R L.Belief Function Model for Information Retrieval[J].Jounral of the American Society for Information Science,1993,44(1):10-18.
[25] Cohen J D.Highlights:Language and Domain-independent Automatic Indexing Terms for Abstracting[J].Journal of the American Society for Information Science,1995,46(3):162-174.
[26] Chien L F.PAT-tree-based Keyword Extraction for Chinese Information Retrieval[A].In:Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR1997)[C].Philadelphia,PA,USA,1997:50-59.
[27] Frank E,Paynter G W,Witten I H.Domain-Specific Keyphrase Extraction[A].In:Proceedings of the 16th International Joint Conference on Aritifcal Intelliegence[C].Stockholm,Sweden,Morgan Kaufmann,1999:668-673.
[28] Turney P D.Learning to Extract Keyphrases from Text[R].NRC Technical Report ERB-1057,National Research Council,Canada.1999:1-43.
[29] Anjewierden A,Kabel S.Automatic Indexing of Documents with Ontologies[A].In:Proceedings of the 13th Belgian/Dutch Conference on Artificial Intelligence (BNAIC-01)[C].Amsterdam,Neteherlands,2001:23-30.
[30] Tomokiyo T,Hurst M.A language Model Approach to Keyphrase Extraction[A].In:Proceedings of the ACL Workshop on Multiword Expressions:Analysis,Acquisition & Treatment[C].Sapporo,Japan,2003:33-40.
[31] Hulth A.Improved Automatic Keyword Extraction Given More Linguistic Knowledge[A].In:Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing[C].Sapporo,Japan,2003:216-223.
[32] Zhang K,Xu H,Tang J,et al.Keyword Extraction Using Support Vector Machine[A].In:Proceedings of the Seventh International Conference on Web-Age Information Management (WAIM2006)[C].Hong Kong,China,2006:85-96.
[33] Ercan G,Cicekli I.Using Lexical Chains for Keyword Extraction[J].Information Processing and Management,2007,43(6):1705-1714.
[34] 韩客松,王永成.中文全文标引的主题词标引和主题概念标引方法[J].情报学报,2001,20(2):212-216.
[35] 索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报,2006,20(6):25-30.
[36] Dennis S F.The Design and Testing of a Fully Automatic Indexing-searching System for Documents Consisting of Expository Text[M].In:G.Schecter eds.Information Retrieval:a Critical Review,Washington D.C.Thompson Book Company,1967:67-94.
[37] Salton G,Buckley C.Automatic Text Structuring and Retrieval -Experiments in Automatic Encyclopaedia Searching[A].In:Proceedings of the Fourteenth SIGIR Conference[C].New York:ACM,1991:21-30.
[38] Salton G,Yang C S,Yu C T.A Theory of Term Importance in Automatic Text Analysis[J].Journal of the American Society for Information Science,1975,26(1):33-44.
[39] 马颖华,王永成,苏贵洋,等.一种基于字同现频率的汉语文本主题抽取方法[J].计算机研究与发展,2004,40(6):874-878.
[40] Matsuo Y,Ishizuka M.Keyword Extraction from a Single Document Using Word Co-ocuurrence Statistical Information[J].International Journal on Artificial Intelligence Tools,2004,13(1):157-169.
[41] Witten I H,Paynter G W,Frank E,et al.KEA:Practical Automatic Keyphrase Extraction[A].In:Proceedings of the 4th ACM Conference on Digital Library (DL'99)[C].Berkeley,CA,SA,1999:254-26.
[42] 张庆国,薛德军,张振海,等.海量数据集上基于特征组合的关键词自动抽取[J].情报学报,2006,25(5):587-593.
[43] Keith Humphreys J B.Phraserate:An Html Keyphrase Extractor[R].Technical Report,University of California,Riverside,2002:1-16.
[44] 侯汉清,章成志,郑红.Web概念挖掘中标引源加权方案初探[J].情报学报,24(1):87-92.
[45] Boris L,Andreas H.Automatic Multi-lable Subject Indexing in a Multilingual Environment[A].In:Proceedings of 7th European Conference in Research and Advanced Technology for Digital Libraries (ECDL 2003)[C].Trondheim,Norway,2003:140-151.
[46] 苏新宁.信息检索理论与技术[M].北京:科学技术文献出版社,2004:215-217.
[47] 曾蕾.知识组织系统.见:曾民族主编.知识技术及其应用[M].北京:科学技术文献出版社,2006:122.
[48] 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.
[49] Yaakov H-K.Automatic Extraction of Keywords from Abstracts[A].In:Proceedings of the 7th Internationl Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES2003)[C].Oxford,UK,2003:843-946.
[50] Leouski A V,Croft W B.An Evaluation of Techniques for Clustering Search Results [R].Technical Report IR-76,Department of Computer Science,University of Massachusetts,Amherst,1996:1-19.
[51] 章成志.主题聚类及其应用研究[D].南京:南京大学,2004.
[52] 储荷婷.索引自动化:自动标引的主要方法[J].情报学报,1993,12(3):218-229.
[53] Medelyna O.Automatic Keyphrase Indexing with a Domain-Specific Thesaurus[D].University of Freiburg,Germany,2005:23-26.

[1] 熊欣,王昊,张海潮,张宝隆. 中文术语粒度对其区分能力测度的影响分析*[J]. 数据分析与知识发现, 2020, 4(2/3): 143-152.
[2] 杨贺, 杨奕虹, 李宁. 关键词-分类号关联词表构建[J]. 现代图书情报技术, 2013, 29(7/8): 107-113.
[3] 赵衍, 陈恒. 一种提高中英文混编文本标引准确性的方法[J]. 现代图书情报技术, 2012, 28(6): 36-42.
[4] 沈静,周金治,马建国. 基于UCL的网页信息自动标引技术研究*[J]. 现代图书情报技术, 2008, 24(8): 58-62.
[5] 章成敏,许鑫,章成志. 条件随机场标引模型的性能影响因素分析[J]. 现代图书情报技术, 2008, 24(6): 34-40.
[6] 蔡代纯 . 基于统计模型的逐步求精标引策略[J]. 现代图书情报技术, 2006, 1(6): 39-42.
[7] 王兰成,王立双. 一种基于数字图书馆的文本信息标引技术的改进研究*[J]. 现代图书情报技术, 2006, 1(2): 5-9.
[8] 许剑颖. 统计分析法自动标引的改进研究[J]. 现代图书情报技术, 2004, 20(2): 92-95.
[9] 苏新宁,邹晓明 . 文献信息自动标引研究[J]. 现代图书情报技术, 2000, 16(1): 23-26.
[10] 吴家云. 新闻数据库自动标引与自由标引的比较实验[J]. 现代图书情报技术, 1999, 15(4): 15-17.
[11] 冯项云. LSI潜在语义标引方法在情报检索中的应用[J]. 现代图书情报技术, 1998, 14(4): 19-21.
[12] 刘滨,王源,秦聿昌,吴蔚,王华霞. 微机辅助文献标引系统的设计与研究 3. 自动标引研究[J]. 现代图书情报技术, 1997, 13(5): 42-46.
[13] 王淼. 单汉字标引技术的改进研究[J]. 现代图书情报技术, 1997, 13(2): 48-53.
[14] 朱爱群. 自动标引和自动文摘对机器翻译的影响[J]. 现代图书情报技术, 1997, 13(1): 47-50.
[15] 方懿. 两种自动标引法的比较及改造[J]. 现代图书情报技术, 1996, 12(2): 20-26.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn