Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (11): 33-39    DOI: 10.11925/infotech.1003-3513.2007.11.07
Current Issue | Archive | Adv Search |
Review and Prospect of Automatic Indexing Research
Zhang Chengzhi
1(Department of Information Management,Nanjing University ofScience & Technology,Nanjing 210094,China)
2(Institute of Scientific & Technical Information of China,Beijing 100038,China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

The review of the automatic indexing research is presented.Firstly,the indexing object in the automatic indexing is proposed.Then,three phases and the representative methods of the automatic indexing in the past 50 years are described respectively.The road map of automatic indexing research is explained in detail.The classification of the keyword extraction and keyword assignment methods is put forward respectively.Finally,the issues in the automatic indexing are summarized,and the future research topics and application related to the automatic indexing are discussed.

Key wordsAutomatic indexing      Keyword extraction      Keyword assignment     
Received: 13 September 2007      Published: 25 November 2007
: 

TP391 

 
     
  G252

 
Corresponding Authors: Zhang Chengzhi     E-mail: zcz51@citiz.net
About author:: Zhang Chengzhi

Cite this article:

Zhang Chengzhi. Review and Prospect of Automatic Indexing Research. New Technology of Library and Information Service, 2007, 2(11): 33-39.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.11.07     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I11/33

[1] 曾元显.关键词自动提取技术与相关词反馈[J].中国图书馆学报,1997(59):59-64.
[2] 王强军,李芸,张普.信息技术领域术语提取的初步研究[J].术语标准化与信息技术,2003(1):32-33,37.
[3] Xun E,Huang C,Zhou M.A Unified Statistical Model for the Identification of English baseNP[A].In:Proceedings of 4th ACM Conference on Digital Libraries[C].Beakeley,CA,USA,2000:254-255.
[4] 李素建,王厚峰,俞士汶,等.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197.
[5] 张燕飞.信息组织的主题语言[M].武汉:武汉大学出版社,2005:226.
[6] Allan J,Carbonell J,Doddington G,et al.Topic Detection and Tracking Pilot Study:Final Report[A].In:Proceedings of DARPA Broadcast News Transcription and Understanding Workshop[C].Lansdowne,Virginia,USA,1998:194-218.
[7] 侯汉清,马张华.主题法导论[M].北京:北京大学出版社,1991:1.
[8] 刘华.基于关键短语的文本内容标引研究[D].北京:北京语言大学,2005.
[9] 戚雨春,董达武,许以理,等.语言学百科词典[M].上海:上海辞书出版社,1993:97.
[10] Lahtinen T.Automatic Indexing:An Approach Using an Index Term Corpus and Combining Linguistic and Statistical Methods[R].Academic Dissertation,University of Helsinki,Finland,2000:34.
[11] Harter S P.Online Information Retrieval:Concepts,Principles and Techniques[M].Orlando,Florida:Academic Press,Inc.,1986:42.
[12] Luhn H P.A Statistical Approach to Mechanized Encoding and Searching of Literary Information[J].IBM Journal of Research and Development,1957,1(4):309-317
[13] Luhn H P.The Automatic Creation of Literature Abstracts[J].IBM Journal of Research and Development.1958,2(2):159-165.
[14] Baxendale P E.Machine-made Index for Technical Literature——an Experiment[J].IBM.Journal of Research and Development,1958,2(4):354-361.
[15] Edmundson H P,Oswald V A.Automatic Indexing and Abstracting of the Contents of Documents[R].Planning Research Corp,Document PRC R-126,ASTIA AD No.231606,Los Angeles,1959:1-142.
[16] Maron M E,Kuhns J L.On Relevance,Probabilistic Indexing and Information Retrieval[J].Journal of the Association for Computer Machinery,1960,7(3):216-244.
[17] Edmundson H P.New Methods in Automatic Abstracting Extracting[J].Journal of the Association for Computing Machinery,1969,16(2):264-285.
[18] Lois L E.Experiments in Automatic Indexing and Extracting[J].Information Storage and Retrieval,1970(6):313-334.
[19] Salton G,Yang C S.On the Specification of Term Values in Automatic Indexing[J].Journal of Documentation,1973,29(4):351-72.
[20] Salton G,Wong A,Yang C S.A Vector Space Model for Automatic Indexing[J].Communications of ACM,1975,18(11):613-620.
[21] Dillon M,Gray A S.FASIT:A Fully Automated Syntactically Based Indexing System[J].Journal of the American Society for Information Science,1983,34(2):99-108.
[22] Devadason F.Computerization of Deep Structure Based Indexes[J].International Classification,1985,12(2):87-94.
[23] Deerwester S,Dumais S T,Landauer T K,et al.Indexing by Latent Semantic Analysis[J].Journal of the American Society for Information Science,1990,41(6):391-407.
[24] Silva W T,MiliDiu R L.Belief Function Model for Information Retrieval[J].Jounral of the American Society for Information Science,1993,44(1):10-18.
[25] Cohen J D.Highlights:Language and Domain-independent Automatic Indexing Terms for Abstracting[J].Journal of the American Society for Information Science,1995,46(3):162-174.
[26] Chien L F.PAT-tree-based Keyword Extraction for Chinese Information Retrieval[A].In:Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR1997)[C].Philadelphia,PA,USA,1997:50-59.
[27] Frank E,Paynter G W,Witten I H.Domain-Specific Keyphrase Extraction[A].In:Proceedings of the 16th International Joint Conference on Aritifcal Intelliegence[C].Stockholm,Sweden,Morgan Kaufmann,1999:668-673.
[28] Turney P D.Learning to Extract Keyphrases from Text[R].NRC Technical Report ERB-1057,National Research Council,Canada.1999:1-43.
[29] Anjewierden A,Kabel S.Automatic Indexing of Documents with Ontologies[A].In:Proceedings of the 13th Belgian/Dutch Conference on Artificial Intelligence (BNAIC-01)[C].Amsterdam,Neteherlands,2001:23-30.
[30] Tomokiyo T,Hurst M.A language Model Approach to Keyphrase Extraction[A].In:Proceedings of the ACL Workshop on Multiword Expressions:Analysis,Acquisition & Treatment[C].Sapporo,Japan,2003:33-40.
[31] Hulth A.Improved Automatic Keyword Extraction Given More Linguistic Knowledge[A].In:Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing[C].Sapporo,Japan,2003:216-223.
[32] Zhang K,Xu H,Tang J,et al.Keyword Extraction Using Support Vector Machine[A].In:Proceedings of the Seventh International Conference on Web-Age Information Management (WAIM2006)[C].Hong Kong,China,2006:85-96.
[33] Ercan G,Cicekli I.Using Lexical Chains for Keyword Extraction[J].Information Processing and Management,2007,43(6):1705-1714.
[34] 韩客松,王永成.中文全文标引的主题词标引和主题概念标引方法[J].情报学报,2001,20(2):212-216.
[35] 索红光,刘玉树,曹淑英.一种基于词汇链的关键词抽取方法[J].中文信息学报,2006,20(6):25-30.
[36] Dennis S F.The Design and Testing of a Fully Automatic Indexing-searching System for Documents Consisting of Expository Text[M].In:G.Schecter eds.Information Retrieval:a Critical Review,Washington D.C.Thompson Book Company,1967:67-94.
[37] Salton G,Buckley C.Automatic Text Structuring and Retrieval -Experiments in Automatic Encyclopaedia Searching[A].In:Proceedings of the Fourteenth SIGIR Conference[C].New York:ACM,1991:21-30.
[38] Salton G,Yang C S,Yu C T.A Theory of Term Importance in Automatic Text Analysis[J].Journal of the American Society for Information Science,1975,26(1):33-44.
[39] 马颖华,王永成,苏贵洋,等.一种基于字同现频率的汉语文本主题抽取方法[J].计算机研究与发展,2004,40(6):874-878.
[40] Matsuo Y,Ishizuka M.Keyword Extraction from a Single Document Using Word Co-ocuurrence Statistical Information[J].International Journal on Artificial Intelligence Tools,2004,13(1):157-169.
[41] Witten I H,Paynter G W,Frank E,et al.KEA:Practical Automatic Keyphrase Extraction[A].In:Proceedings of the 4th ACM Conference on Digital Library (DL'99)[C].Berkeley,CA,SA,1999:254-26.
[42] 张庆国,薛德军,张振海,等.海量数据集上基于特征组合的关键词自动抽取[J].情报学报,2006,25(5):587-593.
[43] Keith Humphreys J B.Phraserate:An Html Keyphrase Extractor[R].Technical Report,University of California,Riverside,2002:1-16.
[44] 侯汉清,章成志,郑红.Web概念挖掘中标引源加权方案初探[J].情报学报,24(1):87-92.
[45] Boris L,Andreas H.Automatic Multi-lable Subject Indexing in a Multilingual Environment[A].In:Proceedings of 7th European Conference in Research and Advanced Technology for Digital Libraries (ECDL 2003)[C].Trondheim,Norway,2003:140-151.
[46] 苏新宁.信息检索理论与技术[M].北京:科学技术文献出版社,2004:215-217.
[47] 曾蕾.知识组织系统.见:曾民族主编.知识技术及其应用[M].北京:科学技术文献出版社,2006:122.
[48] 苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859.
[49] Yaakov H-K.Automatic Extraction of Keywords from Abstracts[A].In:Proceedings of the 7th Internationl Conference on Knowledge-Based Intelligent Information and Engineering Systems (KES2003)[C].Oxford,UK,2003:843-946.
[50] Leouski A V,Croft W B.An Evaluation of Techniques for Clustering Search Results [R].Technical Report IR-76,Department of Computer Science,University of Massachusetts,Amherst,1996:1-19.
[51] 章成志.主题聚类及其应用研究[D].南京:南京大学,2004.
[52] 储荷婷.索引自动化:自动标引的主要方法[J].情报学报,1993,12(3):218-229.
[53] Medelyna O.Automatic Keyphrase Indexing with a Domain-Specific Thesaurus[D].University of Freiburg,Germany,2005:23-26.

[1] Xiong Xin,Wang Hao,Zhang Haichao,Zhang Baolong. Impacts of Chinese Term Granularity on Measuring Term Discriminative Capacity[J]. 数据分析与知识发现, 2020, 4(2/3): 143-152.
[2] Liu Zhuchen,Chen Hao,Yu Yanhua,Li Jie. Extracting Keywords with TextRank and Weighted Word Positions[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
[3] Xia Tian. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[4] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[5] Gu Yijun, Xia Tian. Study on Keyword Extraction with LDA and TextRank Combination[J]. 现代图书情报技术, 2014, 30(7): 41-47.
[6] Xia Tian. Study on Keyword Extraction Using Word Position Weighted TextRank[J]. 现代图书情报技术, 2013, 29(9): 30-34.
[7] Yang He, Yang Yihong, Li Ning. Construction of Keywords-Chinese Library Classification Codes Integrated Thesaurus[J]. 现代图书情报技术, 2013, 29(7/8): 107-113.
[8] Ye Chunlei, Leng Fuhai. Study on the Keyword Extraction from Roadmap Based on the Lexical Chains[J]. 现代图书情报技术, 2013, 29(1): 50-56.
[9] Zhao Yan, Chen Heng. A Method to Improve Accuracy of Automatic Indexing for Chinese-English Mixed Text[J]. 现代图书情报技术, 2012, 28(6): 36-42.
[10] Yin Shumei,Zhang Zhixiong,Wu Zhenxin. A Method for Automatic Keyword Extraction and Filtration from Medical Texts[J]. 现代图书情报技术, 2008, 24(8): 31-36.
[11] Zhang Chengmin,Xu Xin,Zhang Chengzhi. Analysis of the Factors Affecting the Performance of CRF-based Keywords Extraction Model[J]. 现代图书情报技术, 2008, 24(6): 34-40.
[12] Wang Lancheng,Wang Lishuang. Research on a New Text Automatic Indexing Technology Based on Digital Library[J]. 现代图书情报技术, 2006, 1(2): 5-9.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn