Please wait a minute...
New Technology of Library and Information Service  2008, Vol. 24 Issue (8): 31-36    DOI: 10.11925/infotech.1003-3513.2008.08.05
Current Issue | Archive | Adv Search |
A Method for Automatic Keyword Extraction and Filtration from Medical Texts
Yin Shumei1  Zhang Zhixiong2   Wu Zhenxin2
1 (Peking University Health Science Library, Beijing 100083,China) 
2 (National Science Library, Chinese Academy of Sciences, Beijing 100190,China)
Download: PDF(525 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

Seeing that the keyword or key phrase can represent the feature of text, keyword extraction and filtration has great significance for information retrieval, information extraction and knowledge discovery. This paper first investigates current keyword extraction methods. Then it uses existing thesaurus and tools in the medical field and BM25F model in proposing a method for keyword extraction and filtration from medical texts. The proposed method mainly solves two key problems:identification and extraction of keywords, evaluation of keyword value and filtration of keywords. This paper applies the method on documents in the field of osteoarthritis from the year 2001 to 2007, and verifies its effectiveness, which offers an effective way for extracting keywords in knowledge discovery.

Key wordsKeyword extraction      Keyword filtration      BM25F      MMTx      Text mining      Medical data mining     
Received: 16 June 2008      Published: 25 August 2008
: 

G250.73

 
Corresponding Authors: Yin Shumei     E-mail: Yinshumei@lib.bjmu.edu.cn
About author:: Yin Shumei,Zhang Zhixiong,Wu Zhenxin

Cite this article:

Yin Shumei,Zhang Zhixiong,Wu Zhenxin. A Method for Automatic Keyword Extraction and Filtration from Medical Texts. New Technology of Library and Information Service, 2008, 24(8): 31-36.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2008.08.05     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2008/V24/I8/31

[1] 刘华. 基于文本分类中特征提取的领域词语聚类[J]. 语言文字应用,2007(1):139-144.
[2] Blank G D,Pottenger W M, Kessler C D. CIMEL:Constructive and Collaborative, Inquiry-based Multimedia E-Learning[EB/OL]. [2007-08-01].   http://dimacs.rutgers.edu/~billp/pubs/ITICSE01.pdf.
[3] Porter A L,Detampel M J. Technology Opportunities Analysis[J]. Technological Forecasting and Social Change, 1995,49:237-255.
[4] Essential Science Indicators[EB/OL]. [2007-08-01]. http://www.esi-topics.com/RFmethodology.html.
[5] Swan R, Jensen D. TimeMines:Constructing Timelines with Statistical Models of Word Usage[EB/OL]. [2007-08-01].  http://www.cs.cmu.edu/~dunja/KDDpapers/Swan_TM.pdf.
[6] Lowe HJ, Barnett GO. Remote Access MicroMeSH:A Microcomputer System for Searching MEDLINE[C].In: The Proceedings Annual Symposium on Computer Application in Medical Care, 1988:535-539.
[7] Miller RA, Gieszczykiewicz FM, Vries JK, et al. CHARTLINE:Providing Bibliographic References Relevant to Patient Charts Using the UMLS Metathesaurus Knowledge Sources[C].In:the Proceedings Annual Symposium on Computer Application in Medical Care. 1992:86-90.
[8] Evans DA, Hersh WR, Monarch IA, et al. Automatic Indexing of Abstracts via Natural-language Processing Using a Simple Thesaurus[J]. Medical Decision Making, 1991,11(4):S108-S115.
[9] Gordon M, Holt DG, Panigrahi A, et al. Genome-wide Dynamics of SAPHIRE, an Essential Complex for Gene Activation and Chromatin Boundaries[J]. Molecular and Cellular Biology, 2007,27(11):4058-69.
[10] MMTx[EB/OL]. [2007-08-01].  http://mmtx.nlm.nih.gov/.
[11] Aronson A R. MetaMap Variant Generation[EB/OL]. [2007-08-01]. http://skr.nlm.nih.gov/papers/references/mm.variants.pdf.
[12] Robertson S E, Walker S. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval[EB/OL]. [2007-08-01]. http://www.computing.dcu.ie/~gjones/Teaching/CA437/p232.pdf.
[13] Robertson S E,  Walker S, Jones K S, et al. Okapi at TREC-3[C]. In:Proceedings of 3rd Text Retrieval Conference (TREC-3), 1995, 109-126.
[14] 陆伟. 基于域加权词频法的XML文档级检索实现与评价[J]. 中国图书馆学报, 2006(6):57-60.
[15] de Mattei M, Pellati A, Pasello M, et al. High Doses of Glucosamine-HCl have Detrimental Effects on Bovine Articular Cartilage Explants Cultured in Vitro[J]. Osteoarthritis and Cartilage. 2002,10(10):816-25.

[1] Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang. Visualizing Policy Texts Based on Multi-View Collaboration[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[2] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[3] Zhuchen Liu,Hao Chen,Yanhua Yu,Jie Li. Extracting Keywords with TextRank and Weighted Word Positions[J]. 数据分析与知识发现, 2018, 2(9): 74-79.
[4] Ning Zhang,Lemin Yin,Lifeng He. Impacts of “Poster-Follower” Sentiment on Stock Market Performance[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[5] Xinyue Fan,Lei Cui. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[6] Tian Xia. Extracting Keywords with Modified TextRank Model[J]. 数据分析与知识发现, 2017, 1(2): 28-34.
[7] Qiangbing Wang,Chengzhi Zhang. Constructing Users Profiles with Content and Gesture Behaviors[J]. 数据分析与知识发现, 2017, 1(2): 80-86.
[8] Xiufang Xie,Xiaolin Zhang. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[9] Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[10] Ning Jianfei,Liu Jiangzhen. Using Word2vec with TextRank to Extract Keywords[J]. 现代图书情报技术, 2016, 32(6): 20-27.
[11] Lan Qiujun,Liu Wenxing,Li Weikang,Hu Xingye. Sentiment Analysis of Financial Forum Textual Message[J]. 现代图书情报技术, 2016, 32(4): 64-71.
[12] Qiang Bi, Jian Liu, Yulai Bao. A New Text Clustering Method Based on Semantic Similarity[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[13] Lin Yuanyuan,Zhan Hongfei,Yu Junhe,Li Changjiang,Zhang Fan. Using Product Reviews to Analyze Sentiment Fluctuation of Consumer[J]. 现代图书情报技术, 2016, 32(11): 44-53.
[14] Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis[J]. 现代图书情报技术, 2016, 32(10): 13-24.
[15] Sui Mingshuang,Cui Lei. Extracting Chemical and Disease Named Entities with Multiple-Feature CRF Model[J]. 现代图书情报技术, 2016, 32(10): 91-97.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn