Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 25 Issue (5): 22-27    DOI: 10.11925/infotech.1003-3513.2009.05.05
Current Issue | Archive | Adv Search |
A Feature Representation Method of Scientific Data Based on Complex Text Description
Sun Wei
(National Science Library, Chinese Academy of Sciences, Beijing 100190,China)
(Graduate University of Chinese Academy of Sciences, Beijing 100049,China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

Feature representation is one of the key issues in data clustering. Currently, feature representation of scientific data is deficient and influences the effect of data clustering.The paper proposes the concept of complex text description and a feature representation method based on it. The method uses different feature weighting computations to represent candidate features from two kinds of data sources respectively, and strengthenes the feature set by merging the two feature sets. Experiments show that the method is much better than kinds of traditional feature representation methods and it can improve the performance of data clustering markedly.

Key wordsComplex Text Description      Scientific Data      Feature Representation      Weighting Computation     
Received: 09 December 2008      Published: 25 May 2009
: 

TP391

 
Corresponding Authors: Sun Wei     E-mail: sunwei@mail.las.ac.cn
About author:: Sun Wei

Cite this article:

Sun Wei. A Feature Representation Method of Scientific Data Based on Complex Text Description. New Technology of Library and Information Service, 2009, 25(5): 22-27.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.05.05     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V25/I5/22

[1] 焦李成,刘芳,缑水平,等. 智能数据挖掘与知识发现[M]. 西安:西安电子科技大学出版社,2006:16.
[2] 邓绪斌.面向复杂数据源的数据的抽取模型和算法研究[D]. 上海:复旦大学,2005.
[3] Masys D R, Welsh J B, Lynn Fink J,et al. Use of Keyword Hierarchies to Interpret Gene Expression Patterns[J]. Bioinformatics,2001,17(4):319-326.
[4] Liu Y, Brandon M, Navathe S,et al. Text Mining Functional Keywords Associated with Genes[J]. Stud Health Technol Inform,2004,107(Pt 1):292-296.
[5] 李欣宇,傅彦. 一种适合于科学数据的聚类算法[J]. 成都信息工程学院学报,2006,21(3):327-330.
[6] 孙志茹,韩涛,杨文.生物信息学科学数据与科学文献的关联关系分析[J].图书情报工作,2008,52(2):88-91.
[7] Liu Y, Ciliax B J, Borges K,et al. Comparison of Two Schemes for Automatic Keyword Extraction from MEDLINE for Functional Gene Clustering[C]. In:Proc. IEEE Comput. Syst. Bioinform Conf., 2004:394-404.
[8] Liu Y, Navathe S B, Civera J, et al. Text Mining Biomedical Literature for Discovering Gene-to-Gene Relationships: A Comparative Study of Algorithms[J]. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2005,2(1):62-76.
[9] National Center for Biotechnology Information. Etrez, the Life Sciences Search Engine[EB/OL]. [2008-09-28]. http://www.ncbi.nlm.nih.gov/.
[10] King Yee.生物医学词汇[EB/OL].[2008-02-21]. http://www.medscape.com.cn/download/downloadManager/detail.jsp?id=43.
[11] The U.S. Department of Energy (DOE). Glossary of Bioinformatics Terms[R/OL].[2008-02-21]. http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/genejargon.shtml#sequence.
[12] 基因专业词汇[EB/OL]. [2008-02-21]. http://down.foodmate.net/ziliao/sort/14/7038.html.
[13] 刘海峰,王元元,张学仁.文本分类中一种改进的特征选择方法[J].情报科学,2007,25(10):1534-1537.

[1] Liu Feng, Zhang Xiaolin. Review on the Scientific Metadata Standards and Research on Its Generic Design[J]. 现代图书情报技术, 2015, 31(12): 3-12.
[2] Wang Hui, Michael Witt, Dou Tianfang. Purdue University Research Repository and Scientific Data Management Services Based on PURR[J]. 现代图书情报技术, 2015, 31(1): 9-16.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn