Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 3 Issue (2): 1-8    DOI: 10.11925/infotech.1003-3513.2009.02.01
article Current Issue | Archive | Adv Search |
Survey on Document Clustering Description
Zhang Chengzhi1,2
1(Institute of Scientific and Technical Information of China, Beijing 100038, China)
2(Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094, China)
Download: PDF (720 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

The research background and related research work about Document Clustering Description (DCD) are given in this paper. The relationship between DCD and automatic indexing, automatic summarization, conceptual clustering is explained and the research content of DCD is definited. According to its requirements, the tasks of DCD are formalized. The evaluation methods of DCD are also described in this paper.

Key wordsDocument clustering description      Document clustering      Document mining     
Received: 18 November 2008      Published: 25 February 2009
ZTFLH: 

TP391 

 
     
  G252

 
Corresponding Authors: Zhang Chengzhi     E-mail: zhangchz@istic.ac.cn
About author:: Zhang Chengzhi

Cite this article:

Zhang Chengzhi. Survey on Document Clustering Description. New Technology of Library and Information Service, 2009, 3(2): 1-8.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.02.01     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V3/I2/1

[1] Popescul A, Ungar L. Automatic Labeling of Document Clusters.[EB/OL].[2007-01-10].http://www.cis.upenn.edu/~popescul/Publications/popescul00labeling.pdf
[2] Pucktada T, Jamie C. Automatically Labeling Hierarchical Clusters[C]. In:Proceedings of the 2006 International Conference on Digital government research, San Diego, CA, USA, 2006: 167-176. 
[3] Maqbool O, Babri H A. Interpreting Clustering Results through Cluster Labeling[C]. In:Proceedings of the IEEE International Conference on Emerging Technologies (ICET'05), Islamabad, Pakistan, 2005: 429-434.
[4] Stein B, Meyer zu Eissen S. Topic Identification: Framework and Application[C]. In:Proceedings of the 4th International Conference on Knowledge Management (I-KNOW 04), Graz, Austria, 2004: 353-360.
[5] Lawrie D, Croft W B, Rosenberg A L. Finding Topic Words for Hierarchical Summarization[C]. In:Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01), New Orlean, LA, USA, 2001: 249-357.
[6] Muscat R. Automatic Document Clustering Using Topic Analysis[R]. Technical Report CSAI2005-01, Department of Computer Science & AI, University of Malta, 2005: 1-16.
[7] Li H, Shen D, Zhang B Y, et al. Adding Semantics to Email Clustering[C]. In:Proceedings of the IEEE 6th International Conference on Data Mining (ICDM 06). Hong Kong, China,2006: 18-22.
[8] Dawid W. Descriptive Clustering as a Method for Exploring Text Collections[D]. Poznan University of Technology, Poznań, Poland, 2006: 7-56.
[9] Tseng Y H, Lin C J, Chen H H, et al. Toward Generic Title Generation for Clustered Documents[C]. In:Proceedings of the 3rd Asia Information Retrieval Symposium (AIRS2006), Singapore, 2006: 145-157.
[10] Han J, Kamber M. Data Mining: Concepts and Techniques [M]. San Francisco: Morgan Kaufmann, 2001: 376-379.
[11] Glenisson P, Gl nzel W, Janssens F, et al. Combining Full Text and Bolometric Information in Mapping Scientific Disciplines[J]. Information Processing & Management, 2005, 41(6): 1548-1572.
[12] Lai K K, Wu S J. Using the Patent Co-citation Approach to Establish a New Patent Classification System [J]. Information Processing & Management, 2005, 41(2): 313-330.
[13] Cutting D R, Karger D R, Pedersen J O, et al. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections[C]. In:Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92), Copenhagen, Denmark, 1992: 318-329.
[14] Cutting D R, Karger D R, Pedersen J O. Constant Interaction-time Scatter/Gather Browsing of Large Document Collections[C]. In:Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’93), Pittsburgh, PN, USA, 1993: 126-135.
[15] Muller A, Dorre J, Gerstl P, et al. The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection[C]. In:Proceedings of the 32nd Hawaii International Conference on System Sciences (HICSS1999), Maui, HI, USA, 1999: 2034-2042.
[16] Anton V L, Croft W B. An Evaluation of Techniques for Clustering Search Results[R]. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst, 1996: 1-19.
[17] Zamir O, Etzioni O. Web Document Clustering: A Feasibility Demonstration[C]. In:Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), Melbourne, Australia, 1998: 46-54.
[18] Glover E, Pennock D M, Lawrence S, et al. Inferring Hierarchical Descriptions[C]. In:Proceedings of the 11th International Conference on Information and Knowledge Management (CKIM2002), McLean, VA, 2002: 4-9.
[19] Luhn H P. The Automatic Creation of Literature Abstract[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
[20] Michalski R S, Stepp R E.Learning from Observation: Conceptual Clustering [A]//Michalski R S, Carbonell J G, Mitchell T M eds. Machine Learning: An Artificial Intelligence Approach [C], San Mateo, CA: Morgan Kauffmann, 1983: 331-363.
[21] Michalski R S. Knowledge Acquisition through Conceptual Clustering: A Theoretical Framework and an Algorithm for Partitioning Data into Conjunctive Concepts [J]. Journal of Policy Analysis and Information Systems, 1980, 4(3): 219-244.
[22] Fisher D H. Knowledge Acquisition via Incremental Conceptual Clustering [J]. Machine Learning, 1987, 2: 139–172.
[23] Kolodner J L. Reconstructive Memory: A Computer Model [J]. Cognitive Science, 1983, 7, 281-328.
[24] Lebowitz M. Experiments with Incremental Concept Formation [J]. Machine Learning, 1987, 2: 103–138.
[25] Hanson S J, Bauer M. Conceptual Clustering, Categorization and Polymorphy [J]. Machine Learning, 1989, 3: 343–372.
[26] Thompson K, Langley P. Incremental Concept Formation with Composite Objects[C]. In:Proceedings of the 6th International Worksho Pon Machine Learning (ICML-89), Ithaca, NY, USA, 1989: 373–374.
[27] Carpineto C, Romano G G. An Order-theoretic Approach to Conceptual Clustering[C]. In:Proceedings of 10th International Conference on Machine Learning, Amherst (ICML-93), MA, USA, 1993: 33–40.
[28] Agrawal R, Gehrke J E, Gunopulos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications[C]. In:Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD98), Seattle, WA, USA, 1998: 94–105.
[29] Biswas G, Weinberg J B, Fisher D H. Iterate: A Conceptual Clustering Algorithm for Data Mining [J]. IEEE Transactions on Systems, Man, and Cybernetics (Part C), 1998, 28(2): 100–111.
[30] Talavera L, Béjar J. Generality-based Conceptual Clustering with Probabilistic Concepts [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23: 196–206.
[31] Jonyer I, Cook D J, Holder L B. Graph-based Hierarchical Conceptual Clustering [J]. Journal of Machine Learning Research, 2001, 2: 19-43.
[32] Google 网页目录[EB/OL]. [ 2007-02-01]. http://www.google.com/dirhp?hl=zh-CN.
[33] Yahoo! Business_and_Economy[EB/OL].[ 2007-02-01].http://gb.chinese.yahoo.com/Business_and_Economy/.
[34] 工商经济.搜狐分类目录[EB/OL].[ 2007-02-01].http://www.sogou.com/c002/c002.html.
[35] CNKI主题数字图书馆[EB/OL]. [ 2007-02-01]. http://topic.cnki.net/search.aspx?class=a1.
[36] Gao B J, Ester M. Cluster Description Formats, Problems and Algorithms[C]. In:Proceedings of the Sixth SIAM International Conference on Data Mining (SDM06), Bethesda, MD, USA, 2006.
[37] 侯汉清, 马张华. 主题法导论[M]. 北京: 北京大学出版社, 1991: 16-18.
[38] 晏生宏, 黄莉. 英文易读度测量程序开发探索[J]. 重庆大学学报(社会科学版), 2005, 11(2): 92-97.
[39] 邵培仁. 传播学[M]. 北京: 高等教育出版社, 2000: 131-132.
[40] Kummamuru K, Lotlikar R, Roy S, et al. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results[C]. In:Proceedings International WWW Conference (WWW2004), New York, NY, USA, 2004: 658-665.
[41] Yang Y M, Pedersen J. A Comparative Study on Feature Selection in Text Categorization[C]. In:Proceedings of the International Conference on Machine Leaning (ICML’97), Nashville, TN, USA 1997: 412-420.
[42] Ayad H, Kamel M. Topic Discovery from Text Using Aggregation of Different Clustering Methods[C]. In:Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence, 2002: 161-175.
[43] 章成志. 主题聚类及其应用研究[D]. 南京:南京大学, 2007: 28-50.

[1] Wang Wei,Xu Xin. Online Public Opinion Hotspot Detection and Analysis Based on Document Clustering[J]. 现代图书情报技术, 2009, 3(3): 74-79.
[2] Cen Yonghua,Wang Xiaorong,Ji Yonghui. Algorithm and Experiment Research of Textual Document Clustering Based on Improved K-means[J]. 现代图书情报技术, 2008, 24(12): 73-79.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn