Please wait a minute...
Advanced Search
现代图书情报技术  2009, Vol. 3 Issue (2): 1-8     https://doi.org/10.11925/infotech.1003-3513.2009.02.01
  22届机检会专题 本期目录 | 过刊浏览 | 高级检索 |
文本聚类结果描述研究综述*
章成志1,2
1(中国科学技术信息研究所  北京 100038)
2(南京理工大学信息管理系  南京 210094)
Survey on Document Clustering Description
Zhang Chengzhi1,2
1(Institute of Scientific and Technical Information of China, Beijing 100038, China)
2(Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094, China)
全文: PDF (720 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 

首先对文本聚类结果描述的研究背景和相关的研究情况进行说明,分析自动标引、自动文摘、概念聚类与文本聚类结果描述的关系,定位文本聚类结果描述的研究内容;然后根据文本聚类结果描述的具体要求,对该问题进行形式化;最后给出文本聚类结果描述的评价方法。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
关键词 文档聚类描述文本聚类文本挖掘    
Abstract

The research background and related research work about Document Clustering Description (DCD) are given in this paper. The relationship between DCD and automatic indexing, automatic summarization, conceptual clustering is explained and the research content of DCD is definited. According to its requirements, the tasks of DCD are formalized. The evaluation methods of DCD are also described in this paper.

Key wordsDocument clustering description    Document clustering    Document mining
收稿日期: 2008-11-18      出版日期: 2009-02-25
: 

TP391 

 
     
  G252

 
基金资助:

* 本文系中国博士后科学基金资助项目“多语领域本体学习关键技术研究”(项目编号:20080430463)、南京理工大学科研启动基金项目“主题聚类关键技术研究”(项目编号:AB41123)和“十一五”国家科技支撑计划重点项目“多语言信息服务环境关键技术研究”(项目编号:2006BAH03B02)的研究成果之一。

通讯作者: 章成志     E-mail: zhangchz@istic.ac.cn
作者简介: 章成志
引用本文:   
章成志. 文本聚类结果描述研究综述*[J]. 现代图书情报技术, 2009, 3(2): 1-8.
Zhang Chengzhi. Survey on Document Clustering Description. New Technology of Library and Information Service, 2009, 3(2): 1-8.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2009.02.01      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2009/V3/I2/1

[1] Popescul A, Ungar L. Automatic Labeling of Document Clusters.[EB/OL].[2007-01-10].http://www.cis.upenn.edu/~popescul/Publications/popescul00labeling.pdf
[2] Pucktada T, Jamie C. Automatically Labeling Hierarchical Clusters[C]. In:Proceedings of the 2006 International Conference on Digital government research, San Diego, CA, USA, 2006: 167-176. 
[3] Maqbool O, Babri H A. Interpreting Clustering Results through Cluster Labeling[C]. In:Proceedings of the IEEE International Conference on Emerging Technologies (ICET'05), Islamabad, Pakistan, 2005: 429-434.
[4] Stein B, Meyer zu Eissen S. Topic Identification: Framework and Application[C]. In:Proceedings of the 4th International Conference on Knowledge Management (I-KNOW 04), Graz, Austria, 2004: 353-360.
[5] Lawrie D, Croft W B, Rosenberg A L. Finding Topic Words for Hierarchical Summarization[C]. In:Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’01), New Orlean, LA, USA, 2001: 249-357.
[6] Muscat R. Automatic Document Clustering Using Topic Analysis[R]. Technical Report CSAI2005-01, Department of Computer Science & AI, University of Malta, 2005: 1-16.
[7] Li H, Shen D, Zhang B Y, et al. Adding Semantics to Email Clustering[C]. In:Proceedings of the IEEE 6th International Conference on Data Mining (ICDM 06). Hong Kong, China,2006: 18-22.
[8] Dawid W. Descriptive Clustering as a Method for Exploring Text Collections[D]. Poznan University of Technology, Poznań, Poland, 2006: 7-56.
[9] Tseng Y H, Lin C J, Chen H H, et al. Toward Generic Title Generation for Clustered Documents[C]. In:Proceedings of the 3rd Asia Information Retrieval Symposium (AIRS2006), Singapore, 2006: 145-157.
[10] Han J, Kamber M. Data Mining: Concepts and Techniques [M]. San Francisco: Morgan Kaufmann, 2001: 376-379.
[11] Glenisson P, Gl nzel W, Janssens F, et al. Combining Full Text and Bolometric Information in Mapping Scientific Disciplines[J]. Information Processing & Management, 2005, 41(6): 1548-1572.
[12] Lai K K, Wu S J. Using the Patent Co-citation Approach to Establish a New Patent Classification System [J]. Information Processing & Management, 2005, 41(2): 313-330.
[13] Cutting D R, Karger D R, Pedersen J O, et al. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections[C]. In:Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92), Copenhagen, Denmark, 1992: 318-329.
[14] Cutting D R, Karger D R, Pedersen J O. Constant Interaction-time Scatter/Gather Browsing of Large Document Collections[C]. In:Proceedings of the 16th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’93), Pittsburgh, PN, USA, 1993: 126-135.
[15] Muller A, Dorre J, Gerstl P, et al. The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection[C]. In:Proceedings of the 32nd Hawaii International Conference on System Sciences (HICSS1999), Maui, HI, USA, 1999: 2034-2042.
[16] Anton V L, Croft W B. An Evaluation of Techniques for Clustering Search Results[R]. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst, 1996: 1-19.
[17] Zamir O, Etzioni O. Web Document Clustering: A Feasibility Demonstration[C]. In:Proceedings of the 19th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), Melbourne, Australia, 1998: 46-54.
[18] Glover E, Pennock D M, Lawrence S, et al. Inferring Hierarchical Descriptions[C]. In:Proceedings of the 11th International Conference on Information and Knowledge Management (CKIM2002), McLean, VA, 2002: 4-9.
[19] Luhn H P. The Automatic Creation of Literature Abstract[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
[20] Michalski R S, Stepp R E.Learning from Observation: Conceptual Clustering [A]//Michalski R S, Carbonell J G, Mitchell T M eds. Machine Learning: An Artificial Intelligence Approach [C], San Mateo, CA: Morgan Kauffmann, 1983: 331-363.
[21] Michalski R S. Knowledge Acquisition through Conceptual Clustering: A Theoretical Framework and an Algorithm for Partitioning Data into Conjunctive Concepts [J]. Journal of Policy Analysis and Information Systems, 1980, 4(3): 219-244.
[22] Fisher D H. Knowledge Acquisition via Incremental Conceptual Clustering [J]. Machine Learning, 1987, 2: 139–172.
[23] Kolodner J L. Reconstructive Memory: A Computer Model [J]. Cognitive Science, 1983, 7, 281-328.
[24] Lebowitz M. Experiments with Incremental Concept Formation [J]. Machine Learning, 1987, 2: 103–138.
[25] Hanson S J, Bauer M. Conceptual Clustering, Categorization and Polymorphy [J]. Machine Learning, 1989, 3: 343–372.
[26] Thompson K, Langley P. Incremental Concept Formation with Composite Objects[C]. In:Proceedings of the 6th International Worksho Pon Machine Learning (ICML-89), Ithaca, NY, USA, 1989: 373–374.
[27] Carpineto C, Romano G G. An Order-theoretic Approach to Conceptual Clustering[C]. In:Proceedings of 10th International Conference on Machine Learning, Amherst (ICML-93), MA, USA, 1993: 33–40.
[28] Agrawal R, Gehrke J E, Gunopulos D, et al. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications[C]. In:Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD98), Seattle, WA, USA, 1998: 94–105.
[29] Biswas G, Weinberg J B, Fisher D H. Iterate: A Conceptual Clustering Algorithm for Data Mining [J]. IEEE Transactions on Systems, Man, and Cybernetics (Part C), 1998, 28(2): 100–111.
[30] Talavera L, Béjar J. Generality-based Conceptual Clustering with Probabilistic Concepts [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23: 196–206.
[31] Jonyer I, Cook D J, Holder L B. Graph-based Hierarchical Conceptual Clustering [J]. Journal of Machine Learning Research, 2001, 2: 19-43.
[32] Google 网页目录[EB/OL]. [ 2007-02-01]. http://www.google.com/dirhp?hl=zh-CN.
[33] Yahoo! Business_and_Economy[EB/OL].[ 2007-02-01].http://gb.chinese.yahoo.com/Business_and_Economy/.
[34] 工商经济.搜狐分类目录[EB/OL].[ 2007-02-01].http://www.sogou.com/c002/c002.html.
[35] CNKI主题数字图书馆[EB/OL]. [ 2007-02-01]. http://topic.cnki.net/search.aspx?class=a1.
[36] Gao B J, Ester M. Cluster Description Formats, Problems and Algorithms[C]. In:Proceedings of the Sixth SIAM International Conference on Data Mining (SDM06), Bethesda, MD, USA, 2006.
[37] 侯汉清, 马张华. 主题法导论[M]. 北京: 北京大学出版社, 1991: 16-18.
[38] 晏生宏, 黄莉. 英文易读度测量程序开发探索[J]. 重庆大学学报(社会科学版), 2005, 11(2): 92-97.
[39] 邵培仁. 传播学[M]. 北京: 高等教育出版社, 2000: 131-132.
[40] Kummamuru K, Lotlikar R, Roy S, et al. A Hierarchical Monothetic Document Clustering Algorithm for Summarization and Browsing Search Results[C]. In:Proceedings International WWW Conference (WWW2004), New York, NY, USA, 2004: 658-665.
[41] Yang Y M, Pedersen J. A Comparative Study on Feature Selection in Text Categorization[C]. In:Proceedings of the International Conference on Machine Leaning (ICML’97), Nashville, TN, USA 1997: 412-420.
[42] Ayad H, Kamel M. Topic Discovery from Text Using Aggregation of Different Clustering Methods[C]. In:Proceedings of the 15th Conference of the Canadian Society for Computational Studies of Intelligence on Advances in Artificial Intelligence, 2002: 161-175.
[43] 章成志. 主题聚类及其应用研究[D]. 南京:南京大学, 2007: 28-50.

[1] 黄名选,蒋曹清,卢守东. 基于词嵌入与扩展词交集的查询扩展*[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[2] 许光,任明,宋城宇. 西方媒体新闻中的中国经济形象提取*[J]. 数据分析与知识发现, 2021, 5(5): 30-40.
[3] 代冰,胡正银. 基于文献的知识发现新近研究综述 *[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[4] 余传明, 王曼怡, 林虹君, 朱星宇, 黄婷婷, 安璐. 基于深度学习的词汇表示模型对比研究*[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[5] 夏天. 面向中文学术文本的单文档关键短语抽取 *[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[6] 马建霞,袁慧,蒋翔. 基于Bi-LSTM+CRF的科学文献中生态治理技术相关命名实体抽取研究*[J]. 数据分析与知识发现, 2020, 4(2/3): 78-88.
[7] 杜建. 医学知识不确定性测度的进展与展望*[J]. 数据分析与知识发现, 2020, 4(10): 14-27.
[8] 关鹏,王曰芬. 国内外专利网络研究进展*[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[9] 赵华茗,余丽,周强. 基于均值漂移算法的文本聚类数目优化研究 *[J]. 数据分析与知识发现, 2019, 3(9): 27-35.
[10] 黄名选,卢守东,徐辉. 基于加权关联模式挖掘与规则后件扩展的跨语言信息检索 *[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[11] 杨亚楠,赵文辉,张健,谭珅,张贝贝. 基于多视图协同的政策文本可视化研究*[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[12] 张梦吉,杜婉钰,郑楠. 引入新闻短文本的个股走势预测模型[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[13] 陆泉,朱安琪,张霁月,陈静. 中文网络健康社区中的用户信息需求挖掘研究*——以求医网肿瘤板块数据为例[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[14] 张涛, 马海群. 一种基于LDA主题模型的政策文本聚类方法研究*[J]. 数据分析与知识发现, 2018, 2(9): 59-65.
[15] 张宁, 尹乐民, 何立峰. 网络股评“发布者-关注者”BSI与股票市场关联性研究*[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn