Please wait a minute...
New Technology of Library and Information Service  2009, Vol. 25 Issue (6): 31-36    DOI: 10.11925/infotech.1003-3513.2009.06.07
article Current Issue | Archive | Adv Search |
Survey on Multilingual Documents Clustering
Zhang Chengzhi1,2   Huilin Wang2
1(Institute of Scientific & Technical Information of China, Beijing 100038, China)
2(Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094, China)
Download: PDF (414 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

 This paper gives a survey on multilingual documents clustering. The potential application of multilingual documents clustering is introduced firstly. Then, the multilingual documents clustering methods are classified according to the resources. Finally, the authors describe the existing problems and the future trends of multilingual documents clustering.

Key words Multilingual documents clustering      Cross language documents clustering      Text mining      Multilingual information processing     
Received: 13 May 2009      Published: 25 June 2009
ZTFLH: 

TP391 

 
     
  G252

 
Corresponding Authors: Zhang Chengzhi     E-mail: zhangchz@istic.ac.cn
About author:: Zhang Chengzhi,Huilin Wang

Cite this article:

Zhang Chengzhi,Huilin Wang. Survey on Multilingual Documents Clustering. New Technology of Library and Information Service, 2009, 25(6): 31-36.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2009.06.07     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2009/V25/I6/31

[1] Top Ten Languages Used in the Web [EB/OL]. [2008- 12-10]. http://www.internetworldstats.com/stats7.htm.
[2] 多语并存[EB/OL]. [2009-02-20]. http://www.unesco.org/bpi/pdf/memobpi24_multilingualism_zh.pdf.
[3] Graddol D. The Future of Language [J]. Science, 2004, 303(5662): 1329-1331.
[4] Google News [EB/OL]. [2008-12-10]. http://news.google.com.
[5] Montalvo S, Martinez R, Casillas A,et al. Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities [C]. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, 2006: 1145-1152.
[6] Chen H H,Lin C J. A Multilingual News Summarizer [C]. In: Proceedings of the 18th International Conference on Computational Linguistics, 2000: 159-165.
[7] Braschler M, Ripplinger B, Schuble P. Experiments with the Eurospider Retrieval System for CLEF2001 [C]. In: Proceedings of the Second Workshop of the Cross-Language Evaluation Forum, 2001: 102-110.
[8] Lawrence J L. Newsblaster Russian-English Clustering Performance Analysis [R]. Columbia Computer Science Technical Reports, 2003.
[9] Steinberger R, Hagman J, Scheer S. Using Thesauri for Automatic Indexing and for the Visualization of Multilingual Document Collections [C]. In: Proceedings of the Workshop on Ontologies and Lexical Knowledge Bases, 2000: 130-141.
[10] Evans D K,Klavans J L. A Platform for Multilingual News Summarization [R]. Technical Report, Department of Computer Science, Columbia University, 2003.
[11] Mathieu B, Besancon R, Fluhr C. Multilingual Document Clusters Discovery [C]. In: Proceedings of RIAO2004, 2004: 1-10.
[12] Pouliquen B, Steinberger R, Ignat C,et al. Multilingual and Cross-Lingual News Topic Tracking [C]. In: Proceedings of the 20th International Conference on Computational Linguistics, 2004: 959-965.
[13] Wu K,Lu B L. Cross-Lingual Document Clustering [C]. In: Proceedings of the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Nanjing, China. 2007: 956-963.
[14] Pham M H, Bernhard D, Diallo G,et al. SOM-based Clustering of Multilingual Documents Using an Ontology [A].//Nigro H O, Cisaro S C, Xodo D (Eds.). Data Mining with Ontologies: Implementations, Findings and Frameworks [C]. IGI Global, 2007: 65-82.
[15] 孙广范, 宋金平, 袁琦,等. 中英可比语料库中翻译等价对抽取方法研究[J]. 计算机工程与应用, 2007, 43(320):44-46, 71.
[16] Lin C H,Chen H C. An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents [J]. IEEE Transactions on Systems, Man, and Cybernetics, 1996, 26(1): 75-88.
[17] Wei C H,Yang C C,Lin C M. A Latent Semantic Indexing-based Approach to Multilingual Document Clustering [J]. Decision Support Systems, 2008, 45(3):606-620.
[18] Montalvo S, Martinez R, Casillas A,et al. Bilingual News Clustering Using Named Entities and Fuzzy Similarity [C]. In: Proceedings of TSD 2007, 2007: 107-114.
[19] Montalvo S, Martinez R, Casillas A,et al. Multilingual News Clustering: Feature Translation vs. Identification of Cognate Named Entities [J]. Pattern Recognition Letter, 2007,28(16): 2305-2311.
[20] 杜慧平, 侯汉清. 网络环境中汉语叙词表的自动构建研究[J]. 情报学报, 2008, 27(6): 863-869.

[1] Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[2] Xia Tian. Extracting Key-phrases from Chinese Scholarly Papers[J]. 数据分析与知识发现, 2020, 4(7): 76-86.
[3] Peng Guan,Yuefen Wang. Advances in Patent Network[J]. 数据分析与知识发现, 2020, 4(1): 26-39.
[4] Mingxuan Huang,Shoudong Lu,Hui Xu. Cross-Language Information Retrieval Based on Weighted Association Patterns and Rule Consequent Expansion[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[5] Yanan Yang,Wenhui Zhao,Jian Zhang,Shen Tan,Beibei Zhang. Visualizing Policy Texts Based on Multi-View Collaboration[J]. 数据分析与知识发现, 2019, 3(6): 30-41.
[6] Mengji Zhang,Wanyu Du,Nan Zheng. Predicting Stock Trends Based on News Events[J]. 数据分析与知识发现, 2019, 3(5): 11-18.
[7] Zhang Ning,Yin Lemin,He Lifeng. Impacts of “Poster-Follower” Sentiment on Stock Market Performance[J]. 数据分析与知识发现, 2018, 2(6): 1-12.
[8] Fan Xinyue,Cui Lei. Using Text Mining to Discover Drug Side Effects: Case Study of PubMed[J]. 数据分析与知识发现, 2018, 2(3): 79-86.
[9] Wang Qiangbing,Zhang Chengzhi. Constructing Users Profiles with Content and Gesture Behaviors[J]. 数据分析与知识发现, 2017, 1(2): 80-86.
[10] Xie Xiufang,Zhang Xiaolin. Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy[J]. 数据分析与知识发现, 2017, 1(1): 16-25.
[11] Yao Zhaoxu,Ma Jing. Extracting Topic and Opinion from Microblog Posts with New Algorithm[J]. 现代图书情报技术, 2016, 32(7-8): 78-86.
[12] Lan Qiujun,Liu Wenxing,Li Weikang,Hu Xingye. Sentiment Analysis of Financial Forum Textual Message[J]. 现代图书情报技术, 2016, 32(4): 64-71.
[13] Qiang Bi, Jian Liu, Yulai Bao. A New Text Clustering Method Based on Semantic Similarity[J]. 数据分析与知识发现, 2016, 32(12): 9-16.
[14] Lin Yuanyuan,Zhan Hongfei,Yu Junhe,Li Changjiang,Zhang Fan. Using Product Reviews to Analyze Sentiment Fluctuation of Consumer[J]. 现代图书情报技术, 2016, 32(11): 44-53.
[15] Zhao Dongxiao,Wang Xiaoyue,Bai Rujiang,Liu Ziqiang. Semantic Text Mining Methodologies for Intelligence Analysis[J]. 现代图书情报技术, 2016, 32(10): 13-24.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn