Please wait a minute...
New Technology of Library and Information Service  2012, Vol. Issue (11): 40-46    DOI: 10.11925/infotech.1003-3513.2012.11.07
Current Issue | Archive | Adv Search |
Research of Mining the Category Knowledge Based on English-Chinese Humanities and Social Sciences Parallel Corpus in Phrase Level
Wang Dongbo1, Han Pu2, Shen Si2, Wei Xiangqing3
1. College of Information Science and Technology, Nanjing Agricultural University, Nanjing 210095, China;
2. School of Information Management, Nanjing University, Nanjing 210093, China;
3. Bilingual Dictionary Research Center, Nanjing University, Nanjing 210093, China
Download: PDF(769 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  The experiment of mining the category knowledge from English-Chinese humanities and social sciences parallel corpus in phrase level is performed based on the established clustering algorithm. The clustering and morphological conversion algorithms are determined by experimental data and specific research needs. The performance of English-Chinese bilingual word features is better than monolingual word by comparing the performance of the Chinese, English and English-Chinese word level knowledge clustering. The category knowledge is directly applied to knowledge base and machine translation system, and the English and Chinese word's expression is explored in mining the category knowledge.
Key wordsCSSCI      English-Chinese parallel corpus in phrase level      Bisecting K-means clustering algorithm      Category knowledge     
Received: 09 October 2012      Published: 06 February 2013
:  TP391  

Cite this article:

Wang Dongbo, Han Pu, Shen Si, Wei Xiangqing. Research of Mining the Category Knowledge Based on English-Chinese Humanities and Social Sciences Parallel Corpus in Phrase Level. New Technology of Library and Information Service, 2012, (11): 40-46.

URL:     OR

[1] Boley D, Gini M, Gross R, et al. Partitioning-based Clustering for Web Document Categorization[J]. Decision Support Systems, 1999, 27(3): 329-341.
[2] Mao J, Jain A K. A Self-organizing Network for Hyperellipsoidal Clustering[J]. IEEE Transactions on Neural Networks, 1996, 7(1):16-29.
[3] Cai W L, Chen S C, Zhang D Q. Fast and Robust Fuzzy C-means Clustering Algorithms Incorporating Local Information for Image Segmentation[J]. Pattern Recognition, 2007, 40(3): 825-838.
[4] 章成志, 王惠临.多语言文本聚类研究综述[J]. 现代图书情报技术, 2009(6): 31-36. (Zhang Chengzhi, Wang Huilin. Survey on Multilingual Documents Clustering[J]. New Technology of Library and Information Service, 2009(6): 31-36.)
[5] 章成志, 王惠临.基于专业领域平行语料的双语核心术语抽取研究[C]. 见: 中国计算机语言学研究前沿进展(2007-2009). 北京: 清华大学出版社, 2009: 358-363. (Zhang Chengzhi, Wang Huilin. Bilingual Core Terminology Extraction Research Based on the Parallel Corpus in Professional Fields[C]. In: Proceedings of Advances of Computational Linguistics in China (2007-2009). Beijing: Tsinghua University Press, 2009: 358-363.)
[6] Chen H H, Lin C J. A Multilingual News Summarizer[C]. In: Proceedings of the 18th International Conference on Computational Linguistics-Volume 1. Stroudsburg: Association for Computational Linguistics, 2000:159-165.
[7] Lawrence J L. Newsblaster Russian-English Clustering Performance Analysis[R]. Columbia Computer Science Technical Reports, 2003.
[8] Evans D K, Klavans J L, McKeown K R. Columbia Newsblaster: Multilingual News Summarization on the Web[C]. In: Proceedings of HLT-NAACL 2004. Stroudsburg: Association for Computational Linguistics, 2004:1-4.
[9] Mathieu B, Besancon R, Fluhr C. Multilingual Document Clusters Discovery[C]. In: Proceedings of RIAO 2004. 2004:116-125.
[10] Montalvo S, Martinez R, Casillas A, et al. Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities[C]. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2006: 1145-1152.
[11] Dumais S T, Letsche T A, Littman M L. Automatic Cross-language Information Retrieval Using Latent Semantic Indexing[C]. In: Proceedings of the AAAI Symposium on Cross-language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997:15-21.
[12] Wei C P, Yang C C, Lin C M. A Latent Semantic Indexing-based Approach to Multilingual Document Clustering[J]. Decision Support Systems, 2008, 45(3): 606-620.
[13] Montalvo S, Martinez R, Casillas A, et al. Bilingual News Clustering Using Named Entities and Fuzzy Similarity[C]. In: Proceedings of the 10th International Conference on Text, Speech and Dialogue (TSD'07). Berlin,Heidelberg: Springer-Verlag, 2007:107-114.
[14] Lloyd S. Least Squares Quantization in PCM[J]. IEEE Transactions on Information Theory, 1982, 28 (2): 129-137.
[15] Sneath P H, Sokal R R. Numerical Taxonomy: The Principles and Practice of Numerical Classification[M]. San Francisco: Freeman, 1973.
[16] Savaresi S M, Boley D L. On the Performance of Bisecting K-means and PDDP[C]. In: Proceedings of the 1st SIAM International Conference on Data Mining. 2001:1-14.
[17] Karypis Lab. CLUTO[EB/OL].[2012-09-30].
[18] 文本分类语料库(复旦)测试语料[EB/OL].[2012-08-21]. Test Corpus of Text Classification Corpus (Fudan)[EB/OL].[2012-08-21].
[19] ICTCLAS[EB/OL].[2012-08-21].
[20] Huang Z X. Extensions to the K-means Algorithm for Clustering Large Data Sets with Categorical Values[J]. Data Mining and Knowledge Discovery, 1998, 2(3): 283-304.
[21] The Porter Stemming Algorithm[EB/OL].[2012-07-21].
[22] The English (Porter2) Stemming Algorithm[EB/OL].[2012-08-11].
[23] European Languages Lemmatizer[EB/OL].[2012-08-15].
[24] Stemming and Lemmatization[EB/OL].[2012-09-15].
[25] 20 Newsgroups[EB/OL].[2012-09-10].
[26] 中国社会科学研究评价中心. 中文社会科学引文索引[EB/OL].[2012-09-28]. (Chinese Social Sciences Research Evaluation Center. Chinese Social Sciences Citation Index[EB/OL].[2012-09-28].
[1] Xie Jing, Su Xinning, Shen Si. Chinese Phrase Tagging and Automated Annotation Based on CSSCI Corpus[J]. 现代图书情报技术, 2012, (12): 32-38.
[2] Wang Hao, Su Xinning. Services Platform for Knowledge Retrieval Based on CSSCI_Onto[J]. 现代图书情报技术, 2011, 27(3): 22-29.
[3] Deng Sanhong, Wang Hao, Su Xinning. Association Analysis of Academic Periodicals Based on CSSCI_Onto[J]. 现代图书情报技术, 2011, 27(3): 30-37.
[4] Hu Yuanjiao, Wang Hao. Scholars Knowledge Map Construction and Analysis Based on CSSCI[J]. 现代图书情报技术, 2011, 27(3): 38-44.
[5] Wang Hao, Su Xinning. Subject Association Analysis Based on CSSCI_Onto[J]. 现代图书情报技术, 2010, 26(10): 10-16.
[6] Bai Yun,Su Xinning. Academic Influence Analysis on Characteristic of Published and Cited Papers in New Technology of Library and Information Service Based on CSSCI (2004-2006)[J]. 现代图书情报技术, 2008, 24(4): 95-102.
[7] Liang Yong,Zhang Chengzhi,Wang Hao. Construction Periodical K-Map Based on CSSCI[J]. 现代图书情报技术, 2008, 24(2): 59-63.
[8] in Ying,Deng Sanhong . Cited-Keywords Clustering of China Social Science Disciplines[J]. 现代图书情报技术, 2006, 1(9): 43-48.
[9] Zhu Chao,Su Xinning,Deng Sanhong. Service by CSSCI Short-Message Platform Based on ISAPI[J]. 现代图书情报技术, 2006, 22(1): 55-58.
[10] Wang Hao. Research on Relativity between Subjects Based  on Association Rule Mining[J]. 现代图书情报技术, 2005, 21(3): 23-28.
[11] Guo Lifang. Comparison and Analysis of Five Citation Index Systems[J]. 现代图书情报技术, 2005, 21(1): 36-39.
[12] Wang Xiaoxia. Designing Web Version of Chinese Social Science Citation Index(CSSCI)[J]. 现代图书情报技术, 2001, 17(3): 46-47.
[13] Cheng Gang. Analyses about the Cited Papers of New Technology of Library and Information Service[J]. 现代图书情报技术, 2001, 17(1): 33-36.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938