Research of Mining the Word Category Knowledge for Chinese Syntactic Function Distribution Knowledge Base
Wang Dongbo1, Zhu Danhao2
1. College of Information and Technology Science, Nanjing Agricultural University, Nanjing 210095, China; 2. International Institute for Software Technology, United Nations University, Macao 3058, China
Abstract:According to the Chinese word syntactic function distribution, the paper constructs syntactic function distribution knowledge in multi-way tree storage structure base based on Tsinghua treebank. The Chinese word category knowledge is mined by using the K-medoids clustering algorithm of Sparse Feature Clustering based on syntactic function distribution knowledge base.
王东波, 朱丹浩. 面向汉语句法功能分布知识库的词汇类别知识挖掘研究[J]. 现代图书情报技术, 2013, 29(3): 33-37.
Wang Dongbo, Zhu Danhao. Research of Mining the Word Category Knowledge for Chinese Syntactic Function Distribution Knowledge Base. New Technology of Library and Information Service, 2013, 29(3): 33-37.
[1] 陈小荷.从自动句法分析角度看汉语词类问题[J]. 语言教学与研究 ,1999(3):63-72.(Chen Xiaohe. Chinese Words’Classes from the Perspective of Automatic Syntactic Analysis[J].Language Teaching and Research, 1999(3):63-72.) [2] 徐艳华.现代汉语实词语法功能考察及词类体系重构[D].南京:南京师范大学,2006.(Xu Yanhua.Survey on Modern Chinese Notional Word Grammar Function and Reconstructing the POS System[D].Nanjing: Nanjing Normal University,2006.) [3] 陈锋,陈小荷.基于树库的现代汉语短语分布考察[J]. 语言科学 ,2008, 7(1):12-17.(Chen Feng,Chen Xiaohe.A Study on Grammartical Functions of Phrases in Mandarin Chinese Based on Chinese TreeBank[J].Linguistic Sciences,2008, 7(1):12-17.) [4] 卢俊之,陈小荷, 王东波, 等.基于语法功能匹配的汉语句法分析算法[J]. 计算机工程与应用 ,2008,44(16):151-153, 159.(Lu Junzhi,Chen Xiaohe, Wang Dongbo, et al.Chinese Parsing Algorithm Based on Grammar Function Match[J].Computer Engineering and Applications,2008,44(16):151-153,159.) [5] 崔尚卿, 马秀莉, 唐世渭,等.基于不均匀密度的自动聚类算法[J]. 计算机工程 ,2008, 34(23):86-88.(Cui Shangqing, Ma Xiuli, Tang Shiwei, et al.Auto-clustering Algorithm Based on Non-uniform Density[J].Computer Engineering,2008, 34(23):86-88.) [6] 王伟.文本自动聚类技术研究[J]. 情报杂志 ,2009, 28(2):94-96.(Wang Wei.Research on Text Automatic Clustering[J].Journal of Intelligence,2009,28(2):94-96.) [7] 王舵, 郄君, 张娟, 等.一种快速词自动聚类算法[J]. 计算机应用与软件 ,2010, 27(8):277-278.(Wang Duo, Qie Jun, Zhang Juan, et al.A New Algorithm of Words Automatic Clustering[J].Computer Applications and Software,2010, 27(8):277-278.) [8] 潘章明.半监督的自动聚类[J]. 计算机应用 ,2010, 30(10):2614-2617.(Pan Zhangming.Semi-supervised Automatic Clustering[J].Journal of Computer Applications, 2010, 30(10):2614-2617.) [9] 于洪, 储双双.一种基于决策粗糙集的自动聚类方法[J]. 计算机科学 ,2011, 38(1):221-224.(Yu Hong, Chu Shuangshuang.Novel Autonomous Clustering Method Based on Decision-theoretic Rough Set[J].Computer Science,2011, 38(1):221-224.) [10] Boley D, Gini M, Gross R, et al. Partitioning-based Clustering for Web Document Categorization[J]. Decision Support Systems, 1999, 27(3):329-341. [11] Mao J, Jain A K. A Self-organizing Network for Hyperellipsoidal Clustering [J]. IEEE Transactions on Neural Networks, 1996, 7(1):16-29. [12] Cai W, Chen S, Zhang D. Fast and Robust Fuzzy C-means Clustering Algorithms Incorporating Local Information for Image Segmentation[J]. Pattern Recognition, 2007, 40(3):825-838. [13] Chen H H, Lin C J. A Multilingual News Summarizer[C]. In: Proceedings of the 18th International Conference on Computational Linguistics. Stroudsburg: Association for Computational Linguistics, 2000:159-165. [14] Leftin L J.Newsblaster Russian-English Clustering Performance Analysis[R].Columbia Computer Science Technical Reports, 2003. [15] Evans D K,Klavans J L,McKeown K R.Columbia Newsblaster: Multilingual News Summarization on the Web Demonstration[C].In: Proceedings of HLT-NAACL 2004. Stroudsburg: Association for Computational Linguistics, 2004:1-4. [16] Mathieu B, Besancon R, Fluhr C. Multilingual Document Clusters Discovery[C]. In: Proceedings of RIAO 2004. 2004:116-125. [17] 周强, 张伟, 俞士汶.汉语树库的构建[J]. 中文信息学报 ,1997(4):42-51. (Zhou Qiang,Zhang Wei,Yu Shiwen.Building a Chinese Treebank[J].Journal of Chinese Information Processing,1997(4): 42-51.) [18] Dhillon I S, Mallela S, Kumar R.A Divisive Information Theoretic Feature Clustering Algorithm for Text Classification[J].The Journal of Machine Learning Research,2003,3(1):1265-1287. [19] Marcus M P,Marcinkiewicz M A,Santorini B.Building a Large Annotated Corpus of English: The Penn Treebank[J].Computational Linguistics,1993,19(2):313-330.