Please wait a minute...
New Technology of Library and Information Service  2007, Vol. 2 Issue (3): 43-45    DOI: 10.11925/infotech.1003-3513.2007.03.09
Current Issue | Archive | Adv Search |
A Text Categorization System with C#
Liu Hua
(College of Chinese Language and Culture/ Center for Overseas Huayu Research,Jinan University, Guangzhou 510610, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

Based on Vector Space Model(VSM) and Nave-Bayes(NB), completed a multilayer and multi-classification text categorization system. Introduce detailedly four modules: words’ segmentation and frequency statistics, calculating between classifications’ and document, emendating the veracity of parent-class by emendation of subclass, judging whether document has multi-classification and multi-label. Text representation based on Vector Space Model has 89.7% MicroF1 of parent- category, 77.8% of sub- category; text representation based on Nave-Bayes has 67.6% MicroF1 of parent- category, 66.5% of sub- category.

Key wordsText categorization      Vector space model      Na&ive-Bayes     
Received: 27 January 2007      Published: 25 March 2007
: 

TP93

 
Corresponding Authors: Liu Hua     E-mail: liuhua0461@sina.com
About author:: Liu Hua

Cite this article:

Liu Hua . A Text Categorization System with C#. New Technology of Library and Information Service, 2007, 2(3): 43-45.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2007.03.09     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2007/V2/I3/43

1Fabrizio Sebastiani. Machine Learning in Automated Text Categorization.ACM Computing Surveys,2002,34(1):1-47
2骆昌日,张新华,何婷婷,骆世广.基于DCM的中文文本分类.计算机工程与应用, 2006,42(34):157-159
3陈克利.基于大规模真实文本的平衡语料分析与文本分类方法.Advances in Computation of Oriental Languages.北京:清华大学出版社,2003. 540-545
4施彤年,卢忠良.多类多标签汉语文本自动分类的研究.情报学报,2003,22(3):306-309
5罗远胜,王明文,曾雪强.基于核方法的潜在语义文本分类模型.清华大学学报(自然科学版),2005,45(9):1853-1856

[1] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[2] Li Xiangdong,Gao Fan,Li Youhai. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[3] Feng Guoming,Zhang Xiaodong,Liu Suhui. Classifying Chinese Texts with CapsNet[J]. 数据分析与知识发现, 2018, 2(12): 68-76.
[4] Bai Rujiang,Leng Fuhai,Liao Junhua. An Improved Cosine Text Similarity Computing Method Based on Semantic Chunk Feature[J]. 数据分析与知识发现, 2017, 1(6): 56-64.
[5] Xu Dongdong, Wu Shaobo. An Improved TF-IDF Feature Selection Based on Categorical Description[J]. 现代图书情报技术, 2015, 31(3): 39-48.
[6] Tan Xueqing, Zhou Tong, Luo Lin. A Text Classification Algorithm Based on the Average Category Similarity[J]. 现代图书情报技术, 2014, 30(9): 66-73.
[7] Li Xiangdong, He Haihong, Cao Huan, Huang Li. An Algorithm of Digital Resources Text Categorization for Training Sets Skewed Distribution[J]. 现代图书情报技术, 2014, 30(7): 24-33.
[8] Li Xiangdong, Liao Xiangpeng, Huang Li. Research and Implementation of Bibliographic Information Classification System in LDA Model[J]. 现代图书情报技术, 2014, 30(5): 18-25.
[9] Lu Yonghe, Liang Minghui. Improvement of Text Feature Extraction with Genetic Algorithm[J]. 现代图书情报技术, 2014, 30(4): 48-57.
[10] Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. 现代图书情报技术, 2014, 30(3): 80-87.
[11] Hu Jiming, Xiao Lu. Semantic Incremental Improvement on Vector Space Model for Text Modeling[J]. 现代图书情报技术, 2014, 30(10): 49-55.
[12] Lu Yonghe, Li Yanfeng. A Feature Selection Based on Consideration of Multiple Factors[J]. 现代图书情报技术, 2013, (5): 34-39.
[13] Qu Peng, Wang Huilin. Fundamental Research Questions in Patent Text Categorization[J]. 现代图书情报技术, 2013, 29(3): 38-44.
[14] Xu Kun, Cao Jindan, Bi Qiang. A Study and Application on Medical Text Categorization Based on FCA[J]. 现代图书情报技术, 2012, 28(3): 23-26.
[15] Lu Yonghe, He Xinyu. An Application of Sharpen Gaussian Template in a Text Feature Weight Adjustment Methodology[J]. 现代图书情报技术, 2012, (12): 39-44.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn