Please wait a minute...
New Technology of Library and Information Service  2005, Vol. 21 Issue (5): 46-49    DOI: 10.11925/infotech.1003-3513.2005.05.11
Current Issue | Archive | Adv Search |
Development of Text Automatic Categorization Measurement Research.
Tan Jinbo   Li Yi   Yang Xiaojiang
(Department of Educational Technology, Nanjing Normal University, Nanjing 210097, China)
Download: PDF (0 KB)  
Export: BibTeX | EndNote (RIS)      
Abstract  

Text categorization is the foundation and core of text-mining, which has been a research focus of data-mining and Internet-mining in recent years. This article introduces domestic and foreign research situation on text categorization from the view of the nature and quantity. It analyzes the important factors affecting text categorization, and hope to find the common problem by evaluating summary of text categorization system and arithmetic. The goal of the article is to provide theory and fact for the optimization and improvement of text automatic categorization.

Key wordsAutomatic categorization      Evaluate      Feature selection     
Received: 03 December 2004      Published: 25 May 2005
ZTFLH: 

G354.4

 
Corresponding Authors: Tan Jinbo     E-mail: yttjb@163.com
About author:: Tan Jinbo,Li Yi,Yang Xiaojiang

Cite this article:

Tan Jinbo,Li Yi,Yang Xiaojiang. Development of Text Automatic Categorization Measurement Research.. New Technology of Library and Information Service, 2005, 21(5): 46-49.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2005.05.11     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2005/V21/I5/46

1Text retrieval conference. http://trec.nist.gov (Accessed Sep. 20,2004)
2庞剑锋,卜东波,白硕.基于向量空间模型的文本自动分类系统的研究与实现.计算机应用研究,2001(9):23-26
3李小明.北大中文网页自动分类竞赛规则.2003(3)
4黄勇.一个基于具有自学习机制的概念网络的搜索引擎的研究与核心算法的实现.中南工业大学硕士论文,2001(5)
5Yang Y, Pedersen J O. A comparative study on feature selection in text categorization. 1997.http://citeseer.ist.psu.edu/yang97comparative.html. (Accessed Sep. 10,2004)
6程军.基于统计的文本分类技术研究.中国科学院博士论文,2003(5)
7陆玉昌,鲁明羽,李凡等.向量空间法中单词权重函数的分析和构造.计算机研究与发展,2002(10):1205-1210
8张东礼,汪东升,郑纬民.基于VSM 的中文文本分类系统的设计与实现.清华大学学报(自然科学版),2003(9):1288-1291
9黄萱菁,吴立德,石崎洋之等.独立于语种的文本分类方法.中文信息学报,2000(6):1-7
10Franca Debole,Fabrizio Sebastiani. Supervised Term Weighting for Automated Text Categorization. 2003.http://citeseer.ist.psu.edu/
Automated Text Categorization. 2003.http://citeseer.ist.psu.edu/572661.html (Accessed Sep. 10,2004)
11鲁松,李晓黎,白硕等.文档中词语权重计算方法的改进.中文信息学报,2000(6):8-20
12景丽萍,黄厚宽,石洪波.用于文本挖掘的特征选择方法TFIDF及其改进.广西师范大学学报(自然科学版),2003(3):142-145
13Yiming Yang, Xin Liu. A re-examination of text categorization methods. Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval,1999:42-49
14黄萱菁.大规模中文文本的检索、分类与摘要研究.复旦大学博士论文,1998(5)
15李蓉,叶世伟,史忠植.SVM-KNN分类器——一种提高SVM分类精度的新方法.电子学报,2002(5):745-748

[1] Cheng Zhou,Hongqin Wei. Evaluating and Classifying Patent Values Based on Self-Organizing Maps and Support Vector Machine[J]. 数据分析与知识发现, 2019, 3(5): 117-124.
[2] Jiaming Liang,Jie Zhao,Zhou Jianlong,Zhenning Dong. Detecting Collusive Fraudulent Online Transaction with Implicit User Behaviors[J]. 数据分析与知识发现, 2019, 3(5): 125-138.
[3] Tingxin Wen,Yangzi Li,Jingshuang Sun. News Hotspots Discovery Method Based on Multi Factor Feature Selection and AFOA/K-means[J]. 数据分析与知识发现, 2019, 3(4): 97-106.
[4] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[5] Wen Tingxin,Li Yangzi,Sun Jingshuang. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[6] Li Zhipeng,Li Weizhong. Feature Selection Based on Modified QPSO Algorithm[J]. 数据分析与知识发现, 2017, 1(7): 82-89.
[7] Zhang Yue,Wang Dongbo,Zhu Danhao. Segmenting Chinese Words from Food Safety Emergencies[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[8] Li Xiangdong,Ruan Tao,Liu Kang. Automatic Classification of Documents from Wikipedia[J]. 数据分析与知识发现, 2017, 1(10): 43-52.
[9] Lu Yonghe,Chen Jinghuang. Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm[J]. 数据分析与知识发现, 2017, 1(1): 91-101.
[10] Liu Hongguang,Ma Shuanggang,Liu Guifeng. Classifying Chinese News Texts with Denoising Auto Encoder[J]. 现代图书情报技术, 2016, 32(6): 12-19.
[11] Meng Yuan,Wang Hongwei. Evaluating Online Reviews Based on Text Content Features[J]. 现代图书情报技术, 2016, 32(4): 40-47.
[12] Li Xiangdong, Ba Zhichao, Huang Li. Allocation and Multi-granularity[J]. 现代图书情报技术, 2015, 31(5): 42-49.
[13] Xu Dongdong, Wu Shaobo. An Improved TF-IDF Feature Selection Based on Categorical Description[J]. 现代图书情报技术, 2015, 31(3): 39-48.
[14] Tan Xueqing, Zhou Tong, Luo Lin. A Text Classification Algorithm Based on the Average Category Similarity[J]. 现代图书情报技术, 2014, 30(9): 66-73.
[15] Gu Xiaoxue, Zhang Chengzhi. Using Content and Tags for Web Text Clustering[J]. 现代图书情报技术, 2014, 30(11): 45-52.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn