Please wait a minute...
New Technology of Library and Information Service  2012, Vol. Issue (12): 39-44    DOI: 10.11925/infotech.1003-3513.2012.12.08
Current Issue | Archive | Adv Search |
An Application of Sharpen Gaussian Template in a Text Feature Weight Adjustment Methodology
Lu Yonghe, He Xinyu
School of Information Management, Sun Yat-Sen University, Guangzhou 510006, China
Download: PDF(735 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  This paper introduces Gaussian Template and Sharpen Gaussian Template in computer image processing technology and summarizes main ideas of text feature weight adjustment,then proposes a text feature weight adjustment methodology based on Sharpen Gaussian Template. With corpus of Sogou Lab Data, KNN classifier and Class-center classifier, this methodology is experimented by Macro-averaging F-measures. The experimental result shows that the KNN classifier with this methodology performs better than the traditional method. However,Class-center classifier with this methodology has no significant improvement.
Key wordsText categorization      Sharpen Gaussian template      Vector space model      Text feature     
Received: 03 November 2012      Published: 12 March 2013
:  TP391  

Cite this article:

Lu Yonghe, He Xinyu. An Application of Sharpen Gaussian Template in a Text Feature Weight Adjustment Methodology. New Technology of Library and Information Service, 2012, (12): 39-44.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2012.12.08     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2012/V/I12/39

[1] How B C, Narayanan K. An Empirical Study of Feature Selection for Text Categorization Based on Term Weightage[C]. In: Proceedings of the 2004 IEEE /WIC/ACM International Conference on Web Intelligence (WI’04). Washington, DC: IEEE Computer Society, 2004:599-602.
[2] Deng Z H, Tang S W, Yang D Q, et al. A Comparative Study on Feature Weight in Text Categorization[C]. In: Proceedings of the 6th Asia-Pacific Web Conference (APWeb 2004), Hangzhou, China. Springer, 2004:588-597.
[3] 张保富,施化吉,马素琴.基于TF-IDF文本特征加权方法的改进研究[J]. 计算机应用与软件, 2011, 28(2):17-20.( Zhang Baofu, Shi Huaji, Ma Suqin. An Improved Text Feature Weighting Algorithm Based on TFIDF[J].Computer Applications and Software, 2011, 28(2):17-20.)
[4] 李原.中文文本分类中分词和特征选择方法研究[D]. 长春: 吉林大学, 2011. (Li Yuan. Research on Word Segmentation and Feature Selection of Chinese Text Classification [D]. Changchun: Jilin University, 2011.)
[5] 张瑜, 张德贤.一种改进的特征权重算法[J]. 计算机工程, 2011, 37(5): 210-212. (Zhang Yu, Zhang Dexian. Improved Feature Weight Algorithm[J]. Computer Engineering, 2011, 37(5): 210-212.)
[6] 罗欣, 夏德麟, 晏蒲柳.基于词频差异的特征选取及改进的TF-IDF公式[J]. 计算机应用, 2005, 25(9):2031-2033. (Luo Xin, Xia Delin, Yan Puliu. Improved Feature Selection Method and TF-IDF Formula Based on Word Frequency Differentia[J].Journal of Computer Applications, 2005, 25(9):2031-2033.)
[7] 吕佳.文本分类中基于方差的改进特征提取算法[J]. 计算机工程与设计, 2007, 28(24):6039-6041. (Lv Jia. Improved Feature Selection Algorithm Based on Variance in Text Categorization[J]. Computer Engineering and Design, 2007, 28(24):6039-6041.)
[8] 苏力华,朱章华,白文华. 基于向量空间模型的文本分类特征权重算法研究[J]. 电脑知识与技术, 2010, 6(33):9327-9329. (Su Lihua, Zhu Zhanghua, Bai Wenhua. Term Weighting Algorithm in Text Categorization Based on VSM[J]. Computer Knowledge and Technology, 2010, 6(33):9327-9329.)
[9] 石美红,毛江辉,梁颖,等. 一种强高斯噪声的图像滤波方法[J]. 计算机应用, 2007, 27(7): 1637-1640. (Shi Meihong, Mao Jianghui, Liang Ying, et al. Method for Filtering Image Contaminated with Strong Gaussian Noises[J]. Journal of Computer Applications, 2007, 27(7): 1637-1640.)
[10] 田原嫄.图像平滑算子对边缘检测精度的影响[J]. 计算机工程与应用, 2009, 45(32):161-202. (Tian Yuanyuan. Precision of Edge Detection Affected by Smoothing Operator of Image[J]. Computer Engineering and Applications, 2009, 45(32):161-202.)
[11] 图像锐化算法C + +实现[EB/OL]. [2012-11-25]. http://blog.csdn.net/hhygcy/article/details/4330939. (An Image Sharpening Algorithm Based on C + +[EB/OL]. [2012-11-25]. http://blog.csdn.net/hhygcy/article/details/4330939.)
[12] 张爱华,靖红芳,王斌,等.文本分类中特征权重因子的作用研究[J]. 中文信息学报, 2010, 24(3):97-104. (Zhang Aihua, Jing Hongfang, Wang Bin, et al. Research on Effects of Term Weighting Factors for Text Categorization[J]. Journal of Chinese Information Processing, 2010, 24(3):97-104.)
[13] 搜狗. 文本分类语料库[EB/OL]. [2012-11-25].http://www.sogou.com/labs/dl/c.html. (Sogou Lab. Text Classification Corpus [EB/OL]. [2012-11-25]. http://www.sogou.com/labs/dl/c.html.)
[14] Turtle H R, Croft W B. A Comparison of Text Retrieval Models[J]. The Computer Journal, 1992, 35(3):279-290.
[1] Zhanglu Tan,Zhaogang Wang,Han Hu. Study on a Method of Feature Classification Selection Based on χ2 Statistics[J]. 数据分析与知识发现, 2019, 3(2): 72-78.
[2] Xiangdong Li,Fan Gao,Youhai Li. Categorizing Documents Automatically within Common Semantic Space[J]. 数据分析与知识发现, 2018, 2(9): 66-73.
[3] Tingxin Wen,Yangzi Li,Jingshuang Sun. Extracting Text Features with Improved Fruit Fly Optimization Algorithm[J]. 数据分析与知识发现, 2018, 2(5): 59-69.
[4] Guoming Feng,Xiaodong Zhang,Suhui Liu. Classifying Chinese Texts with CapsNet[J]. 数据分析与知识发现, 2018, 2(12): 68-76.
[5] Rujiang Bai,Fuhai Leng,Junhua Liao. An Improved Cosine Text Similarity Computing Method Based on Semantic Chunk Feature[J]. 数据分析与知识发现, 2017, 1(6): 56-64.
[6] Xu Dongdong, Wu Shaobo. An Improved TF-IDF Feature Selection Based on Categorical Description[J]. 现代图书情报技术, 2015, 31(3): 39-48.
[7] Tan Xueqing, Zhou Tong, Luo Lin. A Text Classification Algorithm Based on the Average Category Similarity[J]. 现代图书情报技术, 2014, 30(9): 66-73.
[8] Li Xiangdong, He Haihong, Cao Huan, Huang Li. An Algorithm of Digital Resources Text Categorization for Training Sets Skewed Distribution[J]. 现代图书情报技术, 2014, 30(7): 24-33.
[9] Tang Xiaobo, Fang Xiaoke. The Effect of the Quality of Textual Features on Retrieval in Micro-blog[J]. 现代图书情报技术, 2014, 30(6): 79-86.
[10] Li Xiangdong, Liao Xiangpeng, Huang Li. Research and Implementation of Bibliographic Information Classification System in LDA Model[J]. 现代图书情报技术, 2014, 30(5): 18-25.
[11] Lu Yonghe, Liang Minghui. Improvement of Text Feature Extraction with Genetic Algorithm[J]. 现代图书情报技术, 2014, 30(4): 48-57.
[12] Wang Hao, Ye Peng, Deng Sanhong. The Application of Machine-Learning in the Research on Automatic Categorization of Chinese Periodical Articles[J]. 现代图书情报技术, 2014, 30(3): 80-87.
[13] Hu Jiming, Xiao Lu. Semantic Incremental Improvement on Vector Space Model for Text Modeling[J]. 现代图书情报技术, 2014, 30(10): 49-55.
[14] Lu Yonghe, Li Yanfeng. A Feature Selection Based on Consideration of Multiple Factors[J]. 现代图书情报技术, 2013, (5): 34-39.
[15] Qu Peng, Wang Huilin. Fundamental Research Questions in Patent Text Categorization[J]. 现代图书情报技术, 2013, 29(3): 38-44.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn