Please wait a minute...
New Technology of Library and Information Service  2011, Vol. 27 Issue (2): 48-53    DOI: 10.11925/infotech.1003-3513.2011.02.08
article Current Issue | Archive | Adv Search |
Research Towards Chinese String Similarity Based on the Clustering Feature of Chinese Characters
Wang Jingting
Department of Military Information Management, Shanghai Branch of Nanjing Institute of Politics, Shanghai 200433,China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

This paper adopts cluster analysis method to discuss and analyze the features of Chinese characters,in order to discover the internal rules. Based on the clustering feature of Chinese characters,it refines the matching result of string matching,and advances a 2-level similarity model. The experiment result shows that this model can reflect the similarity better.

Key wordsChinese string matching      Clustering of Chinese character      Similarity     
Received: 18 October 2010      Published: 25 March 2011
: 

TP391

 

Cite this article:

Wang Jingting. Research Towards Chinese String Similarity Based on the Clustering Feature of Chinese Characters. New Technology of Library and Information Service, 2011, 27(2): 48-53.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2011.02.08     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2011/V27/I2/48


[1] 章成志.基于多层特征的字符串相似度计算模型
[J]. 情报学报 ,2005,24(6):696-701.

[2] 李钝,曹元大,万月亮.信息安全中的变形关键词的识别
[J]. 计算机工程 ,2007,33(21):155-156,159.

[3] 周学广,张焕国.抗中文主动干扰的柔性中文串匹配算法
[J]. 武汉大学学报:理学版 ,2009,55(1):101-104.

[4] 曹犟,邬晓钧,夏云庆,等.基于拼音索引的中文模糊匹配算法
[J]. 清华大学学报:自然科学版 ,2009,49(Z1):1328-1332.

[5] 宋玲,徐白.中文检索系统的相似匹配技术研究和实现
[J]. 计算机科学 ,2010,37(12A):46-48.

[6] 杜艾永,李立顺,朱愿,等.基于汉字机内编码的中文相似重复记录消除研究
[J]. 电脑知识与技术 ,2009,5(29):8314-8316.

[7] 宋柔,林民,葛诗利.汉字字形计算及其在校对系统中的应用
[J]. 小型微型计算机系统 ,2008,29(10):1964-1968.

[8] 于志恒.基于笔形相似的文本校对算法及其接口原型系统的研究 .沈阳:东北师范大学,2007.

[9] 刁兴春,谭明超,曹建军.一种融合多种编辑距离的字符串相似度计算方法
[J], 计算机应用研究 ,2010,27(12):4523-4525.

[10] White T.走近Jazzy .(2004-09-22). .http://www.ibm.com/developerworks/cn/java/j-jazzy/?ca=dwcn-newsletter-java.

[11] Navarro G, Raffinot M.柔性字符串匹配
[M].中国科学院计算所网络信息安全研究组译.北京:电子工业出版社,2007:14-21.

[12] Cohen W W, Ravikumar P, Fienberg S E.A Comparison of String Distance Metrics for Name-Matching Tasks . In: Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03).2003:73-78.

[13] 冯志伟.汉字和汉语的计算机处理
[J]. 当代语言学 ,2001,3(1):1-21.

[1] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[2] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[3] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[4] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[5] Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[6] Wu Yanwen, Cai Qiuting, Liu Zhi, Deng Yunze. Digital Resource Recommendation Based on Multi-Source Data and Scene Similarity Calculation[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[7] Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[8] Xu Yicong,Tian Xuedong,Li Xinfu,Yang Fang,Shi Qingxuan. Retrieving Mathematical Expressions Based on Hesitant Fuzzy Weight[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[9] Su Qing,Chen Sizhao,Wu Weimin,Li Xiaomei,Huang Tiankuan. Personalized Recommendation Model Based on Collaborative Filtering Algorithm of Learning Situation[J]. 数据分析与知识发现, 2020, 4(5): 105-117.
[10] Liu Ping,Peng Xiaofang. Calculating Word Similarities Based on Formal Concept Analysis[J]. 数据分析与知识发现, 2020, 4(5): 66-74.
[11] Wei Guohui,Zhang Fengcong,Fu Xianjun,Wang Zhenguo. Similarity Measurement of Traditional Chinese Medicine Components for Cold-hot Nature Discrimination[J]. 数据分析与知识发现, 2020, 4(5): 75-83.
[12] Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[13] Han Kangkang,Xu Jianmin,Zhang Bin. Recommending Microblogs with User’s Interests and Multidimensional Trust[J]. 数据分析与知识发现, 2020, 4(12): 95-104.
[14] Li Jiaquan,Li Baoan,You Xindong,Lü Xueqiang. Computing Similarity of Patent Terms Based on Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
[15] Yan Yu,Lei Chen,Jinde Jiang,Naixuan Zhao. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn