Please wait a minute...
Data Analysis and Knowledge Discovery  2018, Vol. 2 Issue (5): 11-22    DOI: 10.11925/infotech.2096-3467.2017.1065
Orginal Article Current Issue | Archive | Adv Search |
Recognizing Semantics of Continuous Strings in Chinese Patent Documents
Xueying Wang,Hao Wang(),Zixuan Zhang
School of Information Management, Nanjing University, Nanjing 210023, China
Jiangsu Key Laboratory of Data Engineering and Knowledge Service (Nanjing University), Nanjing 210023, China
Download: PDF(681 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper aims to extract the semantic information from continuous strings in Chinese patent documents in the field of iron and steel metallurgy. [Methods] First, we collected strings with identified the semantics as the learning corpus. Then, we examined the basic features, as well as characteristics of Chinese characters and strings with the corpus to establish the best model. Finally, we used this model to recognize the semantics of other strings. [Results] The proposed model could effectively extract semantics of the continuous strings. [Limitations] We did not include the identified characters to the training corpus. [Conclusions] The new model could identify the semantics of continuous strings in Chinese patent documents, which could be used to study the continuous strings in English literature.

Key wordsChinese Patent Documents      Iron and Steel Metallurgy      Continuous Strings      Semantic Recognition     
Received: 26 October 2017      Published: 20 June 2018

Cite this article:

Xueying Wang,Hao Wang,Zixuan Zhang. Recognizing Semantics of Continuous Strings in Chinese Patent Documents. Data Analysis and Knowledge Discovery, 2018, 2(5): 11-22.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.1065     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2018/V2/I5/11

[1] Trappey C V, Wu H Y, Taghaboni-Dutta F, et al.Using Patent Data for Technology Forecasting: China RFID Patent Analysis[J]. Advanced Engineering Informatics, 2011, 25(1): 53-64.
[2] 王密平. 汉语专利术语抽取及应用研究——以钢铁冶金领域为例[D]. 南京: 南京大学, 2017.
[2] (Wang Miping.A Study on Chinese Terms Extraction and Their Application: The Case of Iron and Steel Metallurgy[D]. Nanjing: Nanjing University, 2017.)
[3] 陈志雄, 曾辉. 中文专利文献自动分类[J]. 嘉应学院学报, 2010, 28(2): 24-29.
[3] (Chen Zhixiong, Zeng Hui.Chinese Patent Text Automatic Categorization System[J]. Journal of Jiaying University, 2010, 28(2): 24-29.)
[4] 徐川, 施水才, 房祥, 等. 中文专利文献术语抽取[J].计算机工程与设计, 2013, 34(6): 2175-2179.
[4] (Xu Chuan, Shi Shuicai, Fang Xiang, et al.Chinese Patent Terminology Extraction[J]. Computer Engineering and Design, 2013, 34(6): 2175-2179.)
[5] 王密平, 王昊, 邓三鸿, 等. 基于CRFs的冶金领域中文专利术语抽取研究[J].现代图书情报技术, 2016(6): 28-36.
[5] (Wang Miping, Wang Hao, Deng Sanhong, et al.A Study on Chinese Terms Extraction and Their Application: The Case of Iron and Steel Metallurgy[J]. New Technology of Library and Information Service, 2016(6): 28-36.)
[6] 韩杰冰. 基于字角色标注的中文专利术语识别研究[D]. 南京: 南京大学, 2015.
[6] (Han Jiebing.The Research on Chinese Term Recognition of Patents Based on Word-Role Tagging[D]. Nanjing: Nanjing University, 2015.)
[7] 姜武. 模式识别技术在山茶属植物数值分类学和叶绿素含量预测中的应用研究[D]. 金华: 浙江师范大学, 2013.
[7] (Jiang Wu.Application of Pattern Recognition Techniques in Plant Numerical Taxonomy and Chlorophyll Content of Genus Camellia[D]. Jinhua: Zhejiang Normal University, 2013.)
[8] 罗俊, 王清丽, 张华, 等. 不同甘蔗基因型光合特性的数值分类[J].应用与环境生物学报, 2007, 13(4): 461-465.
[8] (Luo Jun, Wang Qingli, Zhang Hua, et al.Phenetic Classification for Photosynthetic Characters of Different Sugarcane Varieties[J]. Chinese Journal of Applied and Environmental Biology, 2007, 13(4): 461-465.)
[9] 刘晓云, 陈文新. 三叶草、猪屎豆和含羞草植物根瘤菌16S rDNA PCR-RFLP分析和数值分类研究[J]. 中国农业大学学报, 2003, 8(3): 1-6.
[9] (Liu Xiaoyun, Chen Wenxin.16S rDNA PCR-RFLP Analysis and Numerical Taxonomy for Rhizobia Isolated from Trifolium, Crotalaria and Mimosa[J]. Journal of China Agricultural University, 2003, 8(3): 1-6.)
[10] 刘勇, 孙中海, 刘德春, 等. 部分柚类品种数值分类研究[J].果树学报, 2006, 23(1): 35-40.
[10] (Liu Yong, Sun Zhonghai, Liu Dechun, et al.Numerical Classification of Some Grapefruit Cultivars[J]. Journal of Fruit Science, 2006, 23(1): 35-40.)
[11] 杜琪珍, 李名君, 刘维华, 等. 茶组植物的化学分类及数值分类[J].茶叶科学, 1990, 10(2): 1-12.
[11] (Du Qizhen, Li Mingjun, Liu Weihua, et al.Chemical and Numerical Taxonomies of Tea Section Plants[J]. Journal of Tea Science, 1990, 10(2): 1-12.)
[12] 罗礼溥, 郭宪国. 云南医学革螨数值分类研究[J]. 热带医学杂志, 2007, 50(1): 172-177.
[12] (Luo Lipu, Guo Xianguo.Classification of a Medically Important Group of Gamasid Mites by Numerical Taxonomy in Yunnan, China[J]. Journal of Tropical Medicine, 2007, 50(1): 172-177. )
[13] 陈晓琴, 陈强, 张世熔, 等. 流沙河流域土壤自生固氮菌数值分类及BOX-PCR研究[J]. 农业环境科学学报, 2006, 25(S): 528-532.
[13] (Chen Xiaoqin, Chen Qiang, Zhang Shirong, et al.Taxonomy and BOX-PCR Analysis of Free-Living Dizotrophs Isolated from Soils in Liusha River Valley[J]. Journal of Agro-Environment Science, 2006, 25(S): 528-532.)
[14] 孙家梅. 白蛉的数值分类和基于DNA条形码的分子系统学研究[D]. 广州: 暨南大学, 2010.
[14] (Sun Jiamei.The Numerical Taxonomy and Molecular Systematic Using Phlebotomus DNA Barcode of Phlebotomine Sandflies[D]. Guangzhou: Jinan University, 2010.)
[15] 么枕生. 用于数值分类的聚类分析[J]. 海洋湖沼通报, 1994(2): 1-12.
[15] (Yao Zhensheng.Cluster Analysis Used in Numerical Classification[J]. Transactions of Oceanology and Limnology, 1994(2): 1-12.)
[16] 李宏乔, 樊孝忠. 汉语文本中特殊符号串的自动识别技术[J]. 计算机工程, 2004, 30(12): 114-115.
[16] (Li Hongqiao, Fan Xiaozhong.Technique of Special Strings Automatic Recognition in Chinese Texts[J]. Computer Engineering, 2004, 30(12): 114-115.)
[17] 赵欣欣. 基于字符编码的文本隐藏算法及其攻击方法研究[D]. 合肥: 中国科学技术大学, 2009.
[17] (Zhao Xinxin.Research on Character Coding Based Text Stenographer and Its Attack Methods[D]. Hefei: University of Science and Technology of China, 2009.)
[18] 金花, 朱亚涛, 靳志强. 农业文献知识获取中斜体字符识别技术的应用研究[J].河北农业大学学报, 2015, 38(6): 124-128.
[18] (Jin Hua, Zhu Yatao, Jin Zhiqiang.Research on Detection Method of English Italic Characters in Agriculture Acquisition[J]. Journal of Agricultural University of Hebei, 2015, 38(6): 124-128.)
[19] 汤青, 吕学强, 李卓, 等. 领域本体术语抽取研究[J].现代图书情报技术, 2014(1): 43-50.
[19] (Tang Qing, Lv Xueqiang, Li Zhuo, et al.Research on Domain Ontology Term Extraction[J]. New Technology of Library and Information Service, 2014(1): 43-50.)
[20] 屈鹏, 王惠临. 面向信息分析的专利术语抽取研究[J].图书情报工作, 2013, 57(1): 130-135.
[20] (Qu Peng, Wang Huilin.Patent Term Extraction for Information Analysis[J]. Library and Information Service, 2013, 57(1): 130-135.)
[21] 胡阿沛, 张静, 刘俊丽. 基于改进C-value方法的中文术语抽取[J]. 现代图书情报技术, 2013(2): 24-29.
[21] (Hu Apei, Zhang Jing, Liu Junli.Chinese Term Extraction Based on Improved C-value Method[J]. New Technology of Library and Information Service, 2013(2): 24-29.)
[22] 侯婷, 吕学强, 李卓. 专利术语抽取的层次过滤方法[J]. 现代图书情报技术, 2015(1): 24-30.
[22] (Hou Ting, Lv Xueqiang, Li Zhuo.Hierarchical Filtering Method for Patent Term Extraction[J]. New Technology of Library and Information Service, 2015(1): 24-30.)
[23] 何远标, 乐小虬, 张帆. 学术论文大纲中关键术语抽取方法研究[J]. 现代图书情报技术, 2014(3): 73-79.
[23] (He Yuanbiao, Le Xiaoqiu, Zhang Fan.Research on Keyphrase Extraction from Scholarly Article Outline[J]. New Technology of Library and Information Service, 2014(3): 73-79.)
[24] 杜丽萍, 李晓戈, 周元哲, 等. 互信息改进方法在术语抽取中的应用[J]. 计算机应用, 2015, 35(4): 996-1000, 1005.
[24] (Du Liping, Li Xiaoge, Zhou Yuanzhe, et al.Application of Mutual Information Improvement Method in Term Extraction[J]. Computer Applications, 2015, 35(4): 996-1000, 1005.)
[25] 谷俊, 王昊. 基于领域中文文本的术语抽取方法研究[J]. 现代图书情报技术, 2011(4): 29-34.
[25] (Gu Jun, Wang Hao.Study on Term Extraction on the Basis of Chinese Domain Texts[J]. New Technology of Library and Information Service, 2011(4): 29-34.)
[26] 屈鹏, 王惠临. 专利信息服务中的术语抽取[J]. 情报科学, 2015, 33(9): 66-71.
[26] (Qu Peng, Wang Huilin.Term Extraction in Patent Information Services[J]. Information Science, 2015, 33(9): 66-71.)
[27] 曾文, 徐硕, 张运良, 等. 科技文献术语的自动抽取技术研究与分析[J]. 现代图书情报技术, 2014(1): 51-55.
[27] (Zeng Wen, Xu Shuo, Zhang Yunliang, et al.The Research and Analysis on Automatic Extraction of Science and Technology Literature Terms[J]. New Technology of Library and Information Service, 2014(1): 51-55.)
[28] 化柏林. 针对中文学术文献的情报方法术语抽取[J].现代图书情报技术, 2013(6): 68-75.
[28] (Hua Bolin.Extracting Information Method Term from Chinese Academic Literature[J]. New Technology of Library and Information Service, 2013(6): 68-75.)
[29] 袁劲松, 张小明, 李舟军.术语自动抽取方法研究综述[J].计算机科学, 2015, 42(8): 7-12.
[29] (Yuan Jinsong, Zhang Xiaoming, Li Zhoujun.Survey of Automatic Term Extraction Methodologies[J]. Computer Science, 2015, 42(8): 7-12.)
[30] 张文静, 梁颖红. 术语抽取技术研究[J].信息技术, 2008, 32(3): 6-9.
[30] (Zhang Wenjing, Liang Yinghong.Research on Term Extraction Technology[J]. Information Technology, 2008, 32(3): 6-9.)
[31] 周浪. 中文术语抽取若干问题研究[D].南京: 南京理工大学, 2010.
[31] (Zhou Lang.A Study on the Chinese Term Extraction[D]. Nanjing: Nanjing University of Science and Technology, 2010.)
[32] 唐涛, 周俏丽, 张桂平. 统计与规则相结合的术语抽取[J].沈阳航空航天大学学报, 2011, 28(5): 71-74.
[32] (Tang Tao, Zhou Qiaoli, Zhang Guiping.Term Extraction Based on the Combination of Statistics and Rules[J]. Journal of Shenyang University of Aeronautics and Astronautics, 2011, 28(5): 71-74.)
[33] 陈锋, 翟羽佳, 王芳. 基于条件随机场的学术期刊中理论的自动识别方法[J]. 图书情报工作, 2016, 60(2): 122-128.
[33] (Chen Feng, Zhai Yujia, Wang Fang.Automatic Theory Recognition in Academic Journals Based on CRF[J]. Library and Information Service, 2016, 60(2): 122-128.)
[34] 逯万辉, 马建霞. 基于CRFs的领域爆发词识别的研究与实现[J]. 情报科学, 2014, 32(1): 89-93.
[34] (Lu Wanhui, Ma Jianxia.Research and Implementation on the Domain Burst Word Recognition Based on CRFs[J]. Information Science, 2014, 32(1): 89-93.)
[35] 王荣洋, 鞠久朋, 李寿山, 等. 基于CRFs的评价对象抽取特征研究[J]. 中文信息学报, 2012, 26(2): 56-61.
[35] (Wang Rongyang, Ju Jiupeng, Li Shoushan, et al.Feature Engineering for CRFs Based Opinion Target Extraction[J]. Journal of Chinese Information Processing, 2012, 26(2): 56-61.)
[36] 侯立斌, 李培峰, 朱巧明. 基于CRFs和跨事件的事件识别研究[J]. 计算机工程, 2012, 38(24): 191-195.
[36] (Hou Libin, Li Peifeng, Zhu Qiaoming.Study of Event Recognition Based on CRFs and Cross-event[J]. Computer Engineering, 2012, 38(24): 191-195.)
[37] 罗彦彦, 黄德根. 基于CRFs边缘概率的中文分词[J].中文信息学报, 2009, 23(5): 3-8.
[37] (Luo Yanyan, Huang Degen.Chinese Word Segmentation Based on the Marginal Probabilities Generated by CRFs[J]. Journal of Chinese Information Processing, 2009, 23(5): 3-8.)
[38] CRF++[EB/OL]. [2017-11-16]. .
[39] 周志华, 王珏. 机器学习及其应用[M]. 北京: 清华大学出版社, 2009.
[39] (Zhou Zhihua, Wang Jue.Machine Learning and Its Application [M]. Beijing: Tsinghua University Press, 2009.)
No related articles found!
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn