Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (10): 76-83    DOI: 10.11925/infotech.1003-3513.2014.10.12
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
面向中文专利文献的单层并列结构识别
石翠, 王杨, 杨彬, 姚晔
辽宁行政学院信息技术系 沈阳 110161
Identification of Non-nest Coordination for Chinese Patent Literature
Shi Cui, Wang Yang, Yang Bin, Yao Ye
Department of Information Technology, Liaoning School of Administration, Shenyang 110161, China
全文: PDF(638 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 为提高并列结构识别结果的准确率, 根据专利文献中并列结构的特点, 提出一种规则与条件随机场相结合的并列结构识别方法。[方法] 根据中文专利文献中并列结构的特点, 运用规则提取对称并列结构; 对规则提取的并列结构进行捆绑, 运用条件随机场识别单层的并列结构; 在上述识别结果的基础上, 运用错误驱动的方法, 对识别结果进行后规则处理。[结果] 实验结果表明, 该方法可以有效地识别专利文献中的单层并列结构, F值达到76.57%。[局限] 实验所用规则可以进一步改进, 规则的运用直接影响并列结构的识别效果。[结论] 规则与条件随机场相结合的识别方法对于中文专利文献中单层并列结构的识别是有效的。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王杨
杨彬
姚晔
石翠
关键词 专利文献并列结构条件随机场规则    
Abstract

[Objective] In order to improve the accuracy of identification results, according to the characteristics of coordinate structures in Chinese patent literature, this paper presents an identification method combining rules and Conditional Random Fields(CRFs). [Methods] According to the characteristics of coordinate structures, using the rules to extract the symmetrical coordinate structure. Bundling the coordinate structures, using CRFs to identify non-nest coordinate structure. On the basis of the above identification results, using the wrong driver method to deal with the identification results to get the final identification results. [Results] The experimental results show that this method can identify the non-nest coordination in the patent literature effectively and get the F value of 76.57%. [Limitations] Rules used in the experiments can be further improved. The application of the rules directly affects the identification results of coordinate structures. [Conclusions] The identification method by combining rules and CRFs is effective for non-nest coordination in Chinese patent literature.

Key wordsPatent literature    Coordinate structures    CRFs    Rules
收稿日期: 2014-03-31     
:  TP391.1  
通讯作者: 石翠 E-mail: aaasc@163.com     E-mail: aaasc@163.com
作者简介: 作者贡献声明: 石翠: 提出研究思路, 设计研究方案, 进行实验分析, 撰写论文; 王杨: 实验分析, 论文编辑; 杨彬: 设计论文框架及修改论文; 姚晔: 论文修订。
引用本文:   
石翠, 王杨, 杨彬, 姚晔. 面向中文专利文献的单层并列结构识别[J]. 现代图书情报技术, 2014, 30(10): 76-83.
Shi Cui, Wang Yang, Yang Bin, Yao Ye. Identification of Non-nest Coordination for Chinese Patent Literature. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2014.10.12.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.10.12

[1] 冯文贺, 姬东鸿. 并列结构的依存分析与连词的控制语地位[J]. 语言科学, 2011, 10(2): 168-181. (Feng Wenhe, Ji Donghong. The Dependency-based Analysis of Co-ordinate Structures and the Governor Status of Conjunctions [J]. Linguistic Sciences, 2011, 10(2): 168-181.)
[2] 朱德熙. 语法讲义[M]. 北京: 商务印书馆, 1982: 156-159. (Zhu Dexi. Grammer Lecture Notes [M]. Beijing: The Commercial Press, 1982:156-159.)
[3] 吴云芳. 面向语言信息处理的现代汉语并列结构研究[M]. 北京: 北京师范大学出版社, 2013. (Wu Yunfang. Research on Chinese Coordinate Structure for Natural Language Processing [M]. Beijing: Beijing Normal University Publishing Group, 2009.)
[4] 王东波, 谢靖. 基于清华汉语树库的有标记联合结构统计分析[J]. 现代图书情报技术, 2010(4): 12-17. (Wang Dongbo, Xie Jing. Analyzing the Linguistic Features of Coordination with Overt Conjunctions Based on Tsinghua Chinese Treebank [J]. New Technology of Library and Information Service, 2010(4): 12-17.)
[5] 王东波. 有标记联合结构的自动识别[D]. 南京: 南京师范大学, 2008. (Wang Dongbo. Automatic Identification of Coordination with Overt Conjunetion [D]. Nanjing: Nanjing Normal University, 2008.)
[6] 苗艳军. 汉语并列结构的自动识别[D]. 苏州: 苏州大学, 2009.(Miao Yanjun. Automatic Identification of Chinese Coordination Struetures [D]. Suzhou: Suzhou University, 2009.)
[7] 苗艳军, 李军辉, 周国栋. 统计和规则相结合的并列结构自动识别[J]. 计算机应用研究, 2009, 26(9): 3403-3406. (Miao Yanjun, Li Junhui, Zhou Guodong. Automatic Identification of Coordinate Structure Based on Statistics and Rules [J]. Application Research of Computers, 2009, 26(9): 3403-3406.)
[8] 昝红英, 周丽娟, 张坤丽. 基于用法的现代汉语连词结构短语识别研究[J]. 中文信息学报, 2012, 26(6): 72-78. (Zan Hongying, Zhou Lijuan, Zhang Kunli. Studies on Modern Chinese Conjunction Structure Phrase Recognition Based on Usage [J]. Journal of Chinese Information Processing, 2012, 26(6): 72-78.)
[9] Zan H, Zhou L, Zhang K. Studies on the Automatic Recognition of Modern Chinese Conjunction Usages [A]. // Lexture Notes in Computer Science [M]. Springer, 2012, 6838: 472-479.
[10] 周丽娟. 现代汉语连词用法的自动识别及应用研究[D]. 郑州: 郑州大学, 2012. (Zhou Lijuan. Studies on Automatic Recognition of Modern Chinese Conjunction Usages and Application [D]. Zhengzhou: Zhengzhou University, 2012.)
[11] Agarwal R, Boggess L. A Simple but Useful Approach to Conjunct Identification [C]. In: Proceedings of the 30th Annual Meeting on Association for Computational Linguistics. 1992: 15-21.
[12] Hara K, Shimbo M, Okuma H, et al. Coordinate Structure Analysis with Global Structural Constraints and Alignment- Based Local Features [C]. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. 2009: 967-975.
[13] Hanamoto A, Matsuzaki T, Tsujii J. Coordination Structure Analysis Using Dual Decomposition[C]. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France. 2012: 430-438.
[14] Popel M, Mare?ek D, Štěpánek J, et al. Coordination Structures in Dependency Treebanks [C]. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria. 2013, 1: 517-527.
[15] 赵红改, 吕学强, 肖诗斌. 搜索引擎日志中"N+V"型短语分析[J]. 计算机应用与软件, 2012, 29(11): 126-129. (Zhao Honggai, Lv Xueqiang, Xiao Shibin. Phrase Parsing of "N + V" Structure in Search Engine Logs [J]. Computer Applications and Software, 2012, 29(11): 126-129.)
[16] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In: Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
[17] 夏静, 柴玉梅, 昝红英. 基于统计和规则的常用词的兼类识别研究[J]. 计算机工程与设计, 2013, 34(2): 654-659. (Xia Jing, Chai Yumei, Zan Hongying. Study on Multi- category of Common Words Based on Statistics and Rules [J]. Computer Engineering and Design, 2013, 34(2): 654-659.)
[18] Uchimoto K, Ma Q, Murata M, et al. Named Entity Extraction Based on a Maximum Entropy Model and Transformation Rules [C]. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, HongKong, China. 2000: 326-335.
[19] 赵红改, 吕学强, 施水才, 等.搜索引擎日志中"N+V"和"N1+N2+V"型短语自动识别[J]. 中文信息学报, 2012, 26(5): 20-25. (Zhao Honggai, Lv Xueqiang, Shi Shuicai, et al. Automatic Identification of Phrase of "N+V" Structure and "N1+N2+V" Structure in Search Engine Query Logs[J]. Journal of Chinese Information Processing, 2012, 26(5): 20-25.)
[20] 石翠, 周俏丽, 张桂平. 面向中文专利文献的有标记并列结构的统计分析[J]. 中文信息学报, 2013, 27(5): 43-50. (Shi Cui, Zhou Qiaoli, Zhang Guiping. Analyzing the Linguistics Features of Coordination with Overt Conjunctions Based on Chinese Patent Literature [J]. Journal of Chinese Information Processing, 2013, 27(5): 43-50.)

[1] 张勇,李树青,程永上. 基于频次有效长度的加权关联规则挖掘算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[2] 黄菡,王宏宇,王晓光. 结合主动学习的条件随机场模型用于法律术语的自动识别*[J]. 数据分析与知识发现, 2019, 3(6): 66-74.
[3] 何跃,丰月,赵书朋,马玉凤. 基于知乎问答社区的内容推荐研究——以物流话题为例[J]. 数据分析与知识发现, 2018, 2(9): 42-49.
[4] 唐慧慧,王昊,张紫玄,王雪颖. 基于汉字标注的中文历史事件名抽取研究*[J]. 数据分析与知识发现, 2018, 2(7): 89-100.
[5] 何跃,王爱欣,丰月,王莉. 基于关联规则的门诊药房布局优化[J]. 数据分析与知识发现, 2018, 2(1): 99-108.
[6] 王东波,吴毅,叶文豪,刘睿伦. 多特征知识下的食品安全事件实体抽取研究*[J]. 数据分析与知识发现, 2017, 1(3): 54-61.
[7] 张越,王东波,朱丹浩. 面向食品安全突发事件汉语分词的特征选择及模型优化研究*[J]. 数据分析与知识发现, 2017, 1(2): 64-72.
[8] 张琳,秦策,叶文豪. 基于条件随机场的法言法语实体自动识别模型研究*[J]. 数据分析与知识发现, 2017, 1(11): 46-52.
[9] 魏星,胡德华,易敏寒,朱启贞,朱文婕. 基于数据立方体挖掘疾病-基因-药物新关联*[J]. 数据分析与知识发现, 2017, 1(10): 94-104.
[10] 黄名选. 基于矩阵加权关联模式的印尼中跨语言信息检索模型*[J]. 数据分析与知识发现, 2017, 1(1): 26-36.
[11] 王密平,王昊,邓三鸿,吴志祥. 基于CRFs的冶金领域中文专利术语抽取研究*[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[12] 贺惠新,刘丽娟. 主动学习的科技文献研究对象标引体系研究*[J]. 现代图书情报技术, 2016, 32(3): 67-73.
[13] 马天翼,张朋柱,冯浩垠. 网络外包任务的知识需求建模研究*[J]. 现代图书情报技术, 2016, 32(3): 74-81.
[14] 阮光册, 夏磊. 基于关联规则的文本主题深度挖掘应用研究*[J]. 数据分析与知识发现, 2016, 32(12): 50-56.
[15] 周红照,侯敏,滕永林. 评价知识本体研究与规则实现*[J]. 现代图书情报技术, 2016, 32(10): 25-32.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn