Please wait a minute...
New Technology of Library and Information Service  2014, Vol. 30 Issue (10): 76-83    DOI: 10.11925/infotech.1003-3513.2014.10.12
Current Issue | Archive | Adv Search |
Identification of Non-nest Coordination for Chinese Patent Literature
Shi Cui, Wang Yang, Yang Bin, Yao Ye
Department of Information Technology, Liaoning School of Administration, Shenyang 110161, China
Download: PDF(638 KB)   HTML  
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] In order to improve the accuracy of identification results, according to the characteristics of coordinate structures in Chinese patent literature, this paper presents an identification method combining rules and Conditional Random Fields(CRFs). [Methods] According to the characteristics of coordinate structures, using the rules to extract the symmetrical coordinate structure. Bundling the coordinate structures, using CRFs to identify non-nest coordinate structure. On the basis of the above identification results, using the wrong driver method to deal with the identification results to get the final identification results. [Results] The experimental results show that this method can identify the non-nest coordination in the patent literature effectively and get the F value of 76.57%. [Limitations] Rules used in the experiments can be further improved. The application of the rules directly affects the identification results of coordinate structures. [Conclusions] The identification method by combining rules and CRFs is effective for non-nest coordination in Chinese patent literature.

Key wordsPatent literature      Coordinate structures      CRFs      Rules     
Received: 31 March 2014      Published: 28 November 2014
:  TP391.1  

Cite this article:

Shi Cui, Wang Yang, Yang Bin, Yao Ye. Identification of Non-nest Coordination for Chinese Patent Literature. New Technology of Library and Information Service, 2014, 30(10): 76-83.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2014.10.12     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2014/V30/I10/76

[1] 冯文贺, 姬东鸿. 并列结构的依存分析与连词的控制语地位[J]. 语言科学, 2011, 10(2): 168-181. (Feng Wenhe, Ji Donghong. The Dependency-based Analysis of Co-ordinate Structures and the Governor Status of Conjunctions [J]. Linguistic Sciences, 2011, 10(2): 168-181.)
[2] 朱德熙. 语法讲义[M]. 北京: 商务印书馆, 1982: 156-159. (Zhu Dexi. Grammer Lecture Notes [M]. Beijing: The Commercial Press, 1982:156-159.)
[3] 吴云芳. 面向语言信息处理的现代汉语并列结构研究[M]. 北京: 北京师范大学出版社, 2013. (Wu Yunfang. Research on Chinese Coordinate Structure for Natural Language Processing [M]. Beijing: Beijing Normal University Publishing Group, 2009.)
[4] 王东波, 谢靖. 基于清华汉语树库的有标记联合结构统计分析[J]. 现代图书情报技术, 2010(4): 12-17. (Wang Dongbo, Xie Jing. Analyzing the Linguistic Features of Coordination with Overt Conjunctions Based on Tsinghua Chinese Treebank [J]. New Technology of Library and Information Service, 2010(4): 12-17.)
[5] 王东波. 有标记联合结构的自动识别[D]. 南京: 南京师范大学, 2008. (Wang Dongbo. Automatic Identification of Coordination with Overt Conjunetion [D]. Nanjing: Nanjing Normal University, 2008.)
[6] 苗艳军. 汉语并列结构的自动识别[D]. 苏州: 苏州大学, 2009.(Miao Yanjun. Automatic Identification of Chinese Coordination Struetures [D]. Suzhou: Suzhou University, 2009.)
[7] 苗艳军, 李军辉, 周国栋. 统计和规则相结合的并列结构自动识别[J]. 计算机应用研究, 2009, 26(9): 3403-3406. (Miao Yanjun, Li Junhui, Zhou Guodong. Automatic Identification of Coordinate Structure Based on Statistics and Rules [J]. Application Research of Computers, 2009, 26(9): 3403-3406.)
[8] 昝红英, 周丽娟, 张坤丽. 基于用法的现代汉语连词结构短语识别研究[J]. 中文信息学报, 2012, 26(6): 72-78. (Zan Hongying, Zhou Lijuan, Zhang Kunli. Studies on Modern Chinese Conjunction Structure Phrase Recognition Based on Usage [J]. Journal of Chinese Information Processing, 2012, 26(6): 72-78.)
[9] Zan H, Zhou L, Zhang K. Studies on the Automatic Recognition of Modern Chinese Conjunction Usages [A]. // Lexture Notes in Computer Science [M]. Springer, 2012, 6838: 472-479.
[10] 周丽娟. 现代汉语连词用法的自动识别及应用研究[D]. 郑州: 郑州大学, 2012. (Zhou Lijuan. Studies on Automatic Recognition of Modern Chinese Conjunction Usages and Application [D]. Zhengzhou: Zhengzhou University, 2012.)
[11] Agarwal R, Boggess L. A Simple but Useful Approach to Conjunct Identification [C]. In: Proceedings of the 30th Annual Meeting on Association for Computational Linguistics. 1992: 15-21.
[12] Hara K, Shimbo M, Okuma H, et al. Coordinate Structure Analysis with Global Structural Constraints and Alignment- Based Local Features [C]. In: Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP. 2009: 967-975.
[13] Hanamoto A, Matsuzaki T, Tsujii J. Coordination Structure Analysis Using Dual Decomposition[C]. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, Avignon, France. 2012: 430-438.
[14] Popel M, Mare?ek D, Štěpánek J, et al. Coordination Structures in Dependency Treebanks [C]. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria. 2013, 1: 517-527.
[15] 赵红改, 吕学强, 肖诗斌. 搜索引擎日志中"N+V"型短语分析[J]. 计算机应用与软件, 2012, 29(11): 126-129. (Zhao Honggai, Lv Xueqiang, Xiao Shibin. Phrase Parsing of "N + V" Structure in Search Engine Logs [J]. Computer Applications and Software, 2012, 29(11): 126-129.)
[16] Lafferty J, McCallum A, Pereira F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In: Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
[17] 夏静, 柴玉梅, 昝红英. 基于统计和规则的常用词的兼类识别研究[J]. 计算机工程与设计, 2013, 34(2): 654-659. (Xia Jing, Chai Yumei, Zan Hongying. Study on Multi- category of Common Words Based on Statistics and Rules [J]. Computer Engineering and Design, 2013, 34(2): 654-659.)
[18] Uchimoto K, Ma Q, Murata M, et al. Named Entity Extraction Based on a Maximum Entropy Model and Transformation Rules [C]. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, HongKong, China. 2000: 326-335.
[19] 赵红改, 吕学强, 施水才, 等.搜索引擎日志中"N+V"和"N1+N2+V"型短语自动识别[J]. 中文信息学报, 2012, 26(5): 20-25. (Zhao Honggai, Lv Xueqiang, Shi Shuicai, et al. Automatic Identification of Phrase of "N+V" Structure and "N1+N2+V" Structure in Search Engine Query Logs[J]. Journal of Chinese Information Processing, 2012, 26(5): 20-25.)
[20] 石翠, 周俏丽, 张桂平. 面向中文专利文献的有标记并列结构的统计分析[J]. 中文信息学报, 2013, 27(5): 43-50. (Shi Cui, Zhou Qiaoli, Zhang Guiping. Analyzing the Linguistics Features of Coordination with Overt Conjunctions Based on Chinese Patent Literature [J]. Journal of Chinese Information Processing, 2013, 27(5): 43-50.)

[1] Yong Zhang,Shuqing Li,Yongshang Cheng. Mining Algorithm for Weighted Association Rules Based on Frequency Effective Length[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[2] Xing Wei,Dehua Hu,Minhan Yi,Qizhen Zhu,Wenjie Zhu. Extracting Disease-Gene-Drug Correlations Based on Data Cube[J]. 数据分析与知识发现, 2017, 1(10): 94-104.
[3] Li Xiaoying,Xia Guanghui,Li Danya. Finding Semantic Relations Among Subject Indexed Papers[J]. 现代图书情报技术, 2016, 32(7-8): 87-93.
[4] Wang Miping,Wang Hao,Deng Sanhong,Wu Zhixiang. Extracting Chinese Metallurgy Patent Terms with Conditional Random Fields[J]. 现代图书情报技术, 2016, 32(6): 28-36.
[5] Guangce Ruan, Lei Xia. Mining Document Topics Based on Association Rules[J]. 数据分析与知识发现, 2016, 32(12): 50-56.
[6] Du Siqi, Li Honglian, Lv Xueqiang. Research of Chinese Chunk Parsing in Application of the Product Feature Extraction[J]. 现代图书情报技术, 2015, 31(9): 26-30.
[7] Duan Yufeng, Zhu Wenjing, Chen Qiao, Liu Wei, Liu Fenghong. The Study on Out-of-Vocabulary Identification on a Model Based on the Combination of CRFs and Domain Ontology Elements Set[J]. 现代图书情报技术, 2015, 31(4): 41-49.
[8] Tang Xiaobo, Hu Hua. Research of Ontology Concept Extraction Based on Chinese UGC Sources[J]. 现代图书情报技术, 2014, 30(5): 41-49.
[9] Meng Meiren, Ding Shengchun. Research on the Credibility of Online Chinese Product Reviews[J]. 现代图书情报技术, 2013, 29(9): 60-66.
[10] Hu Apei, Zhang Jing, Liu Junli. Chinese Term Extraction Based on Improved C-value Method[J]. 现代图书情报技术, 2013, 29(2): 24-29.
[11] Wang Yong, Zhang Qin, Yang Xiaojie. Research on the Method of Extracting Features from Chinese Product Reviews on the Internet[J]. 现代图书情报技术, 2013, (12): 70-73.
[12] Sun Haixia, Li Junlian, Li Danya, Wu Yingjie, Li Xiaoying. The Study on Semantic Mapping from Free Word to Subject Headings Based on Semantic System of CMeSH[J]. 现代图书情报技术, 2013, 29(11): 46-51.
[13] Gu Jun, Xu Xin. Study on Ontology Relation Extraction in Chinese Patent Documents[J]. 现代图书情报技术, 2013, 29(10): 73-78.
[14] Duan Yufeng, Hei Zhenzhen, Ju Fei, Cui Hong. Study on Semantic Markup of Species Description Text in Chinese Based on Auto-learning Rules[J]. 现代图书情报技术, 2012, 28(5): 41-47.
[15] Qu Jianfeng, Li Fang, Zhang Yihua, Li Bao. Study and Implementation on the Automatic Mapping Rules Between Knowledge Organization Systems——The Case of the Dewey Decimal Classification and the Chinese Library Classification[J]. 现代图书情报技术, 2012, (10): 83-88.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn