New Technology of Library and Information Service  2015, Vol. 31 Issue (1): 24-30    DOI: 10.11925/infotech.1003-3513.2015.01.04
Hierarchical Filtering Method for Patent Term Extraction
Hou Ting, Lv Xueqiang, Li Zhuo
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
[Objective] As the core content and the important part of patent documents, the extraction task of patent terms is regarded as the basis of research works on the patent. [Methods] A hierarchical filtering method is presented to extract terms. Based on the suffix array, this method takes repeated strings as the candidate words and divides invalid strings into three classes, including the broken string, the redundant string and the common word, according to their features in the candidate set. Besides, by removing the above invalid strings, patent terms are obtained. The authors propose an independence calculation method, a relative activity calculation method and a word segmentation error correction method to filter broken strings and redundant strings respectively. [Results] Experimental results show that the proposed method has a good effect on Chinese patent term extraction. The average precision is 90.54% and the average recall is 87.33%. [Limitations] The method is just suitable for repeated strings and cannot identify the term which frequency number is 1. [Conclusions] The method is effective in patent term extraction.

Key wordsPatent terms      Hierarchical filtering method      Independence calculation      Relative Active Degree     
Received: 11 June 2014      Published: 12 February 2015
Hou Ting, Lv Xueqiang, Li Zhuo. Hierarchical Filtering Method for Patent Term Extraction. New Technology of Library and Information Service, 2015, 31(1): 24-30.

