|
|
Extracting Chinese Patent Keywords with LSTM and Logistic Regression |
Wei Tingting1,Jiang Tao1,Zheng Shuling2,Zhang Jiantao1( ) |
1College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China 2Patent Examination Cooperation Guangdong Center of the Patent Office, Guangzhou 510535, China |
|
|
Abstract [Objective] This paper constructs a new method to extract keywords from Chinese patents based on the LSTM and logistic regression, aiming to identify low-frequency and long-tail keywords effectively. [Methods] First, we combined the LSTM neural network and logistic regression model to extract the candidate keywords. Then, we reconstructed the filtering rules to retrieve the target keywords. [Results] The extraction accuracy of all keywords, low-frequency keywords, long-tail keywords, and low-frequency long-tail keywords were 5%, 24%, 11% and 26% higher than those of existing methods. [Limitations] The proposed model classifies keywords by setting thresholds, which are not precise to process words near the thresholds. [Conclusions] Our new model could effectively discover key terms with low frequency and long characters from texts, which benefits patent analysis and other services.
|
Received: 01 September 2021
Published: 14 April 2022
|
|
Fund:Young Talents Program of Colleges and Universities of Chinese Guangdong Province Office of Education(2019KQNCX012);Regional Joint Fund of Chinese Guangdong Province(2019A1515110396);Project of Humanities and Social Sciences,Ministry of Education(20YJC740067) |
Corresponding Authors:
Zhang Jiantao,ORCID:0000-0002-1646-2643
E-mail: zhangjiantao@yeah.net
|
[1] |
Wang Z H, Guo Y. Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents[J]. Journal of Information Science and Engineering, 2019, 35(3):651-674.
|
[2] |
谭婷婷, 陈高荣, 徐建. KEC: 基于cw2vec的中文专利关键词提取方法[J]. 计算机应用研究, 2020, 37(10):2907-2911, 2916.
|
[2] |
( Tan Tingting, Chen Gaorong, Xu Jian. KEC: Chinese Patent Keyword Extraction Method Based on cw2vec[J]. Application Research of Computers, 2020, 37(10):2907-2911, 2916.)
|
[3] |
夏天. 面向中文学术文本的单文档关键短语抽取[J]. 数据分析与知识发现, 2020, 4(7):76-86.
|
[3] |
( Xia Tian. Extracting Key-Phrases from Chinese Scholarly Papers[J]. Data Analysis and Knowledge Discovery, 2020, 4(7):76-86.)
|
[4] |
Siddiqi S, Sharan A. Keyword and Keyphrase Extraction Techniques: A Literature Review[J]. International Journal of Computer Applications, 2015, 109(2):18-23.
|
[5] |
俞琰, 尚明杰, 赵乃瑄. 权利要求特征驱动的专利关键词抽取方法[J]. 情报学报, 2021, 40(6):610-620.
|
[5] |
( Yu Yan, Shang Mingjie, Zhao Naixuan. Patent Keyword Extraction Driven by Claim Features[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(6):610-620.)
|
[6] |
王志宏, 过弋. 基于词句重要性的中文专利关键词自动抽取研究[J]. 情报理论与实践, 2018, 41(9):123-129, 160.
|
[6] |
( Wang Zhihong, Guo Yi. Automatic Keywords Extraction from Chinese Patents Based on Sentence Importance Ranking[J]. Information Studies:Theory & Application, 2018, 41(9):123-129, 160.)
|
[7] |
Zhang Y, Tuo M X, Yin Q Y, et al. Keywords Extraction with Deep Neural Network Model[J]. Neurocomputing, 2020, 383:113-121.
doi: 10.1016/j.neucom.2019.11.083
|
[8] |
Chen Y, Wang J, Li P, et al. Single Document Keyword Extraction via Quantifying Higher-Order Structural Features of Word Co-occurrence Graph[J]. Computer Speech & Language, 2019, 57:98-107.
|
[9] |
Qian Y L, Jia C C, Liu Y M. BERT-Based Text Keyword Extraction[J]. Journal of Physics: Conference Series, 2021, 1992(4):042077.
doi: 10.1088/1742-6596/1992/4/042077
|
[10] |
牛永洁. 基于Python的改进关键词提取算法的实现[J]. 电子设计工程, 2019, 27(13):11-15.
|
[10] |
( Niu Yongjie. Implementation of Improved Keyword Extraction Algorithm Based on Python[J]. Electronic Design Engineering, 2019, 27(13):11-15.)
|
[11] |
杨丹浩, 吴岳辛, 范春晓. 一种基于注意力机制的中文短文本关键词提取模型[J]. 计算机科学, 2020, 47(1):193-198.
|
[11] |
( Yang Danhao, Wu Yuexin, Fan Chunxiao. Chinese Short Text Keyphrase Extraction Model Based on Attention[J]. Computer Science, 2020, 47(1):193-198.)
|
[12] |
Duari S, Bhatnagar V. Complex Network Based Supervised Keyword Extractor[J]. Expert Systems with Applications, 2020, 140:112876.
doi: 10.1016/j.eswa.2019.112876
|
[13] |
Huang Z X, Xie Z P. A Patent Keywords Extraction Method Using TextRank Model with Prior Public Knowledge[J]. Complex & Intelligent Systems, 2021. https://doi.org/10.1007/s40747-021-00343-8.
|
[14] |
Ramay W Y, Xu C Y, Illahi I. Keyword Extraction from Social Media via AHP[J]. Human Systems Management, 2019, 37(4):463-468.
doi: 10.3233/HSM-180344
|
[15] |
Duan X Y, Ying S, Cheng H L, et al. OILog: An Online Incremental Log Keyword Extraction Approach Based on MDP-LSTM Neural Network[J]. Information Systems, 2021, 95:101618.
doi: 10.1016/j.is.2020.101618
|
[16] |
薛金成, 姜迪, 吴建德. 基于LSTM-A深度学习的专利文本分类研究[J]. 通信技术, 2019, 52(12):2888-2892.
|
[16] |
( Xue Jincheng, Jiang Di, Wu Jiande. Patent Text Classification Based on Long Short-Term Memory Network and Attention Mechanism[J]. Communications Technology, 2019, 52(12):2888-2892.)
|
[17] |
马建红, 王瑞杨, 姚爽, 等. 基于深度学习的专利分类方法[J]. 计算机工程, 2018, 44(10):209-214.
|
[17] |
( Ma Jianhong, Wang Ruiyang, Yao Shuang, et al. Patent Classification Method Based on Depth Learning[J]. Computer Engineering, 2018, 44(10):209-214.)
|
[18] |
向进勇, 刘小龙, 丁明扬, 等. 基于卷积递归深度学习模型的句子级文本情感分类[J]. 东北师大学报(自然科学版), 2020, 52(2):73-79.
|
[18] |
( Xiang Jinyong, Liu Xiaolong, Ding Mingyang, et al. Convolutional Recurrent Deep Learning Model for Sentence Sentiment Classification[J]. Journal of Northeast Normal University(Natural Science Edition), 2020, 52(2):73-79.)
|
[19] |
宁珊, 严馨, 周枫, 等. 融合LSTM和LDA差异的新闻文本关键词抽取方法[J]. 计算机工程与科学, 2020, 42(1):153-160.
|
[19] |
( Ning Shan, Yan Xin, Zhou Feng, et al. A News Keyword Extraction Method Combining LSTM and LDA Differences[J]. Computer Engineering and Science, 2020, 42(1):153-160.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|