1College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China 2Patent Examination Cooperation Guangdong Center of the Patent Office, Guangzhou 510535, China
[Objective] This paper constructs a new method to extract keywords from Chinese patents based on the LSTM and logistic regression, aiming to identify low-frequency and long-tail keywords effectively. [Methods] First, we combined the LSTM neural network and logistic regression model to extract the candidate keywords. Then, we reconstructed the filtering rules to retrieve the target keywords. [Results] The extraction accuracy of all keywords, low-frequency keywords, long-tail keywords, and low-frequency long-tail keywords were 5%, 24%, 11% and 26% higher than those of existing methods. [Limitations] The proposed model classifies keywords by setting thresholds, which are not precise to process words near the thresholds. [Conclusions] Our new model could effectively discover key terms with low frequency and long characters from texts, which benefits patent analysis and other services.
Wang Z H, Guo Y. Sentence-Ranking-Enhanced Keywords Extraction from Chinese Patents[J]. Journal of Information Science and Engineering, 2019, 35(3):651-674.
( Tan Tingting, Chen Gaorong, Xu Jian. KEC: Chinese Patent Keyword Extraction Method Based on cw2vec[J]. Application Research of Computers, 2020, 37(10):2907-2911, 2916.)
( Xia Tian. Extracting Key-Phrases from Chinese Scholarly Papers[J]. Data Analysis and Knowledge Discovery, 2020, 4(7):76-86.)
[4]
Siddiqi S, Sharan A. Keyword and Keyphrase Extraction Techniques: A Literature Review[J]. International Journal of Computer Applications, 2015, 109(2):18-23.
( Yu Yan, Shang Mingjie, Zhao Naixuan. Patent Keyword Extraction Driven by Claim Features[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(6):610-620.)
( Wang Zhihong, Guo Yi. Automatic Keywords Extraction from Chinese Patents Based on Sentence Importance Ranking[J]. Information Studies:Theory & Application, 2018, 41(9):123-129, 160.)
[7]
Zhang Y, Tuo M X, Yin Q Y, et al. Keywords Extraction with Deep Neural Network Model[J]. Neurocomputing, 2020, 383:113-121.
doi: 10.1016/j.neucom.2019.11.083
[8]
Chen Y, Wang J, Li P, et al. Single Document Keyword Extraction via Quantifying Higher-Order Structural Features of Word Co-occurrence Graph[J]. Computer Speech & Language, 2019, 57:98-107.
[9]
Qian Y L, Jia C C, Liu Y M. BERT-Based Text Keyword Extraction[J]. Journal of Physics: Conference Series, 2021, 1992(4):042077.
doi: 10.1088/1742-6596/1992/4/042077
( Yang Danhao, Wu Yuexin, Fan Chunxiao. Chinese Short Text Keyphrase Extraction Model Based on Attention[J]. Computer Science, 2020, 47(1):193-198.)
[12]
Duari S, Bhatnagar V. Complex Network Based Supervised Keyword Extractor[J]. Expert Systems with Applications, 2020, 140:112876.
doi: 10.1016/j.eswa.2019.112876
[13]
Huang Z X, Xie Z P. A Patent Keywords Extraction Method Using TextRank Model with Prior Public Knowledge[J]. Complex & Intelligent Systems, 2021. https://doi.org/10.1007/s40747-021-00343-8.
[14]
Ramay W Y, Xu C Y, Illahi I. Keyword Extraction from Social Media via AHP[J]. Human Systems Management, 2019, 37(4):463-468.
doi: 10.3233/HSM-180344
[15]
Duan X Y, Ying S, Cheng H L, et al. OILog: An Online Incremental Log Keyword Extraction Approach Based on MDP-LSTM Neural Network[J]. Information Systems, 2021, 95:101618.
doi: 10.1016/j.is.2020.101618
( Xue Jincheng, Jiang Di, Wu Jiande. Patent Text Classification Based on Long Short-Term Memory Network and Attention Mechanism[J]. Communications Technology, 2019, 52(12):2888-2892.)
( Xiang Jinyong, Liu Xiaolong, Ding Mingyang, et al. Convolutional Recurrent Deep Learning Model for Sentence Sentiment Classification[J]. Journal of Northeast Normal University(Natural Science Edition), 2020, 52(2):73-79.)
( Ning Shan, Yan Xin, Zhou Feng, et al. A News Keyword Extraction Method Combining LSTM and LDA Differences[J]. Computer Engineering and Science, 2020, 42(1):153-160.)