Patent Keyphrase Extraction Based on Patent Term and Layer Information
Yu Yan1,2(),Wang Li1,Zheng Siyu1
1Institute of Information Management and Technology, Nanjing Tech University, Nanjing 210009, China 2College of Electronic and Computer Engineering, Southeast University Chengxian College, Nanjing 210088, China
[Objective] This paper proposes a patent key phrase extraction method incorporating terminology and hierarchical information to improve the accuracy of patent key phrase extraction. It tries to improve the existing graph-based model, which tends to select long key phrases and ignores the phrases’ positional information. [Methods] Based on the traditional graph model, we constructed a new terminology degree metric to measure the terminological information of candidate key phrases. Considering the characteristics of patent documents, we divided patents into several hierarchies and used their weight metrics to measure the positional information of candidate key phrases. [Results] By incorporating terminology information, the F value of the new method improved by 7.615% (nanotechnology), 11.515% (image recognition), 9.813% (chip), and 8.839% (LCD). By incorporating the hierarchical information, the new method’s F value improved by 9.880% (nanotechnology), 6.929% (image recognition), 6.099% (chip), and 5.576% (LCD). [Limitations] The candidate key phrase selection method based on part-of-speech rules may produce more noise. [Conclusions] The proposed method effectively enhances the accuracy of patent key phrase extraction.
俞琰, 王丽, 郑斯煜. 融入术语与层级信息的专利关键短语抽取方法研究[J]. 数据分析与知识发现, 2023, 7(6): 99-112.
Yu Yan, Wang Li, Zheng Siyu. Patent Keyphrase Extraction Based on Patent Term and Layer Information. Data Analysis and Knowledge Discovery, 2023, 7(6): 99-112.
Mihalcea R, Tarau P. TextRank: Bringing Order into Texts[C]// Proceedings of the 2004 Empirical Methods in Natural Language Processing. ACL, 2004: 404-411.
[2]
Page L, Brin S, Motwani R, et al. The PageRank Citation Ranking: Bringing Order to the Web[C]// Proceedings of the 1999 International Conference of World Wide Web. 1999.
(Cui Zhenxin, Lu Haowen. A Method for Extraction of Keywords from Safety Information in Civil Aviation[J]. Journal of Transport Information and Safety, 2016, 34(5): 82-86.)
(Chen Yiqun, Zhou Ruqi, Zhu Weiheng, et al. Mining Patent Knowledge for Automatic Keyword Extraction[J]. Journal of Computer Research and Development, 2016, 53(8): 1740-1752.)
[5]
Hu J, Li S B, Yao Y, et al. Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification[J]. Entropy (Basel), 2018, 20(2): 104.
doi: 10.3390/e20020104
[6]
Das Gollapalli S, Li X L, Yang P. Incorporating Expert Knowledge into Keyphrase Extraction[C]// Proceedings of the 31st AAAI Conference on Artificial Intelligence. New York: ACM, 2017: 3180-3187.
[7]
Aquino G, Lanzarini L. Keyword Identification in Spanish Documents Using Neural Networks[J]. Journal of Computer Science and Technology, 2015, 15: 55-60.
[8]
Zhang Q, Wang Y, Gong Y Y, et al. Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. ACL, 2016: 836-845.
(Cheng Bin, Shi Shuicai, Du Yuncheng, et al. Keyword Extraction for Journals Based on Part-of-Speech and BiLSTM-CRF Combined Model[J]. Data Analysis and Knowledge Discovery, 2021, 5(3): 101-108.)
[10]
Wang L T, Li F. SJTULTLAB: Chunk Based Method for Keyphrase Extraction[C]// Proceedings of the 5th International Workshop on Semantic Evaluation. New York: ACM, 2010: 158-161.
[11]
Noh H, Jo Y, Lee S. Keyword Selection and Processing Strategy for Applying Text Mining to Patent Analysis[J]. Expert Systems with Applications, 2015, 42(9): 4348-4360.
doi: 10.1016/j.eswa.2015.01.050
(Niu Ping, Huang Degen. TF-IDF and Rules Based Automatic Extraction of Chinese Keywords[J]. Journal of Chinese Computer Systems, 2016, 37(4): 711-715.)
[16]
Joung J, Kim K. Monitoring Emerging Technologies for Technology Planning Using Technical Keyword Based Analysis from Patent Data[J]. Technological Forecasting and Social Change, 2017, 114: 281-292.
doi: 10.1016/j.techfore.2016.08.020
[17]
Nguyen K L, Shin B J, Yoo S J. Hot Topic Detection and Technology Trend Tracking for Patents Utilizing Term Frequency and Proportional Document Frequency and Semantic Information[C]// Proceedings of the 2016 International Conference on Big Data and Smart Computing. IEEE, 2016: 223-230.
[18]
Hofmann T. Probabilistic Latent Semantic Indexing[C]// Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 1999: 50-57.
[19]
Blei D M, Ng A Y, Jordan M I, et al. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3(1): 993-1022.
[20]
Song Y Q, Pan S M, Liu S X, et al. Topic and Keyword Re-Ranking for LDA-Based Topic Modeling[C]// Proceedings of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 1757-1760.
[21]
Wei H, Gao G, Su X. LDA-Based Word Image Representation for Keyword Spotting on Historical Mongolia Documents[C]// Proceedings of the 23rd International Conference on Neural Information Processing. Springer, 2016: 432-441.
(Gu Yijun, Xia Tian. Study on Keyword Extraction with LDA and TextRank Combination[J]. New Technology of Library and Information Service, 2014(7/8): 41-47.)
(Liu Xiaojian, Xie Fei, Wu Xindong. Graph Based Keyphrase Extraction Using LDA Topic Model[J]. Journal of the China Society for Scientific and Technical Information, 2016, 35(6): 664-672.)
(Ma Li, Jiao Licheng, Bai Lin, et al. Research on a Compound Keywords Detection Method Based on Small World Model[J]. Journal of Chinese Information Processing, 2009, 23(3): 121-128.)
(Zuo Xiaofei, Liu Huailiang, Fan Yunjie, et al. Research of Text Clustering Algorithm Based on Conceptual Semantic Field[J]. Journal of Intelligence, 2012, 31(5): 180-184.)
[26]
Boudin F. A Comparison of Centrality Measures for Graph-Based Keyphrase Extraction[C]// Proceedings of the IJCNLP 2013 Workshop on Natural Language Processing for Social Media. ACL, 2013: 834-838.
(Li Hang, Tang Chaolan, Yang Xian, et al. TextRank Keyword Extraction Based on Multi Feature Fusion[J]. Journal of Intelligence, 2017, 36(8): 183-187.)
[28]
Florescu C, Caragea C. PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1105-1115.
(Liu Zhuchen, Chen Hao, Yu Yanhua, et al. Extracting Keywords with TextRank and Weighted Word Positions[J]. Data Analysis and Knowledge Discovery, 2018, 2(9): 74-79.)
(Ning Jianfei, Liu Jiangzhen. Using Word2Vec with Text Rank to Extract Keywords[J]. New Technology of Library and Information Service, 2016(6): 20-27.)
[32]
Wang R, Liu W, McDonald C. Using Word Embeddings to Enhance Keyword Identification for Scientific Publications[C]// Proceedings of the 26th Australasian Database Conference on Databases Theory and Applications. ACM, 2015: 257-268.
[33]
Li D C, Li S J, Li W J, et al. A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases Through a Document Semantic Network[C]// Proceedings of the ACL 2010 Conference Short Papers. New York: ACM, 2010: 296-300.
[34]
Li D C, Li S J. Hypergraph-Based Inductive Learning for Generating Implicit Key Phrases[C]// Proceedings of the 20th International Conference Companion on World Wide Web. New York: ACM, 2011: 77-78.
[35]
Lynn H M, Choi C, Choi J, et al. The Method of Semi-Supervised Automatic Keyword Extraction for Web Documents Using Transition Probability Distribution Generator[C]// Proceedings of the 2016 International Conference on Research in Adaptive and Convergent Systems. New York: ACM, 2016: 1-6.
[36]
Frantzi K, Ananiadou S, Mima H. Automatic Recognition of Multi-Word Terms: The C-Value/NC-Value Method[J]. International Journal on Digital Libraries, 2000, 3(2): 115-130.
doi: 10.1007/s007999900023
[37]
Muralikumar J, Seelan S A, Vijayakumar N, et al. A Statistical Approach for Modeling Inter-Document Semantic Relationships in Digital Libraries[J]. Journal of Intelligent Information Systems, 2017, 48(3): 477-498.
doi: 10.1007/s10844-016-0423-6
[38]
Le T, Jeong D H. NLP-Based Approach to Semantic Classification of Heterogeneous Transportation Asset Data Terminology[J]. Journal of Computing in Civil Engineering, 2017, 31(6): Article No. 04017057.
[39]
Yan E J, Williams J, Chen Z. Understanding Disciplinary Vocabularies Using a Full-Text Enabled Domain-Independent Term Extraction Approach[J]. PLoS One, 2017, 12(11): Article No. e0187762.
[40]
Thanawala P, Pareek J. MwTExt: Automatic Extraction of Multi-Word Terms to Generate Compound Concepts within Ontology[J]. International Journal of Information Technology, 2018, 10(3): 303-311.
doi: 10.1007/s41870-018-0111-6
[41]
Bagheri A, Nadi S. Sentiment Miner: A Novel Unsupervised Framework for Aspect Detection from Customer Reviews[J]. International Journal of Computational Linguistics Research, 2018, 9(2): 120-130.
doi: 10.6025/jcl/2018/9/2/120-130
[42]
Haque R, Penkale S, Way A. TermFinder: Log-Likelihood Comparison and Phrase-Based Statistical Machine Translation Models for Bilingual Terminology Extraction[J]. Language Resources and Evaluation, 2018, 52(2): 365-400.
doi: 10.1007/s10579-018-9412-4
[43]
Li Z, Tong X, Zhang Y. Constructing the Phrase Dictionary and Visualizing Consumer Behaviors in the Food Industry Based on Online Reviews During the COVID-19 Pandemic[J]. CONVERTER, 2021: 624-632.
[44]
Lahiri S, Mihalcea R, Lai P H. Keyword Extraction from Emails[J]. Natural Language Engineering, 2017, 23(2): 295-317.
doi: 10.1017/S1351324916000231
(Wang Zhihong, Guo Yi. Automatic Keywords Extraction from Chinese Patents Based on Sentence Importance Ranking[J]. Information Studies: Theory & Application, 2018, 41(9): 123-129.)
doi: 10.16353/j.cnki.1000-7490.2018.09.021
[46]
Carletta J. Assessing Agreement on Classification Tasks: The Kappa Statistic[J]. Computational Linguistics, 1996, 22(2): 249-254.
(Lu Wei, Cheng Qikai. An Information Retrieval Model Based on Weighted Graph and Sentence[J]. Journal of the China Society for Scientific and Technical Information, 2013, 32(8): 797-804.)
(Liu Linqing, Yu Han, Fei Ning, et al. Key-Word Extracting Algorithm from Single Text Based on TextRank[J]. Application Research of Computers, 2018, 35(3): 705-710.)
[49]
Wan X, Xiao J. CollabRank: Towards a Collaborative Approach to Single-Document Keyphrase Extraction[C]// Proceedings of the 22nd International Conference on Computational Linguistics. 2008: 969-976.