|
|
Unsupervised Cross-Language Model for Patent Recommendation Based on Representation |
Zhang Jinzhu1,2(),Zhu Lipeng1,Liu Jingjie1 |
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China 2Jiangsu Provincial Social Public Safety Science and Technology Collaborative Innovation Center, anjing 210094, China |
|
|
Abstract [Objective] This paper designs a cross-language recommendation model for patents based on text semantic representation, aiming to reduce the number of bilingual dictionaries and large-scale corpus, as well as improve the ability of domain adaptation.[Methods] First, we designed a word vector mapping method with unsupervised cross-language algorithm. Then, we mapped Chinese and English word vectors to the unified semantic vector space with linear transformation, which constructed the semantic mapping relationship between Chinese and English words. Third, we created semantic representation of patent texts based on cross-language word vector with smooth inverse frequency (SIF) reweighting method. It realized the semantic representation of Chinese-English patent texts in the same vector space. Finally, we calculated the semantic similarity between patent texts and recommend the cross-language patents.[Results] We examined the proposed method with patents on “wireless communication” and the recommendation accuracy rate of the top 1 and the top 5 reached 55.63% and 77.82%, which were 0.66% and 1.45% higher than those of the weak supervised based cross-language recommendation. They were also 4.29% and 3.90% better than the machine translation based ones.[Limitations] We only examined the proposed method with Chinese and English patents from one specific field.[Conclusions] This proposed method could recommend Chinese and English patents effectively, which help future research in cross-language patent recommendations.
|
Received: 31 March 2020
Published: 28 July 2020
|
|
Corresponding Authors:
Zhang Jinzhu
E-mail: zhangjinzhu@njust.edu.cn
|
[1] |
Jochim C, Lioma C, Schütze H, et al. Preliminary Study into Query Translation for Patent Retrieval[C]//Proceedings of the 3rd Workshop on Patent Information Retrieval. 2010: 57-66.
|
[2] |
Magdy W, Jones G J F. Studying Machine Translation Technologies for Large-Data CLIR Tasks: A Patent Prior-Art Search Case Study[J]. Information Retrieval, 2014,17(5):492-519.
doi: 10.1007/s10791-013-9231-6
|
[3] |
Magdy W, Jones G J F. An Efficient Method for Using Machine Translation Technologies in Cross-Language Patent Search[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011: 1925-1928.
|
[4] |
Shen X, Huang H Y, Li L Z, et al. A Parallel Cross-Language Retrieval System for Patent Documents[C]//Proceedings of the 6th IEEE International Conference on Software Engineering and Service Science. 2015: 672-676.
|
[5] |
Lee C S, Wang M H, Hsiao Y C, et al. Ontology-Based GFML Agent for Patent Technology Requirement Evaluation and Recommendation[J]. Soft Computing, 2019,23(2):537-556.
doi: 10.1007/s00500-017-2859-1
|
[6] |
Ji X, Gu X J, Dai F, et al. Patent Collaborative Filtering Recommendation Approach Based on Patent Similarity[C]//Proceedings of the 8th International Conference on Fuzzy Systems and Knowledge Discovery. 2011: 1699-1703.
|
[7] |
Rui X H, Min D. HIM-PRS: A Patent Recommendation System Based on Hierarchical Index-Based MapReduce Framework[C]//Proceedings of UCAWSN 2016, CUTE 2016, CSA 2016: Advances in Computer Science and Ubiquitous Computing. 2016: 843-848.
|
[8] |
李枫林, 柯佳. 词向量语义表示研究进展[J]. 情报科学, 2019,37(5):155-165.
|
[8] |
( Li Fenglin, Ke Jia. Research Progress of Word Vector Semantic Representation[J]. Information Science, 2019,37(5):155-165.)
|
[9] |
涂存超, 杨成, 刘知远, 等. 网络表示学习综述[J]. 中国科学:信息科学, 2017,47(8):980-996.
|
[9] |
( Tu Cunchao, Yang Cheng, Liu Zhiyuan, et al. Network Representation Learning: An Overview[J]. SCIENTIA SINICA Informationis, 2017,47(8):980-996.)
|
[10] |
刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016,53(2):247-261.
|
[10] |
( Liu Zhiyuan, Sun Maosong, Lin Yankai, et al. Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016,53(2):247-261.)
|
[11] |
彭晓娅, 周栋. 跨语言词向量研究综述[J]. 中文信息学报, 2020,34(2):1-15.
|
[11] |
( Peng Xiaoya, Zhou Dong. Survey of Cross-Lingual Word Embedding[J]. Journal of Chinese Information Processing, 2020,34(2):1-15.)
|
[12] |
Mikolov T, Le Q V, Sutskever I. Exploiting Similarities among Languages for Machine Translation[OL].arXiv Preprint, arXiv: 1309.4168.
|
[13] |
Dinu G, Baroni M. Improving Zero-Shot Learning by Mitigating the Hubness Problem[C]// Proceedings of the 3rd International Conference on Learning Representations. 2014. DOI: 10.1007/978-3-319-23528-8_9.
|
[14] |
Faruqui M, Dyer C. Improving Vector Space Word Representations Using Multilingual Correlation[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 2014: 462-471.
|
[15] |
Lu A, Wang W, Bansal M, et al. Deep Multilingual Correlation for Improved Word Embeddings[C]//Proceedings of 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 250-256.
|
[16] |
Smith S L, Turban D H P, Hamblin S, et al. Offline Bilingual Word Vectors, Orthogonal Transformations and the Inverted Softmax[C]//Proceedings of the 5th International Conference on Learning Representations. 2017.
|
[17] |
Xing C, Wang D, Liu C, et al. Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 1006-1011.
|
[18] |
Barone A V M. Towards Cross-Lingual Distributed Representations Without Parallel Text Trained with Adversarial Autoencoders[C]//Proceedings of the 1st Workshop on Representation Learning for NLP. 2016: 121-126.
|
[19] |
Zhang M, Liu Y, Luan H B, et al. Adversarial Training for Unsupervised Bilingual Lexicon Induction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. DOI: 10.18653/v1/P17-1179.
|
[20] |
Artetxe M, Labaka G, Agirre E. A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. DOI: 10.18653/v1/P18-1073.
|
[21] |
Arora S, Liang Y, Ma T. A Simple but Tough-to-Beat Baseline for Sentence Embeddings[C]//Proceedings of the 5th International Conference on Learning Representations. 2017.
|
[22] |
Conneau A, Lample G, Ranzato M A, et al. Word Translation Without Parallel Data[C]//Proceedings of the 6th International Conference on Learning Representations. 2017.
|
[23] |
Artetxe M, Labaka G, Agirre E. Learning Bilingual Word Embeddings with (Almost) no Bilingual Data[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 451-462.
|
[24] |
Oh S, Lei Z, Lee W C, et al. CV-PCR: A Context-Guided Value-Driven Framework for Patent Citation Recommendation[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013: 2291-2296.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|