Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (10): 93-103    DOI: 10.11925/infotech.2096-3467.2020.0272
Current Issue | Archive | Adv Search |
Unsupervised Cross-Language Model for Patent Recommendation Based on Representation
Zhang Jinzhu1,2(),Zhu Lipeng1,Liu Jingjie1
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2Jiangsu Provincial Social Public Safety Science and Technology Collaborative Innovation Center, anjing 210094, China
Download: PDF (1501 KB)   HTML ( 2
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper designs a cross-language recommendation model for patents based on text semantic representation, aiming to reduce the number of bilingual dictionaries and large-scale corpus, as well as improve the ability of domain adaptation.[Methods] First, we designed a word vector mapping method with unsupervised cross-language algorithm. Then, we mapped Chinese and English word vectors to the unified semantic vector space with linear transformation, which constructed the semantic mapping relationship between Chinese and English words. Third, we created semantic representation of patent texts based on cross-language word vector with smooth inverse frequency (SIF) reweighting method. It realized the semantic representation of Chinese-English patent texts in the same vector space. Finally, we calculated the semantic similarity between patent texts and recommend the cross-language patents.[Results] We examined the proposed method with patents on “wireless communication” and the recommendation accuracy rate of the top 1 and the top 5 reached 55.63% and 77.82%, which were 0.66% and 1.45% higher than those of the weak supervised based cross-language recommendation. They were also 4.29% and 3.90% better than the machine translation based ones.[Limitations] We only examined the proposed method with Chinese and English patents from one specific field.[Conclusions] This proposed method could recommend Chinese and English patents effectively, which help future research in cross-language patent recommendations.

Key wordsCross-Language      Patent Recommendation      Representation Learning      Semantic Representation     
Received: 31 March 2020      Published: 28 July 2020
ZTFLH:  G254  
Corresponding Authors: Zhang Jinzhu     E-mail: zhangjinzhu@njust.edu.cn

Cite this article:

Zhang Jinzhu,Zhu Lipeng,Liu Jingjie. Unsupervised Cross-Language Model for Patent Recommendation Based on Representation. Data Analysis and Knowledge Discovery, 2020, 4(10): 93-103.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0272     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I10/93

Visualization of Patent Word Vectors Before Mapping
Visualization of Patent Word Vectors After Mapping
检索方法 CSLS KNN
弱监督 46.72 43.68
无监督 49.08 46.87
Accuracy of Cross-Language Word Mapping
Influence of the Number of Patent Words on Mapping Accuracy
中文单词 英文映射单词 常见匹配单词
移动终端 mobile-terminal; terminal; mobile-phone mobile-terminal
接入点 access-point; AP; access-points access-point
选择 selecting; selection; selected select
检测 reducing; reduced; reduce reduce
快速的 quickly; rapid; rapidly fast
准确的 accurately; accuracy; accurate accurate
Examples of Cross-Language Patent Word Mapping
Visualization of Patent Text Representation
跨语言专利推荐方法 Top-1 Accuracy Top-5 Accuracy
机器翻译 51.34 73.92
无监督+平均词向量 33.75 56.50
无监督+TF-IDF 42.01 65.45
弱监督+SIF 54.97 76.37
无监督+SIF 55.63 77.82
Accuracy of Cross-language Patent Recommendation
Examples of Cross-language Patent Recommendation
[1] Jochim C, Lioma C, Schütze H, et al. Preliminary Study into Query Translation for Patent Retrieval[C]//Proceedings of the 3rd Workshop on Patent Information Retrieval. 2010: 57-66.
[2] Magdy W, Jones G J F. Studying Machine Translation Technologies for Large-Data CLIR Tasks: A Patent Prior-Art Search Case Study[J]. Information Retrieval, 2014,17(5):492-519.
doi: 10.1007/s10791-013-9231-6
[3] Magdy W, Jones G J F. An Efficient Method for Using Machine Translation Technologies in Cross-Language Patent Search[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011: 1925-1928.
[4] Shen X, Huang H Y, Li L Z, et al. A Parallel Cross-Language Retrieval System for Patent Documents[C]//Proceedings of the 6th IEEE International Conference on Software Engineering and Service Science. 2015: 672-676.
[5] Lee C S, Wang M H, Hsiao Y C, et al. Ontology-Based GFML Agent for Patent Technology Requirement Evaluation and Recommendation[J]. Soft Computing, 2019,23(2):537-556.
doi: 10.1007/s00500-017-2859-1
[6] Ji X, Gu X J, Dai F, et al. Patent Collaborative Filtering Recommendation Approach Based on Patent Similarity[C]//Proceedings of the 8th International Conference on Fuzzy Systems and Knowledge Discovery. 2011: 1699-1703.
[7] Rui X H, Min D. HIM-PRS: A Patent Recommendation System Based on Hierarchical Index-Based MapReduce Framework[C]//Proceedings of UCAWSN 2016, CUTE 2016, CSA 2016: Advances in Computer Science and Ubiquitous Computing. 2016: 843-848.
[8] 李枫林, 柯佳. 词向量语义表示研究进展[J]. 情报科学, 2019,37(5):155-165.
[8] ( Li Fenglin, Ke Jia. Research Progress of Word Vector Semantic Representation[J]. Information Science, 2019,37(5):155-165.)
[9] 涂存超, 杨成, 刘知远, 等. 网络表示学习综述[J]. 中国科学:信息科学, 2017,47(8):980-996.
[9] ( Tu Cunchao, Yang Cheng, Liu Zhiyuan, et al. Network Representation Learning: An Overview[J]. SCIENTIA SINICA Informationis, 2017,47(8):980-996.)
[10] 刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016,53(2):247-261.
[10] ( Liu Zhiyuan, Sun Maosong, Lin Yankai, et al. Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016,53(2):247-261.)
[11] 彭晓娅, 周栋. 跨语言词向量研究综述[J]. 中文信息学报, 2020,34(2):1-15.
[11] ( Peng Xiaoya, Zhou Dong. Survey of Cross-Lingual Word Embedding[J]. Journal of Chinese Information Processing, 2020,34(2):1-15.)
[12] Mikolov T, Le Q V, Sutskever I. Exploiting Similarities among Languages for Machine Translation[OL].arXiv Preprint, arXiv: 1309.4168.
[13] Dinu G, Baroni M. Improving Zero-Shot Learning by Mitigating the Hubness Problem[C]// Proceedings of the 3rd International Conference on Learning Representations. 2014. DOI: 10.1007/978-3-319-23528-8_9.
[14] Faruqui M, Dyer C. Improving Vector Space Word Representations Using Multilingual Correlation[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 2014: 462-471.
[15] Lu A, Wang W, Bansal M, et al. Deep Multilingual Correlation for Improved Word Embeddings[C]//Proceedings of 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 250-256.
[16] Smith S L, Turban D H P, Hamblin S, et al. Offline Bilingual Word Vectors, Orthogonal Transformations and the Inverted Softmax[C]//Proceedings of the 5th International Conference on Learning Representations. 2017.
[17] Xing C, Wang D, Liu C, et al. Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 1006-1011.
[18] Barone A V M. Towards Cross-Lingual Distributed Representations Without Parallel Text Trained with Adversarial Autoencoders[C]//Proceedings of the 1st Workshop on Representation Learning for NLP. 2016: 121-126.
[19] Zhang M, Liu Y, Luan H B, et al. Adversarial Training for Unsupervised Bilingual Lexicon Induction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. DOI: 10.18653/v1/P17-1179.
[20] Artetxe M, Labaka G, Agirre E. A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. DOI: 10.18653/v1/P18-1073.
[21] Arora S, Liang Y, Ma T. A Simple but Tough-to-Beat Baseline for Sentence Embeddings[C]//Proceedings of the 5th International Conference on Learning Representations. 2017.
[22] Conneau A, Lample G, Ranzato M A, et al. Word Translation Without Parallel Data[C]//Proceedings of the 6th International Conference on Learning Representations. 2017.
[23] Artetxe M, Labaka G, Agirre E. Learning Bilingual Word Embeddings with (Almost) no Bilingual Data[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 451-462.
[24] Oh S, Lei Z, Lee W C, et al. CV-PCR: A Context-Guided Value-Driven Framework for Patent Citation Recommendation[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013: 2291-2296.
[1] Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[2] Chuanming Yu,Haonan Li,Manyi Wang,Tingting Huang,Lu An. Knowledge Representation Based on Deep Learning:Network Perspective[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[3] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[4] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[5] Qingtian Zeng,Mingdi Dai,Chao Li,Hua Duan,Zhongying Zhao. Discovering Important Locations with User Representation and Trace Data[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[6] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[7] Jinzhu Zhang,Yue Wang,Yiming Hu. Analyzing Sci-Tech Topics Based on Semantic Representation of Patent References[J]. 数据分析与知识发现, 2019, 3(12): 52-60.
[8] Yu Chuanming,Feng Bolin,An Lu. Sentiment Analysis in Cross-Domain Environment with Deep Representative Learning[J]. 数据分析与知识发现, 2017, 1(7): 73-81.
[9] Deng Sanhong,Wan Jiexi,Wang Hao,Liu Xiwen. Experimental Study of Multilingual Text Clustering[J]. 现代图书情报技术, 2014, 30(1): 28-35.
[10] Liu Sa Zhang Chengzhi. Survey of Multilingual Document Representation[J]. 现代图书情报技术, 2010, 26(6): 33-41.
[11] Zhang Liyi,Zhang Zhenyun. A New Cross-Language Commodity Information Retrieval Approach in Book Searching[J]. 现代图书情报技术, 2010, 26(1): 9-14.
[12] Wu Dan. Design and Implementation of an English-Chinese Interactive Cross-Language Information Retrieval System[J]. 现代图书情报技术, 2009, 3(2): 89-95.
[13] Wang Miaoya,Lai Maosheng. Query Translation Techniques and It’s Research Development in  Cross-Language Information Retrieval[J]. 现代图书情报技术, 2005, 21(4): 37-41.
[14] Huang Guocai. Design of Cross-language Meta Search Engine[J]. 现代图书情报技术, 2001, 17(4): 31-33.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn