Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (10): 93-103    DOI: 10.11925/infotech.2096-3467.2020.0272
Current Issue | Archive | Adv Search |
Unsupervised Cross-Language Model for Patent Recommendation Based on Representation
Zhang Jinzhu1,2(),Zhu Lipeng1,Liu Jingjie1
1School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China
2Jiangsu Provincial Social Public Safety Science and Technology Collaborative Innovation Center, anjing 210094, China
Download: PDF (1501 KB)   HTML ( 6
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper designs a cross-language recommendation model for patents based on text semantic representation, aiming to reduce the number of bilingual dictionaries and large-scale corpus, as well as improve the ability of domain adaptation.[Methods] First, we designed a word vector mapping method with unsupervised cross-language algorithm. Then, we mapped Chinese and English word vectors to the unified semantic vector space with linear transformation, which constructed the semantic mapping relationship between Chinese and English words. Third, we created semantic representation of patent texts based on cross-language word vector with smooth inverse frequency (SIF) reweighting method. It realized the semantic representation of Chinese-English patent texts in the same vector space. Finally, we calculated the semantic similarity between patent texts and recommend the cross-language patents.[Results] We examined the proposed method with patents on “wireless communication” and the recommendation accuracy rate of the top 1 and the top 5 reached 55.63% and 77.82%, which were 0.66% and 1.45% higher than those of the weak supervised based cross-language recommendation. They were also 4.29% and 3.90% better than the machine translation based ones.[Limitations] We only examined the proposed method with Chinese and English patents from one specific field.[Conclusions] This proposed method could recommend Chinese and English patents effectively, which help future research in cross-language patent recommendations.

Key wordsCross-Language      Patent Recommendation      Representation Learning      Semantic Representation     
Received: 31 March 2020      Published: 28 July 2020
ZTFLH:  G254  
Corresponding Authors: Zhang Jinzhu     E-mail: zhangjinzhu@njust.edu.cn

Cite this article:

Zhang Jinzhu,Zhu Lipeng,Liu Jingjie. Unsupervised Cross-Language Model for Patent Recommendation Based on Representation. Data Analysis and Knowledge Discovery, 2020, 4(10): 93-103.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0272     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I10/93

Visualization of Patent Word Vectors Before Mapping
Visualization of Patent Word Vectors After Mapping
检索方法 CSLS KNN
弱监督 46.72 43.68
无监督 49.08 46.87
Accuracy of Cross-Language Word Mapping
Influence of the Number of Patent Words on Mapping Accuracy
中文单词 英文映射单词 常见匹配单词
移动终端 mobile-terminal; terminal; mobile-phone mobile-terminal
接入点 access-point; AP; access-points access-point
选择 selecting; selection; selected select
检测 reducing; reduced; reduce reduce
快速的 quickly; rapid; rapidly fast
准确的 accurately; accuracy; accurate accurate
Examples of Cross-Language Patent Word Mapping
Visualization of Patent Text Representation
跨语言专利推荐方法 Top-1 Accuracy Top-5 Accuracy
机器翻译 51.34 73.92
无监督+平均词向量 33.75 56.50
无监督+TF-IDF 42.01 65.45
弱监督+SIF 54.97 76.37
无监督+SIF 55.63 77.82
Accuracy of Cross-language Patent Recommendation
Examples of Cross-language Patent Recommendation
[1] Jochim C, Lioma C, Schütze H, et al. Preliminary Study into Query Translation for Patent Retrieval[C]//Proceedings of the 3rd Workshop on Patent Information Retrieval. 2010: 57-66.
[2] Magdy W, Jones G J F. Studying Machine Translation Technologies for Large-Data CLIR Tasks: A Patent Prior-Art Search Case Study[J]. Information Retrieval, 2014,17(5):492-519.
doi: 10.1007/s10791-013-9231-6
[3] Magdy W, Jones G J F. An Efficient Method for Using Machine Translation Technologies in Cross-Language Patent Search[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. 2011: 1925-1928.
[4] Shen X, Huang H Y, Li L Z, et al. A Parallel Cross-Language Retrieval System for Patent Documents[C]//Proceedings of the 6th IEEE International Conference on Software Engineering and Service Science. 2015: 672-676.
[5] Lee C S, Wang M H, Hsiao Y C, et al. Ontology-Based GFML Agent for Patent Technology Requirement Evaluation and Recommendation[J]. Soft Computing, 2019,23(2):537-556.
doi: 10.1007/s00500-017-2859-1
[6] Ji X, Gu X J, Dai F, et al. Patent Collaborative Filtering Recommendation Approach Based on Patent Similarity[C]//Proceedings of the 8th International Conference on Fuzzy Systems and Knowledge Discovery. 2011: 1699-1703.
[7] Rui X H, Min D. HIM-PRS: A Patent Recommendation System Based on Hierarchical Index-Based MapReduce Framework[C]//Proceedings of UCAWSN 2016, CUTE 2016, CSA 2016: Advances in Computer Science and Ubiquitous Computing. 2016: 843-848.
[8] 李枫林, 柯佳. 词向量语义表示研究进展[J]. 情报科学, 2019,37(5):155-165.
[8] ( Li Fenglin, Ke Jia. Research Progress of Word Vector Semantic Representation[J]. Information Science, 2019,37(5):155-165.)
[9] 涂存超, 杨成, 刘知远, 等. 网络表示学习综述[J]. 中国科学:信息科学, 2017,47(8):980-996.
[9] ( Tu Cunchao, Yang Cheng, Liu Zhiyuan, et al. Network Representation Learning: An Overview[J]. SCIENTIA SINICA Informationis, 2017,47(8):980-996.)
[10] 刘知远, 孙茂松, 林衍凯, 等. 知识表示学习研究进展[J]. 计算机研究与发展, 2016,53(2):247-261.
[10] ( Liu Zhiyuan, Sun Maosong, Lin Yankai, et al. Knowledge Representation Learning: A Review[J]. Journal of Computer Research and Development, 2016,53(2):247-261.)
[11] 彭晓娅, 周栋. 跨语言词向量研究综述[J]. 中文信息学报, 2020,34(2):1-15.
[11] ( Peng Xiaoya, Zhou Dong. Survey of Cross-Lingual Word Embedding[J]. Journal of Chinese Information Processing, 2020,34(2):1-15.)
[12] Mikolov T, Le Q V, Sutskever I. Exploiting Similarities among Languages for Machine Translation[OL].arXiv Preprint, arXiv: 1309.4168.
[13] Dinu G, Baroni M. Improving Zero-Shot Learning by Mitigating the Hubness Problem[C]// Proceedings of the 3rd International Conference on Learning Representations. 2014. DOI: 10.1007/978-3-319-23528-8_9.
[14] Faruqui M, Dyer C. Improving Vector Space Word Representations Using Multilingual Correlation[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. 2014: 462-471.
[15] Lu A, Wang W, Bansal M, et al. Deep Multilingual Correlation for Improved Word Embeddings[C]//Proceedings of 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 250-256.
[16] Smith S L, Turban D H P, Hamblin S, et al. Offline Bilingual Word Vectors, Orthogonal Transformations and the Inverted Softmax[C]//Proceedings of the 5th International Conference on Learning Representations. 2017.
[17] Xing C, Wang D, Liu C, et al. Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation[C]//Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015: 1006-1011.
[18] Barone A V M. Towards Cross-Lingual Distributed Representations Without Parallel Text Trained with Adversarial Autoencoders[C]//Proceedings of the 1st Workshop on Representation Learning for NLP. 2016: 121-126.
[19] Zhang M, Liu Y, Luan H B, et al. Adversarial Training for Unsupervised Bilingual Lexicon Induction[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. DOI: 10.18653/v1/P17-1179.
[20] Artetxe M, Labaka G, Agirre E. A Robust Self-Learning Method for Fully Unsupervised Cross-Lingual Mappings of Word Embeddings[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. DOI: 10.18653/v1/P18-1073.
[21] Arora S, Liang Y, Ma T. A Simple but Tough-to-Beat Baseline for Sentence Embeddings[C]//Proceedings of the 5th International Conference on Learning Representations. 2017.
[22] Conneau A, Lample G, Ranzato M A, et al. Word Translation Without Parallel Data[C]//Proceedings of the 6th International Conference on Learning Representations. 2017.
[23] Artetxe M, Labaka G, Agirre E. Learning Bilingual Word Embeddings with (Almost) no Bilingual Data[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 451-462.
[24] Oh S, Lei Z, Lee W C, et al. CV-PCR: A Context-Guided Value-Driven Framework for Patent Citation Recommendation[C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013: 2291-2296.
[1] Li Wenna, Zhang Zhixiong. Entity Alignment Method for Different Knowledge Repositories with Joint Semantic Representation[J]. 数据分析与知识发现, 2021, 5(7): 1-9.
[2] Chen Wenjie,Wen Yi,Yang Ning. Fuzzy Overlapping Community Detection Algorithm Based on Node Vector Representation[J]. 数据分析与知识发现, 2021, 5(5): 41-50.
[3] Xu Zheng,Le Xiaoqiu. Generating AND-OR Logical Expressions for Semantic Features of Categorical Documents[J]. 数据分析与知识发现, 2021, 5(5): 95-103.
[4] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[5] Zhang Jinzhu, Yu Wenqian. Topic Recognition and Key-Phrase Extraction with Phrase Representation Learning[J]. 数据分析与知识发现, 2021, 5(2): 50-60.
[6] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[7] Yu Chuanming, Wang Manyi, Lin Hongjun, Zhu Xingyu, Huang Tingting, An Lu. A Comparative Study of Word Representation Models Based on Deep Learning[J]. 数据分析与知识发现, 2020, 4(8): 28-40.
[8] Zhang Chunjin,Guo Shenghui,Ji Shujuan,Yang Wei,Yi Lei. Group Recommendation Algorithms Based on Implicit Representation Learning of Multi-attribute Ratings[J]. 数据分析与知识发现, 2020, 4(12): 120-135.
[9] Ding Yong,Chen Xi,Jiang Cuiqing,Wang Zhao. Predicting Online Ratings with Network Representation Learning and XGBoost[J]. 数据分析与知识发现, 2020, 4(11): 52-62.
[10] Chuanming Yu,Haonan Li,Manyi Wang,Tingting Huang,Lu An. Knowledge Representation Based on Deep Learning:Network Perspective[J]. 数据分析与知识发现, 2020, 4(1): 63-75.
[11] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[12] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[13] Qingtian Zeng,Mingdi Dai,Chao Li,Hua Duan,Zhongying Zhao. Discovering Important Locations with User Representation and Trace Data[J]. 数据分析与知识发现, 2019, 3(6): 75-82.
[14] Jinzhu Zhang,Yiming Hu. Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning[J]. 数据分析与知识发现, 2019, 3(5): 68-76.
[15] Jinzhu Zhang,Yue Wang,Yiming Hu. Analyzing Sci-Tech Topics Based on Semantic Representation of Patent References[J]. 数据分析与知识发现, 2019, 3(12): 52-60.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn