Analyzing Sci-Tech Topics Based on Semantic Representation of Patent References
Jinzhu Zhang1,2(),Yue Wang1,Yiming Hu1
1 School of Economics and Management, Nanjing University of Science and Technology, Nanjing 210094, China 2 Jiangsu Collaborative Innovation Center of Social Safety Science and Technology, Nanjing 210094, China
[Objective] This paper explores the content mining method for scientific references in patent (SRP) based on text semantic representation. It also improves the accuracy, comprehensiveness and interpretability of knowledge flow analysis. [Methods] Firstly, we extracted keywords and abstracts from patents to represent the SRPs and created vectors for these items. Then, we computed the distance between vectors to calculate their semantic similarities. Finally, we obtained and mapped the topics of patents and SRP contents from the field of nanotechnology. [Results] We found our method could map relationship among sci-tech topics from the content perspective effectively. [Limitations] We only conducted exploratory research with abstracts and keywords rather than full texts. [Conclusions] The proposed method improves the knowledge flow analysis of patents.
( Zhang Jinzhu, Zhang Xiaolin . Identification of Radical Innovation Based on Mutation of Cited Scientific Knowledge[J]. Journal of the China Society for Scientific and Technical Information, 2014,33(3):259-266.)
( Zhao Zhiyun, Lei Xiaoping . Analysis of Scientific Linkage Between China’s Technology Innovation and Basic Research in Biotechnology Industry Based on Patent Citation[J]. Journal of the China Society for Scientific and Technical Information, 2012,31(12):1283-1289.)
Mikolov T, Chen K, Corrado G S , et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
Le Q, Mikolov T . Distributed Representations of Sentences and Documents [C]//Proceedings of the 31st International Conference on Machine Learning, 2014: 1188-1196.
Mahata D, Kuriakose J, Shah R R , et al. Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles Using Phrase Embeddings [C]//Proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018: 634-639.
Pagliardini M, Gupta P, Jaggi M . Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features [C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics. 2017: 528-540.
Saha T K, Joty S, Al Hasan M . Con-S2V: A Generic Framework for Incorporating Extra-Sentential Context into Sen2Vec [C]//Proceedings of ECML PKDD: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2017: 753-769.
Tian H, Zhuo H H . Paper2vec: Citation-Context Based Document Distributed Representation for Scholar Recommendation[OL]. arXiv Preprint, arXiv: 1703.06587.
Jain S, Howe B, Yan J , et al. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics[OL]. arXiv Preprint, arXiv: 1801.05613.
Han J, Song Y, Zhao W X , et al. Hyperdoc2vec: Distributed Representations of Hypertext Documents[OL]. arXiv Preprint, arXiv: 1805. 03793.