[Objective] This paper proposes an algorithm to identify potential collaboration opportunities for patents with the LDA and decision tree models, aiming to enhance the cross-region innovation. [Methods] First, we retrieved 22 855 patents from the incoPat database, which were developed by higher education institutions from Guangdong Province and Wuhan City. Then, we used the LDA to extract and cluster patent topics. Third, we constructed decision tree to identify the best potential cooperative relations by adjusting the decision boundaries. Finally, we chose the optimal data mining strategy based on the effective size of the inventors’ network, which helps us identify and recommend cooperative relationships. [Results] We found 18 pairs of potential cross-regional partners from the top four patent categories in the data set, which was much better than the link prediction method. [Limitations] The coverage of patent data needs to be expanded. More research is also needed to study the impacts of the university and industry on the innovation ecology. [Conclusions] The proposed method could identify the potential cross region partners for patents and innovation.
陈浩, 张梦毅, 程秀峰. 融合主题模型与决策树的跨地区专利合作关系发现与推荐*——以广东省和武汉市高校专利库为例[J]. 数据分析与知识发现, 2021, 5(10): 37-50.
Chen Hao, Zhang Mengyi, Cheng Xiufeng. Identifying Cross-Region Patent Collaboration Opportunities Using LDA and Decision Trees——Case Study of Universities from Guangdong and Wuhan. Data Analysis and Knowledge Discovery, 2021, 5(10): 37-50.
(Yang Yong, Wang Luhan. Research on the Structure and Evolution Characteristics of Patent Cooperation Network in China[J]. Studies in Science of Science, 2020, 38(7): 1227-1235.)
Liben-Nowell D, Kleinberg J. The Link-prediction Problem for Social Networks[J]. Journal of the American Society for Information Science and Technology, 2007, 58(7): 1019-1031.
(Zhang Jinzhu, Han Tao, Wang Xiaomei. Co-authorship Prediction in the Author-Keyword Bipartite Networks[J]. Library and Information Service, 2016, 60(21): 74-80.)
Luong N T, Nguyen T T, Jung J J, et al. Discovering Co-author Relationship in Bibliographic Data Using Similarity Measures and Random Walk Model [C]//Proceedings of Asian Conference on Intelligent Information and Database Systems. Springer, Cham, 2015: 127-136.
Zhang J Z. Uncovering Mechanisms of Co-authorship Evolution by Multirelations-Based Link Prediction[J]. Information Processing & Management, 2017, 53(1): 42-51.
Lee D H, Brusilovsky P, Schleyer T. Recommending Collaborators Using Social Features and MeSH Terms[J]. Proceedings of the American Society for Information Science and Technology, 2011, 48(1): 1-10.
(Liu Jing, Sun Wei. Discovery of Potential Scientific and Technical Collaborative Relationship Based on Link Prediction[J]. Information Studies: Theory & Application, 2017, 40(7): 88-92, 121.)
Eslami H, Ebadi A, Schiffauerova A. Effect of Collaboration Network Structure on Knowledge Creation and Technological Performance: The Case of Biotechnology in Canada[J]. Scientometrics, 2013, 97(1): 99-119.
(Liu Tingting, Wu Jie, Zhang Yujie. Research on the University's Knowledge Innovation Capability in the Industry University-Institute Cooperation Using System Dynamics—Based on the Knowledge Transfer Perspective[J]. Journal of Intelligence, 2012, 31(10): 195-200.)
Yan B W, Luo J X. Measuring Technological Distance for Patent Mapping[J]. Journal of the Association for Information Science and Technology, 2017, 68(2): 423-437.
(Liu Guifeng, Lu Zhangping, Liu Qiong, et al. A Study on University-industry Patent Cooperation in Jiangsu Province Based on Social Network Analysis[J]. Journal of Intelligence, 2015, 34(1): 122-126.)
Rosen Z M, Griffiths T, Steyvers M, et al. The Author-Topic Model for Authors and Document [C]//Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004: 487-494.
Hirschanman A O. National Power and the Structure of Foreign Trade[M]. University of California Press, 1980: 98-99.
Hamburg B, Hoffmann M, Keller J. Consumption, Wealth and Business Cycles in Germany[J]. Empirical Economics, 2008, 34(3): 451-476.
Hwang C L, Yoon K. Multiple Attribute Decision Making: Methods and Applications[M]. New York: Springer-Verlag, 1981: 58-191.
Shannon C E. A mathematical Theory of Communication[J]. The Bell System Technical Journal, 1948, 27(3): 379-423.
Arthur D, Vassilvitskii S. K-Means++: The Advantages of Careful Seeding [C]// Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 2007: 1027-1035.
Han J, Kambr M. Data Mining: Concepts and Techniques[M]. San Francisco: Morgan Kaufmann Publishers, 2001: 279-333.
Guan J C, Liu N. Exploitative and Exploratory Innovations in Knowledge Network and Collaboration Network: A Patent Analysis in the Technological Field of Nano-energy[J]. Research Policy, 2016, 45(1): 97-112.