Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (10): 37-50    DOI: 10.11925/infotech.2096-3467.2021.0194
Current Issue | Archive | Adv Search |
Identifying Cross-Region Patent Collaboration Opportunities Using LDA and Decision Trees——Case Study of Universities from Guangdong and Wuhan
Chen Hao(),Zhang Mengyi,Cheng Xiufeng
School of Information Management, Central China Normal University, Wuhan 430079, China
Download: PDF (1569 KB)   HTML ( 4
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes an algorithm to identify potential collaboration opportunities for patents with the LDA and decision tree models, aiming to enhance the cross-region innovation. [Methods] First, we retrieved 22 855 patents from the incoPat database, which were developed by higher education institutions from Guangdong Province and Wuhan City. Then, we used the LDA to extract and cluster patent topics. Third, we constructed decision tree to identify the best potential cooperative relations by adjusting the decision boundaries. Finally, we chose the optimal data mining strategy based on the effective size of the inventors’ network, which helps us identify and recommend cooperative relationships. [Results] We found 18 pairs of potential cross-regional partners from the top four patent categories in the data set, which was much better than the link prediction method. [Limitations] The coverage of patent data needs to be expanded. More research is also needed to study the impacts of the university and industry on the innovation ecology. [Conclusions] The proposed method could identify the potential cross region partners for patents and innovation.

Key wordsPatent Cooperation      Decision Tree      Topic Model      Cross-Region     
Received: 01 March 2021      Published: 23 November 2021
ZTFLH:  G306  
Fund:National Natural Science Foundation of China(71974069)
Corresponding Authors: Chen Hao,ORCID:0000-0002-3460-2769     E-mail: 18071283828@163.com

Cite this article:

Chen Hao, Zhang Mengyi, Cheng Xiufeng. Identifying Cross-Region Patent Collaboration Opportunities Using LDA and Decision Trees——Case Study of Universities from Guangdong and Wuhan. Data Analysis and Knowledge Discovery, 2021, 5(10): 37-50.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0194     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I10/37

Research Framework
特征 一级指标 二级指标
1. 涉及领域种类多
2. 涉及领域分布发散
A. 领域离散度 A1. 词属于某专利摘要的概率
A2. 该摘要属于某主题的概率
A3.词属于某主题的概率
1. 发明人数较多
2. 发明人权威度较高
B. 权威度 B1. 发明人点度中心度
1. 技术难度高
2. 技术关联度高
3. 技术保护性强
C. 专利技术度 C1. 权利要求数量
C2. 专利保护范围
C3. 引证专利数量
C4. 被引证专利数量
Patent Characteristics with Potential Partnerships-Standard Reference Table
地区/学校 专利数量 发明人数均值 权利要求数量均值 保护范围均值 引证数量均值 被引证数量均值
武汉/武汉大学 3 488 4.606 1 4.890 5 6.561 1 2.949 2 0.039 6
武汉/华中科技大学 7 642 5.169 7 6.091 9 6.904 5 3.097 4 0.057 3
广东/中山大学 2 921 4.331 4 5.796 3 6.953 4 2.970 5 0.030 8
广东/华南理工大学 8 804 4.017 2 6.089 2 6.942 6 2.992 4 0.041 1
Original Data Description
Model Perplexity to the Number of Topics
Topic1 概率 Topic2 概率 Topic3 概率 Topic4 概率 Topic5 概率
系统 0.011 制备 0.025 装置 0.019 图像 0.013 模块 0.017
装置 0.010 材料 0.011 连接 0.017 LED 0.011 系统 0.012
蛋白 0.009 混合 0.007 结构 0.010 区域 0.009 控制 0.011
酵母 0.007 溶液 0.006 安装 0.008 激光 0.008 信号 0.011
发酵 0.006 纳米 0.006 固定 0.007 加工 0.007 数据 0.010
Topic Classification Results
ID Authority Technicality Discreteness
1 14 0.927 3 0.072 7
2 122 0.473 2 0.526 8
3 45 0.591 2 0.408 8
4 14 0.443 3 0.556 7
Matrix Sample for Clustering Analysis
Aggregation Coefficient Changes to the Different Number of Clusters
ID Authority Technicality Discreteness Classification Inventor Place
1 14 0.000 1 0.072 7 1 宋保亮; 李云峰; 魏健 武汉大学
3490 142 0.000 6 0.497 4 1 孙燕华; 冯晓宇; 马文家; 姜宵园; 谢菲; 刘世伟 华中科技大学
11131 88 0.001 1 0.540 5 1 肖仕; 周颖; 俞陆军; 陈武; 曾静 中山大学
3502 1 537 0.001 7 0.117 3 2 李中伟; 钟凯; 叶浩; 陈瀚; 周钢; 陈然; 刘洁; 王从军; 史玉升 华中科技大学
11156 1 743 0.003 5 0.268 1 2 于涛; 黄秋忆; 谢宗良; 王乐宇; 郑世昭; 杨志涌; 赵娟; 刘四委; 张艺; 池振国; 许家瑞 中山大学
14057 722 0.000 3 0.021 6 2 肖文勋; 胡建雨; 张波 华南理工大学
A Data Sample for the Decision Tree
Distribution of the Mean Value of the Feature Indicator
Decision Tree
Relationship Between the Maximum Depth of the Decision Tree and the Outlier Ratio of the Node Effective Scale Index in the Corresponding Network
Serial Authority Technicality Discreteness Classification Inventor Place
10903 108 0.155 5 0.483 7 4 黄剑; 王永骥; 高学山; 霍卫光 华中科技大学
3255 291 0.088 9 0.370 1 4 何克清; 李征; 王健; 张能; 李昭 武汉大学
11015 1 001 0.066 7 0.615 2 2 曾晓雁; 胡乾午; 王泽敏 华中科技大学
14011 933 0.066 7 0.093 3 2 许宁生; 陈军; 张思秘; 邓少芝; 佘峻聪 中山大学
18616 1 473 0.044 5 0.021 2 3 宁洪龙; 彭俊彪; 王磊; 兰林锋 华南理工大学
Patent Samples in BCCC under the Best Decision Boundary
Inventor Effective Place Inventor Effective Place
金海 367.973 1 华中科技大学 彭俊彪 134.180 6 华南理工大学
李斌 287.657 6 华中科技大学 陈军 104.681 4 中山大学
冯丹 281.462 0 华中科技大学 邱学青 97.320 8 华南理工大学
张天序 227.633 3 华中科技大学 曹镛 94.921 6 华南理工大学
尹周平 211.334 8 华中科技大学 杨东杰 84.204 3 华南理工大学
史玉升 188.299 5 华中科技大学 汤勇 81.000 0 华南理工大学
谢长生 171.449 4 华中科技大学 苏薇薇 76.152 9 中山大学
周建中 152.098 9 华中科技大学 张艺 72.261 9 中山大学
曾晓雁 151.024 8 华中科技大学 张波 69.684 2 华南理工大学
胡瑞敏 145.828 0 武汉大学 赖学军 67.586 7 华南理工大学
Cross-Regional Cooperation Preselects Recommended Candidates
Chart of the Number of Patents of Inventors in Various Fields
主要发明人 广东省推荐合作人 武汉市推荐合作人
苏薇薇 邱学青 谢长生
杨东杰
List of Joint Recommendations in the Fields of Medicine, Veterinary Medicine or Hygiene
主要发明人 广东省推荐合作人 武汉市推荐合作人
曾晓雁 汤勇 史玉升
李斌
List of Joint Recommendations in the Fields of Metalworking Included in Other Categories
主要发明人 广东省推荐合作人 武汉市推荐合作人
赖学军 曹镛 史玉升
邱学青 张艺
杨东杰 彭俊彪
List of Joint Recommendations in the Fields of Organic Polymer Compounds
主要发明人 广东省推荐合作人 武汉市推荐合作人
张天序 苏薇薇 尹周平
曾晓雁 史玉升
谢长生 李斌
List of Joint Recommendations in the Fields of Measurement and Testing Areas
冯丹 张天序 金海 王高辉 余龙江 侯慧杰
冯丹 3 2 2 0 0 1
张天序 2 3 2 0 0 1
金海 2 2 5 0 0 0
王高辉 0 0 0 2 0 0
余龙江 0 0 0 0 2 0
侯慧杰 1 1 0 0 0 1
CN Similarity Indicator Square
冯丹 张天序 金海 王高辉 余龙江 侯慧杰
冯丹 1 1 1 0 0 0
张天序 1 1 1 0 0 0
金海 1 1 1 0 0 1
王高辉 0 0 0 1 0 0
余龙江 0 0 0 0 1 0
侯慧杰 0 0 1 0 0 1
Adjacent Matrix
合作人1 合作人2 CN指标
薛龙建 杨威嘉 9
薛龙建 马伟超 9
薛龙建 李敬雨 9
薛龙建 陈燕鸣 9
薛龙建 郭嘉琳 9
薛龙建 李正刚 9
谭俊雄 郑国兴 6
蒋燕鞠 郑国兴 6
曾文治 郑国兴 6
吴伟 郑国兴 6
牛小骥 曹强 5
蒋燕鞠 曹强 5
曾文治 曹强 5
吴伟 曹强 5
Final Recommendation of the Link Prediction
主合作人 推荐合作人
薛龙建
(武汉大学动力与机械学院教授)
杨威嘉(武汉大学水利水电学院副教授)
马伟超(武汉大学土木建筑工程学院校友)
李敬雨(武汉大学动力与机械学院硕士研究生)
陈燕鸣(武汉大学动力与机械学院讲师)
郭嘉琳(武汉大学动力与机械学院讲师)
李正刚(武汉大学动力与机械学院实验技术人员)
Category Presentation by “Xue Longjian” as Main Collaborator
主合作人 推荐合作人
郑国兴
(武汉大学电子信息学院教授)
谭俊雄(武汉大学卫星导航技术研究中心研究生)
蒋燕鞠(武汉大学建筑工程系研究生)
曾文治(武汉大学水利水电学院副教授)
吴伟(武汉大学印刷与工程系主任)
Category Presentation by “Zheng Guoxing” as Main Collaborator
主合作人 推荐合作人
曹强
(武汉大学工业科学学院特聘研究员)
牛小骥(武汉大学卫星导航定位技术研究中心教授)
蒋燕鞠(武汉大学建筑工程系研究生)
曾文治(武汉大学水利水电学院副教授)
吴伟(武汉大学印刷与工程系主任)
Category Presentation by “Cao Qiang” as Main Collaborator
[1] 杨勇, 王露涵. 我国发明专利合作网络特征与演化研究[J]. 科学学研究, 2020, 38(7): 1227-1235.
[1] (Yang Yong, Wang Luhan. Research on the Structure and Evolution Characteristics of Patent Cooperation Network in China[J]. Studies in Science of Science, 2020, 38(7): 1227-1235.)
[2] Liben-Nowell D, Kleinberg J. The Link-prediction Problem for Social Networks[J]. Journal of the American Society for Information Science and Technology, 2007, 58(7): 1019-1031.
doi: 10.1002/(ISSN)1532-2890
[3] 张金柱, 韩涛, 王小梅. 作者-关键词二分网络中的合著关系预测研究[J]. 图书情报工作, 2016, 60(21): 74-80.
[3] (Zhang Jinzhu, Han Tao, Wang Xiaomei. Co-authorship Prediction in the Author-Keyword Bipartite Networks[J]. Library and Information Service, 2016, 60(21): 74-80.)
[4] Luong N T, Nguyen T T, Jung J J, et al. Discovering Co-author Relationship in Bibliographic Data Using Similarity Measures and Random Walk Model [C]//Proceedings of Asian Conference on Intelligent Information and Database Systems. Springer, Cham, 2015: 127-136.
[5] Zhang J Z. Uncovering Mechanisms of Co-authorship Evolution by Multirelations-Based Link Prediction[J]. Information Processing & Management, 2017, 53(1): 42-51.
doi: 10.1016/j.ipm.2016.06.005
[6] Lee D H, Brusilovsky P, Schleyer T. Recommending Collaborators Using Social Features and MeSH Terms[J]. Proceedings of the American Society for Information Science and Technology, 2011, 48(1): 1-10.
[7] 翟东升, 郭程, 张杰, 等. 基于专利的企业潜在研发伙伴推荐方法研究[J]. 数据分析与知识发现, 2017, 1(3): 10-20.
[7] (Zhai Dongsheng, Guo Cheng, Zhang Jie, et al. Recommending Potential R&D Partners Based on Patents[J]. Data Analysis and Knowledge Discovery, 2017, 1(3): 10-20.)
[8] 陈文杰. 基于翻译模型的科研合作预测研究[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[8] (Chen Wenjie. Predicting Research Collaboration Based on Translation Model[J]. Data Analysis and Knowledge Discovery, 2020, 4(10): 28-36.)
[9] 汪俊, 岳峰, 王刚, 等. 科研社交网络中基于链接预测的专家推荐研究[J]. 情报杂志, 2015, 34(6): 151-157.
[9] (Wang Jun, Yue Feng, Wang Gang, et al. Expert Recommendation in Scientific Social Network Based on Link Prediction[J]. Journal of Intelligence, 2015, 34(6): 151-157.)
[10] 蒲姗姗. 基于知识互补的科研合作专家推荐模型研究[J]. 情报理论与实践, 2018, 41(8): 96-101.
[10] (Pu Shanshan. Expert Recommendation Model in Scientific and Technical Collaboration Based on Complementary Knowledge[J]. Information Studies: Theory & Application, 2018, 41(8): 96-101.)
[11] 盛嘉祺, 许鑫. 融合主题相似度与合著网络的学者标签扩展方法研究[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[11] (Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network[J]. Data Analysis and Knowledge Discovery, 2020, 4(8): 75-85.)
[12] 熊回香, 杨雪萍, 蒋武轩, 等. 基于学术能力及合作关系网络的学者推荐研究[J]. 情报科学, 2019, 37(5): 71-78.
[12] (Xiong Huixiang, Yang Xueping, Jiang Wuxuan, et al. Scholars Recommend Research Based on Academic Competence and Collaborative Networks[J]. Information Science, 2019, 37(5): 71-78.)
[13] 刘萍, 郑凯伦, 邹德安. 基于LDA模型的科研合作推荐研究[J]. 情报理论与实践, 2015, 38(9): 79-85.
[13] (Liu Ping, Zheng Kailun, Zou Dean. Research on the Recommendation of S&T Collaboration Based on LDA Model[J]. Information Studies: Theory & Application, 2015, 38(9): 79-85.)
[14] 刘海鸥, 孙晶晶, 张亚明, 等. 在线社交活动中的用户画像及其信息传播行为研究[J]. 情报科学, 2018, 36(12): 17-21.
[14] (Liu Haiou, Sun Jingjing, Zhang Yaming, et al. Research on User Portrayal and Information Dissemination Behavior in Online Social Activities[J]. Information Science, 2018, 36(12): 17-21.)
[15] 刘竟, 孙薇. 基于链路预测的潜在科研合作关系发现研究[J]. 情报理论与实践, 2017, 40(7): 88-92, 121.
[15] (Liu Jing, Sun Wei. Discovery of Potential Scientific and Technical Collaborative Relationship Based on Link Prediction[J]. Information Studies: Theory & Application, 2017, 40(7): 88-92, 121.)
[16] Eslami H, Ebadi A, Schiffauerova A. Effect of Collaboration Network Structure on Knowledge Creation and Technological Performance: The Case of Biotechnology in Canada[J]. Scientometrics, 2013, 97(1): 99-119.
doi: 10.1007/s11192-013-1069-6
[17] 胡杨, 李郇. 地理邻近对产学研合作创新的影响途径与作用机制[J]. 经济地理, 2016, 36(6): 109-115.
[17] (Hu Yang, Li Xun. Effect of Geographical Proximity on University-Industry Cooperative Innovation and the Mechanism[J]. Economic Geography, 2016, 36(6): 109-115.)
[18] 陈光华, 王烨, 杨国梁. 地理距离阻碍跨区域产学研合作绩效了吗?[J]. 科学学研究, 2015, 33(1): 76-82.
[18] (Chen Guanghua, Wang Ye, Yang Guoliang. Geographical Distance and Non-local University-Industry Collaborations Performance[J]. Studies in Science of Science, 2015, 33(1): 76-82.)
[19] Burt R S. Structural Holes: The Social Structure of Competition[M]. Harvard: Harvard University Press, 1992.
[20] Blei D M, Ng A Y, Jordan M L. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[21] 邱均平, 陈木佩. 我国计量学领域作者合作关系研究[J]. 情报理论与实践, 2012, 35(11): 56-60.
[21] (Qiu Junping, Chen Mupei. Research on Author Collaboration in the Metrology Field in China[J]. Information Studies: Theory & Application, 2012, 35(11): 56-60.)
[22] 李睿. 专利引文分析法与共词分析法在揭示科学-技术知识关联方面的差异对比[J]. 图书情报工作, 2010, 54(6): 91-93, 140.
[22] (Li Rui. Comparing Co-words Analysis and Citations Analysis Between Academic Papers and Patents in the Presentation of Knowledge Transfer[J]. Library and Information Service, 2010, 54(6): 91-93, 140.)
[23] Freeman L C. Centrality in Social Networks Conceptual Clarification[J]. Social Networks, 1978, 1(3): 215-239.
doi: 10.1016/0378-8733(78)90021-7
[24] 蔡克勇. 论知识积累与知识创新[J]. 教育科学研究, 2001(1): 8-11.
[24] (Cai Keyong. Discussion on Knowledge Accumulation and Knowledge Innovation[J]. Educational Science Research, 2001(1): 8-11.)
[25] 刘亭亭, 吴洁, 张宇洁. 产学研合作中高校知识创新能力提升的系统动力学研究——基于知识转移视角[J]. 情报杂志, 2012, 31(10): 195-200.
[25] (Liu Tingting, Wu Jie, Zhang Yujie. Research on the University's Knowledge Innovation Capability in the Industry University-Institute Cooperation Using System Dynamics—Based on the Knowledge Transfer Perspective[J]. Journal of Intelligence, 2012, 31(10): 195-200.)
[26] Yan B W, Luo J X. Measuring Technological Distance for Patent Mapping[J]. Journal of the Association for Information Science and Technology, 2017, 68(2): 423-437.
doi: 10.1002/asi.2017.68.issue-2
[27] 刘阳, 杜艳艳. 我国农业高校专利合作网络演化研究[J]. 情报杂志, 2015, 34(7): 110-116.
[27] (Liu Yang, Du Yanyan. Evolution of the Patent Cooperation Network of Agricultural Universities in China[J]. Journal of Intelligence, 2015, 34(7): 110-116.)
[28] 刘桂锋, 卢章平, 刘琼, 等. 基于社会网络分析的江苏省高校产学研专利合作研究[J]. 情报杂志, 2015, 34(1): 122-126.
[28] (Liu Guifeng, Lu Zhangping, Liu Qiong, et al. A Study on University-industry Patent Cooperation in Jiangsu Province Based on Social Network Analysis[J]. Journal of Intelligence, 2015, 34(1): 122-126.)
[29] Rosen Z M, Griffiths T, Steyvers M, et al. The Author-Topic Model for Authors and Document [C]//Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004: 487-494.
[30] Hirschanman A O. National Power and the Structure of Foreign Trade[M]. University of California Press, 1980: 98-99.
[31] Hamburg B, Hoffmann M, Keller J. Consumption, Wealth and Business Cycles in Germany[J]. Empirical Economics, 2008, 34(3): 451-476.
doi: 10.1007/s00181-007-0130-9
[32] Hwang C L, Yoon K. Multiple Attribute Decision Making: Methods and Applications[M]. New York: Springer-Verlag, 1981: 58-191.
[33] Shannon C E. A mathematical Theory of Communication[J]. The Bell System Technical Journal, 1948, 27(3): 379-423.
doi: 10.1002/bltj.1948.27.issue-3
[34] Arthur D, Vassilvitskii S. K-Means++: The Advantages of Careful Seeding [C]// Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 2007: 1027-1035.
[35] Han J, Kambr M. Data Mining: Concepts and Techniques[M]. San Francisco: Morgan Kaufmann Publishers, 2001: 279-333.
[36] Guan J C, Liu N. Exploitative and Exploratory Innovations in Knowledge Network and Collaboration Network: A Patent Analysis in the Technological Field of Nano-energy[J]. Research Policy, 2016, 45(1): 97-112.
doi: 10.1016/j.respol.2015.08.002
[1] Yi Huifang,Liu Xiwen. Analyzing Patent Technology Topics with IPC Context-Enhanced Context-LDA Model[J]. 数据分析与知识发现, 2021, 5(4): 25-36.
[2] Zhang Xin,Wen Yi,Xu Haiyun. A Prediction Model with Network Representation Learning and Topic Model for Author Collaboration[J]. 数据分析与知识发现, 2021, 5(3): 88-100.
[3] Zhao Tianzi, Duan Liang, Yue Kun, Qiao Shaojie, Ma Zijuan. Generating News Clues with Biterm Topic Model[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[4] Guan Peng,Wang Yuefen,Jin Jialin,Fu Zhu. Developments of Tech-Innovation Network for Patent Cooperation: Case Study of Speech Recognition in China[J]. 数据分析与知识发现, 2021, 5(1): 112-127.
[5] Yu Chuanming,Yuan Sai,Zhu Xingyu,Lin Hongjun,Zhang Puliang,An Lu. Research on Deep Learning Based Topic Representation of Hot Events[J]. 数据分析与知识发现, 2020, 4(4): 1-14.
[6] Pan Youneng,Ni Xiuli. Recommending Online Medical Experts with Labeled-LDA Model[J]. 数据分析与知识发现, 2020, 4(4): 34-43.
[7] Xu Jianmin,Zhang Liqing,Wang Miao. Tracking Static Topics with Bayesian Network[J]. 数据分析与知识发现, 2020, 4(2/3): 200-206.
[8] Chen Wenjie. Predicting Research Collaboration Based on Translation Model[J]. 数据分析与知识发现, 2020, 4(10): 28-36.
[9] Hongfei Ling,Shiyan Ou. Review of Automatic Labeling for Topic Models[J]. 数据分析与知识发现, 2019, 3(9): 16-26.
[10] Weimin Nie,Yongzhou Chen,Jing Ma. A Text Vector Representation Model Merging Multi-Granularity Information[J]. 数据分析与知识发现, 2019, 3(9): 45-52.
[11] Qingtian Zeng,Xiaohui Hu,Chao Li. Extracting Keywords with Topic Embedding and Network Structure Analysis[J]. 数据分析与知识发现, 2019, 3(7): 52-60.
[12] Bengong Yu,Yangnan Chen,Ying Yang. Classifying Short Text Complaints with nBD-SVM Model[J]. 数据分析与知识发现, 2019, 3(5): 77-85.
[13] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[14] Linna Xi,Yongxiang Dou. Examining Reposts of Micro-bloggers with Planned Behavior Theory[J]. 数据分析与知识发现, 2019, 3(2): 13-20.
[15] Jie Zhang,Junbo Zhao,Dongsheng Zhai,Ningning Sun. Patent Technology Analysis of Microalgae Biofuel Industrial Chain Based on Topic Model[J]. 数据分析与知识发现, 2019, 3(2): 52-64.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn