Please wait a minute...
Advanced Search
数据分析与知识发现  2021, Vol. 5 Issue (12): 25-36     https://doi.org/10.11925/infotech.2096-3467.2021.0524
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于多层语义相似的技术供需文本匹配模型研究*
李纲,余辉,毛进()
武汉大学信息资源研究中心 武汉 430072
Matching Model for Technology Supply and Demand Texts Based on Multi-Layer Semantic Similarity
Li Gang,Yu Hui,Mao Jin()
Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
全文: PDF (907 KB)   HTML ( 24
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 研究准确率较高的技术供需文本匹配模型,提高技术供需匹配的效率,促进技术转移。【方法】 考虑技术供需文本的标题和正文两种文本结构层次,通过多种方法计算技术供需文本中的词相似度和语句相似度,借助深度学习模型进行融合,构建了基于多层语义相似的文本匹配模型。【结果】 实验结果表明不同层次的信息对匹配结果的影响程度不同,多层次信息融合的准确率达到96.50%,高于单一BERT方法的90.70%、DSSM的87.80%以及ESIM的87.50%。【局限】 模型只考虑了两个文本结构层次,未探讨更多种结构层次的效果。【结论】 所提模型可以为在线技术交易服务平台提供供需匹配方案参考,促进技术转移的实现。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
李纲
余辉
毛进
关键词 技术供需文本供需匹配技术转移数据融合    
Abstract

[Objective] This paper proposes a new high-accuracy-model, aiming to improve the matching of technology supply and demand texts and promote technology transfer. [Methods] First, we separated the titles and texts as two structure levels. Then, we calculated the word similarity and sentence similarity through a variety of methods. Finally, we constructed a Multi-layer Semantic Text Matching (MSTM) model based on multi-layer semantic similarity and the deep learning model. [Results] We found that different level of information yielded different matching results. The accuracy of MSTM was 96.50%, which was higher than single BERT (90.70%), DSSM (87.80%), and ESIM (87.50%). [Limitations] Our new model only considers two levels of text structures. [Conclusions] This new model can help online technology trading services match supply and demand, as well as promote technology transfer.

Key wordsTechnological Text for Supply and Demand    Supply and Demand Matching    Technology Transfer    Data Fusion
收稿日期: 2021-05-25      出版日期: 2022-01-20
ZTFLH:  G201  
基金资助:* 国家自然科学基金创新研究群体项目(71921002);国家重点研发计划项目(2018YFB1404300)
通讯作者: 毛进,ORCID:0000-0001-9572-6709     E-mail: maojin@whu.edu.cn
引用本文:   
李纲, 余辉, 毛进. 基于多层语义相似的技术供需文本匹配模型研究*[J]. 数据分析与知识发现, 2021, 5(12): 25-36.
Li Gang, Yu Hui, Mao Jin. Matching Model for Technology Supply and Demand Texts Based on Multi-Layer Semantic Similarity. Data Analysis and Knowledge Discovery, 2021, 5(12): 25-36.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2021.0524      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2021/V5/I12/25
Fig.1  多层语义技术供需文本匹配模型
匹配度 含义 数量 占比 累计占比
1 不相关 3 301 44.75% 44.75%
2 弱相关 3 157 42.80% 87.55%
3 较强相关 662 8.98% 96.53%
4 强相关 256 3.74% 100.00%
总和 - 7 376 100.00% 100.00%
Table 1  数据集匹配度及占比分布
层(类型) 输出形状 参数
dense(Dense) (None, 64) 832
dense_1(Dense) (None, 32) 2 080
dense_2(Dense) (None, 5) 165
Table 2  信息融合模型摘要
方法 标题级 正文级 标题级+正文级
四分类 二分类 四分类 二分类 四分类 二分类
词共现 55.49% 87.80% 47.56% 87.06% 56.57% 88.01%
词语义 52.85% 87.60% 46.88% 87.47% 53.73% 88.28%
BERT 85.91% 94.17% 83.74% 94.31% 89.23% 95.87%
共现+语义 56.44% 88.28% 48.51% 87.87% 56.98% 88.82%
共现+BERT 86.31% 93.70% 84.55% 95.39% 89.36% 95.87%
语义+BERT 86.92% 93.29% 84.69% 94.58% 89.02% 96.00%
共现+语义+BERT 87.80% 93.90% 85.37% 95.12% 89.50% 96.48%
Table 3  多层语义信息融合结果准确率对比
层次 标题级 正文级 标题级+正文级
BERT 91.10% 90.10% 90.70%
DSSM 87.70% 87.80% 87.80%
ESIM 87.50% 87.50% 87.50%
MSTM 93.90% 95.10% 96.50%
Table 4  MSTM模型与基线方法准确率对比
[1] 喻昕. 技术市场信息不对称问题研究[J]. 情报科学, 2011, 29(4): 515-519.
[1] (Yu Xin. Research on the Information Asymmetry in Technology Market[J]. Information Science, 2011, 29(4): 515-519.)
[2] 何喜军, 马珊, 武玉英, 等. 多特征融合下在线技术转移平台供需匹配研究——以京津冀区域数据为例[J]. 情报杂志, 2019, 38(6): 174-181.
[2] (He Xijun, Ma Shan, Wu Yuying, et al. Research on the Supply and Demand Matching of Online Technology Trading Platform Based on Multi-Level Feature Fusion: Taking the Data of Beijing-Tianjin-Hebei Region as an Example[J]. Journal of Intelligence, 2019, 38(6): 174-181.)
[3] 薛伟贤, 田鹏, 孙姝羽. 战略性新兴产业技术供需协同研究:以陕西为例[J]. 科研管理, 2016, 37(S1): 507-516.
[3] (Xue Weixian, Tian Peng, Sun Shuyu. Collaborative Degree of Supply and Demands for Strategic Emerging Industrial Technologies by Taking Shaanxi as an Example[J]. Science Research Management, 2016, 37(S1): 507-516.)
[4] Liu Y, Li K W. A Two-Sided Matching Decision Method for Supply and Demand of Technological Knowledge[J]. Journal of Knowledge Management, 2017, 21(3): 592-606.
doi: 10.1108/JKM-05-2016-0183
[5] 许倞, 贾敬敦. 2019年全国技术市场统计年报[R]. 北京: 兵器工业出版社, 2019.
[5] (Xu Jing, Jia Jingdun. 2019 Annual Report on Statistics of China Technology Market[R]. Beijing: The Publishing House of Ordnance Industry, 2019.)
[6] 李华, 张千慧, 王方. 技术供需主体的混合型多指标双边匹配决策方法[J]. 科技进步与对策, 2016, 33(7): 121-127.
[6] (Li Hua, Zhang Qianhui, Wang Fang. A Decision Method for Two-Sided Matching with Hybrid Multi-Index of Technology Supply and Demand Subjects[J]. Science&Technology Progress and Policy, 2016, 33(7): 121-127.)
[7] 陈希, 樊治平. 双边匹配决策的研究现状与展望[J]. 管理评论, 2012, 24(1): 169-176.
[7] (Chen Xi, Fan Zhiping. The Developing and Research Prospects for Two-Sided Matching Decision[J]. Management Review, 2012, 24(1): 169-176.)
[8] Jiang Z Z, Ip W H Lau H C W, et al. Multi-Objective Optimization Matching for One-Shot Multi-Attribute Exchanges with Quantity Discounts in E-Brokerage[J]. Expert Systems with Applications, 2011, 38(4): 4169-4180.
doi: 10.1016/j.eswa.2010.09.079
[9] 陈林, 朱卫平. 基于二手市场与理性预期的房地产市场机制研究[J]. 管理科学学报, 2011, 14(2): 61-70.
[9] (Chen Lin, Zhu Weiping. Research on Real Estate Market Mechanism in the Second-Hand Market and Rational Expectation[J]. Journal of Management Sciences in China, 2011, 14(2): 61-70.)
[10] 陈希, 樊治平, 李玉花. IT服务供需双边匹配的模糊多目标决策方法[J]. 管理学报, 2011, 8(7): 1097-1101.
[10] (Chen Xi, Fan Zhiping, Li Yuhua. A Fuzzy Multi-Objective Decision Making Method for Two-Sided Matching of Supply and Demand in IT Service[J]. Chinese Journal of Management, 2011, 8(7): 1097-1101.)
[11] Echenique F. What Matchings Can be Stable? The Testable Implications of Matching Theory[J]. Mathematics of Operations Research, 2008, 33(3): 757-768.
doi: 10.1287/moor.1080.0318
[12] 陈希, 樊治平, 李玉花. 技术知识供需双边匹配的两阶段决策分析方法[J]. 工业工程与管理, 2010, 15(6): 90-94.
[12] (Chen Xi, Fan Zhiping, Li Yuhua. A Two-phase Decision Analysis Method for Two-Sided Matching of Technological Knowledge Supply and Demand[J]. Industrial Engineering and Management, 2010, 15(6): 90-94.)
[13] 邓小龙, 李欲晓. 面向应急管理的大图重要节点中介度高效近似计算方法[J]. 系统工程理论与实践, 2015, 35(10): 2531-2543.
[13] (Deng Xiaolong, Li Yuxiao. Efficient Node Betweenness Approximation Computation Method for Large Graph in Emergency Management[J]. Systems Engineering-Theory & Practice, 2015, 35(10): 2531-2543.)
[14] Kuncoro B A, Iswanto B H. TF-IDF Method in Ranking Keywords of Instagram Users' Image Captions [C]//Proceedings of 2015 International Conference on Information Technology Systems and Innovation (ICITSI). Piscataway, NJ: IEEE, 2016. DOI: 10.1109/ICITSI.2015.7437705.
doi: 10.1109/ICITSI.2015.7437705
[15] Zheng Y, Meng Z P, Xu C. A Short-Text Oriented Clustering Method for Hot Topics Extraction[J]. International Journal of Software Engineering and Knowledge Engineering, 2015, 25(3): 453-471.
doi: 10.1142/S0218194015400161
[16] 贺飞艳, 何炎祥, 刘楠, 等. 面向微博短文本的细粒度情感特征抽取方法[J]. 北京大学学报(自然科学版), 2014, 50(1): 48-54.
[16] (He Feiyan, He Yanxiang, Liu Nan, et al. A Microblog Short Text Oriented Multi-Class Feature Extraction Method of Fine-Grained Sentiment Analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1): 48-54.)
[17] He G W, Wang J, Zhang Y F, et al. Keyword Extraction of Web Pages Based on Domain Thesaurus [C]//Proceedings of the 3rd IEEE International Conference on Cloud Computing & Intelligence Systems. 2014: 310-314.
[18] 杨德林, 夏青青, 马晨光. 在线技术转移平台的供需匹配效率分析[J]. 管理科学, 2017, 30(6): 104-112.
[18] (Yang Delin, Xia Qingqing, Ma Chenguang. Efficiency Analysis of Online Technology Transfer Platform and Supply and Demand Matching[J]. Journal of Management Science, 2017, 30(6): 104-112.)
[19] Kim H G, Lee S, Kyeong S. Discovering Hot Topics Using Twitter Streaming Data: Social Topic Detection and Geographic Clustering [C]// Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. New York, USA: ACM, 2013: 1215-1220.
[20] 王立霞, 淮晓永. 基于语义的中文文本关键词提取算法[J]. 计算机工程, 2012, 38(1): 1-4.
[20] (Wang Lixia, Huai Xiaoyong. Semantic-Based Keyword Extraction Algorithm for Chinese Text[J]. Computer Engineering, 2012, 38(1): 1-4.)
[21] 梅家驹, 竺一鸣, 高蕴琦, 等. 同义词词林[M]. 上海: 上海辞书出版社, 1993.
[21] (Mei Jiaju, Zhu Yiming, Gao Yunqi, et al. Synonymy Thesaurus [M]. Shanghai: Shanghai Lexicographic Publishing House, 1993.)
[22] 刘端阳, 王良芳. 结合语义扩展度和词汇链的关键词提取算法[J]. 计算机科学, 2013, 40(12): 264-269.
[22] (Liu Duanyang, Wang Liangfang. Extraction Algorithm Based on Semantic Expansion Integrated with Lexical Chain[J]. Computer Science, 2013, 40(12): 264-269.)
[23] 方俊, 郭雷, 王晓东. 基于语义的关键词提取算法[J]. 计算机科学, 2008, 35(6): 148-151.
[23] (Fang Jun, Guo Lei, Wang Xiaodong. Semantically Improved Automatic Keyphrase Extraction[J]. Computer Science, 2008, 35(6): 148-151.)
[24] Li G, Dai Q B, Wei Q. A New Approach to Compute Semantic Relevance of Chinese Words [C]//Proceedings of 2010 International Conference on Artificial Intelligence and Education (ICAIE). 2010: 610-613.
[25] Wei T T, Lu Y H, Chang H Y, et al. A Semantic Approach for Text Clustering Using Wordnet and Lexical Chains[J]. Expert Systems with Applications, 2015, 42(4): 2264-2275.
doi: 10.1016/j.eswa.2014.10.023
[26] Wu Z D, Zhu H, Li G L, et al. An Efficient Wikipedia Semantic Matching Approach to Text Document Classification[J]. Information Sciences, 2017, 393: 15-28.
doi: 10.1016/j.ins.2017.02.009
[27] Jiang Y C, Bai W, Zhang X P, et al. Wikipedia-Based Information Content and Semantic Similarity Computation[J]. Information Processing & Management, 2017, 53(1): 248-265.
doi: 10.1016/j.ipm.2016.09.001
[28] Mikolov T, Chen K, Corrado G S, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[29] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification [C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017: 427-431.
[30] 吕正东, 李航. 深度匹配学习在语言匹配中的应用[J]. 中国计算机学会通讯, 2015, 11(8): 30-37.
[30] (Lv Zhengdong, Li Hang. Deep Matching Learning in Language Matching[J]. Newsletter of the Chinese Computer Society, 2015, 11(8): 30-37.)
[31] Huang P S, He X D, Gao J F, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data [C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013: 2333-2338.
[32] Kim Y. Convolutional Neural Networks for Sentence Classification [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1746-1751.
[33] Hu B, Lu Z, Hang L, et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences[J]. Advances in neural information processing systems, 2015, 3: 1-8.
[34] Qiu X P, Huang X J. Convolutional Neural Tensor Network Architecture for Community-Based Question Answering [C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015: 1303-1311.
[35] Bahdanau D, Cho K H, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473..
[36] Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005, 18(5/6): 602-610.
doi: 10.1016/j.neunet.2005.06.042
[37] Maragos P, Gros P, Katsamanis A, et al. Cross-Modal Integration for Performance Improving in Multimedia: A Review[A]// Multimodal Processing and Interaction[M]. Boston, USA: Springer, 2008: 1-46.
[38] Pavlidis P, Weston J, Cai J S, et al. Learning Gene Functional Classifications from Multiple Data Types[J]. Journal of Computational Biology, 2002, 9(2): 401-411.
pmid: 12015889
[39] Wang Z, Zhang D Q, Zhou X S, et al. Discovering and Profiling Overlapping Communities in Location-Based Social Networks[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014, 44(4): 499-509.
doi: 10.1109/TSMC.2013.2256890
[40] Fu Y J, Ge Y, Zheng Y, et al. Sparse Real Estate Ranking with Online User Reviews and Offline Moving Behaviors [C]//Proceedings of 2014 IEEE International Conference on Data Mining. IEEE, 2014: 120-129.
[41] 余辉, 梁镇涛, 鄢宇晨. 多来源多模态数据融合与集成研究进展[J]. 情报理论与实践, 2020, 43(11): 169-178.
[41] (Yu Hui, Liang Zhentao, Yan Yuchen. Review on Multi-Source and Multi-Modal Data Fusion and Integration[J]. Information Studies: Theory&Application, 2020, 43(11): 169-178.)
[42] Wang J M, Pan M, He T T, et al. A Pseudo-Relevance Feedback Framework Combining Relevance Matching and Semantic Matching for Information Retrieval[J]. Information Processing & Management, 2020, 57(6): 102342.
doi: 10.1016/j.ipm.2020.102342
[43] Tien N H, Le N M, Tomohiro Y, et al. Sentence Modeling via Multiple Word Embeddings and Multi-Level Comparison for Semantic Textual Similarity[J]. Information Processing & Management, 2019, 56(6): 102090.
doi: 10.1016/j.ipm.2019.102090
[44] Lu W, Liu Z F, Huang Y, et al. How do Authors Select Keywords? A Preliminary Study of Author Keyword Selection Behavior[J]. Journal of Informetrics, 2020, 14(4): 101066.
doi: 10.1016/j.joi.2020.101066
[45] 李湘东, 巴志超, 黄莉. 一种基于加权LDA模型和多粒度的文本特征选择方法[J]. 现代图书情报技术, 2015(5): 42-49.
[45] (Li Xiangdong, Ba Zhichao, Huang Li. A Text Feature Selection Method Based on Weighted Latent Dirichlet Allocation and Multi-granularity[J]. New Technology of Library and Information Service, 2015(5): 42-49.)
[46] He X J, Meng X, Wu Y Y, et al. Semantic Matching Efficiency of Supply and Demand Texts on Online Technology Trading Platforms: Taking the Electronic Information of Three Platforms as an Example[J]. Information Processing & Management, 2020, 57(5): 102258.
doi: 10.1016/j.ipm.2020.102258
[47] Elkahky A M, Song Y, He X D. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems [C]//Proceedings of the 24th International Conference on World Wide Web. 2015: 278-288.
[48] Shen Y L, He X D, Gao J F, et al. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval [C]//Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014: 101-110.
[49] Chen Q, Zhu X D, Ling Z H, et al. Enhanced LSTM for Natural Language Inference [C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1657-1668.
[1] 李广建,王锴,张庆芝. 基于多源数据的美国出口管制分析框架及其实证研究*[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
[2] 胡正银,刘蕾蕾,代冰,覃筱楚. 基于领域知识图谱的生命医学学科知识发现探析*[J]. 数据分析与知识发现, 2020, 4(11): 1-14.
[3] 齐惠颖,江雨荷. 基于多组学数据融合构建乳腺癌生存预测模型 *[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
[4] 牛亚真, 祝忠明. 个性化服务中关联数据驱动的用户语义建模框架[J]. 现代图书情报技术, 2012, (10): 1-7.
[5] 汪名森,王强. Mashup系统构建研究[J]. 现代图书情报技术, 2009, 25(5): 34-38.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn