Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (12): 25-36    DOI: 10.11925/infotech.2096-3467.2021.0524
Current Issue | Archive | Adv Search |
Matching Model for Technology Supply and Demand Texts Based on Multi-Layer Semantic Similarity
Li Gang,Yu Hui,Mao Jin()
Center for Studies of Information Resources, Wuhan University, Wuhan 430072, China
Download: PDF (907 KB)   HTML ( 24
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new high-accuracy-model, aiming to improve the matching of technology supply and demand texts and promote technology transfer. [Methods] First, we separated the titles and texts as two structure levels. Then, we calculated the word similarity and sentence similarity through a variety of methods. Finally, we constructed a Multi-layer Semantic Text Matching (MSTM) model based on multi-layer semantic similarity and the deep learning model. [Results] We found that different level of information yielded different matching results. The accuracy of MSTM was 96.50%, which was higher than single BERT (90.70%), DSSM (87.80%), and ESIM (87.50%). [Limitations] Our new model only considers two levels of text structures. [Conclusions] This new model can help online technology trading services match supply and demand, as well as promote technology transfer.

Key wordsTechnological Text for Supply and Demand      Supply and Demand Matching      Technology Transfer      Data Fusion     
Received: 25 May 2021      Published: 20 January 2022
ZTFLH:  G201  
Fund:National Natural Science Foundation of China(71921002);National Key R&D Program of China(2018YFB1404300)
Corresponding Authors: Mao Jin,ORCID:0000-0001-9572-6709     E-mail: maojin@whu.edu.cn

Cite this article:

Li Gang, Yu Hui, Mao Jin. Matching Model for Technology Supply and Demand Texts Based on Multi-Layer Semantic Similarity. Data Analysis and Knowledge Discovery, 2021, 5(12): 25-36.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0524     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I12/25

Multi-layer Semantic Text Matching Model
匹配度 含义 数量 占比 累计占比
1 不相关 3 301 44.75% 44.75%
2 弱相关 3 157 42.80% 87.55%
3 较强相关 662 8.98% 96.53%
4 强相关 256 3.74% 100.00%
总和 - 7 376 100.00% 100.00%
Data Set Matching Degree and Proportion Distribution
层(类型) 输出形状 参数
dense(Dense) (None, 64) 832
dense_1(Dense) (None, 32) 2 080
dense_2(Dense) (None, 5) 165
Summary of Information Fusion Model
方法 标题级 正文级 标题级+正文级
四分类 二分类 四分类 二分类 四分类 二分类
词共现 55.49% 87.80% 47.56% 87.06% 56.57% 88.01%
词语义 52.85% 87.60% 46.88% 87.47% 53.73% 88.28%
BERT 85.91% 94.17% 83.74% 94.31% 89.23% 95.87%
共现+语义 56.44% 88.28% 48.51% 87.87% 56.98% 88.82%
共现+BERT 86.31% 93.70% 84.55% 95.39% 89.36% 95.87%
语义+BERT 86.92% 93.29% 84.69% 94.58% 89.02% 96.00%
共现+语义+BERT 87.80% 93.90% 85.37% 95.12% 89.50% 96.48%
Accuracy Comparison of Multi-Layer Semantic Information Fusion Results
层次 标题级 正文级 标题级+正文级
BERT 91.10% 90.10% 90.70%
DSSM 87.70% 87.80% 87.80%
ESIM 87.50% 87.50% 87.50%
MSTM 93.90% 95.10% 96.50%
MSTM vs. Baseline Accuracy
[1] 喻昕. 技术市场信息不对称问题研究[J]. 情报科学, 2011, 29(4): 515-519.
[1] (Yu Xin. Research on the Information Asymmetry in Technology Market[J]. Information Science, 2011, 29(4): 515-519.)
[2] 何喜军, 马珊, 武玉英, 等. 多特征融合下在线技术转移平台供需匹配研究——以京津冀区域数据为例[J]. 情报杂志, 2019, 38(6): 174-181.
[2] (He Xijun, Ma Shan, Wu Yuying, et al. Research on the Supply and Demand Matching of Online Technology Trading Platform Based on Multi-Level Feature Fusion: Taking the Data of Beijing-Tianjin-Hebei Region as an Example[J]. Journal of Intelligence, 2019, 38(6): 174-181.)
[3] 薛伟贤, 田鹏, 孙姝羽. 战略性新兴产业技术供需协同研究:以陕西为例[J]. 科研管理, 2016, 37(S1): 507-516.
[3] (Xue Weixian, Tian Peng, Sun Shuyu. Collaborative Degree of Supply and Demands for Strategic Emerging Industrial Technologies by Taking Shaanxi as an Example[J]. Science Research Management, 2016, 37(S1): 507-516.)
[4] Liu Y, Li K W. A Two-Sided Matching Decision Method for Supply and Demand of Technological Knowledge[J]. Journal of Knowledge Management, 2017, 21(3): 592-606.
doi: 10.1108/JKM-05-2016-0183
[5] 许倞, 贾敬敦. 2019年全国技术市场统计年报[R]. 北京: 兵器工业出版社, 2019.
[5] (Xu Jing, Jia Jingdun. 2019 Annual Report on Statistics of China Technology Market[R]. Beijing: The Publishing House of Ordnance Industry, 2019.)
[6] 李华, 张千慧, 王方. 技术供需主体的混合型多指标双边匹配决策方法[J]. 科技进步与对策, 2016, 33(7): 121-127.
[6] (Li Hua, Zhang Qianhui, Wang Fang. A Decision Method for Two-Sided Matching with Hybrid Multi-Index of Technology Supply and Demand Subjects[J]. Science&Technology Progress and Policy, 2016, 33(7): 121-127.)
[7] 陈希, 樊治平. 双边匹配决策的研究现状与展望[J]. 管理评论, 2012, 24(1): 169-176.
[7] (Chen Xi, Fan Zhiping. The Developing and Research Prospects for Two-Sided Matching Decision[J]. Management Review, 2012, 24(1): 169-176.)
[8] Jiang Z Z, Ip W H Lau H C W, et al. Multi-Objective Optimization Matching for One-Shot Multi-Attribute Exchanges with Quantity Discounts in E-Brokerage[J]. Expert Systems with Applications, 2011, 38(4): 4169-4180.
doi: 10.1016/j.eswa.2010.09.079
[9] 陈林, 朱卫平. 基于二手市场与理性预期的房地产市场机制研究[J]. 管理科学学报, 2011, 14(2): 61-70.
[9] (Chen Lin, Zhu Weiping. Research on Real Estate Market Mechanism in the Second-Hand Market and Rational Expectation[J]. Journal of Management Sciences in China, 2011, 14(2): 61-70.)
[10] 陈希, 樊治平, 李玉花. IT服务供需双边匹配的模糊多目标决策方法[J]. 管理学报, 2011, 8(7): 1097-1101.
[10] (Chen Xi, Fan Zhiping, Li Yuhua. A Fuzzy Multi-Objective Decision Making Method for Two-Sided Matching of Supply and Demand in IT Service[J]. Chinese Journal of Management, 2011, 8(7): 1097-1101.)
[11] Echenique F. What Matchings Can be Stable? The Testable Implications of Matching Theory[J]. Mathematics of Operations Research, 2008, 33(3): 757-768.
doi: 10.1287/moor.1080.0318
[12] 陈希, 樊治平, 李玉花. 技术知识供需双边匹配的两阶段决策分析方法[J]. 工业工程与管理, 2010, 15(6): 90-94.
[12] (Chen Xi, Fan Zhiping, Li Yuhua. A Two-phase Decision Analysis Method for Two-Sided Matching of Technological Knowledge Supply and Demand[J]. Industrial Engineering and Management, 2010, 15(6): 90-94.)
[13] 邓小龙, 李欲晓. 面向应急管理的大图重要节点中介度高效近似计算方法[J]. 系统工程理论与实践, 2015, 35(10): 2531-2543.
[13] (Deng Xiaolong, Li Yuxiao. Efficient Node Betweenness Approximation Computation Method for Large Graph in Emergency Management[J]. Systems Engineering-Theory & Practice, 2015, 35(10): 2531-2543.)
[14] Kuncoro B A, Iswanto B H. TF-IDF Method in Ranking Keywords of Instagram Users' Image Captions [C]//Proceedings of 2015 International Conference on Information Technology Systems and Innovation (ICITSI). Piscataway, NJ: IEEE, 2016. DOI: 10.1109/ICITSI.2015.7437705.
doi: 10.1109/ICITSI.2015.7437705
[15] Zheng Y, Meng Z P, Xu C. A Short-Text Oriented Clustering Method for Hot Topics Extraction[J]. International Journal of Software Engineering and Knowledge Engineering, 2015, 25(3): 453-471.
doi: 10.1142/S0218194015400161
[16] 贺飞艳, 何炎祥, 刘楠, 等. 面向微博短文本的细粒度情感特征抽取方法[J]. 北京大学学报(自然科学版), 2014, 50(1): 48-54.
[16] (He Feiyan, He Yanxiang, Liu Nan, et al. A Microblog Short Text Oriented Multi-Class Feature Extraction Method of Fine-Grained Sentiment Analysis[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2014, 50(1): 48-54.)
[17] He G W, Wang J, Zhang Y F, et al. Keyword Extraction of Web Pages Based on Domain Thesaurus [C]//Proceedings of the 3rd IEEE International Conference on Cloud Computing & Intelligence Systems. 2014: 310-314.
[18] 杨德林, 夏青青, 马晨光. 在线技术转移平台的供需匹配效率分析[J]. 管理科学, 2017, 30(6): 104-112.
[18] (Yang Delin, Xia Qingqing, Ma Chenguang. Efficiency Analysis of Online Technology Transfer Platform and Supply and Demand Matching[J]. Journal of Management Science, 2017, 30(6): 104-112.)
[19] Kim H G, Lee S, Kyeong S. Discovering Hot Topics Using Twitter Streaming Data: Social Topic Detection and Geographic Clustering [C]// Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. New York, USA: ACM, 2013: 1215-1220.
[20] 王立霞, 淮晓永. 基于语义的中文文本关键词提取算法[J]. 计算机工程, 2012, 38(1): 1-4.
[20] (Wang Lixia, Huai Xiaoyong. Semantic-Based Keyword Extraction Algorithm for Chinese Text[J]. Computer Engineering, 2012, 38(1): 1-4.)
[21] 梅家驹, 竺一鸣, 高蕴琦, 等. 同义词词林[M]. 上海: 上海辞书出版社, 1993.
[21] (Mei Jiaju, Zhu Yiming, Gao Yunqi, et al. Synonymy Thesaurus [M]. Shanghai: Shanghai Lexicographic Publishing House, 1993.)
[22] 刘端阳, 王良芳. 结合语义扩展度和词汇链的关键词提取算法[J]. 计算机科学, 2013, 40(12): 264-269.
[22] (Liu Duanyang, Wang Liangfang. Extraction Algorithm Based on Semantic Expansion Integrated with Lexical Chain[J]. Computer Science, 2013, 40(12): 264-269.)
[23] 方俊, 郭雷, 王晓东. 基于语义的关键词提取算法[J]. 计算机科学, 2008, 35(6): 148-151.
[23] (Fang Jun, Guo Lei, Wang Xiaodong. Semantically Improved Automatic Keyphrase Extraction[J]. Computer Science, 2008, 35(6): 148-151.)
[24] Li G, Dai Q B, Wei Q. A New Approach to Compute Semantic Relevance of Chinese Words [C]//Proceedings of 2010 International Conference on Artificial Intelligence and Education (ICAIE). 2010: 610-613.
[25] Wei T T, Lu Y H, Chang H Y, et al. A Semantic Approach for Text Clustering Using Wordnet and Lexical Chains[J]. Expert Systems with Applications, 2015, 42(4): 2264-2275.
doi: 10.1016/j.eswa.2014.10.023
[26] Wu Z D, Zhu H, Li G L, et al. An Efficient Wikipedia Semantic Matching Approach to Text Document Classification[J]. Information Sciences, 2017, 393: 15-28.
doi: 10.1016/j.ins.2017.02.009
[27] Jiang Y C, Bai W, Zhang X P, et al. Wikipedia-Based Information Content and Semantic Similarity Computation[J]. Information Processing & Management, 2017, 53(1): 248-265.
doi: 10.1016/j.ipm.2016.09.001
[28] Mikolov T, Chen K, Corrado G S, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
[29] Joulin A, Grave E, Bojanowski P, et al. Bag of Tricks for Efficient Text Classification [C]//Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017: 427-431.
[30] 吕正东, 李航. 深度匹配学习在语言匹配中的应用[J]. 中国计算机学会通讯, 2015, 11(8): 30-37.
[30] (Lv Zhengdong, Li Hang. Deep Matching Learning in Language Matching[J]. Newsletter of the Chinese Computer Society, 2015, 11(8): 30-37.)
[31] Huang P S, He X D, Gao J F, et al. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data [C]//Proceedings of the 22nd ACM International Conference on Information & Knowledge Management. 2013: 2333-2338.
[32] Kim Y. Convolutional Neural Networks for Sentence Classification [C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014: 1746-1751.
[33] Hu B, Lu Z, Hang L, et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences[J]. Advances in neural information processing systems, 2015, 3: 1-8.
[34] Qiu X P, Huang X J. Convolutional Neural Tensor Network Architecture for Community-Based Question Answering [C]//Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015: 1303-1311.
[35] Bahdanau D, Cho K H, Bengio Y. Neural Machine Translation by Jointly Learning to Align and Translate[OL]. arXiv Preprint, arXiv: 1409.0473..
[36] Graves A, Schmidhuber J. Framewise Phoneme Classification with Bidirectional LSTM and Other Neural Network Architectures[J]. Neural Networks, 2005, 18(5/6): 602-610.
doi: 10.1016/j.neunet.2005.06.042
[37] Maragos P, Gros P, Katsamanis A, et al. Cross-Modal Integration for Performance Improving in Multimedia: A Review[A]// Multimodal Processing and Interaction[M]. Boston, USA: Springer, 2008: 1-46.
[38] Pavlidis P, Weston J, Cai J S, et al. Learning Gene Functional Classifications from Multiple Data Types[J]. Journal of Computational Biology, 2002, 9(2): 401-411.
pmid: 12015889
[39] Wang Z, Zhang D Q, Zhou X S, et al. Discovering and Profiling Overlapping Communities in Location-Based Social Networks[J]. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2014, 44(4): 499-509.
doi: 10.1109/TSMC.2013.2256890
[40] Fu Y J, Ge Y, Zheng Y, et al. Sparse Real Estate Ranking with Online User Reviews and Offline Moving Behaviors [C]//Proceedings of 2014 IEEE International Conference on Data Mining. IEEE, 2014: 120-129.
[41] 余辉, 梁镇涛, 鄢宇晨. 多来源多模态数据融合与集成研究进展[J]. 情报理论与实践, 2020, 43(11): 169-178.
[41] (Yu Hui, Liang Zhentao, Yan Yuchen. Review on Multi-Source and Multi-Modal Data Fusion and Integration[J]. Information Studies: Theory&Application, 2020, 43(11): 169-178.)
[42] Wang J M, Pan M, He T T, et al. A Pseudo-Relevance Feedback Framework Combining Relevance Matching and Semantic Matching for Information Retrieval[J]. Information Processing & Management, 2020, 57(6): 102342.
doi: 10.1016/j.ipm.2020.102342
[43] Tien N H, Le N M, Tomohiro Y, et al. Sentence Modeling via Multiple Word Embeddings and Multi-Level Comparison for Semantic Textual Similarity[J]. Information Processing & Management, 2019, 56(6): 102090.
doi: 10.1016/j.ipm.2019.102090
[44] Lu W, Liu Z F, Huang Y, et al. How do Authors Select Keywords? A Preliminary Study of Author Keyword Selection Behavior[J]. Journal of Informetrics, 2020, 14(4): 101066.
doi: 10.1016/j.joi.2020.101066
[45] 李湘东, 巴志超, 黄莉. 一种基于加权LDA模型和多粒度的文本特征选择方法[J]. 现代图书情报技术, 2015(5): 42-49.
[45] (Li Xiangdong, Ba Zhichao, Huang Li. A Text Feature Selection Method Based on Weighted Latent Dirichlet Allocation and Multi-granularity[J]. New Technology of Library and Information Service, 2015(5): 42-49.)
[46] He X J, Meng X, Wu Y Y, et al. Semantic Matching Efficiency of Supply and Demand Texts on Online Technology Trading Platforms: Taking the Electronic Information of Three Platforms as an Example[J]. Information Processing & Management, 2020, 57(5): 102258.
doi: 10.1016/j.ipm.2020.102258
[47] Elkahky A M, Song Y, He X D. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems [C]//Proceedings of the 24th International Conference on World Wide Web. 2015: 278-288.
[48] Shen Y L, He X D, Gao J F, et al. A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval [C]//Proceedings of the 23rd ACM International Conference on Information and Knowledge Management. 2014: 101-110.
[49] Chen Q, Zhu X D, Ling Z H, et al. Enhanced LSTM for Natural Language Inference [C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 1657-1668.
[1] Li Guangjian,Wang Kai,Zhang Qingzhi. Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study[J]. 数据分析与知识发现, 2020, 4(9): 26-40.
[2] Hu Zhengyin,Liu Leilei,Dai Bing,Qin Xiaochu. Discovering Subject Knowledge in Life and Medical Sciences with Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(11): 1-14.
[3] Huiying Qi,Yuhe Jiang. Predicting Breast Cancer Survival Length with Multi-Omics Data Fusion[J]. 数据分析与知识发现, 2019, 3(8): 88-93.
[4] Yu Jian, Xu Chen, Wang Meijun, Zhang Minhao, Yue Zhen'gan, Wu Xia, Zhao Chunmei. Design and Application of Data Fusion Software on Papers Indexed By SCI and EI[J]. 现代图书情报技术, 2014, 30(11): 79-87.
[5] Niu Yazhen, Zhu Zhongming. A Linked Data-driven Semantic User Modeling Framework for Personalization Service[J]. 现代图书情报技术, 2012, (10): 1-7.
[6] Wan Liyun. The Technology of Datawarehouse and Its Application in Securities[J]. 现代图书情报技术, 2002, 18(4): 64-68.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn