Please wait a minute...
Advanced Search
数据分析与知识发现  2023, Vol. 7 Issue (11): 101-113     https://doi.org/10.11925/infotech.2096-3467.2022.0993
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于文本语义与关联网络双链路融合的用户生成内容价值早期识别研究*
王松1(),骆莹1,刘新民2
1山东科技大学经济管理学院 青岛 266590
2青岛农业大学经济管理学院 青岛 266109
Early Recognition of User-Generated Content Value with Text Semantics and Associative Network Dual-Link Fusion
Wang Song1(),Luo Ying1,Liu Xinmin2
1College of Economics & Management, Shandong University of Science and Technology, Qingdao 266590, China
2College of Economics & Management, Qingdao Agricultural University,Qingdao 266109, China
全文: PDF (1797 KB)   HTML ( 6
输出: BibTeX | EndNote (RIS)      
摘要 

目的】 为缓解虚拟社区中对价值性内容识别的时滞性、过载性问题,通过构建特征体系与算法模型提升早期识别的效率。【方法】 综合考量用户生成内容早期的文本语义和用户、文本间显隐性交互关联的网络结构,构建双链路融合算法进行处理。在文本语义链路中,采用BERT+BiLSTM+Linear获取深层语义特征;在关联网络链路中,采纳GAT处理节点的浅层数值特征和关联特征;继而利用卷积层优化上述双链路的融合信息,最终完成价值早期识别的目的。【结果】 所构建的双链路融合模型对魅族Flyme社区数据的处理准确率为89.80%,相较于单独的文本语义链路和关联网络链路,准确率分别提高了3.45和3.20个百分点。相较于其他基线模型,准确率和F1值均有不同程度的提升。【局限】 模型的泛化能力有待进一步提升,缺乏对图片、外部链接等富文本内容的深入挖掘。【结论】 基于深度学习融合模型对序列型文本语义、拓扑型网络结构进行综合性处理,能进一步提高对价值性文本早期识别的准确性。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王松
骆莹
刘新民
关键词 早期识别融合模型双向长短期记忆网络图注意力网络    
Abstract

[Objective] This paper proposes a feature system and new model to improve the efficiency of early recognition, aiming to address the issues of time delay and overload in recognizing valuable content from virtual communities. [Methods] We constructed a dual-link fusion algorithm with the text semantics of user-generated content and the network structure of explicit and implicit interaction between users and texts. In the text semantic link, we used the BERT+BiLSTM+Linear to obtain the deep semantic features. In the association network link, we adopted GAT to process the shallow numerical characteristics and association characteristics of the nodes. Finally, we utilized the convolution layer to optimize the fusion information of the above dual links and achieved early value recognition. [Results] The dual-link fusion model had a processing accuracy of 89.80% for data from the Meizu Flyme community, which was 3.45% and 3.20% higher than that of the single text semantic link and associated network link, respectively. Compared with other baseline models, the accuracy and F1 values were also improved. [Limitations] The generalization ability of the model needs to be further improved, and we should have analyzed rich text content (i.e., pictures and external links). [Conclusions] The deep learning fusion model improves the accuracy of early recognition of valuable texts by processing sequential text semantics and topological network structure.

Key wordsEarly Recognition    Fusion Model    BiLSTM    Graph Attention Network
收稿日期: 2022-09-21      出版日期: 2023-03-22
ZTFLH:  G206  
基金资助:*国家自然科学基金项目(71471105);山东省社会科学规划项目的研究成果之一(18CGLJ38)
通讯作者: 王松,ORCID:0000-0001-9101-7702,E-mail: tiatusw@126.com。   
引用本文:   
王松, 骆莹, 刘新民. 基于文本语义与关联网络双链路融合的用户生成内容价值早期识别研究*[J]. 数据分析与知识发现, 2023, 7(11): 101-113.
Wang Song, Luo Ying, Liu Xinmin. Early Recognition of User-Generated Content Value with Text Semantics and Associative Network Dual-Link Fusion. Data Analysis and Knowledge Discovery, 2023, 7(11): 101-113.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.0993      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I11/101
Fig.1  用户-文本实体显隐性交互网络
所属网络 名称 符号 含义
用户关联
网络
用户权威性 authority 用户节点的威望
用户活跃度 activity 用户节点的发帖数量
度中心性 degree 与其他节点的连接数量
中间中心性 betweenness 用户节点的重要性
接近中心性 closeness 用户节点到其他节点的距离
用户领袖性 pagerank 用户节点的影响力
文本关联
网络
文本长度 length 文本的长度
丰富性 richness 文本是否含有图片、链接
情感极性 emotion 文本所彰显的情感
互动性 interactivity 文本内容中所涉及的人称表述
准确性 accuracy 文本内容中主题概率
Table 1  双重网络节点特征
Fig.2  基于双链路融合的内容价值早期识别模型
Fig.3  BERT模型处理结构图
符号 含义 符号 含义
post_id 文本ID post_them 文本主题
post_content 文本内容 post_picture 文本中是否有图片
author_id 发帖人ID post_num 发帖数量
author_vip 发帖人头衔 author_reputation 发帖人声望
listen_num 关注数量 fans_num 粉丝数量
review_author_id 评论人ID review_content 评论内容
Table 2  数据符号及含义
Fig.4  主题困惑度、一致性变化
Fig.5  词云图
内容 权威人员评论 初次标注 优化标注 最终标注
1. 17pro更新F9系统后息屏状态下会闪屏;2. 偶尔指纹识别位置会常量,
且不能识别指纹,需要重启……
反馈后等优化吧。 1 1 1
能否增加侧滑返回的震动?之前用过一段时间小米11,侧滑返回震动挺
舒服的,换回魅族猛一下还有点不太适应,希望后续增加。
会做相关考虑。 1 1 1
安装未知应用检测如题,每次安装是都要检测一下是否通过魅族商城验证,
这个东西能不能关掉,个人感觉没有用处。
这个是安全提示,有些用户可能会需要。 1 1 1
魅族18丐版谁要,6月份买的18,白色,低价,有人要不! 温馨提示:网络交易请注意交易风险。 1 0 0
我想我要换块屏幕了,魅族17,煤油们有推荐的店嘛,我要换块屏幕,摔
得有一点断触,煤油们新年快乐。
建议选择官方配件。 1 0 0
Table 3  标注样例
实验参数名称 参数值 实验参数名称 参数值
学习率(lr 1×10-5 关联网络链路学习率(g_lr 0.000 1
训练迭代数(epochs 30 关联网络链路迭代次数(g_epochs 1000
训练批量数(batch_size 64 图注意力层的头数(layer 8
文本序列长度(max_length 100 GAT处理维度(hidden_dim 8
字向量嵌入维度(in_dim 768 卷积核大小(kernel_size 1
BiLSTM处理维度(hidden_dim 100 Conv处理维度(out_channels 2
Linear处理维度(output_size 2 融合模型失活率(dropout 0.5
优化器(Optimizer Adam L2正则项参数(Weight_decay 5×10-4
Table 4  实验参数名称及参数值
模型 Acc/% F1/% P/% R/%
双链路融合模型 89.80 77.21 84.34 71.19
文本语义链路BERT+BiLSTM+Linear 86.35 73.65 68.64 79.45
关联网络链路GAT 86.60 72.50 78.35 67.46
Table 5  双链路融合结果
单链路 模型 Acc/% F1/% P/% R/%
文本语义
链路
BERT+BiLSTM+Linear 86.35 73.65 68.64 79.45
Embedding+BiLSTM 83.96 53.01 89.43 37.67
BERT+BiLSTM 85.44 60.05 89.86 45.08
BERT+CNN+Linear 84.70 70.94 65.23 77.74
关联网络
链路
图注意力神经网络 86.60 72.50 78.35 67.46
卷积神经网络 82.89 46.90 94.57 31.18
全连接网络 32.99 33.45 21.97 70.04
Table 6  单链路结果
[1] 王楠, 陈详详, 祁运丽, 等. 基于详尽可能性模型的用户创新社区创意采纳影响因素研究[J]. 中国管理科学, 2020, 28(3): 213-222.
[1] (Wang Nan, Chen Xiangxiang, Qi Yunli, et al. The Research on Influence Factors of User Innovation Community Idea Adoption Based on Elaboration Likelihood Model[J]. Chinese Journal of Management Science, 2020, 28(3): 213-222.)
[2] 易明, 张婷婷. 大众性问答社区答案质量排序方法研究[J]. 数据分析与知识发现, 2019, 3(6): 2-20.
[2] (Yi Ming, Zhang Tingting. Ranking Answer Quality of Popular Q&A Community[J]. Data Analysis and Knowledge Discovery, 2019, 3(6): 12-20.)
[3] 马帅, 刘建伟, 左信. 图神经网络综述[J]. 计算机研究与发展, 2022, 59(1): 47-80.
[3] (Ma Shuai, Liu Jianwei, Zuo Xin. Survey on Graph Neural Network[J]. Journal of Computer Research and Development, 2022, 59(1): 47-80.)
[4] 史加荣, 马媛媛. 深度学习的研究进展与发展[J]. 计算机工程与应用, 2018, 54(10): 1-10.
doi: 10.3778/j.issn.1002-8331.1712-0418
[4] (Shi Jiarong, Ma Yuanyuan. Research Progress and Development of Deep Learning[J]. Computer Engineering and Applications, 2018, 54(10): 1-10.)
doi: 10.3778/j.issn.1002-8331.1712-0418
[5] Sherstinsky A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network[J]. Physica D: Nonlinear Phenomena, 2020, 404: 132306.
doi: 10.1016/j.physd.2019.132306
[6] Chaudhari S, Mithal V, Polatkan G, et al. An Attentive Survey of Attention Models[J]. ACM Transactions on Intelligent Systems and Technology, 2021, 12(5): 1-32.
[7] 李德顺. 价值论: 一种主体性的研究[M]. 第3版. 北京: 中国人民大学出版社, 2013.
[7] (Li Deshun. Axiology: A Study of Subjectivity[M]. The 3rd Edition. Beijing: China Renmin University Press, 2013.)
[8] 唐晓波, 向莉丽, 牟昊. 基于研究问题与研究方法贡献的论文学术价值早期识别方法[J]. 情报科学, 2022, 40(9): 3-11, 19.
[8] (Tang Xiaobo, Xiang Lili, Mou Hao. Early Identification Method of Academic Value of Papers Based on Research Question and Research Method Contribution[J]. Information Science, 2022, 40(9): 3-11, 19.)
[9] 王松, 杨洋, 刘新民. 基于图注意力网络的开放式创新社区用户创意潜在价值发现研究[J]. 数据分析与知识发现, 2021, 5(11): 89-101.
[9] (Wang Song, Yang Yang, Liu Xinmin. Discovering Potentialities of User Ideas from Open Innovation Communities with Graph Attention Network[J]. Data Analysis and Knowledge Discovery, 2021, 5(11): 89-101.)
[10] 李蕾, 张琳琳, 王傲, 等. 社交媒体环境下学术型用户生成内容质量评估研究[J]. 情报理论与实践, 2023, 46(2): 175-183.
[10] (Li Lei, Zhang Linlin, Wang Ao, et al. Quality Evaluation of Academic User Generated Content on Social Media[J]. Information Studies: Theory & Application, 2023, 46(2): 175-183.)
[11] 周知, 李名子, 崔旭. 基于领域情感词典的用户生成内容有用性评价研究——以豆瓣读书为例[J]. 情报理论与实践, 2022, 45(1): 86-92.
doi: 10.16353/j.cnki.1000-7490.2022.01.012
[11] (Zhou Zhi, Li Mingzi, Cui Xu. Research on Helpfulness Evaluation of User Generate Content Based on Domain Sentiment Lexicon: Taking Douban Reading as an Example[J]. Information Studies: Theory & Application, 2022, 45(1): 86-92.)
doi: 10.16353/j.cnki.1000-7490.2022.01.012
[12] 洪闯, 李贺, 毛太田. 开放式创新社区用户知识贡献的采纳机理研究[J]. 现代情报, 2020, 40(5): 33-40.
doi: 10.3969/j.issn.1008-0821.2020.05.005
[12] (Hong Chuang, Li He, Mao Taitian. Study on the Adoption Mechanism of Knowledge Contribution from Open Innovation Community Users[J]. Journal of Modern Information, 2020, 40(5): 33-40.)
doi: 10.3969/j.issn.1008-0821.2020.05.005
[13] 陶晓波, 徐鹏宇, 樊潮, 等. 创新社区中新产品开发人员信息采纳行为的影响机理研究[J]. 管理评论, 2020, 32(10): 135-146.
[13] (Tao Xiaobo, Xu Pengyu, Fan Chao, et al. Research on the Influence Mechanism of Information Adoption Behavior of New Product Developers in Innovation Community[J]. Management Review, 2020, 32(10): 135-146.)
[14] Han C J, Yang M. Stimulating Innovation on Social Product Development: An Analysis of Social Behaviors in Online Innovation Communities[J]. IEEE Transactions on Engineering Management, 2022, 69(2): 365-375..
doi: 10.1109/TEM.2019.2955073
[15] Zhang M, Fan B, Zhang N, et al. Mining Product Innovation Ideas from Online Reviews[J]. Information Processing & Management, 2021, 58(1): 102389.
doi: 10.1016/j.ipm.2020.102389
[16] 易明, 李藿然, 刘继月. 基于GloVe-BiLSTM的在线研讨信息分类模型研究[J]. 情报理论与实践, 2022, 45(9): 173-179.
doi: 10.16353/j.cnki.1000-7490.2022.09.023
[16] (Yi Ming, Li Huoran, Liu Jiyue. Research on Online Discussion Information Classification Model Based on GloVe-BiLSTM[J]. Information Studies: Theory & Application, 2022, 45(9): 173-179.)
doi: 10.16353/j.cnki.1000-7490.2022.09.023
[17] 韩普, 张伟, 张展鹏, 等. 基于特征融合和多通道的突发公共卫生事件微博情感分析[J]. 数据分析与知识发现, 2021, 5(11): 68-79.
[17] (Han Pu, Zhang Wei, Zhang Zhanpeng, et al. Sentiment Analysis of Weibo Posts on Public Health Emergency with Feature Fusion and Multi-Channel[J]. Data Analysis and Knowledge Discovery, 2021, 5(11): 68-79.)
[18] 汪兰兰, 姚春龙, 李旭, 等. 结合依存句法分析与交互注意力机制的隐式方面提取[J]. 计算机应用研究, 2022, 39(1): 37-42.
[18] (Wang Lanlan, Yao Chunlong, Li Xu, et al. Combining Dependency Syntactic Parsing with Interactive Attention Mechanism for Implicit Aspect Extraction[J]. Application Research of Computers, 2022, 39(1): 37-42.)
[19] 张合桥, 苟刚, 陈青梅. 基于图神经网络的方面级情感分析[J]. 计算机应用研究, 2021, 38(12): 3574-3580, 3585.
[19] (Zhang Heqiao, Gou Gang, Chen Qingmei. Aspect-Based Sentiment Analysis Based on Graph Neural Network[J]. Application Research of Computers, 2021, 38(12): 3574-3580, 3585.)
[20] Chen W Y, Chen H H. Collaborative Co-Attention Network for Session-Based Recommendation[J]. Mathematics, 2021, 9(12): 1392.
doi: 10.3390/math9121392
[21] 张继东, 蒋丽萍. 基于多模态深度学习的旅游评论反讽识别研究[J]. 情报理论与实践, 2022, 45(7): 158-164.
doi: 10.16353/j.cnki.1000-7490.2022.07.022
[21] (Zhang Jidong, Jiang Liping. Research on Irony Recognition of Travel Reviews Based on Multi-Modal Deep Learning[J]. Information Studies: Theory & Application, 2022, 45(7): 158-164.)
doi: 10.16353/j.cnki.1000-7490.2022.07.022
[22] 蒋雨肖, 丁晟春, 吴鹏. 基于BiLSTM-VGG16的多模态信息特征分类研究[J]. 情报理论与实践, 2021, 44(11): 180-186, 179.
doi: 10.16353/j.cnki.1000-7490.2021.11.024
[22] (Jiang Yuxiao, Ding Shengchun, Wu Peng. A Study on the Classification of Features of Multi-Modal Information Based on BiLSTM-VGG16[J]. Information Studies: Theory & Application, 2021, 44(11): 180-186, 179.)
doi: 10.16353/j.cnki.1000-7490.2021.11.024
[23] 许晶航, 左万利, 梁世宁, 等. 基于图注意力网络的因果关系抽取[J]. 计算机研究与发展, 2020, 57(1): 159-174.
[23] (Xu Jinghang, Zuo Wanli, Liang Shining, et al. Causal Relation Extraction Based on Graph Attention Networks[J]. Journal of Computer Research and Development, 2020, 57(1): 159-174.)
[24] Sussman S W, Siegal W S. Informational Influence in Organizations: An Integrated Approach to Knowledge Adoption[J]. Information Systems Research, 2003, 14(1): 47-65.
doi: 10.1287/isre.14.1.47.14767
[25] 沈旺, 李世钰, 刘嘉宇, 等. 问答社区回答质量评价体系优化方法研究[J]. 数据分析与知识发现, 2021, 5(2): 83-93.
[25] (Shen Wang, Li Shiyu, Liu Jiayu, et al. Optimizing Quality Evaluation for Answers of Q&A Community[J]. Data Analysis and Knowledge Discovery, 2021, 5(2): 83-93.)
[26] 严炜炜, 黄为, 温馨. 学术社交网络问答质量智能评价与服务优化研究[J]. 图书情报工作, 2021, 65(6): 129-137.
doi: 10.13266/j.issn.0252-3116.2021.06.014
[26] (Yan Weiwei, Huang Wei, Wen Xin. Intelligent Quality Evaluation and Service Optimization of Q&A in Academic Social Networking Site[J]. Library and Information Service, 2021, 65(6): 129-137.)
doi: 10.13266/j.issn.0252-3116.2021.06.014
[27] 郭顺利, 张向先, 陶兴, 等. 社会化问答社区用户生成答案质量自动化评价研究——以“知乎”为例[J]. 图书情报工作, 2019, 63(11): 118-130.
doi: 10.13266/j.issn.0252-3116.2019.11.013
[27] (Guo Shunli, Zhang Xiangxian, Tao Xing, et al. Research on Automated Evaluation of User Generated Answer Quality in Social Question and Answer Community—Taking “Zhihu” as an Example[J]. Library and Information Service, 2019, 63(11): 118-130.)
doi: 10.13266/j.issn.0252-3116.2019.11.013
[28] Bonacich P. Factoring and Weighting Approaches to Status Scores and Clique Identification[J]. The Journal of Mathematical Sociology, 1972, 2(1): 113-120.
doi: 10.1080/0022250X.1972.9989806
[29] Freeman L C. A Set of Measures of Centrality Based on Betweenness[J]. Sociometry, 1977, 40(1): 35.
doi: 10.2307/3033543
[30] Bavelas A. Communication Patterns in Task-Oriented Groups[J]. The Journal of the Acoustical Society of America, 1950, 22(6): 725-730.
doi: 10.1121/1.1906679
[31] 杨东红, 吴邦安, 孙晓春. 基于机器学习的网络评论信息有用性预测模型研究[J]. 情报科学, 2019, 37(12): 34-39, 77.
[31] (Yang Donghong, Wu Bangan, Sun Xiaochun. Research on the Helpfulness Prediction Model of Online Review Information Based on Machine Learning[J]. Information Science, 2019, 37(12): 34-39, 77.)
[32] 张瑞, 何禄鑫, 黄炜. 多特征融合下视频网站弹幕信息有用性检测研究[J]. 现代情报, 2022, 42(4): 99-109.
doi: 10.3969/j.issn.1008-0821.2022.04.009
[32] (Zhang Rui, He Luxin, Huang Wei. Research on Usefulness Detection of Danmaku Information in Video Websites Based on Multi-Feature Fusion[J]. Journal of Modern Information, 2022, 42(4): 99-109.)
doi: 10.3969/j.issn.1008-0821.2022.04.009
[33] 陈远高, 应梦茜, 毕然, 等. 管理者回复对在线评论与有用性关系的调节效应: 基于TripAdvisor的实证研究[J]. 管理工程学报, 2021, 35(5): 110-116.
[33] (Chen Yuangao, Ying Mengqian, Bi Ran, et al. The Moderating Effect of Manager Response on the Relationship Between Online Review and Review Helpfulness: An Empirical Study of TripAdvisor[J]. Journal of Industrial Engineering and Engineering Management, 2021, 35(5): 110-116.)
[1] 刘向, 刘香, 余博文. 创新二重性视角下明星发明人类型的早期识别*[J]. 数据分析与知识发现, 2023, 7(2): 119-128.
[2] 张顺香, 张镇江, 朱广丽, 赵彤, 黄菊. 基于Bi-LSTM与双路CNN的金融领域文本因果关系识别*[J]. 数据分析与知识发现, 2022, 6(7): 118-127.
[3] 顾耀文, 张博文, 郑思, 杨丰春, 李姣. 基于图注意力网络的药物ADMET分类预测模型构建方法*[J]. 数据分析与知识发现, 2021, 5(8): 76-85.
[4] 成彬,施水才,都云程,肖诗斌. 基于融合词性的BiLSTM-CRF的期刊关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(3): 101-108.
[5] 王松, 杨洋, 刘新民. 基于图注意力网络的开放式创新社区用户创意潜在价值发现研究*[J]. 数据分析与知识发现, 2021, 5(11): 89-101.
[6] 余传明, 龚雨田, 王峰, 安璐. 基于文本价格融合模型的股票趋势预测*[J]. 数据分析与知识发现, 2018, 2(12): 33-42.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn