Please wait a minute...
Advanced Search
数据分析与知识发现  2024, Vol. 8 Issue (4): 99-111     https://doi.org/10.11925/infotech.2096-3467.2023.1383
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
融合学术文本词汇功能属性的交叉领域新兴社群预测*
操玉杰1,向荣荣1,毛进2(),袁丹妮1
1华中师范大学信息管理学院 武汉 430079
2武汉大学信息管理学院 武汉 430072
Predicting Emerging Interdisciplinary Communities with Functional Attributes of Academic Texts
Cao Yujie1,Xiang Rongrong1,Mao Jin2(),Yuan Danni1
1School of Information Management, Central China Normal University, Wuhan 430079, China
2School of Information Management, Wuhan University, Wuhan 430072, China
全文: PDF (1892 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

【目的】 充分挖掘科学知识网络社群多元特征,提升领域新兴趋势预测效果。【方法】 基于e-Health领域新兴社群到热点社群的成长路径回溯,本文提出一种融合词汇功能属性的新兴趋势多元特征预测模型。【结果】 在e-Health领域,所融合的主题、技术等词汇功能属性特征能够提升新兴趋势预测性能,综合结构、影响、序列和属性4组特征的RF算法模型效果最佳。词汇功能属性规模大、密度低、中介中心性高、波动率大的社群更有可能成为新兴社群。序列特征对新兴社群预测效果欠佳,可能受到新兴社群的前瞻性影响。【局限】 词汇功能识别结果存在一定领域依赖,结论扩展到其他领域的有效性需进一步验证。【结论】 充分挖掘科学文本词汇细粒度语义特征,能够有效提升新兴趋势预测性能,对科学内容评价和科技决策支持具有一定参考意义。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
操玉杰
向荣荣
毛进
袁丹妮
关键词 新兴趋势词汇功能社群预测机器学习    
Abstract

[Objective] This paper explores the diverse characteristics of knowledge network communities to enhance the predicting effectiveness of emerging scientific trends. [Methods] Based on the retrospective growth path of e-Health communities, we proposed a model integrating vocabulary functional attributes to predict emerging trends with diverse features. [Results] In the e-Health field, integrating topic, technical, and other vocabulary functional attribute features can improve the prediction performance of emerging trends. The RF algorithm model, which combines structure, influence, sequence, and attribute features, performed the best. Communities with large vocabulary functional attribute scales, low density, high mediated centrality, and high volatility were more likely to become emerging communities. Sequence features have limited effectiveness in predicting emerging communities, possibly due to the forward-looking impact of emerging communities. [Limitations] The identification results of vocabulary functionality are domain-dependent, and the validity of the conclusions extended to other fields needs further verification. [Conclusions] Fully exploring the fine-grained semantic features of scientific vocabulary can effectively enhance the prediction performance of emerging trends. It provides valuable insights for scientific content evaluation and technology decision support.

Key wordsEmerging Trend    Lexical Function    Community Prediction    Machine Learning
收稿日期: 2023-12-15      出版日期: 2024-05-17
ZTFLH:  G250  
基金资助:* 国家社会科学基金项目(20CTQ024)
通讯作者: 毛进,ORCID: 0000-0001-9572-6709,E-mail: danveno@163.com。   
引用本文:   
操玉杰, 向荣荣, 毛进, 袁丹妮. 融合学术文本词汇功能属性的交叉领域新兴社群预测*[J]. 数据分析与知识发现, 2024, 8(4): 99-111.
Cao Yujie, Xiang Rongrong, Mao Jin, Yuan Danni. Predicting Emerging Interdisciplinary Communities with Functional Attributes of Academic Texts. Data Analysis and Knowledge Discovery, 2024, 8(4): 99-111.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2023.1383      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2024/V8/I4/99
Fig.1  交叉领域新兴社群预测分析框架
社群类型 计量指标 计量方式 判定规则
热点社群 社群规模 社群内节点数量 年份t下社群规模大于所有社群节点规模平均值的社群
新兴社群 社群相似度 J a c c a r d C i t - 1 , C j t = C i t - 1 ? C j t C i t - 1 ? C j t 年份t-1下与t年份的某个热点社群的相似度大于平均值的社群中(均值计算不考虑相似度为0的社群),加权增长率大于均值且不为0的社群
加权增长率 W e i g h t e d _ ? g r o w t h _ r a t e C i t - 1 , C j t = j o i n ( n j t ) - l e a v e ( n j t ) n i t - 1
Table 1  热点社群和新兴社群的判定指标
功能类别 功能描述 举例
研究主题 与医学研究问题有关的主题词,如疾病、研究领域以及一些背景信息 information, depression, diabetes, health information
理论 有具体名称的理论知识短语,包括一些常见的理论框架/模型 TAM, social cognitive theory, transtheoretical model
研究方法 电子健康领域常用的研究方法,包括一些一般研究方法、分析技术、测量尺度、准则、评价指标 systematic review, meta analysis, randomize control trial
技术 在研究中提及的医疗仪器、物理装置和医疗管理系统,用于医疗服务中的治疗、诊断或干预 mobile phone, Web, smartphone, APP
人类实体 作为研究目标群体的个人或组织,包括病人群体、医疗保健人员群体和医疗保健相关组织 patient, woman, child, adolescent
数据 与数据集、数据源和数据材料有关的短语 tweet, qualitative datum, clinical datum
其他 不能列入上述类别的其他短语,如地理位置、项目和一些无意义的短语 study, use, result, outcome, Canada
Table 2  词汇功能属性类别说明
特征维度 具体特征 测度指标
结构 规模 社群节点比率[5]
社群连边比率[5]
关联度 社群密度[17]
社群关系强度[20]
影响 热度 社群节点出现率[4]
影响力 社群点度中心性[33]
社群特征向量中心性[33]
社群接近中心性[13]
社群中介中心性[13]
序列 时间变化 社群新加入节点比率[5]
社群波动率[34]
属性 学科交叉 社群学科多样性[35]
词汇功能属性 社群主题词/技术词数量
社群主题词/技术词规模
社群主题词/技术词数量占比
社群主题词/技术词规模占比
Table 3  社群特征汇总
词汇功能属性 词汇数量(个) 词汇占比(%)
研究主题 16 365 25
技术 5 363 8
其他 43 664 67
Table 4  e-Health领域词汇功能属性识别
年份(年) 社群数量
(个)
平均节点
数量(个)
平均连边
数量(个)
平均节点
频次
1999 26 4.846 2 3.846 2 5.269 2
2000 56 4.767 9 3.767 9 5.392 9
2001 79 4.746 8 3.746 8 5.886 1
2002 107 4.850 5 3.850 5 6.420 6
2003 140 4.835 7 7.671 4 6.685 7
2004 185 4.886 5 3.886 5 6.935 1
2005 247 5.307 7 4.307 7 7.684 2
2006 294 5.442 2 4.442 2 8.136 1
2007 358 5.648 0 4.648 0 8.743 0
2008 441 5.816 3 4.816 3 9.365 1
2009 507 5.907 3 4.907 3 9.897 4
2010 589 6.008 5 5.008 5 10.483 9
2011 713 6.134 6 5.134 6 11.266 5
2012 922 6.321 0 5.321 0 11.963 1
2013 1 221 6.568 4 5.568 4 12.933 7
2014 1 471 6.770 2 5.770 2 14.015 6
2015 1 806 6.844 4 5.844 4 14.942 4
2016 2 215 6.957 6 17.872 7 15.813 5
2017 2 661 6.944 0 5.944 0 16.532 1
2018 3 175 6.986 8 5.986 8 17.438 7
2019 3 892 6.978 2 11.956 3 18.232 8
2020 5 042 7.026 2 6.026 2 19.585 1
2021 6 077 7.122 3 6.122 3 20.977 8
2022 6 775 7.144 2 6.144 2 19.572 3
Table 5  1999年~2022年社群基本信息统计
Fig.2  1999年~2021年新兴社群分布情况
Fig.3  4种特征组合新兴社群预测模型F1值
Fig.4  基于所有特征的RF模型特征评估
(注:横坐标为SHAP Value,纵坐标为特征名称且按照特征重要性(SHAP Value的平均绝对值)降序排序。其中,每一行代表一个特征,一个点代表一个样本。SHAP Value值为正,则说明特征对输出结果产生正向影响;反之,则产生负向影响。样本颜色越红,则说明特征值越大;颜色越蓝,则说明特征值越小。)
Fig.5  特征分组重要性评估
Fig.6  特征分组评估
[1] Matsumura N, Matsuo Y, Ohsawa Y, et al. Discovering Emerging Topics from WWW[J]. Journal of Contingencies and Crisis Management, 2002, 10(2): 73-81.
[2] Zhang S T, Han F. Identifying Emerging Topics in a Technological Domain[J]. Journal of Intelligent & Fuzzy Systems, 2016, 31(4): 2147-2157.
[3] Tu Y N, Seng J L. Indices of Novelty for Emerging Topic Detection[J]. Information Processing & Management, 2012, 48(2): 303-325.
[4] 叶光辉, 王灿灿, 李松烨. 基于SciTS会议文本的跨学科科研协作新兴主题识别及预测[J]. 情报科学, 2022, 40(7): 126-135.
[4] (Ye Guanghui, Wang Cancan, Li Songye. Recognition and Prediction of Emerging Topics in Interdisciplinary Scientific Research Collaboration Based on SciTS Conference Text[J]. Information Science, 2022, 40(7): 126-135.)
[5] Brodka P, Saganowski S, Kazienko P. Group Evolution Discovery in Social Networks[C]// Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining. Piscataway: IEEE, 2011: 247-253.
[6] Yin G S, Chi K, Dong Y X, et al. An Approach of Community Evolution Based on Gravitational Relationship Refactoring in Dynamic Networks[J]. Physics Letters A, 2017, 381(16): 1349-1355.
[7] Chen J, Zhao H T, Yang X Y, et al. Community Evolution Prediction Based on Multivariate Feature Sets and Potential Structural Features[J]. Mathematics, 2022, 10(20): 3802.
[8] Tajeuna E G, Bouguessa M, Wang S R. Modeling and Predicting Community Structure Changes in Time-Evolving Social Networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(6): 1166-1180.
[9] Saganowski S, Bródka P, Koziarski M, et al. Analysis of Group Evolution Prediction in Complex Networks[J]. PLoS One, 2019, 14(10): e0224194.
[10] Yu W, Wang W J, Chen X, et al. Boosting Temporal Community Detection via Modeling Community Evolution Characteristics[C]// Proceedings of the 2019 IEEE International Conference on Big Data and Cloud Computing. Piscataway: IEEE, 2019: 1291-1296.
[11] Dakiche N, Benbouzid-Si Tayeb F, Benatchba K, et al. Tailored Network Splitting for Community Evolution Prediction in Dynamic Social Networks[J]. New Generation Computing, 2021, 39(1): 303-340.
[12] İlhan N,Öğüdücü Ş G. Predicting Community Evolution Based on Time Series Modeling[C]// Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. New York: ACM, 2015: 1509-1516.
[13] Wang Z, Xu Q G, Li W M. Multi-Layer Feature Fusion-Based Community Evolution Prediction[J]. Future Internet, Basel: Mdpi, 2022, 14(4): 113.
[14] Gao J Q, Luo X F, Wang H. An Uncertain Future: Predicting Events Using Conditional Event Evolutionary Graph[J]. Concurrency and Computation: Practice and Experience, 2021, 33(9): e6164.
[15] Hu W J, Yang Y, Cheng Z Q, et al. Time-Series Event Prediction with Evolutionary State Graph[C]// Proceedings of the 14th ACM International Conference on Web Search and Data Mining. New York: ACM, 2021: 580-588.
[16] 庞云黠. 属性与关系的再认识——社会网络分析研究现状与演进[J]. 新闻与传播评论, 2019, 72(3): 117-128.
[16] (Pang Yunxia. Rethinking the Relational Data and Attribute Data: The Status and Development of Social Network Analysis[J]. Journalism & Communication Review, 2019, 72(3): 117-128.)
[17] Hu C P, Hu J M, Deng S L, et al. A Co-Word Analysis of Library and Information Science in China[J]. Scientometrics, 2013, 97(2): 369-382.
[18] Zheng J, Gong J Y, Li R, et al. Community Evolution Analysis Based on Co-Author Network: A Case Study of Academic Communities of the Journal of “Annals of the Association of American Geographers”[J]. Scientometrics, 2017, 113(2): 845-865.
[19] 王曰芬, 李冬琼, 余厚强. 生命周期阶段中的科学合作网络演化及高影响力学者成长特征研究[J]. 情报学报, 2018, 37(2): 121-131.
[19] (Wang Yuefen, Li Dongqiong, Yu Houqiang. Research on the Evolution of the Scientific Collaboration Network and the Growth of the High-Impact Author in the Life Cycle Phase[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(2): 121-131.)
[20] Li L J, Fang S Y, Bai S S, et al. Effective Link Prediction Based on Community Relationship Strength[J]. IEEE Access, 2019, 7: 43233-43248.
doi: 10.1109/ACCESS.2019.2908208
[21] 陆伟, 李鹏程, 张国标, 等. 学术文本词汇功能识别——基于BERT向量化表示的关键词自动分类研究[J]. 情报学报, 2020, 39(12): 1320-1329.
[21] (Lu Wei, Li Pengcheng, Zhang Guobiao, et al. Recognition of Lexical Functions in Academic Texts: Automatic Classification of Keywords Based on BERT Vectorization[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(12): 1320-1329.)
[22] Kadkhoda Mohammadmosaferi K, Naderi H. AFIF: Automatically Finding Important Features in Community Evolution Prediction for Dynamic Social Networks[J]. Computer Communications, 2021, 176: 66-80.
[23] Huo C G, Ma S T, Liu X Z. Hotness Prediction of Scientific Topics Based on a Bibliographic Knowledge Graph[J]. Information Processing & Management, 2022, 59(4): 102980.
[24] Bok K, Noh Y, Lim J, et al. Hot Topic Prediction Considering Influence and Expertise in Social Media[J]. Electronic Commerce Research, 2021, 21(3): 671-687.
[25] Wei T, Li M H, Wu C S, et al. Do Scientists Trace Hot Topics?[J]. Scientific Reports, 2013, 3: Article No.2207.
[26] 张学武, 沈浩东, 赵沛然, 等. 基于事件框架的社区进化预测研究[J]. 计算机学报, 2017, 40(3): 729-742.
[26] (Zhang Xuewu, Shen Haodong, Zhao Peiran, et al. Research on Community Evolution Prediction Based on Event-Based Frameworks[J]. Chinese Journal of Computers, 2017, 40(3): 729-742.)
[27] 张柳. 社交网络舆情用户主题图谱构建及舆情引导策略研究[D]. 长春: 吉林大学, 2021.
[27] (Zhang Liu. Research on Construction of User Topic Graph in Social Networks and Guiding Strategy[D]. Changchun: Jilin University, 2021.)
[28] 马费成, 陈锐, 袁红. 科学信息离散分布规律的研究从文献单元到内容单元的实证分析(I): 总体研究框架[J]. 情报学报, 1999(1): 79-84.
[28] (Ma Feicheng, Chen Rui, Yuan Hong. Study on the Law of Scattering Distribution of Scientific Information—Demonstration Analysis from Document Level to Content Level(I): Overall Research Frame[J]. Journal of the China Society for Scientific and Technical Information, 1999(1): 79-84.)
[29] Wang S Y, Mao J, Cao Y J, et al. Integrated Knowledge Content in an Interdisciplinary Field: Identification, Classification, and Application[J]. Scientometrics, 2022, 127(11): 6581-6614.
[30] Abuzayed A, Al-Khalifa H. BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique[J]. Procedia Computer Science, 2021, 189: 191-194.
[31] Wang S Y, Mao J, Lu K, et al. Understanding Interdisciplinary Knowledge Integration Through Citance Analysis: A Case Study on eHealth[J]. Journal of Informetrics, 2021, 15(4): 101214.
[32] Wang S Y, Mao J, Tang J, et al. Content Characteristics of Knowledge Integration in the eHealth Field: An Analysis Based on Citation Contexts[J]. Journal of Data and Information Science, 2021, 6(3): 58-74.
doi: 10.2478/jdis-2021-0015
[33] Behrouzi S, Shafaeipour Sarmoor Z, Hajsadeghi K, et al. Predicting Scientific Research Trends Based on Link Prediction in Keyword Networks[J]. Journal of Informetrics, 2020, 14(4): 101079.
[34] Xu L W, Qiu J N, Zhai J. Trend Prediction Model of Online Public Opinion in Emergencies Based on Fluctuation Analysis[J]. Natural Hazards, 2023, 116(3): 3301-3320.
[35] 侯海燕, 王亚杰, 梁国强, 等. 基于期刊学科分类的学科交叉特征识别方法——以生物医学工程领域为例[J]. 中国科技期刊研究, 2017, 28(4): 350-357.
doi: 10.11946/cjstp.201611070913
[35] (Hou Haiyan, Wang Yajie, Liang Guoqiang, et al. Interdisciplinary Feature Identification Method Based on Journal Subject Category: A Case Study of Biomedical Engineering[J]. Chinese Journal of Scientific and Technical Periodicals, 2017, 28(4): 350-357.)
doi: 10.11946/cjstp.201611070913
[36] 商宪丽, 王施运, 操玉杰, 等. 基于引文内容分析的交叉领域知识内化过程研究[J]. 情报科学, 2023, 41(2): 118-125.
[36] (Shang Xianli, Wang Shiyun, Cao Yujie, et al. Analyzing the Knowledge Internalization Process of Interdisciplinary Field Based on Citation Content[J]. Information Science, 2023, 41(2): 118-125.)
[37] 李纲, 唐晶, 毛进, 等. 基于演化事件探测的学科领域科研社群演化特征研究——以图书馆学情报学为例[J]. 图书情报工作, 2021, 65(17): 79-90.
doi: 10.13266/j.issn.0252-3116.2021.17.008
[37] (Li Gang, Tang Jing, Mao Jin, et al. Research on the Evolution Characteristics of Scientific Research Communities in Subject Fields Based on Evolutionary Event Detection-An Example of LIS[J]. Library and Information Service, 2021, 65(17): 79-90.)
doi: 10.13266/j.issn.0252-3116.2021.17.008
[38] 胡昌平, 陈果. 层次视角下概念知识网络的三元关系形态研究[J]. 图书情报工作, 2014, 58(4): 11-16.
[38] (Hu Changping, Chen Guo. Research on Ternary Relationship of the Conceptual Knowledge Network from the Hierarchy Perspective[J]. Library and Information Service, 2014, 58(4): 11-16.)
[1] 贺国秀, 任佳渝, 李宗耀, 林晨曦, 蔚海燕. 以可解释工具重探基于深度学习的谣言检测*[J]. 数据分析与知识发现, 2024, 8(4): 1-13.
[2] 聂卉, 吴晓燕. 结合梯度提升树算法与可解释机器学习模型SHAP的抑郁症影响因素研究*[J]. 数据分析与知识发现, 2024, 8(3): 41-52.
[3] 张云秋, 黄麒霏, 朱祥. 基于关系融合和双向扩散模型的药物与靶标关系预测方法研究*[J]. 数据分析与知识发现, 2024, 8(2): 155-167.
[4] 刘天畅, 王雷, 朱庆华. 基于SHAP解释方法的智慧居家养老服务平台用户流失预测研究*[J]. 数据分析与知识发现, 2024, 8(1): 40-54.
[5] 刘智锋, 王继民. 可解释机器学习在信息资源管理领域的应用研究综述*[J]. 数据分析与知识发现, 2024, 8(1): 16-29.
[6] 徐晨, 张巍. 不平衡数据背景下基于文本线索的公益众筹欺诈项目检测*[J]. 数据分析与知识发现, 2023, 7(9): 125-135.
[7] 韦华楠, 雷鸣, 汪雪锋, 余音. 基础研究资助导向识别及演化分析:以NSF为例[J]. 数据分析与知识发现, 2023, 7(5): 10-20.
[8] 林伟振, 刘洪伟, 陈燕君, 温展明, 易闽琦. 基于在线评论的顾客满意度研究——以健康监测穿戴产品为例*[J]. 数据分析与知识发现, 2023, 7(5): 145-154.
[9] 蒋林甫, 袁贞明, 张邢炜, 姜华强, 孙晓燕. 基于PCHD-TabNet的十年冠心病预测*[J]. 数据分析与知识发现, 2023, 7(5): 133-144.
[10] 吕琦, 上官燕红, 张琳, 黄颖. 基于文本内容自动分类的跨学科测度研究*[J]. 数据分析与知识发现, 2023, 7(4): 56-67.
[11] 曲宗希, 沙勇忠, 李雨桐. 基于灰狼优化与多机器学习的重大传染病集合预测研究——以COVID-19疫情为例*[J]. 数据分析与知识发现, 2022, 6(8): 122-133.
[12] 赵杨, 严周周, 沈棋琦, 李钟航. 基于机器学习的医疗健康APP隐私政策合规性研究*[J]. 数据分析与知识发现, 2022, 6(5): 112-126.
[13] 王露, 乐小虬. 科技论文引用内容分析研究进展[J]. 数据分析与知识发现, 2022, 6(4): 1-15.
[14] 王若佳, 严承希, 郭凤英, 王继民. 基于用户画像的在线健康社区用户流失预测研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 80-92.
[15] 吴金红, 穆克亮. 国际期刊异常行为的自动识别与预警研究*[J]. 数据分析与知识发现, 2022, 6(2/3): 385-395.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn