Please wait a minute...
Data Analysis and Knowledge Discovery  2023, Vol. 7 Issue (1): 89-101    DOI: 10.11925/infotech.2096-3467.2022.0111
Current Issue | Archive | Adv Search |
Detecting Research Frontiers Based on Twitter
Wuxihong Jiangbulati1,2,Wang Xiaomei1(),Chen Ting1,3,4
1Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
2School of Public Policy and Management, University of Chinese Academy of Sciences, Beijing 100049, China
3National Science Library, Chinese Academy of Sciences, Beijing 100190, China
4Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (799 KB)   HTML ( 18
Export: BibTeX | EndNote (RIS)      

[Objective] This paper designs a Twitter-based method to identify emerging research topics, aiming to identify the latest developments of a specific discipline. [Methods] First, we analyzed the principles and practices of using Twitter to identify research topics. Then, we proposed a monitoring index system based on the influence of scholars and contents. Third, we conducted an empirical analysis in the field of natural language processing (NLP). [Results] The detection model is able to identify emerging research topics in NLP in a timely manner. Compared with reports on NLP status quo, 8 of the 13 research frontiers were successfully identified. [Limitations] Due to the open nature of social media, it is difficult to completely avoid subject-independent noise contents during dataset construction. [Conclusions] The proposed method is based on the scholarly UGC contents on Twitter, which is a feasible and effective way to detect the research frontiers of the discipline in a timely and forward-looking way.

Key wordsResearch Frontiers      Social Media      Twitter      Domain Dataset Construction     
Received: 13 February 2022      Published: 16 February 2023
ZTFLH:  G350  
Fund:Project of Literature and Information Capacity Building, Chinese Academy of Sciences(GHJ-QBZX-2021-04)
Corresponding Authors: Wang Xiaomei,ORCID:0000-0002-9895-1511,E-mail:。   

Cite this article:

Wuxihong Jiangbulati, Wang Xiaomei, Chen Ting. Detecting Research Frontiers Based on Twitter. Data Analysis and Knowledge Discovery, 2023, 7(1): 89-101.

URL:     OR

A Twitter-based Framework for Identifying Research Frontiers
一级指标 二级指标 三级指标
内容影响力指标 价值影响力
学者影响力指标 认同度 受众关注度
互动性 发布推文数
Twitter Research Frontiers Detecting Indicator System
价值影响力 传播影响力 新颖性
价值影响力 1 1/3 3
传播影响力 3 1 5
新颖性 1/3 1/5 1
最大特征值 3.039 CR值 0.037
Content Impact Indexes Judgment Matrix and Consistency Test Results
认同度 活跃度 互动性
认同度 1 6 3
活跃度 1/6 1 1/4
互动性 1/3 4 1
最大特征值 3.054 CR值 0.052
Scholars’ Influence Indexes Judgment Matrix and Consistency Test Results
受众关注度 外界价值影响力 外界传播影响力
受众关注度 1 1/3 1/5
外界价值影响力 3 1 1/4
外界传播影响力 5 4 1
最大特征值 3.087 CR值 0.084
Recognition Indexes Judgment Matrix and Consistency Test Results
一级指标 权重 二级指标 权重 三级指标 权重
价值影响力 0.260 5
0.634 5 传播影响力 0.633 3
新颖性 0.106 2
认同度 0.639 3 受众关注度
0.103 8
0.231 1
0.665 1
0.365 5 活跃度 0.087 0
互动性 0.273 7 发布推文数
0.500 0
0.500 0
Model for Detecting Research Frontiers
Monthly Number of Tweet Data Related to NLP
指标 指标属性 Avg. Max. Min. Std.
价值影响力 103.86 42 082 0 268.76
传播影响力 24.78 11 069 0 71.82
新颖性 322.98 725 3 202.28
受众关注度 8 019.18 393 261 66 29 344.56
外界价值影响力 115.24 2 387.80 0 171.15
外界传播影响力 22.72 367.90 0 35.64
活跃度 8.60 111.80 12.98
发布推文数 283.83 1 766 1 348.59
评论数 162.71 1 599 0 230.16
关注数量* 654.30 6 049 0 740.84
Statistics of the Element Scores of the Emerging Frontier Component Indicators
发布日期 新兴研究内容 学者影响力分数 内容影响力分数 研究前沿分数
1 2020-7-13 This is mind blowing. With GPT-3, I built a layout generator where you just describe any layout you want, and it generates the JSX code for you. 2.09 3.48 2.97
2 2021-6-29 Meet GitHub Copilot - your AI pair programmer. Powered by OpenAI Codex: a large neural network that can code pretty well. 2.09 3.43 2.94
3 2021-12-8 The ongoing consolidation in AI is incredible. Thread: When I started decade ago vision, speech, natural language, reinforcement learning, etc. were completely separate; You couldn’t read papers across areas - the approaches were completely different, often not even ML based.
Every ML model is converging into a Transformer that can basically be defined in 200 lines of PyTorch code. This is a great thread, Models designed to generate words (transformers) &model language (BERT) were reused in #AlphaFold to solve the protein folding problem, mapping a bunch of letters, to 3D coordinates.
2.29 2.80 2.61
4 2021-11-18 Our new AI system learned speech recognition in English with *zero* speech to text training data: researchers just gave it lots of audio, and it figured out what the words were. But it goes way beyond that - it learned Swahili too!
Wav2vec enables AI systems learn a language based on audio recordings with no matching text — as we’ve said before it’s a game changer for building speech AI that works in all languages, not just the dominant ones.
2.35 2.51 2.45
5 2021-9-17 New benchmark testing if models like GPT3 are truthful (= avoid generating false answers). We find that models fail and they imitate human misconceptions. Larger models (with more params) do worse! 2.29 2.36 2.34
6 2020-8-3 Why You Should Do NLP Beyond English:7000+ languages are spoken around the world but NLP research has mostly focused on English. In this post, I give an overview of why you should work on languages other than English. 2.23 2.37 2.32
7 2021-1-6 We’ve developed two neural networks which have learned by associating text and images. CLIP maps images into categories described in text, and DALL-E creates new images. A step toward systems with deeper understanding of the world.
@OpenAI is exploring the multimodal direction and discover how far we push the ability to learn vision from language supervision in massive data+compute scenarios! CLIP: maps images to categories by taking class names as inputs; beats the original RN50 on ImageNet zero-shot(!), while being far more robust on unusual images; DALL-E: text2im that works for a wide variety of sentences
2.01 2.54 2.35
8 2020-2-11 Microsoft researchers and engineers release Zero Redundancy Optimizer (ZeRO) and DeepSpeed library, a system able to train 100-billion-parameter deep learning models. Learn about this breakthrough and how it led to Turing Natural Language Generation. 2.11 2.37 2.28
9 2021-8-20 We use big language models to synthesize computer programs, execute programs, solve math problems, and dialog with humans to iteratively refine code.The models can solve 60% and 81% of the programming and math problems, respectively. 2.33 2.22 2.26
10 2021-9-10 We’re introducing GSLM, the first language model that breaks free completely of the dependence on text for training. This “textless NLP” approach learns to generate expressive speech using only raw audio recordings as input.
There is lot more to natural languages than text: tone, accent, expression, prosody, timbre, pitch..... Textless NLP represents speech through a stream of discrete tokens, automatically learned through self-supervised learning, directly fed with raw speech waveform! A new era.
2.35 2.19 2.25
Results of the Research Frontiers
解读 发布者 非文献形式 解读 发布者 非文献形式
1 GPT-3在文本生成中的应用 Open AI研究员 11 宣布实验室的新研究重点:开发支持协作构建大型模型的工具 北卡罗来纳大学教授
2 代码生成:Codex首次被集成到GitHub Copilot中 Open AI研究员 12 以人类偏好替代自动化评测方法(如ROUGE、BLUE)为训练目标,用人类反馈作为奖励进行强化学习,在文本摘要任务中的表现全面超越人类 Open AI
3 人工智能模型在各子领域的通用泛化趋势 特斯拉人工智能总监 13 测试GPT-3、GPT-Neo在编程中的应用 Hugging Face研究员
4 Wav2vec-U:适用于多语言且无需语音转录数据的语音辨识模型 Meta 首席技术官 14 BioMed Explorer:NLP模型在生物领域的应用 Google AI (Research and Health) 研究员
5 TruthfulQA:测试语言模型回答开放式问题的性能 牛津大学研究员 15 大型语言模型的现状及未来 斯坦福大学研究员
6 呼吁关注NLP模型在多语言中的应用 Google AI 研究员 16 nlp开源库:提供语料管理及测评功能 Hugging Face研究员
7 多模态文本与图像神经网络CLIP & DALL·E,用于文本到图像生成 Open AI 17 To:通过结合prompt+多任务学习,在下游多任务Zero-Shot性能测试中优于GPT-3 Big Science
8 ZeRO & DeepSpeed开源库:能够训练含1000亿个参数的深度学习模型的系统 Microsoft Research 18 大规模语言模型依然在进展之中,能力也在继续增强 DeepMind
9 用大型语言模型合成程序 Google Brain研究员 19 REALM:一种语言预训练模型的新范例,用知识检索器增强预训练语言模型 Google AI
10 新的语言模型训练方式GSLM,从语音开始训练,无需标签或大规模数据,让每个语言都能享受大规模语言模型的便利 Meta AI 20 呼吁重点关注潜在语言现象而非专注于算法和模型的提升 巴伊兰大学教授
Interpretation of the Content of Research Frontiers
ML及NLP研究前沿 是否NLP 本研究是否识别 对应本研究识别结果编号
通用预训练模型 3、4
大规模多任务学习 8、15、17、18
提示(Prompting) 17
更高效的架构和更高效的微调方法 -
基准测试 5
条件图像生成 7
与自然科学结合的机器学习 - -*
程序合成 2、9、13
大型预训练模型的有害偏见 -
检索增广 19
时序自适应 -
数据的重要性 16
元学习 - -
The Matching of ML and NLP Research Frontiers with the Results Identified in This Paper
[1] 刘小平, 冷伏海, 李泽霞. 国际科技前沿分析的方法和途径[J]. 图书情报工作, 2012, 56(12): 60-65.
[1] ( Liu Xiaoping, Leng Fuhai, Li Zexia. Methods and Approaches of International S&T Front Analysis[J]. Library and Information Service, 2012, 56(12): 60-65.)
[2] 罗瑞, 许海云, 董坤. 领域前沿识别方法综述[J]. 图书情报工作, 2018, 62(23): 119-131.
doi: 10.13266/j.issn.0252-3116.2018.23.015
[2] ( Luo Rui, Xu Haiyun, Dong Kun. A Review of the Main Recognition Methods of Frontier Research[J]. Library and Information Service, 2018, 62(23): 119-131.)
doi: 10.13266/j.issn.0252-3116.2018.23.015
[3] 段庆锋, 潘小换. 利用社交媒体识别学科新兴主题研究[J]. 情报学报, 2017, 36(12): 1216-1223.
[3] ( Duan Qingfeng, Pan Xiaohuan. Identification of Emerging Topics in Science Using Social Media[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(12): 1216-1223.)
[4] 李小涛, 李博龙, 夏小青, 等. 基于Altmetrics的国际图书情报学领域前沿分析[J]. 中华医学图书情报杂志, 2021, 30(10): 36-42.
[4] ( Li Xiaotao, Li Bolong, Xia Xiaoqing, et al. Altmetrics-Based Frontiers in Foreign Studies on Library and Information Science[J]. Chinese Journal of Medical Library and Information Science, 2021, 30(10): 36-42.)
[5] Li X, Xie Q Q, Jiang J J, et al. Identifying and Monitoring the Development Trends of Emerging Technologies Using Patent Analysis and Twitter Data Mining: The Case of Perovskite Solar Cell Technology[J]. Technological Forecasting and Social Change, 2019, 146: 687-705.
doi: 10.1016/j.techfore.2018.06.004
[6] Zeng M A. Foresight by Online Communities—The Case of Renewable Energies[J]. Technological Forecasting and Social Change, 2018, 129: 27-42.
doi: 10.1016/j.techfore.2018.01.016
[7] Twitter. About Your Activity Dashboard[EB/OL]. [2022-05-05].
[8] Twitter. How to Calculate Twitter Impressions and Reach[EB/OL]. [2022-05-05].
[9] Altmetric. Defining a Mention[EB/OL]. [2022-05-05].
[10] Peoples B K, Midway S R, Sackett D, et al. Twitter Predicts Citation Rates of Ecological Research[J]. PLoS One, 2016, 11(11): e0166570.
doi: 10.1371/journal.pone.0166570
[11] Luc J G Y, Archer M A, Arora R C, et al. Does Tweeting Improve Citations? One-Year Results from the TSSMN Prospective Randomized Trial[J]. The Annals of Thoracic Surgery, 2021, 111(1): 296-300.
doi: 10.1016/j.athoracsur.2020.04.065
[12] Pemmaraju N, Utengen A, Gupta V, et al. Social Media and Myeloproliferative Neoplasms(MPN): Analysis of Advanced Metrics from the First Year of a New Twitter Community: #MPNSM[J]. Current Hematologic Malignancy Reports, 2016, 11(6): 456-461.
doi: 10.1007/s11899-016-0341-2 pmid: 27492118
[13] Xia F, Su X Y, Wang W, et al. Bibliographic Analysis of Nature Based on Twitter and Facebook Altmetrics Data[J]. PLoS One, 2016, 11(12): e0165997.
doi: 10.1371/journal.pone.0165997
[14] 王超, 马铭, 李思思, 等. Altmetrics视角下颠覆性技术的社会影响力探测研究[J]. 情报理论与实践, 2022, 45(1): 93-104.
[14] ( Wang Chao, Ma Ming, Li Sisi, et al. A Study on the Social Impact of Disruptive Technologies Using Altmetrics Indicators[J]. Information Studies: Theory & Application, 2022, 45(1): 93-104.)
[15] Fang Z. Towards Advanced Social Media Metrics: Understanding the Diversity and Characteristics of Twitter Interactions Around Science[D]. Leiden: Leiden University, 2021.
[16] Sugimoto C. “Attention is Not Impact” and Other Challenges for Altmetrics[OL]. [2022-05-05].
[17] Haunschild R, Bornmann L, Potnis D, et al. Investigating Dissemination of Scientific Information on Twitter: A Study of Topic Networks in Opioid Publications[J]. Quantitative Science Studies, 2021, 2(4): 1486-1510.
doi: 10.1162/qss_a_00168
[18] Daneshjou R, Shmuylovich L, Grada A, et al. Research Techniques Made Simple: Scientific Communication Using Twitter[J]. Journal of Investigative Dermatology, 2021, 141(7): 1615-1621.e1.
doi: 10.1016/j.jid.2021.03.026 pmid: 34167718
[19] Holmberg K, Thelwall M. Disciplinary Differences in Twitter Scholarly Communication[J]. Scientometrics, 2014, 101(2): 1027-1042.
doi: 10.1007/s11192-014-1229-3
[20] Fang Z C, Costas R, Tian W C, et al. An Extensive Analysis of the Presence of Altmetric Data for Web of Science Publications Across Subject Fields and Research Topics[J]. Scientometrics, 2020, 124(3): 2519-2549.
doi: 10.1007/s11192-020-03564-9
[21] Fang Z C, Costas R. Studying the Accumulation Velocity of Altmetric Data Tracked by[J]. Scientometrics, 2020, 123(2): 1077-1101.
doi: 10.1007/s11192-020-03405-9
[22] Ortega J L. The Life Cycle of Altmetric Impact: A Longitudinal Study of Six Metrics from PlumX[J]. Journal of Informetrics, 2018, 12(3): 579-589.
doi: 10.1016/j.joi.2018.06.001
[23] Van Noorden R. Online Collaboration: Scientists and the Social Network[J]. Nature, 2014, 512(7513): 126-129.
doi: 10.1038/512126a
[24] Breitzman A, Thomas P. The Emerging Clusters Model: A Tool for Identifying Emerging Technologies across Multiple Patent Systems[J]. Research Policy, 2015, 44(1): 195-205.
doi: 10.1016/j.respol.2014.06.006
[25] Fang Z C, Dudek J, Costas R. The Stability of Twitter Metrics: A Study on Unavailable Twitter Mentions of Scientific Publications[J]. Journal of the Association for Information Science and Technology, 2020, 71(12): 1455-1469.
doi: 10.1002/asi.24344
[26] Cesare N, Grant C, Nguyen Q, et al. Detection of User Demographics on Social Media: A Review of Methods and Recommendations for Best Practices[OL]. arXiv Preprint, arXiv: 1702.01807.
[27] Wen X D, Lin Y R, Trattner C, et al. Twitter in Academic Conferences: Usage, Networking and Participation over Time[C]// Proceedings of the 25th ACM Conference on Hypertext and Social Media. 2014: 285-290.
[28] Priem J, Hemminger B H. Scientometrics 2.0: New Metrics of Scholarly Impact on the Social Web[J]. First Monday, 2010. DOI:
[29] Ke Q, Ahn Y Y, Sugimoto C R. A Systematic Identification and Analysis of Scientists on Twitter[J]. PLoS One, 2017, 12(4): e0175368.
doi: 10.1371/journal.pone.0175368
[30] Schmitt M, Jäschke R. What do Computer Scientists Tweet? Analyzing the Link-Sharing Practice on Twitter[J]. PLoS One, 2017, 12(6): e0179630.
doi: 10.1371/journal.pone.0179630
[31] Vainio J, Holmberg K. Highly Tweeted Science Articles: Who Tweets Them? An Analysis of Twitter User Profile Descriptions[J]. Scientometrics, 2017, 112(1): 345-366.
doi: 10.1007/s11192-017-2368-0
[32] ResearchGate. RG Score[EB/OL]. [2022-05-05].
[33] 朱郭峰, 杨彦, 周竹荣, 等. 基于领域的微博用户影响力计算方法[J]. 西南大学学报(自然科学版), 2014, 36(3): 145-151.
[33] Zhu Guofeng, Yang Yan, Zhou Zhurong, et al. A Method of Calculating the Influence of Micro-Blog Users Based on Domain[J]. Journal of Southwest University(Natural Science Edition), 2014, 36(3): 145-151.)
[34] Díaz-Faes A A, Bowman T D, Costas R. Towards a Second Generation of ‘Social Media Metrics’: Characterizing Twitter Communities of Attention Around Science[J]. PLoS One, 2019, 14(5): e0216408.
doi: 10.1371/journal.pone.0216408
[35] 兰月新. 突发事件网络舆情安全评估指标体系构建[J]. 图书情报工作, 2011, 55(S1): 317-319.
[35] ( Lan Yuexin. On Construction of Emergency Network Safety Evaluation Index System[J]. Library and Information Service, 2011, 55(S1): 317-319.)
[36] Ruder S. ML and NLP Research Highlights of 2021[EB/OL]. [2022-02-23].
[1] Li Xueli, Huang Linghe, Chen Jiaxing. Influencing Factors of Social Media Users’ Intentions to Disclose Privacy[J]. 数据分析与知识发现, 2022, 6(4): 97-107.
[2] Li Gang, Zhang Ji, Mao Jin. Social Media Image Classification for Emergency Portrait[J]. 数据分析与知识发现, 2022, 6(2/3): 67-79.
[3] Feng Xiaodong, Hui Kangxin. Topic Clustering for Social Media Texts with Heterogeneous Graph Neural Networks[J]. 数据分析与知识发现, 2022, 6(10): 9-19.
[4] An Lu, Xu Manting. Measuring Online Trust in Government Microblogs in Public Health Emergencies[J]. 数据分析与知识发现, 2022, 6(1): 55-68.
[5] Xie Hao,Mao Jin,Li Gang. Sentiment Classification of Image-Text Information with Multi-Layer Semantic Fusion[J]. 数据分析与知识发现, 2021, 5(6): 103-114.
[6] Ma Yingxue,Zhao Jichang. Patterns and Evolution of Public Opinion on Weibo During Natural Disasters: Case Study of Typhoons and Rainstorms[J]. 数据分析与知识发现, 2021, 5(6): 66-79.
[7] Zhang Guobiao,Li Jie. Detecting Social Media Fake News with Semantic Consistency Between Multi-model Contents[J]. 数据分析与知识发现, 2021, 5(5): 21-29.
[8] Chang Chengyang,Wang Xiaodong,Zhang Shenglei. Polarity Analysis of Dynamic Political Sentiments from Tweets with Deep Learning Method[J]. 数据分析与知识发现, 2021, 5(3): 121-131.
[9] Liu Qian, Li Chenliang. A Survey of Topic Evolution on Social Media[J]. 数据分析与知识发现, 2020, 4(8): 1-14.
[10] Li Gang, Guan Weidong, Ma Yaxue, Mao Jin. Predicting Social Media Visibility of Scholarly Articles[J]. 数据分析与知识发现, 2020, 4(8): 63-74.
[11] Ying Tan,Jin Zhang,Lixin Xia. A Survey of Sentiment Analysis on Social Media[J]. 数据分析与知识发现, 2020, 4(1): 1-11.
[12] Lin Wang,Ke Wang,Jiang Wu. Public Opinion Propagation and Evolution of Public Health Emergencies in Social Media Era: A Case Study of 2018 Vaccine Event[J]. 数据分析与知识发现, 2019, 3(4): 42-52.
[13] Xiwei Wang,Duo Wang,Qingxiao Zheng,Ya’nan Wei. Information Interaction Between User and Enterprise in Online Brand Community: A Study of Virtual Reality Industry[J]. 数据分析与知识发现, 2019, 3(3): 83-94.
[14] Xiaoxiao Zhu,Zunqi Yang,Jing Liu. Construction of an Adverse Drug Reaction Extraction Model Based on Bi-LSTM and CRF[J]. 数据分析与知识发现, 2019, 3(2): 90-97.
[15] Cuiqing Jiang,Yibo Guo,Yao Liu. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938