1Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China 2School of Public Policy and Management, University of Chinese Academy of Sciences, Beijing 100049, China 3National Science Library, Chinese Academy of Sciences, Beijing 100190, China 4Department of Library, Information and Archives Management, University of Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper designs a Twitter-based method to identify emerging research topics, aiming to identify the latest developments of a specific discipline. [Methods] First, we analyzed the principles and practices of using Twitter to identify research topics. Then, we proposed a monitoring index system based on the influence of scholars and contents. Third, we conducted an empirical analysis in the field of natural language processing (NLP). [Results] The detection model is able to identify emerging research topics in NLP in a timely manner. Compared with reports on NLP status quo, 8 of the 13 research frontiers were successfully identified. [Limitations] Due to the open nature of social media, it is difficult to completely avoid subject-independent noise contents during dataset construction. [Conclusions] The proposed method is based on the scholarly UGC contents on Twitter, which is a feasible and effective way to detect the research frontiers of the discipline in a timely and forward-looking way.
江布拉提·吾喜洪, 王小梅, 陈挺. 基于Twitter的学科领域研究前沿探测研究*[J]. 数据分析与知识发现, 2023, 7(1): 89-101.
Wuxihong Jiangbulati, Wang Xiaomei, Chen Ting. Detecting Research Frontiers Based on Twitter. Data Analysis and Knowledge Discovery, 2023, 7(1): 89-101.
This is mind blowing. With GPT-3, I built a layout generator where you just describe any layout you want, and it generates the JSX code for you.
2.09
3.48
2.97
2
2021-6-29
Meet GitHub Copilot - your AI pair programmer. Powered by OpenAI Codex: a large neural network that can code pretty well.
2.09
3.43
2.94
3
2021-12-8
The ongoing consolidation in AI is incredible. Thread: When I started decade ago vision, speech, natural language, reinforcement learning, etc. were completely separate; You couldn’t read papers across areas - the approaches were completely different, often not even ML based. Every ML model is converging into a Transformer that can basically be defined in 200 lines of PyTorch code. This is a great thread, Models designed to generate words (transformers) &model language (BERT) were reused in #AlphaFold to solve the protein folding problem, mapping a bunch of letters, to 3D coordinates.
2.29
2.80
2.61
4
2021-11-18
Our new AI system learned speech recognition in English with *zero* speech to text training data: researchers just gave it lots of audio, and it figured out what the words were. But it goes way beyond that - it learned Swahili too! Wav2vec enables AI systems learn a language based on audio recordings with no matching text — as we’ve said before it’s a game changer for building speech AI that works in all languages, not just the dominant ones.
2.35
2.51
2.45
5
2021-9-17
New benchmark testing if models like GPT3 are truthful (= avoid generating false answers). We find that models fail and they imitate human misconceptions. Larger models (with more params) do worse!
2.29
2.36
2.34
6
2020-8-3
Why You Should Do NLP Beyond English:7000+ languages are spoken around the world but NLP research has mostly focused on English. In this post, I give an overview of why you should work on languages other than English.
2.23
2.37
2.32
7
2021-1-6
We’ve developed two neural networks which have learned by associating text and images. CLIP maps images into categories described in text, and DALL-E creates new images. A step toward systems with deeper understanding of the world. @OpenAI is exploring the multimodal direction and discover how far we push the ability to learn vision from language supervision in massive data+compute scenarios! CLIP: maps images to categories by taking class names as inputs; beats the original RN50 on ImageNet zero-shot(!), while being far more robust on unusual images; DALL-E: text2im that works for a wide variety of sentences
2.01
2.54
2.35
8
2020-2-11
Microsoft researchers and engineers release Zero Redundancy Optimizer (ZeRO) and DeepSpeed library, a system able to train 100-billion-parameter deep learning models. Learn about this breakthrough and how it led to Turing Natural Language Generation.
2.11
2.37
2.28
9
2021-8-20
We use big language models to synthesize computer programs, execute programs, solve math problems, and dialog with humans to iteratively refine code.The models can solve 60% and 81% of the programming and math problems, respectively.
2.33
2.22
2.26
10
2021-9-10
We’re introducing GSLM, the first language model that breaks free completely of the dependence on text for training. This “textless NLP” approach learns to generate expressive speech using only raw audio recordings as input. There is lot more to natural languages than text: tone, accent, expression, prosody, timbre, pitch..... Textless NLP represents speech through a stream of discrete tokens, automatically learned through self-supervised learning, directly fed with raw speech waveform! A new era.
( Liu Xiaoping, Leng Fuhai, Li Zexia. Methods and Approaches of International S&T Front Analysis[J]. Library and Information Service, 2012, 56(12): 60-65.)
( Luo Rui, Xu Haiyun, Dong Kun. A Review of the Main Recognition Methods of Frontier Research[J]. Library and Information Service, 2018, 62(23): 119-131.)
doi: 10.13266/j.issn.0252-3116.2018.23.015
( Duan Qingfeng, Pan Xiaohuan. Identification of Emerging Topics in Science Using Social Media[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(12): 1216-1223.)
( Li Xiaotao, Li Bolong, Xia Xiaoqing, et al. Altmetrics-Based Frontiers in Foreign Studies on Library and Information Science[J]. Chinese Journal of Medical Library and Information Science, 2021, 30(10): 36-42.)
[5]
Li X, Xie Q Q, Jiang J J, et al. Identifying and Monitoring the Development Trends of Emerging Technologies Using Patent Analysis and Twitter Data Mining: The Case of Perovskite Solar Cell Technology[J]. Technological Forecasting and Social Change, 2019, 146: 687-705.
doi: 10.1016/j.techfore.2018.06.004
[6]
Zeng M A. Foresight by Online Communities—The Case of Renewable Energies[J]. Technological Forecasting and Social Change, 2018, 129: 27-42.
doi: 10.1016/j.techfore.2018.01.016
[7]
Twitter. About Your Activity Dashboard[EB/OL]. [2022-05-05]. https://help.twitter.com/en/managing-your-account/using-the-tweet-activity-dashboard.
[8]
Twitter. How to Calculate Twitter Impressions and Reach[EB/OL]. [2022-05-05]. https://www.tweetbinder.com/blog/twitter-impressions.
[9]
Altmetric. Defining a Mention[EB/OL]. [2022-05-05]. https://help.altmetric.com/support/solutions/articles/6000240575-defining-a-mention.
[10]
Peoples B K, Midway S R, Sackett D, et al. Twitter Predicts Citation Rates of Ecological Research[J]. PLoS One, 2016, 11(11): e0166570.
doi: 10.1371/journal.pone.0166570
[11]
Luc J G Y, Archer M A, Arora R C, et al. Does Tweeting Improve Citations? One-Year Results from the TSSMN Prospective Randomized Trial[J]. The Annals of Thoracic Surgery, 2021, 111(1): 296-300.
doi: 10.1016/j.athoracsur.2020.04.065
[12]
Pemmaraju N, Utengen A, Gupta V, et al. Social Media and Myeloproliferative Neoplasms(MPN): Analysis of Advanced Metrics from the First Year of a New Twitter Community: #MPNSM[J]. Current Hematologic Malignancy Reports, 2016, 11(6): 456-461.
doi: 10.1007/s11899-016-0341-2
pmid: 27492118
[13]
Xia F, Su X Y, Wang W, et al. Bibliographic Analysis of Nature Based on Twitter and Facebook Altmetrics Data[J]. PLoS One, 2016, 11(12): e0165997.
doi: 10.1371/journal.pone.0165997
( Wang Chao, Ma Ming, Li Sisi, et al. A Study on the Social Impact of Disruptive Technologies Using Altmetrics Indicators[J]. Information Studies: Theory & Application, 2022, 45(1): 93-104.)
[15]
Fang Z. Towards Advanced Social Media Metrics: Understanding the Diversity and Characteristics of Twitter Interactions Around Science[D]. Leiden: Leiden University, 2021.
[16]
Sugimoto C. “Attention is Not Impact” and Other Challenges for Altmetrics[OL]. [2022-05-05]. https://www.wiley.com/en-us/network/publishing/research-publishing/promoting-your-article/attention-is-not-impact-and-other-challenges-for-altmetrics.
[17]
Haunschild R, Bornmann L, Potnis D, et al. Investigating Dissemination of Scientific Information on Twitter: A Study of Topic Networks in Opioid Publications[J]. Quantitative Science Studies, 2021, 2(4): 1486-1510.
doi: 10.1162/qss_a_00168
[18]
Daneshjou R, Shmuylovich L, Grada A, et al. Research Techniques Made Simple: Scientific Communication Using Twitter[J]. Journal of Investigative Dermatology, 2021, 141(7): 1615-1621.e1.
doi: 10.1016/j.jid.2021.03.026
pmid: 34167718
[19]
Holmberg K, Thelwall M. Disciplinary Differences in Twitter Scholarly Communication[J]. Scientometrics, 2014, 101(2): 1027-1042.
doi: 10.1007/s11192-014-1229-3
[20]
Fang Z C, Costas R, Tian W C, et al. An Extensive Analysis of the Presence of Altmetric Data for Web of Science Publications Across Subject Fields and Research Topics[J]. Scientometrics, 2020, 124(3): 2519-2549.
doi: 10.1007/s11192-020-03564-9
[21]
Fang Z C, Costas R. Studying the Accumulation Velocity of Altmetric Data Tracked by Altmetric.com[J]. Scientometrics, 2020, 123(2): 1077-1101.
doi: 10.1007/s11192-020-03405-9
[22]
Ortega J L. The Life Cycle of Altmetric Impact: A Longitudinal Study of Six Metrics from PlumX[J]. Journal of Informetrics, 2018, 12(3): 579-589.
doi: 10.1016/j.joi.2018.06.001
[23]
Van Noorden R. Online Collaboration: Scientists and the Social Network[J]. Nature, 2014, 512(7513): 126-129.
doi: 10.1038/512126a
[24]
Breitzman A, Thomas P. The Emerging Clusters Model: A Tool for Identifying Emerging Technologies across Multiple Patent Systems[J]. Research Policy, 2015, 44(1): 195-205.
doi: 10.1016/j.respol.2014.06.006
[25]
Fang Z C, Dudek J, Costas R. The Stability of Twitter Metrics: A Study on Unavailable Twitter Mentions of Scientific Publications[J]. Journal of the Association for Information Science and Technology, 2020, 71(12): 1455-1469.
doi: 10.1002/asi.24344
[26]
Cesare N, Grant C, Nguyen Q, et al. Detection of User Demographics on Social Media: A Review of Methods and Recommendations for Best Practices[OL]. arXiv Preprint, arXiv: 1702.01807.
[27]
Wen X D, Lin Y R, Trattner C, et al. Twitter in Academic Conferences: Usage, Networking and Participation over Time[C]// Proceedings of the 25th ACM Conference on Hypertext and Social Media. 2014: 285-290.
[28]
Priem J, Hemminger B H. Scientometrics 2.0: New Metrics of Scholarly Impact on the Social Web[J]. First Monday, 2010. DOI: https://doi.org/10.5210/fm.v15i7.2874.
doi: https://doi.org/10.5210/fm.v15i7.2874
[29]
Ke Q, Ahn Y Y, Sugimoto C R. A Systematic Identification and Analysis of Scientists on Twitter[J]. PLoS One, 2017, 12(4): e0175368.
doi: 10.1371/journal.pone.0175368
[30]
Schmitt M, Jäschke R. What do Computer Scientists Tweet? Analyzing the Link-Sharing Practice on Twitter[J]. PLoS One, 2017, 12(6): e0179630.
doi: 10.1371/journal.pone.0179630
[31]
Vainio J, Holmberg K. Highly Tweeted Science Articles: Who Tweets Them? An Analysis of Twitter User Profile Descriptions[J]. Scientometrics, 2017, 112(1): 345-366.
doi: 10.1007/s11192-017-2368-0
Zhu Guofeng, Yang Yan, Zhou Zhurong, et al. A Method of Calculating the Influence of Micro-Blog Users Based on Domain[J]. Journal of Southwest University(Natural Science Edition), 2014, 36(3): 145-151.)
[34]
Díaz-Faes A A, Bowman T D, Costas R. Towards a Second Generation of ‘Social Media Metrics’: Characterizing Twitter Communities of Attention Around Science[J]. PLoS One, 2019, 14(5): e0216408.
doi: 10.1371/journal.pone.0216408