Visualizing Appropriation of Research Funding with t-SNE Algorithm
Chen Ting1,2,3(), Li Guopeng3, Wang Xiaomei3
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China 2University of Chinese Academy of Sciences, Beijing 100049, China 3Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
[Objective] This paper designs a visualization method for the appropriation of research funding, aiming to more effectively present the locations of funded projects. [Methods] First, we retrieved 4,669 funded projects from NSF’s Information and Intelligent System. Then, we added topic tags to these projects using clustering algorithm and human interpretation. Third, we extracted the high-dimensional text features for the application documents with TF-IDF model and LSA model. Fourth, we used the t-SNE algorithm to project high-dimensional features into two or three-dimensional spaces for visualization. Finally, we examined the visualization results with pre-classified topic labels. [Results] The proposed method created maps of funded projects, in both two-dimensional or three-dimensional spaces. [Limitations] The algorithm parameters need to be adjusted manually. More research is needed to evaluate the proposed method with documents of projects funded by other agencies. [Conclusions] The proposed method could generate maps for the funded projects, which is a helpful tool for scientific management.
Talley E M, Newman D, Mimno D, et al.Database of NIH Grants Using Machine-Learned Categories and Graphical Clustering[J]. Nature Methods, 2011, 8(6): 443-444.
doi: 10.1038/nmeth.1619
pmid: 21623347
(Chen Ting, Han Tao, Li Zexia, et al.Research on Comparison Method of Scientific Funding Layout——Take NSF and EUFP Grants for Instance[J]. New Technology of Library & Information Service, 2015(7-8): 89-96.)
(Chen Ting, Li Guopeng, Jiang Shan, et al.Past Decade of NSF Material Science:An Analysis of Layout and Trend of Funded Projects[J]. World Sci-Tech R&D, 2017, 39(5): 401-411.)
[4]
De-Miguel-Molina B, Cunningham S W, Palop F. Analyzing Funding Patterns and Their Evolution in Two Medical Research Topics[J]. International Journal of Innovation and Technology Management, 2017, 14(2). DOI: 10.1142/S0219877017400107.
doi: 10.1142/S0219877017400107
(Wang Wenjuan, Ma Jianxia.Topic Detection and Evolution Analysis of Research Project Based on LDA——A Case Study of Projects on Ocean Acidification Supported by NSF[J]. Journal of Intelligence, 2017, 36(7): 34-39.)
doi: 10.3969/j.issn.1002-1965.2017.07.007
[6]
Park J, Blume-Kohout M, Krestel R, et al.Analyzing NIH Funding Patterns over Time with Statistical Text Analysis[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016.
[7]
Liu S, Cao N, Lv H. Interactive Visual Analysis of the NSF Funding Information[C]//Proceedings of 2008 IEEE Pacific Visualization Symposium. DOI: 10.1109/PACIFICVIS.2008.4475475.
(Wang Xianwen, Liu Zeyuan, Hou Haiyan.Global Assessment of Science Funding and Funding Papers: A Study in Web of Science[J]. Studies in Science of Science, 2010, 28(1): 62-66.)
(Sun Jinwei, Liu Di, Wang Xianwen, et al.Science Funding and SCI Papers Output: A Comparative Analysis on 10 Countries[J]. Studies in Science of Science, 2013, 31(1): 36-42.)
doi: 10.3969/j.issn.1003-2053.2013.01.006
[10]
Auranen O, Nieminen M.University Research Funding and Publication Performance - An International Comparison[J]. Research Policy, 2010, 39(6): 822-834.
doi: 10.1016/j.respol.2010.03.003
[11]
Wang J, Shapira P.Funding Acknowledgement Analysis: An Enhanced Tool to Investigate Research Sponsorship Impacts: The Case of Nanotechnology[J]. Scientometrics, 2011, 87(3): 563-586.
doi: 10.1007/s11192-011-0362-5
[12]
Herr II B W, Talley E M, Burns G A P C, et al. The NIH Visual Browser: An Interactive Visualization of Biomedical Research[C]// Proceedings of the 13th International Conference on Information Visualisation. IEEE, 2009.
[13]
Takahiro K, Katsutaro W, Naoya M.Funding Map for Research Project Relationships Using Paragraph Vectors[C]// Proceedings of the 16th International Conference on Scientometrics&Informetrics (ISSI), Wuhan, China. 2017.
[14]
Abadi M, Agarwal A, Barham P, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[OL]. arXiv:1603.04467. 2016.
[15]
Salton G, Buckley C.Term-Weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management,1988, 24(5): 513-523.
doi: 10.1016/0306-4573(88)90021-0
[16]
Foltz P W.Latent Semantic Analysis for Text-based Research[J]. Behavior Research Methods Instruments & Computers, 1996, 28(2): 197-202.
doi: 10.3758/BF03204765
[17]
Roweis S T, Saul L K.Nonlinear Dimensionality Reduction by Locally Linear Embedding[J]. Science, 2000, 290(5500): 2323-2326.
doi: 10.1126/science.290.5500.2323
pmid: 11125150
[18]
Burges C J C. Dimension Reduction: A Guided Tour[J]. Foundations & Trends® in Machine Learning, 2010, 2(4): 262-286.
doi: 10.1561/2200000002
Li W, Cerise J E, Yang Y, et al.Application of t-SNE to Human Genetic Data[J]. Journal of Bioinformatics & Computational Biology, 2017, 15(4): 1750017. DOI: 10.1142/S0219720017500172.
doi: 10.1142/S0219720017500172
pmid: 28718343
[21]
Pezzotti N, Lelieveldt B, Maaten L V D, et al. Approximated and User Steerable tSNE for Progressive Visual Analytics[J]. IEEE Transactions on Visualization & Computer Graphics, 2017, 23(7): 1739-1752.
doi: 10.1109/TVCG.2016.2570755
pmid: 27323366
[22]
Liu S, Bremer P T, Thiagarajan J J, et al.Visual Exploration of Semantic Relationships in Neural Word Embeddings[J]. IEEE Transactions on Visualization & Computer Graphics, 2017, 24(1): 553-562.
doi: 10.1109/TVCG.2017.2745141
pmid: 28866574
[23]
Maaten L V D, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(2605): 2579-2605.
doi: 10.1007/s10846-008-9235-4
[24]
Hinton G, Roweis S.Stochastic Neighbor Embedding[J]. Advances in Neural Information Processing Systems, 2002, 41(4): 833-840.
[25]
Kullback S, Leibler R A.On Information and Sufficiency[J]. Annals of Mathematical Statistics, 1951, 22(1): 79-86.
doi: 10.1214/aoms/1177729694