Please wait a minute...
Advanced Search
数据分析与知识发现  2018, Vol. 2 Issue (8): 1-9     https://doi.org/10.11925/infotech.2096-3467.2018.0251
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
基于t-SNE降维的科学基金资助项目可视化方法研究*
陈挺1,2,3(), 李国鹏3, 王小梅3
1中国科学院文献情报中心 北京 100190
2中国科学院大学 北京 100049
3中国科学院科技战略咨询研究院 北京 100190
Visualizing Appropriation of Research Funding with t-SNE Algorithm
Chen Ting1,2,3(), Li Guopeng3, Wang Xiaomei3
1National Science Library, Chinese Academy of Sciences, Beijing 100190, China
2University of Chinese Academy of Sciences, Beijing 100049, China
3Institutes of Science and Development, Chinese Academy of Sciences, Beijing 100190, China
全文: PDF (2765 KB)   HTML ( 3
输出: BibTeX | EndNote (RIS)      
摘要 

目的】设计主题模型结合流形学习文本特征降维可视化方案, 更有效地发现与更直观地展示科研基金资助布局。【方法】基于美国NSF信息与智能系统(IIS)10年(2008-2017)的基金资助项目数据, 利用聚类算法结合人工判读构建项目主题标签; 利用TF-IDF向量空间模型与LSA潜在语义分析主题模型分别构建项目申请书高维特征, 采用流形学习中t-SNE非线性降维算法将高维特征映射到二维或三维空间中可视化展示; 基于构建的项目主题标签结合人工判读检验可视化效果。【结果】实验结果表明, t-SNE算法结合潜在语义分析模型在实验数据降维效果明显, 可视化图谱不论在二维还是三维空间中, 相同主题项目有较好的聚集性, 主题间同样显示了清晰的轮廓和分界。【局限】算法参数的预设与调整需人工参与, 未对不同资助机构的基金文本数据的适用性进行验证。【结论】该方法是可行的且可视化图谱能够直观地反映资助机构的资助布局, 对科研管理与决策者审视宏观科研布局能够起到辅助作用。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
陈挺
李国鹏
王小梅
关键词 科学基金项目科研布局LSAt-SNE可视化图谱    
Abstract

[Objective] This paper designs a visualization method for the appropriation of research funding, aiming to more effectively present the locations of funded projects. [Methods] First, we retrieved 4,669 funded projects from NSF’s Information and Intelligent System. Then, we added topic tags to these projects using clustering algorithm and human interpretation. Third, we extracted the high-dimensional text features for the application documents with TF-IDF model and LSA model. Fourth, we used the t-SNE algorithm to project high-dimensional features into two or three-dimensional spaces for visualization. Finally, we examined the visualization results with pre-classified topic labels. [Results] The proposed method created maps of funded projects, in both two-dimensional or three-dimensional spaces. [Limitations] The algorithm parameters need to be adjusted manually. More research is needed to evaluate the proposed method with documents of projects funded by other agencies. [Conclusions] The proposed method could generate maps for the funded projects, which is a helpful tool for scientific management.

Key wordsResearch Awards    Funding Map    LSA    t-SNE    Visualization
收稿日期: 2018-03-07      出版日期: 2018-09-08
ZTFLH:  P315 G312  
基金资助:*本文系国家自然科学基金项目“科学结构特征及其演化动力学分析方法应用研究”(项目编号: 71173211)和中国科学院科技战略咨询研究院青年基金项目“科研项目布局分析中关键技术方法研究”(项目编号: Y7X1161Q01)的研究成果之一
引用本文:   
陈挺, 李国鹏, 王小梅. 基于t-SNE降维的科学基金资助项目可视化方法研究*[J]. 数据分析与知识发现, 2018, 2(8): 1-9.
Chen Ting,Li Guopeng,Wang Xiaomei. Visualizing Appropriation of Research Funding with t-SNE Algorithm. Data Analysis and Knowledge Discovery, 2018, 2(8): 1-9.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2018.0251      或      http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2018/V2/I8/1
  PCA、LDA基金资助项目降维可视化效果对比
  基金项目可视化流程
  可视化效果验证流程
Perplexity Earlyexaggeration Learningrate n_iter
50 40 600 1 000
  t-SNE算法重要参数
  两种不同文本特征应用t-SNE降维算法可视化效果对比
  三维可交互基金可视化图谱
[1] Talley E M, Newman D, Mimno D, et al.Database of NIH Grants Using Machine-Learned Categories and Graphical Clustering[J]. Nature Methods, 2011, 8(6): 443-444.
doi: 10.1038/nmeth.1619 pmid: 21623347
[2] 陈挺, 韩涛, 李泽霞, 等. 科研项目布局差异对比方法研究——以NSF和EUFP项目为例[J]. 现代图书情报技术, 2015(7-8): 89-96.
[2] (Chen Ting, Han Tao, Li Zexia, et al.Research on Comparison Method of Scientific Funding Layout——Take NSF and EUFP Grants for Instance[J]. New Technology of Library & Information Service, 2015(7-8): 89-96.)
[3] 陈挺, 李国鹏, 姜山, 等. NSF材料科学十年——基金项目分布及趋势变化分析[J]. 世界科技研究与发展, 2017, 39(5): 401-411.
[3] (Chen Ting, Li Guopeng, Jiang Shan, et al.Past Decade of NSF Material Science:An Analysis of Layout and Trend of Funded Projects[J]. World Sci-Tech R&D, 2017, 39(5): 401-411.)
[4] De-Miguel-Molina B, Cunningham S W, Palop F. Analyzing Funding Patterns and Their Evolution in Two Medical Research Topics[J]. International Journal of Innovation and Technology Management, 2017, 14(2). DOI: 10.1142/S0219877017400107.
doi: 10.1142/S0219877017400107
[5] 王文娟, 马建霞. 基于LDA的科研项目主题挖掘与演化分析——以NSF海洋酸化研究为例[J]. 情报杂志, 2017, 36(7): 34-39.
doi: 10.3969/j.issn.1002-1965.2017.07.007
[5] (Wang Wenjuan, Ma Jianxia.Topic Detection and Evolution Analysis of Research Project Based on LDA——A Case Study of Projects on Ocean Acidification Supported by NSF[J]. Journal of Intelligence, 2017, 36(7): 34-39.)
doi: 10.3969/j.issn.1002-1965.2017.07.007
[6] Park J, Blume-Kohout M, Krestel R, et al.Analyzing NIH Funding Patterns over Time with Statistical Text Analysis[C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016.
[7] Liu S, Cao N, Lv H. Interactive Visual Analysis of the NSF Funding Information[C]//Proceedings of 2008 IEEE Pacific Visualization Symposium. DOI: 10.1109/PACIFICVIS.2008.4475475.
[8] 王贤文, 刘则渊, 侯海燕. 全球主要国家的科学基金及基金论文产出现状: 基于Web of Science的分析[J]. 科学学研究, 2010, 28(1): 62-66.
[8] (Wang Xianwen, Liu Zeyuan, Hou Haiyan.Global Assessment of Science Funding and Funding Papers: A Study in Web of Science[J]. Studies in Science of Science, 2010, 28(1): 62-66.)
[9] 孙金伟, 刘迪, 王贤文, 等. 科学基金资助与SCI论文产出: 对10个国家的比较分析[J]. 科学学研究, 2013, 31(1): 36-42.
doi: 10.3969/j.issn.1003-2053.2013.01.006
[9] (Sun Jinwei, Liu Di, Wang Xianwen, et al.Science Funding and SCI Papers Output: A Comparative Analysis on 10 Countries[J]. Studies in Science of Science, 2013, 31(1): 36-42.)
doi: 10.3969/j.issn.1003-2053.2013.01.006
[10] Auranen O, Nieminen M.University Research Funding and Publication Performance - An International Comparison[J]. Research Policy, 2010, 39(6): 822-834.
doi: 10.1016/j.respol.2010.03.003
[11] Wang J, Shapira P.Funding Acknowledgement Analysis: An Enhanced Tool to Investigate Research Sponsorship Impacts: The Case of Nanotechnology[J]. Scientometrics, 2011, 87(3): 563-586.
doi: 10.1007/s11192-011-0362-5
[12] Herr II B W, Talley E M, Burns G A P C, et al. The NIH Visual Browser: An Interactive Visualization of Biomedical Research[C]// Proceedings of the 13th International Conference on Information Visualisation. IEEE, 2009.
[13] Takahiro K, Katsutaro W, Naoya M.Funding Map for Research Project Relationships Using Paragraph Vectors[C]// Proceedings of the 16th International Conference on Scientometrics&Informetrics (ISSI), Wuhan, China. 2017.
[14] Abadi M, Agarwal A, Barham P, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[OL]. arXiv:1603.04467. 2016.
[15] Salton G, Buckley C.Term-Weighting Approaches in Automatic Text Retrieval[J]. Information Processing & Management,1988, 24(5): 513-523.
doi: 10.1016/0306-4573(88)90021-0
[16] Foltz P W.Latent Semantic Analysis for Text-based Research[J]. Behavior Research Methods Instruments & Computers, 1996, 28(2): 197-202.
doi: 10.3758/BF03204765
[17] Roweis S T, Saul L K.Nonlinear Dimensionality Reduction by Locally Linear Embedding[J]. Science, 2000, 290(5500): 2323-2326.
doi: 10.1126/science.290.5500.2323 pmid: 11125150
[18] Burges C J C. Dimension Reduction: A Guided Tour[J]. Foundations & Trends® in Machine Learning, 2010, 2(4): 262-286.
doi: 10.1561/2200000002
[19] Zhong G, Cheriet M.Large Margin Low Rank Tensor Analysis[J]. Neural Computation, 2014, 26(4):761-780.
doi: 10.1162/NECO_a_00570 pmid: 24479778
[20] Li W, Cerise J E, Yang Y, et al.Application of t-SNE to Human Genetic Data[J]. Journal of Bioinformatics & Computational Biology, 2017, 15(4): 1750017. DOI: 10.1142/S0219720017500172.
doi: 10.1142/S0219720017500172 pmid: 28718343
[21] Pezzotti N, Lelieveldt B, Maaten L V D, et al. Approximated and User Steerable tSNE for Progressive Visual Analytics[J]. IEEE Transactions on Visualization & Computer Graphics, 2017, 23(7): 1739-1752.
doi: 10.1109/TVCG.2016.2570755 pmid: 27323366
[22] Liu S, Bremer P T, Thiagarajan J J, et al.Visual Exploration of Semantic Relationships in Neural Word Embeddings[J]. IEEE Transactions on Visualization & Computer Graphics, 2017, 24(1): 553-562.
doi: 10.1109/TVCG.2017.2745141 pmid: 28866574
[23] Maaten L V D, Hinton G. Visualizing Data Using t-SNE[J]. Journal of Machine Learning Research, 2008, 9(2605): 2579-2605.
doi: 10.1007/s10846-008-9235-4
[24] Hinton G, Roweis S.Stochastic Neighbor Embedding[J]. Advances in Neural Information Processing Systems, 2002, 41(4): 833-840.
[25] Kullback S, Leibler R A.On Information and Sufficiency[J]. Annals of Mathematical Statistics, 1951, 22(1): 79-86.
doi: 10.1214/aoms/1177729694
[26] Embedding Projector[EB/OL]. [2018-02-20]. .
[1] 陈挺, 韩涛, 李泽霞, 李国鹏, 王小梅. 科研项目布局差异对比方法研究——以NSF和EUFP项目为例[J]. 现代图书情报技术, 2015, 31(7-8): 89-96.
[2] 张晓娟, 陆伟, 程齐凯. PLSA在图情领域专家专长识别中的应用[J]. 现代图书情报技术, 2012, 28(2): 76-81.
[3] 蒋翠清, 张玉, 丁勇. 基于PLSA的大众标注潜在语义发现[J]. 现代图书情报技术, 2010, 26(10): 28-32.
[4] 胡泽文,王效岳. 1998-2008年国内外本体应用研究计量分析及可视化[J]. 现代图书情报技术, 2009, 25(12): 25-30.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn