Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (10): 84-92     https://doi.org/10.11925/infotech.1003-3513.2014.10.13
  情报分析与研究 本期目录 | 过刊浏览 | 高级检索 |
改进TFIDF算法在潜在合作关系挖掘中的应用研究
孙鸿飞, 侯伟
东北电力大学经济管理学院 吉林 132012
Application of Improved TFIDF Algorithm in Mining Potential Cooperation Relationship
Sun Hongfei, Hou Wei
School of Economics and Management, Northeast Dianli University, Jilin 132012, China
全文: PDF (462 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 弥补传统方法在潜在合作关系挖掘中的缺陷和不足, 提高潜在合作关系的挖掘效果。[方法] 在分析简单计算法、最小值计算法与传统TFIDF算法缺陷和不足的基础上, 提出改进TFIDF算法, 并将其引入到潜在合作关系挖掘中。[结果] 利用《北大中文期刊核心目录(2012年版)》中19种图书情报类期刊近5年情报学研究方法应用领域的论文作为样本数据, 发现简单计算法与最小值计算法受到作者发文量影响较大, 传统TFIDF算法的挖掘结果很难实现从潜在合作关系转化为现实合作关系, 而改进TFIDF算法对此的满足度则表现得非常突出。[局限] 改进TFIDF算法未考虑论文中作者之间的排名顺序对潜在合作关系的影响。[结论] 通过将4种挖掘结果进行对比和评价, 证明改进TFIDF算法较其他传统方法更科学、更具有优越性和实用价值。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
孙鸿飞
侯伟
关键词 改进TFIDF算法潜在合作关系数据挖掘耦合分析    
Abstract

[Objective] In order to remedy the defects of traditional methods in the mining potential cooperation relationship, improve the potential mining effect. [Methods] The paper proposes the improved TFIDF algorithm and applies to the potential cooperation relationship mining based on the analysis of the flaw and the insufficiency of simple calculation method, minimum value calculation method and the traditional TFIDF algorithm. [Results] The simple calculation method and the minimum value calculation method are greatly influenced by authors productivity, traditional TFIDF algorithm result is difficult to achieve the conversion from potential cooperation relationship for practical cooperation, and improved TFIDF algorithm shows very prominent based on regarding the applying research methods of information science field in 19 kinds of journals of Library and Information Science in "Chinese Core Journal of Peking University Directory (2012 Edition)" in recent 5 years as sample data. [Limitations] The improved TFIDF algorithm does not consider the influence between author ranking orders of potential cooperation. [Conclusions] The results show that the improved TFIDF algorithm is more scientific, has more advantages and better practical value than other traditional methods, through comparing and evaluating four data mining results.

Key wordsImproved TFIDF algorithm    Potential cooperation relationship    Data mining    Coupling analysis
收稿日期: 2014-04-10      出版日期: 2014-11-28
:  G350  
通讯作者: 孙鸿飞 E-mail: sunny_bird@126.com     E-mail: sunny_bird@126.com
作者简介: 作者贡献声明: 孙鸿飞: 提出研究思路和论文框架, 设计研究方案, 论文最终 版本修订; 侯伟: 采集、分析数据以及起草论文。
引用本文:   
孙鸿飞, 侯伟. 改进TFIDF算法在潜在合作关系挖掘中的应用研究[J]. 现代图书情报技术, 2014, 30(10): 84-92.
Sun Hongfei, Hou Wei. Application of Improved TFIDF Algorithm in Mining Potential Cooperation Relationship. New Technology of Library and Information Service, 2014, 30(10): 84-92.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.10.13      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I10/84

[1] White H D, Griffith B C. Author Co-citation: A Literature Measure of Intellectual Structure [J]. Journal of the American Society for Information Science, 1981, 32(3): 163-171.
[2] Jarneving B. A Variation of the Calculation of the First Author Cocitation Strength in Author Cocitation Analysis [J]. Scientometrics, 2008, 77(3): 485-504.
[3] Zhao D. Going Beyond Counting First Authors in Author Co-citation Analysis [J]. Proceedings of the American Society for Information Science and Technology, 2005, 42(1). DOI: 10.1002/meet.14504201210.
[4] Zhao D, Strotman A. Evolution of Research Activities and Intellectual Influences in Information Science 1996-2005: Introducing Author Bibliographic-coupling Analysis [J]. Journal of the American Society for Information Science and Technology, 2008, 59(13): 2070-2086.
[5] 刘志辉, 张志强. 作者关键词耦合分析方法及实证研究[J].情报学报, 2010, 29(2): 268-275. (Liu Zhihui, Zhang Zhiqiang. Author Keyword Coupling Analysis: An Empirical Research [J]. Journal of the China Society for Scientific and Technical Information, 2010, 29(2): 268-275.)
[6] 邱均平, 陈木佩. 我国计量学领域作者合作关系研究[J].
情报理论与实践, 2012, 35(11): 56-60. (Qiu Junping, Chen Mupei. Study on the Relationship between the Cooperation in the Science of National Metrology [J]. Information Studies: Theory & Application, 2012, 35(11): 56-60.)
[7] 陈远, 王菲菲. 基于CSSCI的国内情报学领域作者文献耦合分析[J]. 情报资料工作, 2011(5): 6-12. (Chen Yuan, Wang Feifei. An Analysis on the Bibliographic Coupling in the Field of Information Studies in China: Based on CSSCI [J]. Information and Documentation Services, 2011(5): 6-12.)
[8] 陈卫静, 郑颖. 基于作者关键词耦合的潜在合作关系挖掘[J]. 情报杂志, 2013, 21(5): 127-131. (Chen Weijing, Zheng Ying. Mining Potential Cooperative Relationships Based on the Author Keyword Coupling Analysis [J]. Journal of Intelligence, 2013, 21(5): 127-131.)
[9] 沈耕宇, 黄水清, 王东波. 以作者合作共现为源数据的科研团队发掘方法研究[J]. 现代图书情报技术, 2013(1): 57-62. (Shen Gengyu, Huang Shuiqing, Wang Dongbo. On the Scientific Research Teams Identification Method Taking Co- authorship of Collaboration as the Source Data [J]. New Technology of Library and Information Service, 2013(1): 57-62.)
[10] 孙鸿飞, 侯伟, 周兰萍, 等. 近五年我国情报学研究方法应用的统计分析[J]. 情报科学, 2014, 32(4): 77-84. (Sun Hongfei, Hou Wei, Zhou Lanping, et al. Statistical Analysis of Application of Research Methods of Information Science in Our Country in Recent Five Years [J]. Information Science, 2014, 32(04): 77-84.)
[11] 邱均平. 信息计量学[M]. 武汉: 武汉大学出版社, 2007: 916-937. (Qiu Junping. Information Metrology [M]. Wuhan: Wuhan University Press, 2007: 916-937.)

[1] 谢旺, 王丽珍, 陈红梅, 曾兰清. 基于空间序偶模式挖掘污染源与癌症病例的关系 *[J]. 数据分析与知识发现, 2021, 5(2): 14-31.
[2] 张勇,李树青,程永上. 基于频次有效长度的加权关联规则挖掘算法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 85-93.
[3] 陆泉,朱安琪,张霁月,陈静. 中文网络健康社区中的用户信息需求挖掘研究*——以求医网肿瘤板块数据为例[J]. 数据分析与知识发现, 2019, 3(4): 22-32.
[4] 牟冬梅,法慧,王萍,孙晶. 基于结构方程模型的疾病危险因素研究*[J]. 数据分析与知识发现, 2019, 3(4): 80-89.
[5] 李勇男. 贝叶斯理论在反恐情报分类分析中的应用研究*[J]. 数据分析与知识发现, 2018, 2(10): 9-14.
[6] 牟冬梅, 王萍, 赵丹宁. 高维电子病历的数据降维策略与实证研究*[J]. 数据分析与知识发现, 2018, 2(1): 88-98.
[7] 胡忠义, 王超群, 吴江. 融合多源网络评估数据及URL特征的钓鱼网站识别技术研究*[J]. 数据分析与知识发现, 2017, 1(6): 47-55.
[8] 江思伟, 谢振平, 陈梅婕, 蔡明. 混合特征数据的自解释归约建模方法*[J]. 数据分析与知识发现, 2017, 1(12): 92-100.
[9] 牟冬梅,任珂. 三种数据挖掘算法在电子病历知识发现中的比较*[J]. 现代图书情报技术, 2016, 32(6): 102-109.
[10] 李峰,李书宁,于静. 面向院系的高校毕业生图书馆记忆系统[J]. 现代图书情报技术, 2016, 32(5): 99-103.
[11] 高楠,傅俊英,赵蕴华. 基于两种相似度矩阵的专利引文耦合方法识别研究前沿*——以脑机接口为例[J]. 现代图书情报技术, 2016, 32(3): 33-40.
[12] 赵静娴. 基于决策树的网络伪舆情识别研究[J]. 现代图书情报技术, 2015, 31(6): 78-84.
[13] 何建民, 王哲. 社交网络话题信息传播影响簇发现谱系挖掘方法[J]. 现代图书情报技术, 2015, 31(5): 65-72.
[14] 黄文彬, 徐山川, 马龙, 王军. 利用通信数据的移动用户行为分析[J]. 现代图书情报技术, 2015, 31(5): 80-87.
[15] 郝玫, 王道平. 面向供应链的产品评论中客户关注特征挖掘方法研究[J]. 现代图书情报技术, 2014, 30(4): 65-70.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn