提取核心特征词的惩罚性矩阵分解方法——以共词分析为例

doi:10.11925/infotech.1003-3513.2014.03.13

现代图书情报技术

2014, Vol. 30

Issue (3): 88-95 https://doi.org/10.11925/infotech.1003-3513.2014.03.13

情报分析与研究

本期目录 | 过刊浏览 | 高级检索

提取核心特征词的惩罚性矩阵分解方法——以共词分析为例

俞仙子¹, 高英莲², 马春霞¹, 刘金星¹

1 曲阜师范大学信息技术与传播学院日照 276826;
2 曲阜师范大学图书馆日照 276826

The Penalized Matrix Decomposition Method of Extracting Core Characteristic Words——Taking Co-word Analysis as an Example

Yu Xianzi¹, Gao Yinglian², Ma Chunxia¹, Liu Jinxing¹

1 Department of Information Technology and Communication, QuFu Normal University, Rizhao 276826, China;
2 Library of QuFu Normal University, Rizhao 276826, China

摘要
参考文献
相关文章
Metrics

全文: PDF (637 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要

[目的] 在共词分析时对高维共词矩阵进行稀疏降维，直观快速地凸显出高维矩阵中的核心特征词。[方法] 提出基于惩罚性矩阵分解（PMD）的文本核心特征词提取方法，选取有关高校图书馆使用社交网络这一主题的文献进行实验，用Matlab R2012a对构建的共词矩阵进行PMD分解降维。[结果] 利用PMD从1 648个特征词中提取出65个核心特征词，不仅大于用主成分分析提取的34个特征词，而且揭示出高校图书馆使用社交网络的研究热点。[局限] 实验中提取的高校图书馆使用社交网络的特征词未能全面涉及，有一定的主观性。[结论] 用PMD方法对高维共词矩阵进行稀疏后，所获核心特征词更容易被理解和解释，也能够表明一些边缘化的主题。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	高英莲
	刘金星
	马春霞
	俞仙子

关键词 ：惩罚性矩阵分析, 特征词提取, 主成分分析

Abstract：

[Objective] Highlight core characteristic words directly by reducing the high-dimensional co-matrix sparely in co-word analysis. [Methods] This article proposes, based on the Penalized Matrix Decomposition (PMD) method, a method to extract core characteristic words from texts of characteristic words.The authors experiment on articles which are related to university libraries that take advantage of SNS, and use Matlab R2012a to decompose high-dimensional co-word matrix by PMD. [Results] By using PMD method, 65 core characteristic words are extracted from all 1648 characteristic words, which more than 34 characteristic words that extracted by the principal components analysis, and also reveal research hotspots of the university libraries using social networks. [Limitations] The authors don't refer to all the characteristic words that acquired from literature, and have a certain subjectivity. [Conclusions] Converting into sparse matrix by PMD, core characteristic words are comprehended and explained more easily, meanwhile, they can show some marginal subjects.

Key words： PMD Extracting core characteristic words PCA

收稿日期: 2013-09-10 出版日期: 2014-04-15

G250

基金资助:

本文系曲阜师范大学校级基金项目“多变量控制的先进建模方法研究”（项目编号：XJ200947）的研究成果之一。

通讯作者: 俞仙子 E-mail：yuxianzi2010@163.com E-mail: yuxianzi2010@163.com

作者简介: 作者贡献声明：俞仙子: 采集、清洗、分析数据和论文起草；高英莲: 数据的分析与论文修订；马春霞: 实验调试；刘金星: 提出研究思路，设计研究方案和论文修订。

引用本文:

俞仙子, 高英莲, 马春霞, 刘金星. 提取核心特征词的惩罚性矩阵分解方法——以共词分析为例[J]. 现代图书情报技术, 2014, 30(3): 88-95.
Yu Xianzi, Gao Yinglian, Ma Chunxia, Liu Jinxing. The Penalized Matrix Decomposition Method of Extracting Core Characteristic Words——Taking Co-word Analysis as an Example. New Technology of Library and Information Service, 2014, 30(3): 88-95.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.03.13 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I3/88

[1] 李颖, 贾二鹏, 马力. 国内外共词分析研究综述[J]. 新世纪图书馆, 2012(1): 23-27. (Li Ying, Jia Erpeng, Ma Li. A Review of Domestic and International Co-word Analysis[J]. New Century Library, 2012(1): 23-27.)

[2] 张勤, 马费成. 国外知识管理研究范式——以共词分析为方法[J]. 管理科学学报, 2007, 10(6): 65-75. (Zhang Qin, Ma Feicheng. On Paradigm of Research Knowledge Manage- ment:A Bibliometric Analysis [J]. Journal of Management Sciences in China, 2007, 10(6): 65-75.)

[3] 陆宇杰, 张凤仙, 范并思. 基于共词分析的高校图书馆核心价值研究[J]. 大学图书馆学报, 2011, 29(6): 34-40. (Lu Yujie, Zhang Fengxian, Fan Bingsi. Research on the Core Value of Foreign Universities——Based on Co-word Analysis[J]. Journal of Academic Libraries, 2011, 29(6): 34-40.)

[4] Ding Y, Chowdhury G G, Foo S. Bibliometric Cartography of Information Retrieval Research by Using Co-word Analysis[J]. Information Processing & Management, 2001, 37(6): 817-842.

[5] Morris S A. Manifestation of Emerging Specialties in Journal Literature:A Growth Model of Papers, References, Exemplars, Bibliographic Coupling, Cocitation, and Clustering Coefficient Distribution[J]. Journal of the American Society for Information Science and Technology, 2005, 56(12): 1250-1273.

[6] 李纲, 李轶. 一种基于关键词加权的共词分析方法[J]. 情报科学, 2011, 29(3): 321-324. (Li Gang, Li Yi. An Approach to Co-word Analysis Based on Weighted Keywords[J]. Information Science, 2011, 29(3): 321-324.)

[7] 杨彦荣, 张阳. 加权共词分析法研究[J]. 情报理论与实践, 2011, 34(4): 61-63. (Yang Yanrong, Zhang Yang. Research on Weighted Co-word Analysis[J]. Information Studies:Theory & Application, 2011, 34(4): 61-63.)

[8] Witten D M, Tibshirani R, Hastie T. A Penalized Matrix Decomposition, with Applications to Sparse Principal Components and Canonical Correlation Analysis[J]. Biostatistics, 2009, 10(3): 515-534.

[9] Zheng C H, Zhang L, Ng T Y, et al. Inferring the Transcriptional Modules Using Penalized Matrix Decomposition[C]. In:Proceedings of the 6th International Conference on Intelligent Computing, Changsha, China. 2010: 35-41.

[10] Zhang J, Zheng C H, Liu J X, et al. Discovering the Transcriptional Modules Using Microarray Data by Penalized Matrix Decomposition[J]. Computers in Biology and Medicine, 2011, 41(11): 1041-1050.

[11] Liu J X, Zheng C H, Xu Y. Extracting Plants Core Genes Responding to Abiotic Stresses by Penalized Matrix Decomposition[J]. Computers in Biology and Medicine, 2012, 42(5): 582-589.

[12] 王娟, 范少萍, 郑春厚. 基于惩罚性矩阵分解的文本聚类分析[J]. 情报学报, 2012, 31(9): 998-1008. (Wang Juan, Fan Shaoping, Zheng Chunhou. Analysis of Text Clustering Based on Penalized Matrix Decomposition[J]. Journal of the China Society for Scientific and Technical Information, 2012, 31(9): 998-1008.)

[13] 郭春侠, 叶继元. 基于共词分析的国外图书情报学研究热点[J]. 图书情报工作, 2011, 55(20): 19-22. (Guo Chunxia, Ye Jiyuan. Hot Topics of Library and Information Science Abroad Between 2005 and 2009 Based on Co-word Analysis Method[J] Library and Information Service, 2011, 55(20): 19-22.)

[14] Pearson K. On Lines and Planes of Closest Fit to Systems of Points in Space[J]. Philosophical Magazine, 1901, 2 (6): 559-572.

[15] Abdi H, Williams L J. Principal Component Analysis[J]. Wiley Interdisciplinary Reviews:Computational Statistics, 2010, 2(4):433-459.

[16] 孙晓宁, 储节旺. 近十年知识管理领域硕博士学位论文研究热点分析——以共词分析为方法[J]. 情报杂志, 2012, 31(6): 433-459. (Sun Xiaoning, Chu Jiewang. On Hotspots of Master and Ph. D. Degree's Dissertations in the Field of Knowledge Management During the Last Decade:A Co-word Analysis[J]. Journal of Intelligence, 2012, 31(6): 433-459.)

[1]	林克柔,王昊,龚丽娟,张宝隆. 融合多特征的中文论文同名学者消歧研究 ^*[J]. 数据分析与知识发现, 2021, 5(4): 90-102.
[2]	刘伟江,魏海,运天鹤. 基于卷积神经网络的客户信用评估模型研究*[J]. 数据分析与知识发现, 2020, 4(6): 80-90.
[3]	陈远, 王超群, 胡忠义, 吴江. 基于主成分分析和随机森林的恶意网站评估与识别^*[J]. 数据分析与知识发现, 2018, 2(4): 71-80.
[4]	张李义, 张皎. 一种基于主成分分析和随机森林的刷客识别方法[J]. 现代图书情报技术, 2015, 31(10): 65-71.

Viewed

Full text

Abstract

Cited

Shared

Discussed