Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (3): 41-49    DOI: 10.11925/infotech.1003-3513.2016.03.06
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
关联数据在学术资源网相似文献发现中的应用研究*
赵夷平,毕强()
吉林大学管理学院 长春 130022
Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites
Zhao Yiping,Bi Qiang()
School of Management, Jilin University, Changchun 130022, China
全文: PDF(2609 KB)   HTML ( 49
输出: BibTeX | EndNote (RIS)      
摘要 

目的】利用关联数据的机器可读、语义表示、关联描述和网络资源属性的优势, 弥补学术资源网信息组织的不足, 为相似文献发现提供支持。【方法】采用潜在语义分析方法计算学术资源网发布的文献的总体相似度, 通过层次聚类方法确定相似度阈值进行相似度筛选, 生成文档关系矩阵, 在此基础上利用动态文档技术构造学术资源网关联数据以支持关联文献语义检索。【结果】初步实现具有相似文献查询功能的学术资源网关联数据, 用于便捷地获得与任何一篇文献高度相关的文献, 有助于高效地发现相似文献。【局限】仅从统计学角度实现学术资源网中相似文献的发现, 对于利用文档集知识体系、语义内涵和组织方式等进行深度的相似文献发现有待进一步研究。【结论】潜在语义分析方法计算文献相似度可有效发现相似文档, 将相似文献关联记录在关联数据中, 支持语义检索获得精确的相似文献, 并能够大幅缩减实时相似性计算的延迟。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
赵夷平
毕强
关键词 关联数据潜在语义分析学术资源网相似度    
Abstract

[Objective] This paper studied the linked data from the Web, which is machine-readable, semantically meaningful and relationally descriptive. We examined these data’s effectiveness to improve the information organization of the academic resource websites (ARWs), with the purpose of retrieving more similar documents. [Methods] We first calculated the similarity of documents published in the ARWs with the help of the Latent Semantic Analysis (LSA) method. Then, chose documents with high similarities by the Hierarchical Cluster method, and created a document relation matrix. Finally, we used the dynamic document technology to generate a linked data index to search the ARWs. [Results] We built a preliminary ARWs linked data index, which helped us find similar documents more effectively from the ARWs. [Limitations] We investigated the similar documents retrieval technology from the perspective of statistical analysis. Therefore, further research is needed to locate similar documents from various subject areas with the support of deep learning technology. [Conclusions] We computed documents’ similarity using LSA method to discover related documents of specific articles. The linked data could help us find more similar documents, while reducing the waiting time for similarity calculation.

Key wordsLinked data    Latent Semantic Analysis(LSA)    Academic Resource Websites(ARWs)    Similarity
收稿日期: 2015-08-13     
基金资助:*本文系国家自然科学基金项目“语义网络环境下数字图书馆资源多维度聚合与可视化展示研究”(项目编号:71273111)和吉林大学高峰学科(群)建设项目的研究成果之一
引用本文:   
赵夷平,毕强. 关联数据在学术资源网相似文献发现中的应用研究*[J]. 现代图书情报技术, 2016, 32(3): 41-49.
Zhao Yiping,Bi Qiang. Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites. New Technology of Library and Information Service, DOI:10.11925/infotech.1003-3513.2016.03.06.
链接本文:  
http://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.03.06
[1] 张云中. 从整合到聚合: 国内数字资源再组织模式的变革[J]. 数字图书馆论坛, 2014(6): 16-20.
[1] (Zhang Yunzhong.From Integration to Aggregation: The Change of Digital Resources Re-organization Pattern in China[J]. Digital Library Forum, 2014(6): 16-20.)
[2] Magerman T, Van Looy B, Song X.Exploring the Feasibility and Accuracy of Latent Semantic Analysis Based Text Mining Techniques to Detect Similarity Between Patent Documents and Scientific Publications[J]. Scientometrics, 2010, 82(2): 289-306.
[3] 和晓萍, 李迪, 王米利, 等. 基于预聚类的潜在语义分析模型文献检索研究[J]. 云南民族大学学报: 自然科学版, 2015, 24(3): 257-260.
[3] (He Xiaoping, Li Di, Wang Mili, et al.A New Pre-Clustering-based Latent Semantic Analysis Algorithm for Document Retrieval[J]. Journal of Yunnan Nationalities University: Natural Sciences Edition, 2015, 24(3): 257-260. )
[4] Wang W, Yu B.Text Categorization Based on Combination of Modified back Propagation Neural Network and Latent Semantic Analysis[J]. Neural Computing & Application, 2009, 18(8): 875-881.
[5] Olmos R, León J A, Jorge-Botana G, et al.New Algorithms Assessing Short Summaries in Expository Texts Using Latent Semantic Analysis[J]. Behavior Research Methods, 2009, 41(3): 944-950.
[6] Law J, Bauin S, Courtial J P, et al.Policy and the Mapping of Scientific Change: A Co-word Analysis of Research into Environmental Acidification[J]. Scientometrics, 1988, 14(3): 251-264.
[7] 唐果媛, 张薇.基于共词分析法的学科主题演化研究进展与分析[J]. 图书情报工作, 2015, 59(5): 128-136.
[7] (Tang Guoyuan, Zhang Wei.Development and Analysis of Subject Theme Evolution Based on Co-word Analysis Method[J]. Library and Information Service, 2015, 59(5): 128-136.)
[8] 任建华, 沈炎彬, 孟祥福, 等.基于词条之间关联关系的文档聚类[J/OL]. [2014-12-11]. 计算机工程与应用. .
[8] (Ren Jianhua, Shen Yanbin, Meng Xiangfu, et al. Document Clustering Based on Association Relations Between Terms [J/OL]. [2014-12-11]. Computer Engineering and Applications.
[9] 黄贤英, 张金鹏, 刘英涛, 等.基于词项语义映射的短文本相似度算法[J]. 计算机工程与设计, 2015, 36(6): 1514-1518, 1534.
[9] (Huang Xianying, Zhang Jinpeng, Liu Yingtao, et al.Short Text Similarity Algorithm Based on Term Mapping with Semantic[J]. Computer Engineering and Design, 2015, 36(6): 1514-1518, 1534.)
[10] 徐勇, 陈建国, 胡凌云, 等.基于泛化语义相似的科技文献混合推荐算法[J]. 情报理论与实践, 2013, 36(2): 96-99, 103.
[10] (Xu Yong, Chen Jianguo, Hu Lingyun, et al.S&T Literature Hybrid Recommendation Algorithm Based on Generalized Semantic Similarity[J]. Information Studies: Theory & Application, 2013, 36(2): 96-99, 103.)
[11] 吴树芳, 刘畅, 徐建民.基于术语间本体关联度的文档相关度研究[J]. 现代情报, 2014, 34(9): 56-59, 176.
[11] (Wu Shufang, Liu Chang, Xu Jianmin.Research on Document Relevancy Based on Ontology Term Relations[J]. Journal of Modern Information, 2014, 34(9): 56-59, 176.)
[12] Steyvers M, Griffith T.Probabilistic Topic Models[A].// Latent Semantic Analysis: A Road to Meaning[M]. Laurence Erlbaum, 2006.
[13] Landauer T K, Foltz P W, Laham D.An Introduction to Latent Semantic Analysis[J]. Discourse Processes, 1998, 25(2-3): 259-284.
[14] Leydesdorff L. Similarity Measures, Author Cocitation Analysis,Information Theory[J]. Journal of the American Society for Information Science & Technology (JASIST), 2005, 56(7): 769-772.
[15] Structured Dynamic. Linked Data FAQ [EB/OL]. [2014-07- 18]. .
[16] 王昊奋. 大规模知识图谱技术[J]. 中国计算机学会通讯, 2014, 10(3): 64-68.
[16] (Wang Haofen.Large-scale Knowledge Graph Technology[J]. Communications of the CCF, 2014, 10(3): 64-68.)
[17] Berners-Lee T. Linked Data-Design Issues [EB/OL]. [2009- 06-18]. .
[18] 刘炜. 关联数据:概念、技术及应用展望[J]. 大学图书馆学报, 2011, 29(2): 5-12.
[18] (Liu Wei.Overview on Linked Data: Concept, Technology and Implementation[J]. Journal of Academic Libraries, 2011, 29(2): 5-12.)
[19] Han J, Kamber M, Pei J.数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 第3版.北京: 机械工业出版社, 2012: 288-289.
[19] (Han J, Kamber M, Pei J.Data Mining: Concept and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 288-289.)
[20] Tang X, Zhu P.Hierarchical Clustering Problems and Analysis of Fuzzy Proximity Relation on Granular Space[J]. IEEE Transactions on Fuzzy Systems, 2013, 21(5): 814-824.
[21] The R Project for Statistical Computing [EB/OL]. [2015-07- 10]..
[22] XML: Tools for Parsing and Generating XML Within R and S-Plus [EB/OL]. [2015-06-30]. .
[23] Rwordseg: Chinese Word Segmentation[EB/OL]. [2013-12- 15]. .
[24] lsa: Latent Semantic Analysis[EB/OL]. [2015-05-27]. .
[25] proxy: Distance and Similarity Measures[EB/OL]. [2015- 07-08]. .
[26] rmarkdown: Dynamic Documents for R [EB/OL]. [2015- 06-13]. .
[27] Csardi G, Nepusz T.The iGraph Software Package for Complex Network Research [C]. In: Proceedings of InterJournal, Complex Systems Cambridge, MA USA. 2006: 1695.
[28] Antoniou G, Groth P, Hoekstra R, 等. 语义网基础教程[M]. 胡伟, 程龚, 黄智生译. 第3版.北京: 机械工业出版社, 2014.
[28] (Antoniou G, Groth P, Hoekstra R, et al.A Semantic Web Primer [M]. Translated by Hu Wei, Cheng Gong, Huang Zhisheng. The 3rd Edition. Beijing: China Machine Press, 2014.)
[1] 关鹏,王曰芬,傅柱. 基于LDA的主题语义演化分析方法研究 * ——以锂离子电池领域为例[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[2] 张佩瑶,刘东苏. 基于词向量和BTM的短文本话题演化分析*[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[3] 吴丹,陆柳杏. 跨设备搜索中设备转移前后查询式语义变化研究*[J]. 数据分析与知识发现, 2018, 2(8): 69-78.
[4] 孙海霞,王蕾,吴英杰,华薇娜,李军莲. 科技文献数据库中机构名称匹配策略研究*[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[5] 王道平,蒋中杨,张博卿. 基于灰色关联分析和时间因素的协同过滤算法*[J]. 数据分析与知识发现, 2018, 2(6): 102-109.
[6] 李琳,李辉. 一种基于概念向量空间的文本相似度计算方法[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[7] 花凌锋,杨高明,王修君. 面向位置的多样性兴趣新闻推荐研究*[J]. 数据分析与知识发现, 2018, 2(5): 94-104.
[8] 刘俊婉,杨波,王菲菲. 基于引证行为与学术相似度的学者影响力领域排名方法研究*[J]. 数据分析与知识发现, 2018, 2(4): 59-70.
[9] 徐建民,许彩云. 基于文本和公式的科技文档相似度计算*[J]. 数据分析与知识发现, 2018, 2(10): 103-109.
[10] 沈志宏,姚畅,侯艳飞,吴林寰,李跃鹏. 关联大数据管理技术: 挑战、对策与实践*[J]. 数据分析与知识发现, 2018, 2(1): 9-20.
[11] 陈二静,姜恩波. 文本相似度计算方法研究综述[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
[12] 白如江,冷伏海,廖君华. 一种基于语义组块特征的改进Cosine文本相似度计算方法*[J]. 数据分析与知识发现, 2017, 1(6): 56-64.
[13] 崔家旺,李春旺. 基于关联数据的类簇语义揭示模型研究[J]. 数据分析与知识发现, 2017, 1(4): 57-66.
[14] 姜赢,张婧,朱玲萱. 面向Cytoscape平台的关联数据知识图谱概览抽取与可视化*[J]. 数据分析与知识发现, 2017, 1(3): 29-37.
[15] 田世海,吕德丽. 改进潜在语义分析和支持向量机算法用于突发安全事件舆情预警*[J]. 数据分析与知识发现, 2017, 1(2): 11-18.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn