Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (3): 41-49     https://doi.org/10.11925/infotech.1003-3513.2016.03.06
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
关联数据在学术资源网相似文献发现中的应用研究*
赵夷平,毕强()
吉林大学管理学院 长春 130022
Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites
Zhao Yiping,Bi Qiang()
School of Management, Jilin University, Changchun 130022, China
全文: PDF (2609 KB)   HTML ( 49
输出: BibTeX | EndNote (RIS)      
摘要 

目的】利用关联数据的机器可读、语义表示、关联描述和网络资源属性的优势, 弥补学术资源网信息组织的不足, 为相似文献发现提供支持。【方法】采用潜在语义分析方法计算学术资源网发布的文献的总体相似度, 通过层次聚类方法确定相似度阈值进行相似度筛选, 生成文档关系矩阵, 在此基础上利用动态文档技术构造学术资源网关联数据以支持关联文献语义检索。【结果】初步实现具有相似文献查询功能的学术资源网关联数据, 用于便捷地获得与任何一篇文献高度相关的文献, 有助于高效地发现相似文献。【局限】仅从统计学角度实现学术资源网中相似文献的发现, 对于利用文档集知识体系、语义内涵和组织方式等进行深度的相似文献发现有待进一步研究。【结论】潜在语义分析方法计算文献相似度可有效发现相似文档, 将相似文献关联记录在关联数据中, 支持语义检索获得精确的相似文献, 并能够大幅缩减实时相似性计算的延迟。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
赵夷平
毕强
关键词 关联数据潜在语义分析学术资源网相似度    
Abstract

[Objective] This paper studied the linked data from the Web, which is machine-readable, semantically meaningful and relationally descriptive. We examined these data’s effectiveness to improve the information organization of the academic resource websites (ARWs), with the purpose of retrieving more similar documents. [Methods] We first calculated the similarity of documents published in the ARWs with the help of the Latent Semantic Analysis (LSA) method. Then, chose documents with high similarities by the Hierarchical Cluster method, and created a document relation matrix. Finally, we used the dynamic document technology to generate a linked data index to search the ARWs. [Results] We built a preliminary ARWs linked data index, which helped us find similar documents more effectively from the ARWs. [Limitations] We investigated the similar documents retrieval technology from the perspective of statistical analysis. Therefore, further research is needed to locate similar documents from various subject areas with the support of deep learning technology. [Conclusions] We computed documents’ similarity using LSA method to discover related documents of specific articles. The linked data could help us find more similar documents, while reducing the waiting time for similarity calculation.

Key wordsLinked data    Latent Semantic Analysis(LSA)    Academic Resource Websites(ARWs)    Similarity
收稿日期: 2015-08-13      出版日期: 2016-04-12
基金资助:*本文系国家自然科学基金项目“语义网络环境下数字图书馆资源多维度聚合与可视化展示研究”(项目编号:71273111)和吉林大学高峰学科(群)建设项目的研究成果之一
引用本文:   
赵夷平,毕强. 关联数据在学术资源网相似文献发现中的应用研究*[J]. 现代图书情报技术, 2016, 32(3): 41-49.
Zhao Yiping,Bi Qiang. Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites. New Technology of Library and Information Service, 2016, 32(3): 41-49.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.03.06      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2016/V32/I3/41
[1] 张云中. 从整合到聚合: 国内数字资源再组织模式的变革[J]. 数字图书馆论坛, 2014(6): 16-20.
[1] (Zhang Yunzhong.From Integration to Aggregation: The Change of Digital Resources Re-organization Pattern in China[J]. Digital Library Forum, 2014(6): 16-20.)
[2] Magerman T, Van Looy B, Song X.Exploring the Feasibility and Accuracy of Latent Semantic Analysis Based Text Mining Techniques to Detect Similarity Between Patent Documents and Scientific Publications[J]. Scientometrics, 2010, 82(2): 289-306.
[3] 和晓萍, 李迪, 王米利, 等. 基于预聚类的潜在语义分析模型文献检索研究[J]. 云南民族大学学报: 自然科学版, 2015, 24(3): 257-260.
[3] (He Xiaoping, Li Di, Wang Mili, et al.A New Pre-Clustering-based Latent Semantic Analysis Algorithm for Document Retrieval[J]. Journal of Yunnan Nationalities University: Natural Sciences Edition, 2015, 24(3): 257-260. )
[4] Wang W, Yu B.Text Categorization Based on Combination of Modified back Propagation Neural Network and Latent Semantic Analysis[J]. Neural Computing & Application, 2009, 18(8): 875-881.
[5] Olmos R, León J A, Jorge-Botana G, et al.New Algorithms Assessing Short Summaries in Expository Texts Using Latent Semantic Analysis[J]. Behavior Research Methods, 2009, 41(3): 944-950.
[6] Law J, Bauin S, Courtial J P, et al.Policy and the Mapping of Scientific Change: A Co-word Analysis of Research into Environmental Acidification[J]. Scientometrics, 1988, 14(3): 251-264.
[7] 唐果媛, 张薇.基于共词分析法的学科主题演化研究进展与分析[J]. 图书情报工作, 2015, 59(5): 128-136.
[7] (Tang Guoyuan, Zhang Wei.Development and Analysis of Subject Theme Evolution Based on Co-word Analysis Method[J]. Library and Information Service, 2015, 59(5): 128-136.)
[8] 任建华, 沈炎彬, 孟祥福, 等.基于词条之间关联关系的文档聚类[J/OL]. [2014-12-11]. 计算机工程与应用. .
[8] (Ren Jianhua, Shen Yanbin, Meng Xiangfu, et al. Document Clustering Based on Association Relations Between Terms [J/OL]. [2014-12-11]. Computer Engineering and Applications.
[9] 黄贤英, 张金鹏, 刘英涛, 等.基于词项语义映射的短文本相似度算法[J]. 计算机工程与设计, 2015, 36(6): 1514-1518, 1534.
[9] (Huang Xianying, Zhang Jinpeng, Liu Yingtao, et al.Short Text Similarity Algorithm Based on Term Mapping with Semantic[J]. Computer Engineering and Design, 2015, 36(6): 1514-1518, 1534.)
[10] 徐勇, 陈建国, 胡凌云, 等.基于泛化语义相似的科技文献混合推荐算法[J]. 情报理论与实践, 2013, 36(2): 96-99, 103.
[10] (Xu Yong, Chen Jianguo, Hu Lingyun, et al.S&T Literature Hybrid Recommendation Algorithm Based on Generalized Semantic Similarity[J]. Information Studies: Theory & Application, 2013, 36(2): 96-99, 103.)
[11] 吴树芳, 刘畅, 徐建民.基于术语间本体关联度的文档相关度研究[J]. 现代情报, 2014, 34(9): 56-59, 176.
[11] (Wu Shufang, Liu Chang, Xu Jianmin.Research on Document Relevancy Based on Ontology Term Relations[J]. Journal of Modern Information, 2014, 34(9): 56-59, 176.)
[12] Steyvers M, Griffith T.Probabilistic Topic Models[A].// Latent Semantic Analysis: A Road to Meaning[M]. Laurence Erlbaum, 2006.
[13] Landauer T K, Foltz P W, Laham D.An Introduction to Latent Semantic Analysis[J]. Discourse Processes, 1998, 25(2-3): 259-284.
[14] Leydesdorff L. Similarity Measures, Author Cocitation Analysis,Information Theory[J]. Journal of the American Society for Information Science & Technology (JASIST), 2005, 56(7): 769-772.
[15] Structured Dynamic. Linked Data FAQ [EB/OL]. [2014-07- 18]. .
[16] 王昊奋. 大规模知识图谱技术[J]. 中国计算机学会通讯, 2014, 10(3): 64-68.
[16] (Wang Haofen.Large-scale Knowledge Graph Technology[J]. Communications of the CCF, 2014, 10(3): 64-68.)
[17] Berners-Lee T. Linked Data-Design Issues [EB/OL]. [2009- 06-18]. .
[18] 刘炜. 关联数据:概念、技术及应用展望[J]. 大学图书馆学报, 2011, 29(2): 5-12.
[18] (Liu Wei.Overview on Linked Data: Concept, Technology and Implementation[J]. Journal of Academic Libraries, 2011, 29(2): 5-12.)
[19] Han J, Kamber M, Pei J.数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 第3版.北京: 机械工业出版社, 2012: 288-289.
[19] (Han J, Kamber M, Pei J.Data Mining: Concept and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 288-289.)
[20] Tang X, Zhu P.Hierarchical Clustering Problems and Analysis of Fuzzy Proximity Relation on Granular Space[J]. IEEE Transactions on Fuzzy Systems, 2013, 21(5): 814-824.
[21] The R Project for Statistical Computing [EB/OL]. [2015-07- 10]..
[22] XML: Tools for Parsing and Generating XML Within R and S-Plus [EB/OL]. [2015-06-30]. .
[23] Rwordseg: Chinese Word Segmentation[EB/OL]. [2013-12- 15]. .
[24] lsa: Latent Semantic Analysis[EB/OL]. [2015-05-27]. .
[25] proxy: Distance and Similarity Measures[EB/OL]. [2015- 07-08]. .
[26] rmarkdown: Dynamic Documents for R [EB/OL]. [2015- 06-13]. .
[27] Csardi G, Nepusz T.The iGraph Software Package for Complex Network Research [C]. In: Proceedings of InterJournal, Complex Systems Cambridge, MA USA. 2006: 1695.
[28] Antoniou G, Groth P, Hoekstra R, 等. 语义网基础教程[M]. 胡伟, 程龚, 黄智生译. 第3版.北京: 机械工业出版社, 2014.
[28] (Antoniou G, Groth P, Hoekstra R, et al.A Semantic Web Primer [M]. Translated by Hu Wei, Cheng Gong, Huang Zhisheng. The 3rd Edition. Beijing: China Machine Press, 2014.)
[1] 韩辉, 刘秀文. 海事适任评估中主观题自动评分技术研究*[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[2] 刘文斌, 何彦青, 吴振峰, 董诚. 基于BERT和多相似度融合的句子对齐方法研究*[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[3] 闫强,张笑妍,周思敏. 基于义原相似度的关键词抽取方法 *[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[4] 向卓元,刘志聪,吴玉. 基于用户行为自适应推荐模型研究 *[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[5] 吕学强,罗艺雄,李家全,游新冬. 中文专利侵权检测研究综述*[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[6] 吴彦文, 蔡秋亭, 刘智, 邓云泽. 融合多源数据和场景相似度计算的数字资源推荐研究*[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[7] 盛嘉祺, 许鑫. 融合主题相似度与合著网络的学者标签扩展方法研究*[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[8] 徐以聪,田学东,李新福,杨芳,史青宣. 基于犹豫模糊权重的数学表达式检索 *[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[9] 苏庆,陈思兆,吴伟民,李小妹,黄佃宽. 基于学习情况协同过滤算法的个性化学习推荐模型研究*[J]. 数据分析与知识发现, 2020, 4(5): 105-117.
[10] 刘萍,彭小芳. 基于形式概念分析的词汇相似度计算*[J]. 数据分析与知识发现, 2020, 4(5): 66-74.
[11] 高原,施元磊,张蕾,曹天奕,冯筠. 基于游记文本的游客游览行程重构*[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[12] 李家全,李宝安,游新冬,吕学强. 基于专利知识图谱的专利术语相似度计算研究*[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
[13] 俞琰,陈磊,姜金德,赵乃瑄. 结合词向量和统计特征的专利相似度测量方法 *[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
[14] 关鹏,王曰芬,傅柱. 基于LDA的主题语义演化分析方法研究 * ——以锂离子电池领域为例[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[15] 张佩瑶,刘东苏. 基于词向量和BTM的短文本话题演化分析*[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn