Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (3): 41-49    DOI: 10.11925/infotech.1003-3513.2016.03.06
Orginal Article Current Issue | Archive | Adv Search |
Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites
Zhao Yiping,Bi Qiang()
School of Management, Jilin University, Changchun 130022, China
Download: PDF(2609 KB)   HTML ( 49
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper studied the linked data from the Web, which is machine-readable, semantically meaningful and relationally descriptive. We examined these data’s effectiveness to improve the information organization of the academic resource websites (ARWs), with the purpose of retrieving more similar documents. [Methods] We first calculated the similarity of documents published in the ARWs with the help of the Latent Semantic Analysis (LSA) method. Then, chose documents with high similarities by the Hierarchical Cluster method, and created a document relation matrix. Finally, we used the dynamic document technology to generate a linked data index to search the ARWs. [Results] We built a preliminary ARWs linked data index, which helped us find similar documents more effectively from the ARWs. [Limitations] We investigated the similar documents retrieval technology from the perspective of statistical analysis. Therefore, further research is needed to locate similar documents from various subject areas with the support of deep learning technology. [Conclusions] We computed documents’ similarity using LSA method to discover related documents of specific articles. The linked data could help us find more similar documents, while reducing the waiting time for similarity calculation.

Key wordsLinked data      Latent Semantic Analysis(LSA)      Academic Resource Websites(ARWs)      Similarity     
Received: 13 August 2015      Published: 12 April 2016

Cite this article:

Zhao Yiping,Bi Qiang. Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites. New Technology of Library and Information Service, 2016, 32(3): 41-49.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.03.06     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I3/41

[1] 张云中. 从整合到聚合: 国内数字资源再组织模式的变革[J]. 数字图书馆论坛, 2014(6): 16-20.
[1] (Zhang Yunzhong.From Integration to Aggregation: The Change of Digital Resources Re-organization Pattern in China[J]. Digital Library Forum, 2014(6): 16-20.)
[2] Magerman T, Van Looy B, Song X.Exploring the Feasibility and Accuracy of Latent Semantic Analysis Based Text Mining Techniques to Detect Similarity Between Patent Documents and Scientific Publications[J]. Scientometrics, 2010, 82(2): 289-306.
[3] 和晓萍, 李迪, 王米利, 等. 基于预聚类的潜在语义分析模型文献检索研究[J]. 云南民族大学学报: 自然科学版, 2015, 24(3): 257-260.
[3] (He Xiaoping, Li Di, Wang Mili, et al.A New Pre-Clustering-based Latent Semantic Analysis Algorithm for Document Retrieval[J]. Journal of Yunnan Nationalities University: Natural Sciences Edition, 2015, 24(3): 257-260. )
[4] Wang W, Yu B.Text Categorization Based on Combination of Modified back Propagation Neural Network and Latent Semantic Analysis[J]. Neural Computing & Application, 2009, 18(8): 875-881.
[5] Olmos R, León J A, Jorge-Botana G, et al.New Algorithms Assessing Short Summaries in Expository Texts Using Latent Semantic Analysis[J]. Behavior Research Methods, 2009, 41(3): 944-950.
[6] Law J, Bauin S, Courtial J P, et al.Policy and the Mapping of Scientific Change: A Co-word Analysis of Research into Environmental Acidification[J]. Scientometrics, 1988, 14(3): 251-264.
[7] 唐果媛, 张薇.基于共词分析法的学科主题演化研究进展与分析[J]. 图书情报工作, 2015, 59(5): 128-136.
[7] (Tang Guoyuan, Zhang Wei.Development and Analysis of Subject Theme Evolution Based on Co-word Analysis Method[J]. Library and Information Service, 2015, 59(5): 128-136.)
[8] 任建华, 沈炎彬, 孟祥福, 等.基于词条之间关联关系的文档聚类[J/OL]. [2014-12-11]. 计算机工程与应用. .
[8] (Ren Jianhua, Shen Yanbin, Meng Xiangfu, et al. Document Clustering Based on Association Relations Between Terms [J/OL]. [2014-12-11]. Computer Engineering and Applications.
[9] 黄贤英, 张金鹏, 刘英涛, 等.基于词项语义映射的短文本相似度算法[J]. 计算机工程与设计, 2015, 36(6): 1514-1518, 1534.
[9] (Huang Xianying, Zhang Jinpeng, Liu Yingtao, et al.Short Text Similarity Algorithm Based on Term Mapping with Semantic[J]. Computer Engineering and Design, 2015, 36(6): 1514-1518, 1534.)
[10] 徐勇, 陈建国, 胡凌云, 等.基于泛化语义相似的科技文献混合推荐算法[J]. 情报理论与实践, 2013, 36(2): 96-99, 103.
[10] (Xu Yong, Chen Jianguo, Hu Lingyun, et al.S&T Literature Hybrid Recommendation Algorithm Based on Generalized Semantic Similarity[J]. Information Studies: Theory & Application, 2013, 36(2): 96-99, 103.)
[11] 吴树芳, 刘畅, 徐建民.基于术语间本体关联度的文档相关度研究[J]. 现代情报, 2014, 34(9): 56-59, 176.
[11] (Wu Shufang, Liu Chang, Xu Jianmin.Research on Document Relevancy Based on Ontology Term Relations[J]. Journal of Modern Information, 2014, 34(9): 56-59, 176.)
[12] Steyvers M, Griffith T.Probabilistic Topic Models[A].// Latent Semantic Analysis: A Road to Meaning[M]. Laurence Erlbaum, 2006.
[13] Landauer T K, Foltz P W, Laham D.An Introduction to Latent Semantic Analysis[J]. Discourse Processes, 1998, 25(2-3): 259-284.
[14] Leydesdorff L. Similarity Measures, Author Cocitation Analysis,Information Theory[J]. Journal of the American Society for Information Science & Technology (JASIST), 2005, 56(7): 769-772.
[15] Structured Dynamic. Linked Data FAQ [EB/OL]. [2014-07- 18]. .
[16] 王昊奋. 大规模知识图谱技术[J]. 中国计算机学会通讯, 2014, 10(3): 64-68.
[16] (Wang Haofen.Large-scale Knowledge Graph Technology[J]. Communications of the CCF, 2014, 10(3): 64-68.)
[17] Berners-Lee T. Linked Data-Design Issues [EB/OL]. [2009- 06-18]. .
[18] 刘炜. 关联数据:概念、技术及应用展望[J]. 大学图书馆学报, 2011, 29(2): 5-12.
[18] (Liu Wei.Overview on Linked Data: Concept, Technology and Implementation[J]. Journal of Academic Libraries, 2011, 29(2): 5-12.)
[19] Han J, Kamber M, Pei J.数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 第3版.北京: 机械工业出版社, 2012: 288-289.
[19] (Han J, Kamber M, Pei J.Data Mining: Concept and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 288-289.)
[20] Tang X, Zhu P.Hierarchical Clustering Problems and Analysis of Fuzzy Proximity Relation on Granular Space[J]. IEEE Transactions on Fuzzy Systems, 2013, 21(5): 814-824.
[21] The R Project for Statistical Computing [EB/OL]. [2015-07- 10]..
[22] XML: Tools for Parsing and Generating XML Within R and S-Plus [EB/OL]. [2015-06-30]. .
[23] Rwordseg: Chinese Word Segmentation[EB/OL]. [2013-12- 15]. .
[24] lsa: Latent Semantic Analysis[EB/OL]. [2015-05-27]. .
[25] proxy: Distance and Similarity Measures[EB/OL]. [2015- 07-08]. .
[26] rmarkdown: Dynamic Documents for R [EB/OL]. [2015- 06-13]. .
[27] Csardi G, Nepusz T.The iGraph Software Package for Complex Network Research [C]. In: Proceedings of InterJournal, Complex Systems Cambridge, MA USA. 2006: 1695.
[28] Antoniou G, Groth P, Hoekstra R, 等. 语义网基础教程[M]. 胡伟, 程龚, 黄智生译. 第3版.北京: 机械工业出版社, 2014.
[28] (Antoniou G, Groth P, Hoekstra R, et al.A Semantic Web Primer [M]. Translated by Hu Wei, Cheng Gong, Huang Zhisheng. The 3rd Edition. Beijing: China Machine Press, 2014.)
[1] Peng Guan,Yuefen Wang,Zhu Fu. Analyzing Topic Semantic Evolution with LDA: Case Study of Lithium Ion Batteries[J]. 数据分析与知识发现, 2019, 3(7): 61-72.
[2] Peiyao Zhang,Dongsu Liu. Topic Evolutionary Analysis of Short Text Based on Word Vector and BTM[J]. 数据分析与知识发现, 2019, 3(3): 95-101.
[3] Dan Wu,Liuxing Lu. Semantic Changes of Queries from Cross-device Searching[J]. 数据分析与知识发现, 2018, 2(8): 69-78.
[4] Haixia Sun,Lei Wang,Yingjie Wu,Weina Hua,Junlian Li. Matching Strategies for Institution Names in Literature Database[J]. 数据分析与知识发现, 2018, 2(8): 88-97.
[5] Ya’nan Zhao,Yuqing Wang. Research on Collaborative Filtering Traveling Products Recommendation Algorithm Based on IUNCF[J]. 数据分析与知识发现, 2018, 2(7): 63-71.
[6] Mansheng Xiao, Lijuan Zhou, Zhicheng Wen. A Fuzzy C-Means Algorithm Based on Huffman Tree[J]. 数据分析与知识发现, 2018, 2(7): 81-88.
[7] Daoping Wang,Zhongyang Jiang,Boqing Zhang. Collaborative Filtering Algorithm Based on Gray Correlation Analysis and Time Factor[J]. 数据分析与知识发现, 2018, 2(6): 102-109.
[8] Lin Li,Hui Li. Computing Text Similarity Based on Concept Vector Space[J]. 数据分析与知识发现, 2018, 2(5): 48-58.
[9] Yong Wang,Yongdong Wang,Huifang Guo,Yumin Zhou. Measuring Item Similarity Based on Increment of Diversity[J]. 数据分析与知识发现, 2018, 2(5): 70-76.
[10] Lingfeng Hua,Gaoming Yang,Xiujun Wang. Recommending Diversified News Based on User’s Locations[J]. 数据分析与知识发现, 2018, 2(5): 94-104.
[11] Junwan Liu,Bo Yang,Feifei Wang. Ranking Scholarly Impacts Based on Citations and Academic Similarity[J]. 数据分析与知识发现, 2018, 2(4): 59-70.
[12] Yuying Wu,Ping Sun,Xijun He,Guorui Jiang. Predicting Transactions Among Agents in Patent Transfer Weighted Networks for New Energy[J]. 数据分析与知识发现, 2018, 2(11): 73-79.
[13] Jianmin Xu,Caiyun Xu. Computing Similarity of Sci-Tech Documents Based on Texts and Formulas[J]. 数据分析与知识发现, 2018, 2(10): 103-109.
[14] Zhihong Shen,Chang Yao,Yanfei Hou,Linhuan Wu,Yuepeng Li. Big Linked Data Management: Challenges, Solutions and Practices[J]. 数据分析与知识发现, 2018, 2(1): 9-20.
[15] Erjing Chen,Enbo Jiang. Review of Studies on Text Similarity Measures[J]. 数据分析与知识发现, 2017, 1(6): 1-11.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn