[Objective] This paper studied the linked data from the Web, which is machine-readable, semantically meaningful and relationally descriptive. We examined these data’s effectiveness to improve the information organization of the academic resource websites (ARWs), with the purpose of retrieving more similar documents. [Methods] We first calculated the similarity of documents published in the ARWs with the help of the Latent Semantic Analysis (LSA) method. Then, chose documents with high similarities by the Hierarchical Cluster method, and created a document relation matrix. Finally, we used the dynamic document technology to generate a linked data index to search the ARWs. [Results] We built a preliminary ARWs linked data index, which helped us find similar documents more effectively from the ARWs. [Limitations] We investigated the similar documents retrieval technology from the perspective of statistical analysis. Therefore, further research is needed to locate similar documents from various subject areas with the support of deep learning technology. [Conclusions] We computed documents’ similarity using LSA method to discover related documents of specific articles. The linked data could help us find more similar documents, while reducing the waiting time for similarity calculation.
赵夷平,毕强. 关联数据在学术资源网相似文献发现中的应用研究*[J]. 现代图书情报技术, 2016, 32(3): 41-49.
Zhao Yiping,Bi Qiang. Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites. New Technology of Library and Information Service, 2016, 32(3): 41-49.
(Zhang Yunzhong.From Integration to Aggregation: The Change of Digital Resources Re-organization Pattern in China[J]. Digital Library Forum, 2014(6): 16-20.)
[2]
Magerman T, Van Looy B, Song X.Exploring the Feasibility and Accuracy of Latent Semantic Analysis Based Text Mining Techniques to Detect Similarity Between Patent Documents and Scientific Publications[J]. Scientometrics, 2010, 82(2): 289-306.
(He Xiaoping, Li Di, Wang Mili, et al.A New Pre-Clustering-based Latent Semantic Analysis Algorithm for Document Retrieval[J]. Journal of Yunnan Nationalities University: Natural Sciences Edition, 2015, 24(3): 257-260. )
[4]
Wang W, Yu B.Text Categorization Based on Combination of Modified back Propagation Neural Network and Latent Semantic Analysis[J]. Neural Computing & Application, 2009, 18(8): 875-881.
[5]
Olmos R, León J A, Jorge-Botana G, et al.New Algorithms Assessing Short Summaries in Expository Texts Using Latent Semantic Analysis[J]. Behavior Research Methods, 2009, 41(3): 944-950.
[6]
Law J, Bauin S, Courtial J P, et al.Policy and the Mapping of Scientific Change: A Co-word Analysis of Research into Environmental Acidification[J]. Scientometrics, 1988, 14(3): 251-264.
(Tang Guoyuan, Zhang Wei.Development and Analysis of Subject Theme Evolution Based on Co-word Analysis Method[J]. Library and Information Service, 2015, 59(5): 128-136.)
(Ren Jianhua, Shen Yanbin, Meng Xiangfu, et al. Document Clustering Based on Association Relations Between Terms [J/OL]. [2014-12-11]. Computer Engineering and Applications.
(Huang Xianying, Zhang Jinpeng, Liu Yingtao, et al.Short Text Similarity Algorithm Based on Term Mapping with Semantic[J]. Computer Engineering and Design, 2015, 36(6): 1514-1518, 1534.)
(Xu Yong, Chen Jianguo, Hu Lingyun, et al.S&T Literature Hybrid Recommendation Algorithm Based on Generalized Semantic Similarity[J]. Information Studies: Theory & Application, 2013, 36(2): 96-99, 103.)
(Wu Shufang, Liu Chang, Xu Jianmin.Research on Document Relevancy Based on Ontology Term Relations[J]. Journal of Modern Information, 2014, 34(9): 56-59, 176.)
[12]
Steyvers M, Griffith T.Probabilistic Topic Models[A].// Latent Semantic Analysis: A Road to Meaning[M]. Laurence Erlbaum, 2006.
[13]
Landauer T K, Foltz P W, Laham D.An Introduction to Latent Semantic Analysis[J]. Discourse Processes, 1998, 25(2-3): 259-284.
[14]
Leydesdorff L. Similarity Measures, Author Cocitation Analysis,Information Theory[J]. Journal of the American Society for Information Science & Technology (JASIST), 2005, 56(7): 769-772.
[15]
Structured Dynamic. Linked Data FAQ [EB/OL]. [2014-07- 18]. .
[16]
王昊奋. 大规模知识图谱技术[J]. 中国计算机学会通讯, 2014, 10(3): 64-68.
[16]
(Wang Haofen.Large-scale Knowledge Graph Technology[J]. Communications of the CCF, 2014, 10(3): 64-68.)
[17]
Berners-Lee T. Linked Data-Design Issues [EB/OL]. [2009- 06-18]. .
(Liu Wei.Overview on Linked Data: Concept, Technology and Implementation[J]. Journal of Academic Libraries, 2011, 29(2): 5-12.)
[19]
Han J, Kamber M, Pei J.数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 第3版.北京: 机械工业出版社, 2012: 288-289.
[19]
(Han J, Kamber M, Pei J.Data Mining: Concept and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 288-289.)
[20]
Tang X, Zhu P.Hierarchical Clustering Problems and Analysis of Fuzzy Proximity Relation on Granular Space[J]. IEEE Transactions on Fuzzy Systems, 2013, 21(5): 814-824.
[21]
The R Project for Statistical Computing [EB/OL]. [2015-07- 10]..
[22]
XML: Tools for Parsing and Generating XML Within R and S-Plus [EB/OL]. [2015-06-30]. .
[23]
Rwordseg: Chinese Word Segmentation[EB/OL]. [2013-12- 15]. .
proxy: Distance and Similarity Measures[EB/OL]. [2015- 07-08]. .
[26]
rmarkdown: Dynamic Documents for R [EB/OL]. [2015- 06-13]. .
[27]
Csardi G, Nepusz T.The iGraph Software Package for Complex Network Research [C]. In: Proceedings of InterJournal, Complex Systems Cambridge, MA USA. 2006: 1695.
(Antoniou G, Groth P, Hoekstra R, et al.A Semantic Web Primer [M]. Translated by Hu Wei, Cheng Gong, Huang Zhisheng. The 3rd Edition. Beijing: China Machine Press, 2014.)