Please wait a minute...
New Technology of Library and Information Service  2016, Vol. 32 Issue (3): 41-49    DOI: 10.11925/infotech.1003-3513.2016.03.06
Orginal Article Current Issue | Archive | Adv Search |
Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites
Zhao Yiping,Bi Qiang()
School of Management, Jilin University, Changchun 130022, China
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper studied the linked data from the Web, which is machine-readable, semantically meaningful and relationally descriptive. We examined these data’s effectiveness to improve the information organization of the academic resource websites (ARWs), with the purpose of retrieving more similar documents. [Methods] We first calculated the similarity of documents published in the ARWs with the help of the Latent Semantic Analysis (LSA) method. Then, chose documents with high similarities by the Hierarchical Cluster method, and created a document relation matrix. Finally, we used the dynamic document technology to generate a linked data index to search the ARWs. [Results] We built a preliminary ARWs linked data index, which helped us find similar documents more effectively from the ARWs. [Limitations] We investigated the similar documents retrieval technology from the perspective of statistical analysis. Therefore, further research is needed to locate similar documents from various subject areas with the support of deep learning technology. [Conclusions] We computed documents’ similarity using LSA method to discover related documents of specific articles. The linked data could help us find more similar documents, while reducing the waiting time for similarity calculation.

Key wordsLinked data      Latent Semantic Analysis(LSA)      Academic Resource Websites(ARWs)      Similarity     
Received: 13 August 2015      Published: 12 April 2016

Cite this article:

Zhao Yiping,Bi Qiang. Using Linked Data to Retrieve Similar Documents from the Academic Resource Websites. New Technology of Library and Information Service, 2016, 32(3): 41-49.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.1003-3513.2016.03.06     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2016/V32/I3/41

[1] 张云中. 从整合到聚合: 国内数字资源再组织模式的变革[J]. 数字图书馆论坛, 2014(6): 16-20.
[1] (Zhang Yunzhong.From Integration to Aggregation: The Change of Digital Resources Re-organization Pattern in China[J]. Digital Library Forum, 2014(6): 16-20.)
[2] Magerman T, Van Looy B, Song X.Exploring the Feasibility and Accuracy of Latent Semantic Analysis Based Text Mining Techniques to Detect Similarity Between Patent Documents and Scientific Publications[J]. Scientometrics, 2010, 82(2): 289-306.
[3] 和晓萍, 李迪, 王米利, 等. 基于预聚类的潜在语义分析模型文献检索研究[J]. 云南民族大学学报: 自然科学版, 2015, 24(3): 257-260.
[3] (He Xiaoping, Li Di, Wang Mili, et al.A New Pre-Clustering-based Latent Semantic Analysis Algorithm for Document Retrieval[J]. Journal of Yunnan Nationalities University: Natural Sciences Edition, 2015, 24(3): 257-260. )
[4] Wang W, Yu B.Text Categorization Based on Combination of Modified back Propagation Neural Network and Latent Semantic Analysis[J]. Neural Computing & Application, 2009, 18(8): 875-881.
[5] Olmos R, León J A, Jorge-Botana G, et al.New Algorithms Assessing Short Summaries in Expository Texts Using Latent Semantic Analysis[J]. Behavior Research Methods, 2009, 41(3): 944-950.
[6] Law J, Bauin S, Courtial J P, et al.Policy and the Mapping of Scientific Change: A Co-word Analysis of Research into Environmental Acidification[J]. Scientometrics, 1988, 14(3): 251-264.
[7] 唐果媛, 张薇.基于共词分析法的学科主题演化研究进展与分析[J]. 图书情报工作, 2015, 59(5): 128-136.
[7] (Tang Guoyuan, Zhang Wei.Development and Analysis of Subject Theme Evolution Based on Co-word Analysis Method[J]. Library and Information Service, 2015, 59(5): 128-136.)
[8] 任建华, 沈炎彬, 孟祥福, 等.基于词条之间关联关系的文档聚类[J/OL]. [2014-12-11]. 计算机工程与应用. .
[8] (Ren Jianhua, Shen Yanbin, Meng Xiangfu, et al. Document Clustering Based on Association Relations Between Terms [J/OL]. [2014-12-11]. Computer Engineering and Applications.
[9] 黄贤英, 张金鹏, 刘英涛, 等.基于词项语义映射的短文本相似度算法[J]. 计算机工程与设计, 2015, 36(6): 1514-1518, 1534.
[9] (Huang Xianying, Zhang Jinpeng, Liu Yingtao, et al.Short Text Similarity Algorithm Based on Term Mapping with Semantic[J]. Computer Engineering and Design, 2015, 36(6): 1514-1518, 1534.)
[10] 徐勇, 陈建国, 胡凌云, 等.基于泛化语义相似的科技文献混合推荐算法[J]. 情报理论与实践, 2013, 36(2): 96-99, 103.
[10] (Xu Yong, Chen Jianguo, Hu Lingyun, et al.S&T Literature Hybrid Recommendation Algorithm Based on Generalized Semantic Similarity[J]. Information Studies: Theory & Application, 2013, 36(2): 96-99, 103.)
[11] 吴树芳, 刘畅, 徐建民.基于术语间本体关联度的文档相关度研究[J]. 现代情报, 2014, 34(9): 56-59, 176.
[11] (Wu Shufang, Liu Chang, Xu Jianmin.Research on Document Relevancy Based on Ontology Term Relations[J]. Journal of Modern Information, 2014, 34(9): 56-59, 176.)
[12] Steyvers M, Griffith T.Probabilistic Topic Models[A].// Latent Semantic Analysis: A Road to Meaning[M]. Laurence Erlbaum, 2006.
[13] Landauer T K, Foltz P W, Laham D.An Introduction to Latent Semantic Analysis[J]. Discourse Processes, 1998, 25(2-3): 259-284.
[14] Leydesdorff L. Similarity Measures, Author Cocitation Analysis,Information Theory[J]. Journal of the American Society for Information Science & Technology (JASIST), 2005, 56(7): 769-772.
[15] Structured Dynamic. Linked Data FAQ [EB/OL]. [2014-07- 18]. .
[16] 王昊奋. 大规模知识图谱技术[J]. 中国计算机学会通讯, 2014, 10(3): 64-68.
[16] (Wang Haofen.Large-scale Knowledge Graph Technology[J]. Communications of the CCF, 2014, 10(3): 64-68.)
[17] Berners-Lee T. Linked Data-Design Issues [EB/OL]. [2009- 06-18]. .
[18] 刘炜. 关联数据:概念、技术及应用展望[J]. 大学图书馆学报, 2011, 29(2): 5-12.
[18] (Liu Wei.Overview on Linked Data: Concept, Technology and Implementation[J]. Journal of Academic Libraries, 2011, 29(2): 5-12.)
[19] Han J, Kamber M, Pei J.数据挖掘: 概念与技术[M]. 范明, 孟小峰译. 第3版.北京: 机械工业出版社, 2012: 288-289.
[19] (Han J, Kamber M, Pei J.Data Mining: Concept and Techniques[M]. Translated by Fan Ming, Meng Xiaofeng. The 3rd Edition. Beijing: China Machine Press, 2012: 288-289.)
[20] Tang X, Zhu P.Hierarchical Clustering Problems and Analysis of Fuzzy Proximity Relation on Granular Space[J]. IEEE Transactions on Fuzzy Systems, 2013, 21(5): 814-824.
[21] The R Project for Statistical Computing [EB/OL]. [2015-07- 10]..
[22] XML: Tools for Parsing and Generating XML Within R and S-Plus [EB/OL]. [2015-06-30]. .
[23] Rwordseg: Chinese Word Segmentation[EB/OL]. [2013-12- 15]. .
[24] lsa: Latent Semantic Analysis[EB/OL]. [2015-05-27]. .
[25] proxy: Distance and Similarity Measures[EB/OL]. [2015- 07-08]. .
[26] rmarkdown: Dynamic Documents for R [EB/OL]. [2015- 06-13]. .
[27] Csardi G, Nepusz T.The iGraph Software Package for Complex Network Research [C]. In: Proceedings of InterJournal, Complex Systems Cambridge, MA USA. 2006: 1695.
[28] Antoniou G, Groth P, Hoekstra R, 等. 语义网基础教程[M]. 胡伟, 程龚, 黄智生译. 第3版.北京: 机械工业出版社, 2014.
[28] (Antoniou G, Groth P, Hoekstra R, et al.A Semantic Web Primer [M]. Translated by Hu Wei, Cheng Gong, Huang Zhisheng. The 3rd Edition. Beijing: China Machine Press, 2014.)
[1] Han Hui, Liu Xiuwen. Automatic Scoring for Subjective Questions in Maritime Competency Assessment[J]. 数据分析与知识发现, 2021, 5(8): 113-121.
[2] Liu Wenbin, He Yanqing, Wu Zhenfeng, Dong Cheng. Sentence Alignment Method Based on BERT and Multi-similarity Fusion[J]. 数据分析与知识发现, 2021, 5(7): 48-58.
[3] Yan Qiang,Zhang Xiaoyan,Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
[4] Xiang Zhuoyuan,Liu Zhicong,Wu Yu. Adaptive Recommendation Model Based on User Behaviors[J]. 数据分析与知识发现, 2021, 5(4): 103-114.
[5] Lv Xueqiang,Luo Yixiong,Li Jiaquan,You Xindong. Review of Studies on Detecting Chinese Patent Infringements[J]. 数据分析与知识发现, 2021, 5(3): 60-68.
[6] Wu Yanwen, Cai Qiuting, Liu Zhi, Deng Yunze. Digital Resource Recommendation Based on Multi-Source Data and Scene Similarity Calculation[J]. 数据分析与知识发现, 2021, 5(11): 114-123.
[7] Sheng Jiaqi, Xu Xin. Expanding Scholar Labels with Research Similarity and Co-authorship Network[J]. 数据分析与知识发现, 2020, 4(8): 75-85.
[8] Xu Yicong,Tian Xuedong,Li Xinfu,Yang Fang,Shi Qingxuan. Retrieving Mathematical Expressions Based on Hesitant Fuzzy Weight[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[9] Su Qing,Chen Sizhao,Wu Weimin,Li Xiaomei,Huang Tiankuan. Personalized Recommendation Model Based on Collaborative Filtering Algorithm of Learning Situation[J]. 数据分析与知识发现, 2020, 4(5): 105-117.
[10] Liu Ping,Peng Xiaofang. Calculating Word Similarities Based on Formal Concept Analysis[J]. 数据分析与知识发现, 2020, 4(5): 66-74.
[11] Wei Guohui,Zhang Fengcong,Fu Xianjun,Wang Zhenguo. Similarity Measurement of Traditional Chinese Medicine Components for Cold-hot Nature Discrimination[J]. 数据分析与知识发现, 2020, 4(5): 75-83.
[12] Gao Yuan,Shi Yuanlei,Zhang Lei,Cao Tianyi,Feng Jun. Reconstructing Tour Routes Based on Travel Notes[J]. 数据分析与知识发现, 2020, 4(2/3): 165-172.
[13] Han Kangkang,Xu Jianmin,Zhang Bin. Recommending Microblogs with User’s Interests and Multidimensional Trust[J]. 数据分析与知识发现, 2020, 4(12): 95-104.
[14] Li Jiaquan,Li Baoan,You Xindong,Lü Xueqiang. Computing Similarity of Patent Terms Based on Knowledge Graph[J]. 数据分析与知识发现, 2020, 4(10): 104-112.
[15] Yan Yu,Lei Chen,Jinde Jiang,Naixuan Zhao. Measuring Patent Similarity with Word Embedding and Statistical Features[J]. 数据分析与知识发现, 2019, 3(9): 53-59.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn