Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (7-8): 148-154    DOI: 10.11925/infotech.1003-3513.2015.07.20
Current Issue | Archive | Adv Search |
Practice of Data Collection in Building Characteristic Digital Resources Based on Drupal
Li Dan, Yan Xiaodi, Wei Qingshan
Xi'an Jiaotong University Library, Xi'an 710049, China
Export: BibTeX | EndNote (RIS)      

[Objective] To address the problems of Web data collection, difficult to integrate multiple types of digital resources etc. in characteristic database construction. [Context] The life of characteristic digital resources information is short, each heterogeneous database platform in Shaanxi has great difference, supports limited RSS interface, contains complex data formats. [Methods] Using Web data collection technology such as Drupal Feeds, XPath Parser, Crawls, Image Grabber, combined with data cleaning and removing, to achieve specialization and systematization for Web data collection. [Results] Explore feeds RSS collection, HTML/XML automatic acquisition, rules for different characteristics of resource modification specially, and Web streaming media collection. [Conclusions] This study can rich platform data sources, partially provide solutions to difficult data collection, data formats unstandardized, data source route limited and so on.

Received: 16 December 2014      Published: 25 August 2015
:  G250.7  

Cite this article:

Li Dan, Yan Xiaodi, Wei Qingshan . Practice of Data Collection in Building Characteristic Digital Resources Based on Drupal. New Technology of Library and Information Service, 2015, 31(7-8): 148-154.

URL:     OR

[1] 李丹, 闫晓弟, 李娟, 等. 陕西省地方特色数字资源现状分析与思考[J]. 情报探索, 2013(10): 59-61. (Li Dan, Yan Xiaodi, Li Juan, et al. Analysis and Deliberation on Local Characteristic Digital Resources in Shaanxi [J]. Information Research, 2013(10): 59-61.)
[2] 刘兰, 吴振新, 张智雄, 等. Web Archive的采集策略研究[J]. 现代图书情报技术, 2009(1): 10-15. (Liu Lan, Wu Zhenxin, Zhang Zhixiong, et al. Study on the Harvest Strategies in Web Archive [J]. New Technology of Library and Information Service, 2009(1): 10-15.)
[3] Marshall C C. Making Metadata: A Study of Metadata Creation for a Mixed Physical-Digital Collection [C]. In: Proceedings of the 3rd ACM Conference on Digital Libraries (DL'98). New York: ACM, 1998: 162-171.
[4] 范炜. Drupal分类组织机制研究: 一种复合信息组织模式[J]. 图书馆杂志, 2010, 29(1): 23-26. (Fan Wei. A Study on Drupal's Taxonomy Module: A Hybrid Pattern of Information Organization [J]. Library Journal, 2010, 29(1): 23-26.)
[5] 王欣, 李玉兰, 商允峥. 基于Drupal构建图书馆2.0网站的研究和实践[J]. 现代图书情报技术, 2009(11): 82-87. (Wang Xin, Li Yulan, Shang Yunzheng. The Research and Practice of Building a Library Website with Library 2.0 Features Based on Drupal [J]. New Technology of Library and Information Service, 2009(11): 82-87.)
[6] 李丹, 闫晓弟, 魏青山. Drupal的混搭技术在图书馆的应用[J]. 现代图书情报技术, 2013(10): 79-84. (Li Dan, Yan Xiaodi, Wei Qingshan. Application of Mashup in Library Based on Drupal [J]. New Technology of Library and Information Service, 2013(10): 79-84.)
[7] Rauber A, Aschenbrenner A, Witvoet O. Austrian Online Archive Processing: Analyzing Archives of the World Wide Web [A]. //Agosti M, Thanos C. Research and Advanced Technology for Digital Libraries [M]. Springer Berlin Heidelberg, 2002: 16-31.
[8] Xpath [EB/OL]. [2014-12-12].

[1] Wang Sili, Zhu Zhongming, Yang Heng, Liu Wei. Research on Automatic Identification of Hypernym-Hyponym Relations of Domain Concepts Based on Pattern and Projection Learning [J]. 数据分析与知识发现, 0, (): 1-.
[2] Guo Shaoqing,Le Xiaoqiu. Identifying Actual Value of Numerical Indicator from Scientific Paper[J]. 数据分析与知识发现, 2018, 2(1): 21-28.
[3] Chen Guo,Xiao Lu. Linking Knowledge Elements from Online Community[J]. 数据分析与知识发现, 2017, 1(11): 75-83.
[4] Yin Xiangquan,Li Shuning. Analyzing Website Navigation Features of Top U.S. Academic Libraries[J]. 数据分析与知识发现, 2017, 1(3): 90-95.
[5] Sun Yi'nan, Ku Liping, Song Xiufang, Liu Jingjing, Jiang Xian. The Policy Research and Analysis of Subject Data Repository ——Cases Study of Life Sciences[J]. 现代图书情报技术, 2015, 31(12): 13-20.
[6] Bi Qiang, Liu Jian. Research on the Service Recommendation of the Content of Digital Literature Resources[J]. 现代图书情报技术, 2015, 31(12): 21-27.
[7] Zhu Guang. Copyright Protection Scheme of Color Images for Libraries, Museums and Archives Based on Zero-Watermarking[J]. 现代图书情报技术, 2015, 31(12): 89-94.
[8] Liu Yueru, Guo Limin. The New Utilizes of WeChat Platform with Interactive Functions[J]. 现代图书情报技术, 2015, 31(11): 104-109.
[9] Liu Dan. Personalized Book Recommender Service Deployment Using Apache Mahout[J]. 现代图书情报技术, 2015, 31(10): 102-108.
[10] Guo Zhenying, Zhao Wenbing, Wei Yuhui. Construction of Linked Data with Lightweight Book Bibliography Ontology[J]. 现代图书情报技术, 2015, 31(7-8): 139-143.
[11] Guo Limin, Liu Yueru, Xiang Mingqiong. Application of WeChat QR Code in Reader Authentication[J]. 现代图书情报技术, 2015, 31(7-8): 144-147.
[12] Zhou Yao, Liu Chang, Li Jiandong. Application of WeChat for Library Seat Reservation——Taking Northwest University for Nationalities as an Example[J]. 现代图书情报技术, 2015, 31(7-8): 155-159.
[13] Shi Hongbo, Qian Li, Zhang Xiaolin, Liang Na. Router Service Engine iSwitch for Open Access Articles: Articles Reception and Resolving[J]. 现代图书情报技术, 2015, 31(6): 1-6.
[14] Wang Ying, Wu Zhenxin, Xie Jing. Review on Semantic Retrieval System for Scientific Literature[J]. 现代图书情报技术, 2015, 31(5): 1-7.
[15] Bai Haiyan, Liu Yao, Guo Xiaofeng. Introduction of Construction Mechanism of New Contributor Identifier System ORCID[J]. 现代图书情报技术, 2015, 31(5): 8-14.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938