Please wait a minute...
New Technology of Library and Information Service  2015, Vol. 31 Issue (7-8): 148-154    DOI: 10.11925/infotech.1003-3513.2015.07.20
Current Issue | Archive | Adv Search |
Practice of Data Collection in Building Characteristic Digital Resources Based on Drupal
Li Dan, Yan Xiaodi, Wei Qingshan
Xi'an Jiaotong University Library, Xi'an 710049, China
Download: PDF(1220 KB)   HTML  
Export: BibTeX | EndNote (RIS)      

[Objective] To address the problems of Web data collection, difficult to integrate multiple types of digital resources etc. in characteristic database construction. [Context] The life of characteristic digital resources information is short, each heterogeneous database platform in Shaanxi has great difference, supports limited RSS interface, contains complex data formats. [Methods] Using Web data collection technology such as Drupal Feeds, XPath Parser, Crawls, Image Grabber, combined with data cleaning and removing, to achieve specialization and systematization for Web data collection. [Results] Explore feeds RSS collection, HTML/XML automatic acquisition, rules for different characteristics of resource modification specially, and Web streaming media collection. [Conclusions] This study can rich platform data sources, partially provide solutions to difficult data collection, data formats unstandardized, data source route limited and so on.

Received: 16 December 2014      Published: 25 August 2015
:  G250.7  

Cite this article:

Li Dan, Yan Xiaodi, Wei Qingshan . Practice of Data Collection in Building Characteristic Digital Resources Based on Drupal. New Technology of Library and Information Service, 2015, 31(7-8): 148-154.

URL:     OR

[1] 李丹, 闫晓弟, 李娟, 等. 陕西省地方特色数字资源现状分析与思考[J]. 情报探索, 2013(10): 59-61. (Li Dan, Yan Xiaodi, Li Juan, et al. Analysis and Deliberation on Local Characteristic Digital Resources in Shaanxi [J]. Information Research, 2013(10): 59-61.)
[2] 刘兰, 吴振新, 张智雄, 等. Web Archive的采集策略研究[J]. 现代图书情报技术, 2009(1): 10-15. (Liu Lan, Wu Zhenxin, Zhang Zhixiong, et al. Study on the Harvest Strategies in Web Archive [J]. New Technology of Library and Information Service, 2009(1): 10-15.)
[3] Marshall C C. Making Metadata: A Study of Metadata Creation for a Mixed Physical-Digital Collection [C]. In: Proceedings of the 3rd ACM Conference on Digital Libraries (DL'98). New York: ACM, 1998: 162-171.
[4] 范炜. Drupal分类组织机制研究: 一种复合信息组织模式[J]. 图书馆杂志, 2010, 29(1): 23-26. (Fan Wei. A Study on Drupal's Taxonomy Module: A Hybrid Pattern of Information Organization [J]. Library Journal, 2010, 29(1): 23-26.)
[5] 王欣, 李玉兰, 商允峥. 基于Drupal构建图书馆2.0网站的研究和实践[J]. 现代图书情报技术, 2009(11): 82-87. (Wang Xin, Li Yulan, Shang Yunzheng. The Research and Practice of Building a Library Website with Library 2.0 Features Based on Drupal [J]. New Technology of Library and Information Service, 2009(11): 82-87.)
[6] 李丹, 闫晓弟, 魏青山. Drupal的混搭技术在图书馆的应用[J]. 现代图书情报技术, 2013(10): 79-84. (Li Dan, Yan Xiaodi, Wei Qingshan. Application of Mashup in Library Based on Drupal [J]. New Technology of Library and Information Service, 2013(10): 79-84.)
[7] Rauber A, Aschenbrenner A, Witvoet O. Austrian Online Archive Processing: Analyzing Archives of the World Wide Web [A]. //Agosti M, Thanos C. Research and Advanced Technology for Digital Libraries [M]. Springer Berlin Heidelberg, 2002: 16-31.
[8] Xpath [EB/OL]. [2014-12-12].

[1] Sun Yi'nan, Ku Liping, Song Xiufang, Liu Jingjing, Jiang Xian. The Policy Research and Analysis of Subject Data Repository ——Cases Study of Life Sciences[J]. 现代图书情报技术, 2015, 31(12): 13-20.
[2] Bi Qiang, Liu Jian. Research on the Service Recommendation of the Content of Digital Literature Resources[J]. 现代图书情报技术, 2015, 31(12): 21-27.
[3] Zhu Guang. Copyright Protection Scheme of Color Images for Libraries, Museums and Archives Based on Zero-Watermarking[J]. 现代图书情报技术, 2015, 31(12): 89-94.
[4] Liu Yueru, Guo Limin. The New Utilizes of WeChat Platform with Interactive Functions[J]. 现代图书情报技术, 2015, 31(11): 104-109.
[5] Liu Dan. Personalized Book Recommender Service Deployment Using Apache Mahout[J]. 现代图书情报技术, 2015, 31(10): 102-108.
[6] Guo Zhenying, Zhao Wenbing, Wei Yuhui. Construction of Linked Data with Lightweight Book Bibliography Ontology[J]. 现代图书情报技术, 2015, 31(7-8): 139-143.
[7] Guo Limin, Liu Yueru, Xiang Mingqiong. Application of WeChat QR Code in Reader Authentication[J]. 现代图书情报技术, 2015, 31(7-8): 144-147.
[8] Zhou Yao, Liu Chang, Li Jiandong. Application of WeChat for Library Seat Reservation——Taking Northwest University for Nationalities as an Example[J]. 现代图书情报技术, 2015, 31(7-8): 155-159.
[9] Shi Hongbo, Qian Li, Zhang Xiaolin, Liang Na. Router Service Engine iSwitch for Open Access Articles: Articles Reception and Resolving[J]. 现代图书情报技术, 2015, 31(6): 1-6.
[10] Wang Ying, Wu Zhenxin, Xie Jing. Review on Semantic Retrieval System for Scientific Literature[J]. 现代图书情报技术, 2015, 31(5): 1-7.
[11] Bai Haiyan, Liu Yao, Guo Xiaofeng. Introduction of Construction Mechanism of New Contributor Identifier System ORCID[J]. 现代图书情报技术, 2015, 31(5): 8-14.
[12] Wang Xiaoliang, Hu Jun. Design and Implementation of Library Self-service Graduation Deactivate Account System[J]. 现代图书情报技术, 2015, 31(4): 96-102.
[13] Lu Xiaoming. Research on a Lightweight Academic Library Context-aware Recommendation Service Platform Based on GimbalTM[J]. 现代图书情报技术, 2015, 31(3): 101-107.
[14] Gu Jiawei, Wang Shengqing, Zhao Danqun, Chen Wenguang. A Centralized Identity Authentication in the Cloud Service of Public Culture Digital Resources[J]. 现代图书情报技术, 2015, 31(2): 64-71.
[15] Sun Wei, Hao Aiyu, Lv Qiang. Application of Location Mapping Technology in Book Positioning and Navigation[J]. 现代图书情报技术, 2015, 31(2): 85-90.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938