Please wait a minute...
Data Analysis and Knowledge Discovery  2017, Vol. 1 Issue (10): 85-93    DOI: 10.11925/infotech.2096-3467.2017.0783
Orginal Article Current Issue | Archive | Adv Search |
Tracking Scientific Information with CSpace Technology
Sili Wang1,2(),Wei Liu1,Zhongming Zhu1,Zhiqiang Wu1,Jinping Wang1
1Lanzhou Literature and Information Center, Chinese Academy of Sciences, Lanzhou 730000, China
2University of Chinese Academy of Sciences, Beijing 100049, China
Download: PDF(2456 KB)   HTML ( 1
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a new system to automatically track, acquire, store and manage scientific information, aiming to support research in related fields. [Methods] We developed the new system based on the CSpace and then solve many technical issues. Then, we examined the new system with marine information. [Results] The proposed system could automatically retrieve multi-source heterogeneous scientific information, which supported the construction of science and technology platform. [Limitations] The information acquisition procedure of the new system was complex, and it cannot retrieve documents from password-protected sites. [Conclusions] The proposed method could expand the CSpace’s data acquisition and integration functions, and might be transferred to other fields.

Key wordsCspace      Institutional Repository      Scientific and Technological Information      Automatic Monitoring      Information Acquisition     
Received: 05 August 2017      Published: 08 November 2017

Cite this article:

Sili Wang,Wei Liu,Zhongming Zhu,Zhiqiang Wu,Jinping Wang. Tracking Scientific Information with CSpace Technology. Data Analysis and Knowledge Discovery, 2017, 1(10): 85-93.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2017.0783     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2017/V1/I10/85

[1] 祝忠明. 支持数据与知识服务的机构知识库新功能[R/OL]. (2016-10-17). [2017-07-17].
[1] (Zhu Zhongming. New Functions of Institutional Repository for Data and Knowledge Services [R/OL]. (2016-10-17). [2017-07-17].
[2] 张晓林. 机构知识库的发展趋势与挑战[J]. 现代图书情报技术, 2014(2): 1-7.
[2] (Zhang Xiaolin.Trends and Challenges for Institutional Repositories[J]. New Technology of Library and Information Service, 2014(2): 1-7.)
[3] 姚晓娜, 祝忠明, 刘巍, 等. 机构知识库集成服务系统研究及实践[J]. 图书情报工作, 2015, 59(21): 123-127, 75.
[3] (Yao Xiaona, Zhu Zhongming, Liu Wei, et al.Research and Practice on the Institutional Repository Aggregative System[J]. Library and Information Service, 2015, 59(21): 123-127, 75.)
[4] 叶勤勇. 基于URL规则的聚焦爬虫及其应用[D]. 杭州: 浙江大学, 2007.
[4] (Ye Qinyong.URL Rule Based Focused Crawl and Its Application[D]. Hangzhou: Zhejiang University, 2007.)
[5] 蒋付彬. 基于决策树的URL分类器算法及主题爬虫平台设计[D]. 成都: 成都理工大学, 2016.
[5] (Jiang Fubin.URL Classifier Algorithm Based on Decision Tree and Platform Design of Focused Crawler [D]. Chengdu: Chengdu University of Technology, 2016.)
[6] 杨镒铭. 基于URL模式的网页分类算法研究[D]. 合肥: 中国科学技术大学, 2016.
[6] (Yang Yiming.Research on URL-Pattern Based Algorithm for Web Page[D]. Hefei: University of Science and Technology of China, 2016.)
[7] Bar-Yossef Z, Rajagopalan S.Template Detection via Data Mining and Its Applications[C]//Proceedings of the 11th International Conference on World Wide Web, Honolulu, Hawaii, USA. New York, USA: ACM, 2002: 580-591.
[8] Mitra P, Debnath S, Giles Lee C, et al.Automatic Identification of Informative Sections of Web Pages[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 17(9): 1233-1246.
[9] 王浩. 基于半监督学习的网络敏感信息识别[D]. 天津: 天津大学, 2012.
[9] (Wang Hao.Internet Sensitive Information Identification Based on Semi-Supervised Learning [D]. Tianjin: Tianjin University, 2012.)
[10] Pavlinek M, Podgorelec V.Text Classification Method Based on Self-training and LDA Topic Models[J]. Expert Systems with Applications, 2017, 80: 83-93.
[11] 李剑. 基于DOM和神经网络的网页净化应用[J]. 电子科技, 2012, 25(1): 105-107.
[11] (Li Jian.Application Research of Web Page Purification Based on DOM and Neural Network[J]. Electronic Science and Technology, 2012, 25(1): 105-107.)
[12] 李伟男, 李书琴, 景旭, 等. 基于模拟退火算法和二阶HMM的Web信息抽取[J]. 计算机工程与设计, 2014, 35(4): 1264-1268.
[12] (Li Weinan, Li Shuqin, Jing Xu, et al.Web Information Extraction Based on Simulated Annealing Algorithm and Second-order HMM[J]. Computer Engineering and Design, 2014, 35(4): 1264-1268.)
[13] Cai D, Yu S, Wen J R, et al.VIPS: A Vision-based Page Segmentation Algorithm [R]. Microsoft Research, Technical Report MSR-TR-2003-79, 2003.
[14] 谢方立. 基于节点类型标注的网页主题信息提取技术研究[D]. 北京: 中国农业科学院, 2016.
[14] (Xie Fangli.Research on the Technique of Extracting Web Page Informational Content Based on Node Type Annotation[D]. Beijing: Chinese Academy of Agricultural Sciences, 2016.)
[15] 欧健文, 董守斌, 蔡斌. 模板化网页主题信息的提取方法[J]. 清华大学学报: 自然科学版, 2005, 45(S1): 1743-1747.
[15] (Ou Jianwen, Dong Shoubin, Cai Bin.Topic Information Extraction from Template Web Pages[J]. Journal of Tsinghua University: Science and Technology, 2005, 45(S1): 1743-1747.)
[16] 马费成, 苏小敏. 网络信息生命阶段的模糊识别研究[J]. 情报科学, 2012, 30(9): 1277-1283.
[16] (Ma Feicheng, Su Xiaomin.Research on Fuzzy Identification in Life Stages of Network Information[J]. Information Science, 2012, 30(9): 1277-1283.)
[17] 林文辉. 基于Hadoop的海量网络数据处理平台的关键技术研究[D]. 北京: 北京邮电大学, 2014.
[17] (Lin Wenhui.Research on Key Technologies of Massive Network Data Processing Platform Based on Hadoop [D]. Beijing: Beijing University of Posts and Telecommunications, 2014.)
[18] 谭宗颖, 王强, 苍宏宇, 等. 科技发展前沿信息监测与分析平台的构建[J]. 科学学研究, 2010, 28(2): 195-201.
[18] (Tan Zongying, Wang Qiang, Cang Hongyu, et al.Construction of the Science and Technology Frontier Information Monitoring and Analysis Platform[J]. Studies in Science of Science, 2010, 28(2): 195-201.)
[19] 刘海波. 动态Web信息监测相关技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2011.
[19] (Liu Haibo.Research on Related Technology of Dynamic Web Information Monitoring [D]. Harbin: Harbin Institute of Technology, 2011.)
[20] 张智雄, 刘建华, 谢靖, 等. 科技战略情报监测服务云平台的设计与实现[J]. 现代图书情报技术, 2014(6): 51-61.
[20] (Zhang Zhixiong, Liu Jianhua, Xie Jing, et al.Design and Implementation of the Service Cloud for Strategic S&T Information Monitoring[J]. New Technology of Library and Information Service, 2014(6): 51-61.)
[21] 谢靖, 曲云鹏, 刘建华. 面向网络科技监测的分布式定向资源精确采集研究和应用[J]. 现代图书情报技术, 2011 (7-8): 26-31.
[21] (Xie Jing, Qu Yunpeng, Liu Jianhua.Targeted Websites Distributed and Precise Harvest System for Network Monitoring Technology[J]. New Technology of Library and Information Service, 2011(7-8): 26-31.)
[22] 王思丽, 马建玲, 王楠, 等. 开放知识资源的元数据自动采集策略研究[J]. 图书馆学研究, 2013(12): 47-51.
[22] (Wang Sili, Ma Jianling, Wang Nan, et al.Research on Automatic Acquisition Strategy for Metadata of Open Knowledge Resources[J]. Research on Library Science, 2013(12): 47-51.)
[1] Wangqiang Zhang,Zhongming Zhu,Yamei Li,Linong Lu,Wei Liu. Disambiguating Author Names Automatically for Institutional Repository[J]. 数据分析与知识发现, 2019, 3(6): 92-98.
[2] Zhiqiang Wu,Zhongming Zhu,Wei Liu,Sili Wang. Research and Practice on the Extension of Knowledge Analysis and Visualization Function in CSpace[J]. 数据分析与知识发现, 2019, 3(3): 112-119.
[3] Zhiqiang Wu,Zhongming Zhu,Xiaona Yao,Sili Wang. Expanding Support Ability of CSpace for Audios and Videos Resources[J]. 数据分析与知识发现, 2017, 1(9): 90-96.
[4] Zhiqiang Wu,Zhongming Zhu,Wei Liu,Wangqiang Zhang,Xiaona Yao. Retrieving 3D Models from Institutional Repository[J]. 数据分析与知识发现, 2017, 1(1): 73-80.
[5] Zhang Wangqiang,Zhu Zhongming,Yao Xiaona,Liu Wei. Building Institutional Repository with iSwitch Service[J]. 现代图书情报技术, 2016, 32(4): 91-96.
[6] Pan Zhuhong,Xiao Dehong. Data Filtering Method for Digital Resource Usage Analysis System for Dual Stack and High Speed Network[J]. 现代图书情报技术, 2016, 32(3): 90-96.
[7] Yan Chaobin, Chen Jiayong, Hou Ruifang, Li Ling, Zhou Jie. Construction of University Institutional Repository: Demand-driven by Paper Index and Citation Service[J]. 现代图书情报技术, 2015, 31(5): 94-100.
[8] Bai Haiyan. Introduction of Integration Between ORCID and Institutional Repository[J]. 现代图书情报技术, 2015, 31(3): 8-17.
[9] Zhao Ruixue, Du Ruopeng. Practice on Institutional Repository of Chinese Academy of Agricultural Sciences[J]. 现代图书情报技术, 2015, 31(2): 72-77.
[10] Xiong Yongjun, Yuan Xiaoyi. Design and Implementation of Automatic Monitoring System about Library Document Database Running State[J]. 现代图书情报技术, 2014, 30(7): 127-132.
[11] Zhang Xiaodan, Qiao Xiaodong, Gu Liping, Yao Changqing, Chu Jingli. A Survey Analysis of the Intention of Chinese Academic Journals Toward the Institutional Repository Deposit Policies[J]. 现代图书情报技术, 2014, 30(6): 1-7.
[12] Yao Xiaoxia, Nie Hua, Ku Liping, Zhang Dongrong, Wu Yue, Wei Chengfu. Survey and Analysis of the Status on Institutional Repository in Chinese Academic and Research Libraries[J]. 现代图书情报技术, 2014, 30(5): 1-9.
[13] Wang Sili, Zhu Zhongming, Yao Xiaona. Analysis and Experimental Research on Method of Semantic Knowledge Acquisition for Institutional Repository[J]. 现代图书情报技术, 2014, 30(4): 7-13.
[14] Liu Yajing, Wang Yanxi, Hao Dan, Zhou Jinhui. Study on the Methods of Institutional Repository Supporting Research Services[J]. 现代图书情报技术, 2014, 30(3): 1-7.
[15] Liu Wei, Zhu Zhongming, Zhang Wangqiang, Lu Linong, Yao Xiaona. Development and Research of Author Identifier and Item Claim Service for Institutional Repository[J]. 现代图书情报技术, 2014, 30(3): 8-13.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn