Tracking Scientific Information with CSpace Technology
Wang Sili1,2(), Liu Wei1, Zhu Zhongming1, Wu Zhiqiang1, Wang Jinping1
1Lanzhou Literature and Information Center, Chinese Academy of Sciences, Lanzhou 730000, China 2University of Chinese Academy of Sciences, Beijing 100049, China
[Objective] This paper proposes a new system to automatically track, acquire, store and manage scientific information, aiming to support research in related fields. [Methods] We developed the new system based on the CSpace and then solve many technical issues. Then, we examined the new system with marine information. [Results] The proposed system could automatically retrieve multi-source heterogeneous scientific information, which supported the construction of science and technology platform. [Limitations] The information acquisition procedure of the new system was complex, and it cannot retrieve documents from password-protected sites. [Conclusions] The proposed method could expand the CSpace’s data acquisition and integration functions, and might be transferred to other fields.
(Yao Xiaona, Zhu Zhongming, Liu Wei, et al.Research and Practice on the Institutional Repository Aggregative System[J]. Library and Information Service, 2015, 59(21): 123-127, 75.)
doi: 10.13266/j.issn.0252-3116.2015.21.018
[4]
叶勤勇. 基于URL规则的聚焦爬虫及其应用[D]. 杭州: 浙江大学, 2007.
[4]
(Ye Qinyong.URL Rule Based Focused Crawl and Its Application[D]. Hangzhou: Zhejiang University, 2007.)
(Jiang Fubin.URL Classifier Algorithm Based on Decision Tree and Platform Design of Focused Crawler [D]. Chengdu: Chengdu University of Technology, 2016.)
[6]
杨镒铭. 基于URL模式的网页分类算法研究[D]. 合肥: 中国科学技术大学, 2016.
[6]
(Yang Yiming.Research on URL-Pattern Based Algorithm for Web Page[D]. Hefei: University of Science and Technology of China, 2016.)
[7]
Bar-Yossef Z, Rajagopalan S.Template Detection via Data Mining and Its Applications[C]//Proceedings of the 11th International Conference on World Wide Web, Honolulu, Hawaii, USA. New York, USA: ACM, 2002: 580-591.
[8]
Mitra P, Debnath S, Giles Lee C, et al.Automatic Identification of Informative Sections of Web Pages[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 17(9): 1233-1246.
doi: 10.1109/TKDE.2005.138
[9]
王浩. 基于半监督学习的网络敏感信息识别[D]. 天津: 天津大学, 2012.
[9]
(Wang Hao.Internet Sensitive Information Identification Based on Semi-Supervised Learning [D]. Tianjin: Tianjin University, 2012.)
[10]
Pavlinek M, Podgorelec V.Text Classification Method Based on Self-training and LDA Topic Models[J]. Expert Systems with Applications, 2017, 80: 83-93.
doi: 10.1016/j.eswa.2017.03.020
(Li Jian.Application Research of Web Page Purification Based on DOM and Neural Network[J]. Electronic Science and Technology, 2012, 25(1): 105-107.)
doi: 10.3969/j.issn.1007-7820.2012.01.036
(Li Weinan, Li Shuqin, Jing Xu, et al.Web Information Extraction Based on Simulated Annealing Algorithm and Second-order HMM[J]. Computer Engineering and Design, 2014, 35(4): 1264-1268.)
[13]
Cai D, Yu S, Wen J R, et al.VIPS: A Vision-based Page Segmentation Algorithm [R]. Microsoft Research, Technical Report MSR-TR-2003-79, 2003.
[14]
谢方立. 基于节点类型标注的网页主题信息提取技术研究[D]. 北京: 中国农业科学院, 2016.
[14]
(Xie Fangli.Research on the Technique of Extracting Web Page Informational Content Based on Node Type Annotation[D]. Beijing: Chinese Academy of Agricultural Sciences, 2016.)
(Ou Jianwen, Dong Shoubin, Cai Bin.Topic Information Extraction from Template Web Pages[J]. Journal of Tsinghua University: Science and Technology, 2005, 45(S1): 1743-1747.)
doi: 10.3321/j.issn:1000-0054.2005.09.005
(Lin Wenhui.Research on Key Technologies of Massive Network Data Processing Platform Based on Hadoop [D]. Beijing: Beijing University of Posts and Telecommunications, 2014.)
(Tan Zongying, Wang Qiang, Cang Hongyu, et al.Construction of the Science and Technology Frontier Information Monitoring and Analysis Platform[J]. Studies in Science of Science, 2010, 28(2): 195-201.)
[19]
刘海波. 动态Web信息监测相关技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2011.
[19]
(Liu Haibo.Research on Related Technology of Dynamic Web Information Monitoring [D]. Harbin: Harbin Institute of Technology, 2011.)
(Zhang Zhixiong, Liu Jianhua, Xie Jing, et al.Design and Implementation of the Service Cloud for Strategic S&T Information Monitoring[J]. New Technology of Library and Information Service, 2014(6): 51-61.)
(Xie Jing, Qu Yunpeng, Liu Jianhua.Targeted Websites Distributed and Precise Harvest System for Network Monitoring Technology[J]. New Technology of Library and Information Service, 2011(7-8): 26-31.)
(Wang Sili, Ma Jianling, Wang Nan, et al.Research on Automatic Acquisition Strategy for Metadata of Open Knowledge Resources[J]. Research on Library Science, 2013(12): 47-51.)