|
|
Tracking Scientific Information with CSpace Technology |
Wang Sili1,2(), Liu Wei1, Zhu Zhongming1, Wu Zhiqiang1, Wang Jinping1 |
1Lanzhou Literature and Information Center, Chinese Academy of Sciences, Lanzhou 730000, China 2University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
Abstract [Objective] This paper proposes a new system to automatically track, acquire, store and manage scientific information, aiming to support research in related fields. [Methods] We developed the new system based on the CSpace and then solve many technical issues. Then, we examined the new system with marine information. [Results] The proposed system could automatically retrieve multi-source heterogeneous scientific information, which supported the construction of science and technology platform. [Limitations] The information acquisition procedure of the new system was complex, and it cannot retrieve documents from password-protected sites. [Conclusions] The proposed method could expand the CSpace’s data acquisition and integration functions, and might be transferred to other fields.
|
Received: 05 August 2017
Published: 08 November 2017
|
|
[1] |
祝忠明. 支持数据与知识服务的机构知识库新功能[R/OL]. (2016-10-17). [2017-07-17].
|
[1] |
(Zhu Zhongming. New Functions of Institutional Repository for Data and Knowledge Services [R/OL]. (2016-10-17). [2017-07-17].
|
[2] |
张晓林. 机构知识库的发展趋势与挑战[J]. 现代图书情报技术, 2014(2): 1-7.
|
[2] |
(Zhang Xiaolin.Trends and Challenges for Institutional Repositories[J]. New Technology of Library and Information Service, 2014(2): 1-7.)
|
[3] |
姚晓娜, 祝忠明, 刘巍, 等. 机构知识库集成服务系统研究及实践[J]. 图书情报工作, 2015, 59(21): 123-127, 75.
doi: 10.13266/j.issn.0252-3116.2015.21.018
|
[3] |
(Yao Xiaona, Zhu Zhongming, Liu Wei, et al.Research and Practice on the Institutional Repository Aggregative System[J]. Library and Information Service, 2015, 59(21): 123-127, 75.)
doi: 10.13266/j.issn.0252-3116.2015.21.018
|
[4] |
叶勤勇. 基于URL规则的聚焦爬虫及其应用[D]. 杭州: 浙江大学, 2007.
|
[4] |
(Ye Qinyong.URL Rule Based Focused Crawl and Its Application[D]. Hangzhou: Zhejiang University, 2007.)
|
[5] |
蒋付彬. 基于决策树的URL分类器算法及主题爬虫平台设计[D]. 成都: 成都理工大学, 2016.
|
[5] |
(Jiang Fubin.URL Classifier Algorithm Based on Decision Tree and Platform Design of Focused Crawler [D]. Chengdu: Chengdu University of Technology, 2016.)
|
[6] |
杨镒铭. 基于URL模式的网页分类算法研究[D]. 合肥: 中国科学技术大学, 2016.
|
[6] |
(Yang Yiming.Research on URL-Pattern Based Algorithm for Web Page[D]. Hefei: University of Science and Technology of China, 2016.)
|
[7] |
Bar-Yossef Z, Rajagopalan S.Template Detection via Data Mining and Its Applications[C]//Proceedings of the 11th International Conference on World Wide Web, Honolulu, Hawaii, USA. New York, USA: ACM, 2002: 580-591.
|
[8] |
Mitra P, Debnath S, Giles Lee C, et al.Automatic Identification of Informative Sections of Web Pages[J]. IEEE Transactions on Knowledge & Data Engineering, 2009, 17(9): 1233-1246.
doi: 10.1109/TKDE.2005.138
|
[9] |
王浩. 基于半监督学习的网络敏感信息识别[D]. 天津: 天津大学, 2012.
|
[9] |
(Wang Hao.Internet Sensitive Information Identification Based on Semi-Supervised Learning [D]. Tianjin: Tianjin University, 2012.)
|
[10] |
Pavlinek M, Podgorelec V.Text Classification Method Based on Self-training and LDA Topic Models[J]. Expert Systems with Applications, 2017, 80: 83-93.
doi: 10.1016/j.eswa.2017.03.020
|
[11] |
李剑. 基于DOM和神经网络的网页净化应用[J]. 电子科技, 2012, 25(1): 105-107.
doi: 10.3969/j.issn.1007-7820.2012.01.036
|
[11] |
(Li Jian.Application Research of Web Page Purification Based on DOM and Neural Network[J]. Electronic Science and Technology, 2012, 25(1): 105-107.)
doi: 10.3969/j.issn.1007-7820.2012.01.036
|
[12] |
李伟男, 李书琴, 景旭, 等. 基于模拟退火算法和二阶HMM的Web信息抽取[J]. 计算机工程与设计, 2014, 35(4): 1264-1268.
|
[12] |
(Li Weinan, Li Shuqin, Jing Xu, et al.Web Information Extraction Based on Simulated Annealing Algorithm and Second-order HMM[J]. Computer Engineering and Design, 2014, 35(4): 1264-1268.)
|
[13] |
Cai D, Yu S, Wen J R, et al.VIPS: A Vision-based Page Segmentation Algorithm [R]. Microsoft Research, Technical Report MSR-TR-2003-79, 2003.
|
[14] |
谢方立. 基于节点类型标注的网页主题信息提取技术研究[D]. 北京: 中国农业科学院, 2016.
|
[14] |
(Xie Fangli.Research on the Technique of Extracting Web Page Informational Content Based on Node Type Annotation[D]. Beijing: Chinese Academy of Agricultural Sciences, 2016.)
|
[15] |
欧健文, 董守斌, 蔡斌. 模板化网页主题信息的提取方法[J]. 清华大学学报: 自然科学版, 2005, 45(S1): 1743-1747.
doi: 10.3321/j.issn:1000-0054.2005.09.005
|
[15] |
(Ou Jianwen, Dong Shoubin, Cai Bin.Topic Information Extraction from Template Web Pages[J]. Journal of Tsinghua University: Science and Technology, 2005, 45(S1): 1743-1747.)
doi: 10.3321/j.issn:1000-0054.2005.09.005
|
[16] |
马费成, 苏小敏. 网络信息生命阶段的模糊识别研究[J]. 情报科学, 2012, 30(9): 1277-1283.
|
[16] |
(Ma Feicheng, Su Xiaomin.Research on Fuzzy Identification in Life Stages of Network Information[J]. Information Science, 2012, 30(9): 1277-1283.)
|
[17] |
林文辉. 基于Hadoop的海量网络数据处理平台的关键技术研究[D]. 北京: 北京邮电大学, 2014.
|
[17] |
(Lin Wenhui.Research on Key Technologies of Massive Network Data Processing Platform Based on Hadoop [D]. Beijing: Beijing University of Posts and Telecommunications, 2014.)
|
[18] |
谭宗颖, 王强, 苍宏宇, 等. 科技发展前沿信息监测与分析平台的构建[J]. 科学学研究, 2010, 28(2): 195-201.
|
[18] |
(Tan Zongying, Wang Qiang, Cang Hongyu, et al.Construction of the Science and Technology Frontier Information Monitoring and Analysis Platform[J]. Studies in Science of Science, 2010, 28(2): 195-201.)
|
[19] |
刘海波. 动态Web信息监测相关技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2011.
|
[19] |
(Liu Haibo.Research on Related Technology of Dynamic Web Information Monitoring [D]. Harbin: Harbin Institute of Technology, 2011.)
|
[20] |
张智雄, 刘建华, 谢靖, 等. 科技战略情报监测服务云平台的设计与实现[J]. 现代图书情报技术, 2014(6): 51-61.
|
[20] |
(Zhang Zhixiong, Liu Jianhua, Xie Jing, et al.Design and Implementation of the Service Cloud for Strategic S&T Information Monitoring[J]. New Technology of Library and Information Service, 2014(6): 51-61.)
|
[21] |
谢靖, 曲云鹏, 刘建华. 面向网络科技监测的分布式定向资源精确采集研究和应用[J]. 现代图书情报技术, 2011 (7-8): 26-31.
|
[21] |
(Xie Jing, Qu Yunpeng, Liu Jianhua.Targeted Websites Distributed and Precise Harvest System for Network Monitoring Technology[J]. New Technology of Library and Information Service, 2011(7-8): 26-31.)
|
[22] |
王思丽, 马建玲, 王楠, 等. 开放知识资源的元数据自动采集策略研究[J]. 图书馆学研究, 2013(12): 47-51.
|
[22] |
(Wang Sili, Ma Jianling, Wang Nan, et al.Research on Automatic Acquisition Strategy for Metadata of Open Knowledge Resources[J]. Research on Library Science, 2013(12): 47-51.)
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|