This paper gives an overview on the technical architecture and principle of cloud storage, then analyses the challenges which are faced by traditional storage technologies, as well as the strategies adopted by cloud storage in the application of digital preservation. Then it takes two digital preservation cases to do more research on applicability of cloud storage in digital preservation.
Based on the research of SKOS and its applications, a technique for converting LASC to SKOS is described, the method of dealing with auxiliary tables in LASC is also explored. This paper can provide references for the other Chinese knowledge organization systems in implanting SKOS.
Based on Rodriguez and Egenhofer’s semantic similarity measuring, combing with the characteristics of MeSH,this paper puts forward a semantic similarity measuring of MeSH,and the experiment result shows that this method is effective.
The paper mainly discusses the construction of natural language thesauri for automatic assistant indexing literature system. Based on years of massive manual indexing keywords, it analyzes the rules of word frequency, length, type, co-occurrence, and proposes a method for constructing a thesauri of automatic assistant indexing and post controlled vocabulary.
Based on the load pressure and capacity expansion requirement of international science citation database (DISC), We choose popular load balancing technology and clustering technology to construct the service system. Web server load balancing mechanism, mysql cluster technology and distributed indexing technology of solr is introduced in detail. The performance of the architecture solution is explored both through theoretical analysis and experiment testing. Test results show that system architecture of DISC has good expansibility, availability and reliability, which can support current application requirements well.
This article discusses the issues of document representation in multilingual information processing. Firstly, it describes the process of multilingual document representation, introduces different methods in detail and compares their strengths and weaknesses. Then it summarizes the characteristics of multilingual document representation, and points out some existing problems.Finally, it shows some development trends of multilingual document representation.
The paper introduces the background and motivation of Named Entity Recognition,and summarizes the history development of Named Entity Recognition at home and abroad, as well as the related technology and evaluation method. Finally,it discusses the new development trends on Named Entity Recognition.
This article uses the model of CRF to conduct an experiment for comparing recognition performance and recognition efficiency between the way based on char labeled and the way based on word labeled. The experiment result shows that the performance of based on char is better than that of based on word at the expense of costing more time. In addition, it also pays more attention to the quantity of feature’s influence on the experiment performance.
A method of building modular Ontology is given by module fission,module reorganization and module reuse in this paper, in which the modular building flow, the rules of module fission, and module integration are analyzed in detail.At last, the application of building water resources Ontology and the problems in building process are discussed. The method can realize the fine-granularity and collaborative building of modular Ontology, and improve the building efficiency and reuse ability of Ontology.
This paper proposes a visual analytical framework named NeSVA based on link analysis. With the help of NeSVA, the analysts can explore the structure information of networks and gain deep insights from massive dynamic link data by providing timely, defensible and understandable assessments for dynamic network analysis.
Taking the standard system of rare earth in China for an example and based on the present standard system of rare earth, standard updating information is analyzed and estimated through searching the standard database CSSN to solve the problem about standard updating quantification. It is thought that the ways of standard updating are divided into two kinds: one to one and more to one. At the same time, the numerical value model about standard updating is developed for distinguishing and calculating the updating ways of nation standard system and industry standard system of rare earth. The updating ways of rare earth standards and their annual standard-updating are quantified, which can reflect the technology advancement and development of rare earth industry on some extent. At last, the standard-updating model is also used for alike standard systems of other industries.
Aiming at some problems in traditional information retrieval under the present network information environment, the paper puts forward patent information retrieval model based on domain Ontology, and makes in-depth study on the process and technical implementation of user retrieval request disposal, Ontology construction, Ontology visualization and semantic expansion, retrieval and storage. A prototype patent information retrieval system is also implemented. Via a series of retrieval effects tests, the model can ensure accuracy of information retrieval and greatly improve the comprehensiveness of information retrieval.
In view of the problems such as searching inefficient, retrieval paths singleness of the self-built characteristic database, the paper proposes the method of full-text retrieval based on Sphinx which is an open source full-text search engine,and introduces key technologies of system implementation in details. The test results show that the system can improve search speed and search quality to meet users’ needs.
Aiming at the status of passive check mode, poor real-time and low efficiency in the traditional methods of papers indexed by three famous indexes,the E-mail pushing service system is established. This paper describes the system design ideas and implementation in detail, including how to import data, read records, change English name to Chinese name, uniform data format and push E-mails.
According to the information collection of market quotation of agricultural products, the paper comprehensively uses WebClient class and opens source class libraries HtmlParser.net, automatically generates the downloading link of dynamic web page of market quotation, and converts each page downloading to the static page. Accurate extract common method of web data is founded based on HTML structure, and all market quotation data is extracted cyclically.
To address the current version of Unicorn system used in the library which doesn’t support the SIP2 protocol, the paper designs and develops an interface program to communicate with the Unicorn system. And through an intermediate procedure combined with self-check machine, it achieves real-time self-check service. Because of that, it avoids the risk of accessing the database directly. At the same time, this pager summarizes the problems in the course of self-check service and proposes appropriate solutions as well as improvements.