Using connotea as a social tagging test bed, the resources and users are analyzed, patterns are discovered to show the broad resource coverage of tags, strong co-occurrence of tags, high tagging activity among active users, low average tags per resource, regular shifts of users’ tagging interests, and high usage of thesauri terms by users in scientific fields. Suggestions are made to improve structural relations among tags, and to increase tagging granularity.
The paper studies the localization and use of the open source digital software in detail, and discusses the problems and solutions on development of the open source software in Chinese digital libraries.
This paper reveals the concept and the importance of data provenance, and proposes an annotation schema of W7, then compares some description model including query inversion, time sequence, directed graph and XML/RDF. Finally an example of the data provenance in the bioinformatics science is given.
From the present theory of semantic Web and Ontology, the tools and methods to construct OWL-based Ontology about defense organizations and products are explored in this paper. The Ontology of the world defense industry organizations and products is constructed with Protégé3.2. Finally, the relevant visible figure of this Ontology is presented.
The paper analyzes the network structure of digital library resource service and the technology of multi-ISP network outlet, and proposes the network frame that the key technology of digital library resource service under the multi-ISP outlet is solved by using dynamic DNS. Then, an example that the optimizing and quick response resource service system of a digital library under CERNET, CNC and ChinaNET network outlet applied with dynamic DNS is given in this paper.
Matching is one of the most important techniques of information integration. In this paper, string-based matching algorithms,mainly distance-based,token-based and the N-gram are elucidated. The deficiencies and research directions are also outlined.
In this paper, an information retrieval model of SGML format documents based on multi-granularity 2-tuple linguistic approach is proposed to improve the efficiency of information retrieval. Since Fuzzy Linguistic Approach has been successfully applied to many different areas, especially in the information retrieval area, it is considered the basic theory of this model.
The paper brings forward a reputation evaluation model of Web service based on UDDI by extending actual UDDI criterion. At the same time, the paper introduces the concept of quantification, and uses consumer feedback and active monitor mechanism for evaluating and adjusting reputation value of Web service in the UDDI,to realize a Web discovery mechanism based on reputation restriction.
The main concept of P2P network and its information retrieval mechanism are analyzed. Based on the primary principle of Information Retrieval(IR), the P2P network information retrieval studying model(P2PNIRSM) is presented. Based on the P2PNIRSM, the current situation of P2P information retrieval is detailedly discussed from three aspects (resources location, retrieval model and user model) and the prospects of P2P network information retrieval are analyzed.
This paper introduces the architecture and features of DotNetNuke open source software, proposes the design and technology architecture for building a Web2.0-based knowledge management platform, and expounds the implementation process of using DotNetNuke for the construction of the platform, which provides a feasible method for constructing a Web2.0-based knowledge communication and management platform quickly.
This paper describes the shortage of visualization network teaching system in Web1.0 condition, outlines the advantage of the blog and podcast in Web2.0 and introduces the method of updating the existing system with them.The paper expatiates the whole reconstruction process with the example of visualization network teaching system of Nanjing Army Command College.
Well-known algorithm of maximum matching method is implemented in the process of knowledge extraction, and drawn a conclusion about critical techniques of vector segmentation. Nested vector segmentation is designed and implemented on account of disadvantage of once scanning. According to experiment, nested vector segmentation is used in knowledge extraction, it not only improves precision and recall, which resolves the problem of word in word radically, but also provides convenience to following syntactic analysis.
Based on analysis of system functionality of the current intelligence analysis tools, this paper details the basic design and integration study for the intelligent analysis platform. Meanwhile, the paper provides solutions to users’ requirements combining existing analysis tools and ongoing study for this platform.
This paper analyzes the requirements of data mining and introduces the common forms and existing problems in data mining. In the light of the problems,the authors discusse the associate data mining method of multimedia based on grid system which is the application of Apriori algorithm under the grid system. By analyzing the instance, the method is proved to have not only the accuracy of classics Apriori algorithm but also the characteristics of grid parallel excavation. Therefore, it can improve the data mining speed greatly and enhance the operation efficiency.
In order to improve the precision and the quality of the literature retrieval, this paper explores similarity computing scheme based on Vector Space Model (VSM), and describes the clustering algorithm of biomedical literature on the basis of similarity model. Applying the method of similarity algorithm and cluster analysis, the searched papers can be ranked by degree of similarity.
The authors have made some corresponding research to multi-agent mechanism based on the network with complementary architecture.Some simple functions,such as custom download,intelligent download,user-profile generation,information filtering, have already initially been realized with intelligent agent software programmed in this paper.
ETL needs to distinguish each kind of data, so the paper designs one kind of new data model to describe and support all data source, and analyzes the mapping relation between the data source and the goal database.Also the paper introduces the realization method about ETL core content——data source access and the data extract.
To improve double-array Trie, this paper presents a Chinese dictionary based on tri-array Trie mechanism, and gives a recursive algorithm to construct the table of word-building state automatically.
When a dissertation online submission system is separated from the template file, students have to fill out a metadata form. In the process of completing the form, some problems may appear such as entry errors, inconsistent with the print.Concerning these problems, the paper introduces how to extract metadata from the dissertation word file, and transmit the values to the local dissertation online submission system. The extracting process is implemented using VB.net programming.
Through making a MARC data analysis of multi-media information resources, the differences of various MARC data are concluded. Based on MARC field characteristic of different resources, computer automatic classification is designed and carried out, so information service quality can be improved.
This paper discusses the main idea of database management and design of SINOPEC Geologic Data Management System,and describes the main thought about application of TRS content search technology combining Oracle database system. In the end this method has proved to be safe and efficient.
Aiming at the principal channels of virus transmission, such as Internet, E-mails and downloads, the author builds HAVP based on Linux system. HAVP is able to block virus effectively, thus prevent LAN from virus and contribute to enhancing the security of LAN.