This paper discusses how to build the general metadata application rules for Chinese digital library. It aims at solving the applications of metadata in Chinese digital library, developing a series of related metadata standards, criteria and platforms, to meet the requirements of describing, organizing, managing, serving a nd preserving the Chinese digital objects. It also gives the metadata application principles and framework, the metadata open and interoperability mechanism, and the metadata application workflows, based on the work of DCMI as well as the other international leading metadata projects. The authors are trying to find the best practice of metadata application for developing digital library in China.
This article introduces some contents and its application of the digitalization standard of the objects resources and its guideline. More attention is paid to metadata standard, the relationship of naming rules and DOI,etc. The authors expect to provide some theoretical and practical references for digitalization of resources in library.
Based on the personas of user modeling in human-computer interaction design, through the analysis of user behavior logs in institutional repository, the authors use K-means clustering method to identify user behavior patterns, classify users group, and create personas-feature matrix quantitative models for institutional repository.
This paper introduces the background of Subject Knowledge Environment (SKE) platform.Then by analyzing the composing and functional characteristic of Vitro, the authors advance the design solution to the SKE platform. The main methods on the localization of Vitro are also expounded.
Firstly, the article introduces the basic features of terms, and discusses the automatic identification method of scientific terms. Then V-value is proposed, which improves the two main statistical indicators:TF-IDF and C-value according to text characteristics. Different weights are also set for the candidate terms by the position to show their effect. Finally, a term extraction system is implemented based on statistics and rules. The system combines the weight, C-value and TF-IDF, so it has a higher precision of extraction.
Considering the completeness of subject extraction, this paper sorts the sentences with PageRank algorithm based on text theme divisions after reconstructing sentence relation map to every theme package. Then the sentence which has the maximum weight among all the texts is set to be the topics sentence. Experiments show that the topic sentence extraction algorithm has a good coverage of the full text.
Due to the absence of domestic research on image retrieval behavior, this paper designs a user experiment in which image retrieval process is recorded by behavior tracking technology to analyze the key behaviors. Some results on image retrieval strategies, characters and user psychology are discussed from various perspectives such as behavior distribution, browsing or researching, page turning, relevance judgment, and so on. In the end, some suggestions to networked image retrieval systems are provided.
Oriented to patent data fields, taking the characteristics of patent document and the requirement of patent analysis into account, this paper puts forward an improved method of patent data approximately duplicate attributes and records detecting based on RFMA algorithm and PCM algorithm, which is IRPU algorithm. Then IRPU algorithm is applied in patent data to detect inventor attribute and whole record. Experimental comparison with the previous work indicates that the proposed method is fit for patent data field and the identification accuracy is higher.
Based on the principle of disjoint literature knowledge discovery,transitive closure in discrete mathematics is applied to find potential associations among drug targets,which confirms that transitive closure based disjoint literature knowledge discovery is achievable and effective. What’s more,the paper makes the original three-step model to multi-step knowledge discovery model,which can get more potential associations but ensure relative high precision and high recall at the same time.
Based on query logs, comprehensive description of the “N1+N2” structure noun phrase form is given according to the characteristics of corpus itself,including the characteristics of each element and syntactic function.And the basic methods of mining and proofreading are given about the type of noun phrase. Through the analysis of experimental results, the authors further illustrate that the study of phrase is important in search engine.
The paper makes an association analysis on authors, affiliations and documents based on the data of the papers published in Chinese periodicals from Wanfang Data(2003-2007). This helps to indicate the latent relationships among authors, affiliations and documents. An effective method of entity recognition is also proposed to improve the accuracy of association analysis in this application. And the application is supposed to be the basis of further semantic retrieval.
In order to fully utilize the library collections, Tsinghua University Library opens a new approach for mobile services with the help of its past work on search integration of electronic resources. Tsinghua University Library implements the mobile search service for heterogeneous electronic resources based on MetaLib system and its X-Server interface. It is composed of 3 key components: UI customization, search service and status monitoring, and provides continuously available retrieving service for heterogeneous resources to the mobile users. This paper illustrates the technical implementations of the key components in detail.
This paper introduces an experimental system (DAAS) which can automatic harvest the institutional researcher articles and ingest the metadata into the local DSpace platform. The system implements a semi-automatic approach for IRs population which consists of information filtering, metadata extraction, copyright verification, metadata mapping and data archiving. Based on Nutch key component, how to parse the URL and extract the metadata from unstructured Web pages according to the rule-based filter is described in detail. The next research is focus on the computer-learning algorithm.
This article briefly discusses the situation of identity authentication research. In connection with the traditional smart card, it especially represents the method of using high-speed CCD to get card images without human intervention, then uses effective image manipulations and pattern recognition methods to read the card information. Finally it compares the cardholder’s identity information with the database, reaching the purpose of identity authentication.
This article combines with actual work of Library of Shandong Normal University and presents the ways to program assisted tools for ILAS. The tools automatically close messagebox of ILAS, make the query module of ILAS support automatic and continuous work. These tools can improve the efficiency of librarians.