After elaborating user difficulties with the website of the National Science Library (NSL), the authors present the principles of library website development such as user-driven, simplicity, convenience, integration and modularity, design the service layer based on user information processes and the supporting layer with a service knowledge base and an SRU encapsulating engine, develop some functions such as integrated resource search, user-driven process composition, process-driven service navigation and context-sensitive help information.
This paper introduces a services-embedded desktop information system, which puts science and technology literature services to desktop by tracing user’s operation and his ongoing workflow scene. The paper also provides design ideas and implementation approaches of the system.
Based on the questionnaire of “cognition and requirement of IR ” done by the researchers working in the Chinese Academy of Sciences,the authors make an analysis of actuality of the institutional repository in academy. Then the authors point out the deficiency and plan in the next step of work.
This paper introduces the general situation and the track of development of ACE. According to the change of evolution task, teams, corpus and results, it analyzes the state-of-the-art development on information extraction and then gives out some thoughts concerning future directions.
This paper firstly introduces the characteristics and structure of Web tables and describes the process of information extraction over Web tables. Then four key technologies are analysed, including Web table detection, Web table structure recognition, Web table interpretation and presentation of table extraction. It also analyses the application of the research and points out the problems in current researches, and finally presents a prospect of its future.
This paper presents a method of calculating trust degree between entities in web environment. It also designs a sort of trust transfer protocol, which uses XML to present message and XML based encryption protocol.
This paper introduces the technology of Finite State Transducer, and references to the thinking of development of Penn Treebank, through the analysis of rules and the results of comprehensive utilization of POS tagging, recognition of discourse connectives,punctuations, vocabulary mapping, and chunk to simplify the complicated sentences. Final results are expressed in the form of proposition.
On the condition of error allowing, the Bloom Filter and its improvable algorithm, can be used to filter the homology URL pages through URL Hashing. Experiment shows that it can achieve satisfactory results through reasonable adjustments of its parameter.
This paper presents a new algorithm for the Elimination of Noise in Web Pages Based on a Group of Content-related rules. First, we present an algorithm which can peel off noises by iteratively comparing the tables on the same level of the page’s table tree. Next, we present an algorithm in order to evaluate anchor text’s topic similarity to the content of the page. To some extent, as the new algorithm takes semantic facts of the pages into consideration, it acquires a even higher accuracy than pure rule-based algorithms, and requires a fairly low time complexity. The experiment indicates that this algorithm performs very effectively when purifying great mass of web pages.
In view of the shortcomings of traditional methods for analyzing public opinions, this paper proposes a new idea of public opinion analysis under the Web,and then designs a model for it. Experiments show that the proposed model is an effective solution to analyzing public opinion under the Web.
With visual studio.NET development platform，C# program design language，XML knowledge description and data storage，knowledge element automatic extraction system of network special subject knowledge organization has been designed and developed. The design and development of main functions such as text information pretreatment，fast self increasing word segmentation of connection patterns of Chinese characters，accurate statistics of full text of word frequency etc of the system have been researched.
It is very difficult for Librarians to process mass of non-plain text information on web. The problems concerning input and output in math formula and some special data were solved based on MathML in the present study. In this way, information retrieval and utilization are carried out. The research results can provide a new method for non-plain text visualization such as math formula on web.
Studying the development and application of ontology at home and abroad and referring to the achievement of E-Government Thesauri, this paper brings forward the method of E-government ontology construction and gives the demo.
A new algorithm for Chinese word segmentation is introduced in this paper, which is based on the new data structure for Chinese dictionary. Theory and experimets show that the above data structure achieves much more efficiency.
This paper discusses the reason of difficulty of migrating MARC editor from coventional C/S mode to B/S mode, and provides a method of how to resolve the problem by using Ajax.
Purpose-To propose improvements to the identification of authors’ names in digital repositories.
Design/methodology/approach-Analysis of current name authorities in digital resources,particularly in digital repositories,and analysis of some features of existing repository application.
Findings-This paper finds that the variations of authors’ names have negatively affected the retrieval capability of digital repositories.Two possible solutions include using composite identifiers that variants of their name, if any, at the time of depositing articles.
Originality/value-This is the first time that the approach of authors self-depositing their name variations is proposed. This approach will be able to reduce confusions in name identification.
Purpose:Overview on certification of institutional repositories as a means to support Open Access in Germany and description of the DINI Certificate 2006 developed by DINI, the German Initiative for Networked Information.
Design/methodology/approach: The DINI Certificate for Document and Publication Repositories shows potential users and authores of digital documents that a certain level of quality in operating the repository is guaranteed and that this distinguishes it from commun insitutional Web servers.The Certificate can also be used as an instrument to support Open Access.
Findings: Repository certification will not be the main factor in achieving open access to academic information globally, but it can support the spread of institutional repositories and enhance visibility of the “Institutional Repository”-service.
Research limitations/implications: The DINI Certificate as a “soft” certificate aims towards interpoperability of digital repositories,the coaching idea prevails.It does not provide an exhaustive auditing tool for trusted digital long term preservation archives.
Practical implications: The DINI Certificate for Document and Publication Repositories pushed the development of institutional repositories in Germany according to certain organisational and technical standards and contributes to the interoperability amongst digital repositories worldwide.
Originality/value: This paper describes anique approach that has been implemented in Germany and could be transferred to other countries and communities.