Following the review of some relation extraction researches based on GATE, this paper brings forward an open relation extraction model based on GATE and Stanford Parser. It is proved that the model can extract the relation triple guided by verb or preposition effectively, which will have an important impact on intelligence analysis.
Based on the features of Ajax, this paper describes an Asynchronous Communication Model(ACM) for viewer and retrieval engine, presents a resolution of complicated-class sent in semantic retrieval, proves the feasibility of the ACM by carrying out an experiment of Ontology navigator development, and verifies the advantages of Ajax to the semantic retrieval considering the client experience and redundancy resolution.
This paper analyzes why open source software is used widely in Web Archive, introduces some commonly used open source software, sums up the application status and trends of it, discusses on how to use open source software effectively in Web Archive.
From an investigation of some related studies and projects, the paper concludes two main construction models for building Overlay journal systems, which are digital repository system based model and electronic journal system based model.The technical features and usage scope for each model are also presented. Based upon a trend analysis, the paper proposes a construction framework for Overlay journal systems, which suggesting a method of extending OJS with an OAI harvesting service plugin.
Feature representation is one of the key issues in data clustering. Currently, feature representation of scientific data is deficient and influences the effect of data clustering.The paper proposes the concept of complex text description and a feature representation method based on it. The method uses different feature weighting computations to represent candidate features from two kinds of data sources respectively, and strengthenes the feature set by merging the two feature sets. Experiments show that the method is much better than kinds of traditional feature representation methods and it can improve the performance of data clustering markedly.
Based on a hierarchical DL grid service publication structure, this paper adopts the user-oriented and multi-stage service discovery model, which segmenting the service discovery into “my service” stage, main service domain to service sub-domain stage, transfer within main service domain stage and root domain to main service domain stage.
As an important character of Web2.0, mashup technology can integrate data resource and improve date value. Based on characters of this technology, this paper gives a research on the building methodology of mashup application system, and points out the developing process of the system after combining the structured system development methodology. At last, this paper builds a data integration system in CSOCO project, setting directions for the deeper applications.
This article studies algorithms of keywords extraction and analyzes factors that may influence the extraction. Based on the quantification of these factors, this paper proposes the complete framework of a model that includes word segmentation and part-of-speech tagging, text pre-treatment, weighted linear algorithm, generation and filtering of word combination, and combination of candidate keywords.
In terms of the weakness that information extraction based on information item Ontology of Web page can not partition accurately the areas of extraction, an improved Web information extractor based on Ontology and DOM is designed. This paper utilizes the DOM tree to design an inductive learning algorithm for the path of information items in sample Web pages. Through this algorithm, the areas of information extraction can be partitioned accurately, the noises of sample Web page can be reduced, and the preprocessing of the Web page can be implemented. The experiment shows that the improved approach can increase the precision of information extraction.
Realization of XML document for the Classified Chinese Thesaurus has a very important significance for improving the utilization efficiency of the thesaurus. On the basis of analyzing the logical relationships between the datasheets of the Classified Chinese Thesaurus, this paper uses JDOM technology to generate the XML document for the Classified Chinese Thesaurus automatically.
This paper introduces the design and implementation of a patent information acquiring and analysis prototype system. Patent search keywords in a certain domain are expanded by way of concept retrieval, which improves the search performance. Meanwhile, patent text information from Web pages of search results are accurately parsed and extracted by XML technology. Finally, the system applies social network analysis method to patent citation analysis.
This paper first analyzes the limitation of the existing methods of aspect identification. Then a novel method is presented which utilizes Self-organization map to identify the aspects from product reviews. A new SOM display named “Attribute Accumulative Matrix” is defined. In order to verify the validity of the method, we extract the product aspects from the restaurant reviews on a website. The experiment results show that this approach can effectively extract the product aspects.
This paper introduces design of the union website, analyzes its related development work such as hardware and software environment, resources organization structure, and the metadata harvesting based on OAI et al. Key issues in developing the union website are also analyzed and the effect and future work are described in the last.
In order to trace the changes of a Web page, the search engine needs to save many snapshots of it, that will increase the storage usage of the server. This paper introduces the method of delta encoding to save disk space. In order to let users understand global changes of all the snapshots and the detail changes of every two snapshots easily, this paper introduces the method of visualization.
This paper extends the analysis fuction from the aspects such as the analysis of readers’ character, the analysis of the correlation between service quality and readers’ willing to use library, the analysis of the influence of service quality evaluation dimensions on readers’ willing to use library. Then, the analysis methods of correlation and regression is studied to implement the function. It can offer a reference in the work of extending service quality evaluation system for libraries.
This paper introduces a new implementation of site search engine,which is a secondary development based on Google, Baidu and other common search engines. Compared with other similar applications, this search engine can actualize multi-domain name search and implement search in main Web ,sub-web and similar Web simultaneously, with the results page has no other advertisements and promotional information.
A management system which uses digital resource collection as the operating object is introduced in this paper.Detailed introduction is made on the overall design of the system, including collection granularity,collection description metadata, and functional operation for collection. The implementation details of some relevant parts of the system are also described.
This paper describes Ezproxy custom authentication script and ILASII Web authentication processes at first. Then,it gives the system architecture of portal integrated with ILASII user information authentication and Ezproxy remote access system.At last,the paper gives implementation method of reader status checking,secure password and groups authentication.