The paper introduces the major general Ontology libraries in domestic and foreign: WordNet、DBpedia、Cyc and HowNet, and the successful professional domain Ontology libraries: Biomedical Ontology and Enterprise Ontology. Then it separately compares and analyzes them from five aspects as the description language, storage mode, query language, platform building and application to provide assistance for the study in Ontology library and its application.
Ontology integration is a process that can eliminate Ontology heterogeneous, so as to achieve the highest level of semantic communication and semantic integration, and finally achieve knowledge reuse and interoperability. The paper reviews the four main methods and the five main tools for Ontology integration, and gives some comparative analysis.
To solve the existing contradiction of generality and speciality between Ontology concepts and natural language words,this paper takes WordNet thesaurus and SUMO Ontology as research objects, makes a simple introduction of them, detailedly analyzes the mapping motivations between them, proposes a mapping model among natural language words, WordNet synsets and SUMO Ontology concepts, and deeply analyzes the mapping instances, the mapping effects and applications between WordNet synsets and SUMO Ontology concepts. The authors hopes to better utilize the mapping relations between WordNet and SUMO to solve the contradiction between Ontology concepts and natural language words, and make Ontology have a more widely application in intelligent retrieval, semantic classification and data mining etc.
Aiming at the existing problems in the traditional text classification methods and the current semantic classification methods, a new text classification model based on SUMO and WordNet Ontology integration is proposed. This model utilizes the mapping relations between WordNet synsets and SUMO Ontology concepts to map terms in document-words vector space into the corresponding concepts in Ontology, and forms document-concepts vector space to classify texts automatically. The experiment results show that the proposed method can greatly decrease the dimensionality of vector space and improve the text classification performance.
This paper classifies the related articles retrieval from the perspective of bibliometrics, analyzes the key technologies involved in the process of implementation, and focuses on the text similarity computation algorithm, main research course and recent progress in the system of PubMed and CBM. Based on outlining the evaluation methods and indicators, the paper analyzes the effectiveness of related articles retrieval from both positive and negative aspects.Finally, it discusses the development direction of related articles retrieval.
Designing system functions to meet the non-user behavior can increase system utilization.This article systematically introduces the relevant studies and non-user types, then proposes the non-user theory, discusses using scenerios analysis, personas and living labortory to hold the non-user behavior. It can be a guideline to enhance and to improve the digital library service system.
Aiming at the term mismatch issues of existing information retrieval systems, a novel query expansion algorithm of pseudo relevance feedback is proposed based on feature terms extraction and correlation fusion. At the same time, a new computing method for weights of expansion terms is also given. The algorithm can extract feature terms related to original query from the n chapter top-ranked retrieved local documents, and then identify those feature terms as final expansion terms according to the frequency of each feature term appeared in the local documents and the correlation between each feature term and the entire original query for query expansion. The results of the experiment show that the method is effective,and it can enhance and improve the performance of information retrieval.
This paper introduces the combination of query fusion and relevance feedback methods.By analyzing previous TopN documents selection strategy, it puts forward a query fusion algorithm using correlation coefficient to select a variable number of TopN documents in order to extend query, which is called variable TopN feedback-based query fusion algorithm. Fixed and variable TopN query fusion experiments are analyzed separately, and the test results show that the variable TopN feedback method improves the retrieval performance to some extent.
According to detection of hot topics in a research field, the paper proposes a method combining co-word analysis and SOM together. By analysing the co-occurrence of high-frequency keywords in the literature as input data and using SOM Toolbox for SOM clustering, the collection of hot research topics is obtained.At last a case study is done by taking traditional medicine as an example, and experimental results show that this method is efficient in the process of hot research topics detection.
To solve the problem of irregular structure of some texts, this paper presents a method based on the complex network theory to evaluate the text structure. This method uses a node to represent a sentence and an edge between two nodes to represent a common word of two sentences, which construct the complex network of a text. Then the authors analyze characters of text structure by topological characteristics of text complex network. By building a text complex network based on a selected article, the degree, the degree of intensity, the shortest paths and the weighting clustering coefficients of this selected article are calculated. The results show that the structure of the text content can be effectively evaluated by this proposed method. Moreover, the results also provide important references to understand main ideas, to generate summaries and to filter text retrieval of a given text.
This paper gives comparisons of twelve Chinese and English Q&A communities from basic information, interaction, and personalized service. Q&A experiment on four types of questions in three fields is also conducted to evaluate those communities from the quality and efficiency of answering questions, etc. Research results give some advices on the development strategies of Q&A community.
Co-word cluster method is improved by following ways: high-frequency words are selected according to the formula derived from Zipf’s law; adhesive force is used to identify the core major MeSH words for tagging the content of each cluster; contrastive analysis of two periods helps to find the topics change. The bibliographic data of medical informatics are collected from PubMed in two periods (1999-2003 and 2004-2008). Major MeSH words from the articles are extracted separately to make co-word clusters as to explore the evolution of this subject structure based on comparison of two periods.
In order to prevent and control the existing mismanagement problems in ETL, and ensure the efficient implementation of the data warehouse, the paper designs CWM-based ETL metadata system model. This model can describe the specific steps of data transform, and the specific modules are designed according to this model design system, thus the process of ETL management can be achieved effectively.
Because of the fact that the introductions of primary and secondary schools have less feature items and unequal weights, the authors use the strategies of denoising, processing synonym features based on fuzzy set to build category vocabularies, and then classify short texts using the classification model which is based on category vocabularies and fuzzy rules. The results show that using fuzzy rules to classify the short texts which have less feature items and uneven distribution of weight is better than VSM, Rocchio and other classification algorithms.