[Objective] Study the construction of Institutional Repositories of the Academy of Military Medical Sciences (AMMS IR) in order to promote the scientific organization and concentrated reveal, storage, management and reuse of AMMS knowledge assets. [Context] In full compliance with the principle of the DSpace development, AMMS IR is constructed with the use of B/S architecture and Java language based on the PostgreSQL. [Methods] The DSpace-Core API is redesigned by keeping using the core logic and functionality of DSpace-Core API section and adding new "event mechanism", "plug-in mechanism" and "Access chain" and other mechanisms instead of merely relying on DSpace default presentation layer. [Results] With Solr as the search engine, it is realized that the progressive faceted search and browsing, technological file management, data analysis of institutions and authors. [Conclusions] Useful exploration and practice is carried out in terms of the faceted retrieval, semantic analysis of software system based on the construction of institutional repositories with DSpace.
[Objective] The paper proposes and forms an effective method of semantic knowledge acquisition through analysis, summary and experiment, in order to provide theoretical principle and possible technological route for the semantization of Institutional Repository. [Methods] Based on the contrastive analysis of methods of semantic knowledge acquisition both at home and abroad, the paper proposes a system framework of semantic knowledge acquisition for Institutional Repository, and sums up its key technologies for deep analysis and then takes the CAS IR GRID for an experimental study. [Results] This method can automatically and effectively acquire semantic knowledge information from data and entity relationship structure of relational database of underlying Institutional Repository and convert it into RDF triples for browse and search. [Limitations] To define a reasonable and effective mapping rule may need domain expert evaluation, more manual intervention and repeated experiments. The semantic knowledge acquisition and relevance study for the same entity object between different Institutional Repository is not involved in this paper. [Conclusions] This study may better help follow-up researchers and developers quickly understand and master the method and key technologies of semantic knowledge acquisition, then lay the foundations for enhancing knowledge service capabilities of Institutional Repository.
[Objective] This paper tries to build a book recommender system based on folksonomy, which forms the triple relations among the users, resources and tags. [Methods] This papercalculates the cosine similarity and weights of books and tags, use sparse vector representation to represent the input matrix for each resource to compress sparse matrix. [Results] Experimental results show that the book weights varied from 0 to 200 and the tag weights followed a power law distribution. In the end, the relevant assessments are performed with the AP and MAP indicators. [Limitations] It fails to get enough data in the library catalogs, hence collects the additional data in book.douban.com. [Conclusions] The recommendation system can help the OPACs to improve its function and personalized services.
[Objective] To help users retrieve and read the literature of one topic from the shallower to the deeper. [Context] Literature recommendation service is one of the core businesses in digital library, and it plays an important role in literature searching and querying for the readers. [Methods] This paper introduces a user searching behaviour Common evolution pAtterns based Literature retrievaL method (CALL for short). First, it extracts the features of literature, readers and retrieval logs, then it clusters the literature into n stages, further uses longest common subsequence method to mine the frequent article name sequences that are greater than the thresholds of length and frequency, finally it outputs the frequent subsequences from the above stage as the recommendation results. [Results] The author conducts extensive experiments on real literature and retrieval log datasets, and results demonstrate the accuracy, efficiency and scalability of the methods. And it can enrich the performance of recommendation of digital library. [Conclusions] The proposed methods can greatly enhance the efficiency of the existing literature recommendation systems, and make the direction of literature recommendation be diversified.
[Objective] In order to fully get expert resources, the authors have carried out the information fusion research based on multiple-sensor expert features. [Methods] Firstly, in the view of working process of sensor, this paper brings out three methods based on knowledge sensor, Web sensor and social network sensor in sequence. Secondly, focusing on resource balancing degree, it designs the method of expert feature recognition based on multiple-sensor information to solve the conflict which three obtained eigenvectors give rise to. [Results] Matching the expert feature from C-DBLP, the degree of similarity is close to thirty-nine percent, which can be accepted among similar methods. [Limitations] On one hand, many experts identified are from universities and institutes, correspondingly, academic resources for feature recognition are of great account. On the other hand, the site collection for Web sensor can be extended further. [Conclusions] Under the circumstance of controlled relationship between keywords, this method can be applied to many aspects, such as the construction of expert teams, the recommendation and retrieval of experts, and so on.
[Objective] This paper tries to realize user task-oriented query suggestion from session level based on AOL query log dataset. [Methods] This paper firstly measures the relationship between queries based on user task, and then realizes user task-oriented query recommendation by exploiting random walk to traversal graph model. [Results] The final results show that our query recommendation method outperforms that method which measures relationship between queries by exploiting queries occurrence information. [Limitations] Misspelled candidate queries are not implemented spell correction; Query recommendation are not realized from query level; The recommendation effect of rare queries and ambiguous queries are not good. [Conclusions] Measuring the relationship between queries based on user task can improve the performance of query recommendation.
[Objective] Enlightened by Lesk's research about sense disambiguation, an approach based on the term definition to find synonyms is proposed. [Methods] This experiment set up the test set on the Chinese scientific and technical vocabulary system(new energy vehicles). First the Chinese word segmentation, part-of-speech tagging and manual correction of term definition are given. Then verbs and nouns content words are extracted, and the similarity of two terms is calculated according to the number of terms defined in the same content words and the position of the same content words. At last, according to the similarity and given threshold, the synonym relations are recommended. [Results] The precision, recall, F value is used to evaluate the effect of synonyms found, to demonstrate the effectiveness of this method. The result shows that the method can achieve a high precision, but the recall is low. [Limitations] This method can not exclude terms with antisense relationships or related relationships, resulting in lower recall rate. [Conclusions] This method is simple and more effective, and can achieve a high accuracy, while higher recall rate is expected.
[Objective] To comprehensively analyze many feature extraction methods and improve traditional feature extraction process. [Methods] Firstly, the paper uses feature pool to pre-extract features, then extract best feature set by genetic algorithm and group coding. [Results] When the fitness function uses KNN classification algorithm, the method using in this paper shows the best performance. Besides, the effect is more obvious with less feature dimensions. Simultaneously, the proposed method has better stability in text classification for different feature dimensions and corpuses. [Limitations] The corpus is not abundant enough. Only IG and CHI are used to extract features for feature pool construction. It ignores semantic relationships among words for group coding. The population size and the number of iteration in genetic algorithm are restricted by experimental conditions. [Conclusions] The stability of text classification is improved by adding a feature pool to pre-extract features. The result of text classification is more accurate by adding genetic algorithm in the text feature extraction. To use proposed method reduces overfitting of features and improves efficiency by utilizing group coding in the genetic algorithm.
[Objective] Aiming at the problem of quality testing in the process of sentiment analysis research, the paper constructs a filter model to select more suitable review. [Methods] It selects four indexes namely product words, length of review, emotional strength and adjunct words as judgment references, using multiple linear regression method and data from shopping website to construct the model. [Results] The four indexes are related to the quality of review, and the filter model gains high accuracy in terms of recall rate and precision so that it provides a new method for selection of data source in the sentiment analysis research. [Limitations] Data scarcityleads to the limitation ofthe filter model. [Conclusions] The model can judge the quality of customer reviews in the range of permitted errors.
[Objective] This paper proposes a customer focus feature mining method oriented supply chain. [Methods] The association rule mining is improved by adding data preprocessing, which includes product evaluation conception tree, product evaluation feature database and MA_Apriori algorithm. Based on the data of tablet PC of Jingdong Mall, the data experiment mines the customer focus features in Weka. [Results] The experiments show that the recall radio of new method is 90.5%, but the association rule method is 71.4%. In addition, it can get the hierarchical and standardized products features. [Limitations] Considering the accuracy of word segmentation, the user dictionary of segmentation system needs to be replenished by adding the product professional vocabulary. [Conclusions] This paper can help each enterprise select the product evaluation conception hierarchies flexibly, then improve the qualities of products and service.
[Objective] The article studies the book pages automatic identification and the thematic information extraction method, which sets relevant book pages as the objects. [Methods] Based on the analysis of the features usage of different book pages labels, layout structure and theme information representation, the article establishes a book pages automatic identification and thematic information extraction model through defining general rules, using co-occurrence words and pages analysis, etc. [Results] The result shows that the book pages identification rates from the general Web sites of the model can reach nearly 80%, and the average abstraction rates of the thematic information about kinds of book pages can reach nearly 79%. [Limitations] The method of threshold setting comprehensively considerates various types of books characteristics of Web information, but for some features extremely special webpages exists misjudgment phenomenon, if the algorithm is further improved, it may be better. [Conclusions] The method for automatic identification of all kinds of book pages and thematic information extraction can obtain ideal result, it has a strong universality, at the same time, it also has laid the foundation for the book Web page information organization management and automatic classification research.
[Objective] The software to discriminate one scientific institute's authors of scientific papers is designed to meet demands of the statistics of papers indexed by SCI. [Context] It can be used to help the department of statistical analysis on papers in SCI to determine Chinese characters for the Chinese author name belong to their institute and its corresponding lab. [Methods] Author discrimination is implemented technically by the comprehensive utilization of one characteristics of scientific research that people from the same research units are more likely to co-author papers, custom unique keywords or co-authors and text features of author fields in SCI. [Results] Automation and high accuracy of author discrimination can be achieved based on maintenance of a personnel list of one scientific institute. [Conclusions] It effectively solves the duplication problem of Chinese names during the analysis of papers in SCI and its design ideas also apply to other databases such as EI and Inspec.
[Objective] Reduce the costs of detail perceived and decision-making of users to get Web life service information. [Context] Life service information based on the Web environment need to combine the users' situation that help the users get information quickly. [Methods] This paper summarizes four kinds of ordinary requirements of users, which refer to the trip chain theory and Bertin coding principle in information visualization technology and base on the nature of weighted graph to design the algorithm, and implement the information visualization of Web life service. [Results] Taking the Web life service information in group buying for example, this paper realizes the interactive prototype. [Conclusions] It is verified that the Web life service information visualization can help users create psychological orientation quickly.
[Objective] Analyze the existing product evaluation models of electronic commerce, find their shortages, and propose a new model to improve these shortages. [Methods] Collect 1 687 microblogging data on a product from the largest microblogging platform in China. Analyze and build modeling on the sample data sets by text sentimental classification. [Results] Analyzing the microblogging data on a product and summarizing their inherent semantic information. The research find that they can be used to evaluate product characterisics. And these data is generated with spontaneous, so the results of the analysis are more objective. [Limitations] Analysis of a larger sample of data is not fully involved, also the evaluation of products based on dynamic microblogging data is not involved. [Conclusions] The analysis in the paper indicates that this model overcomes the weakness of original ones to a certain extent; accordingly, it attracts more companies' attention on microblogging product evaluation information.