[Objective] This survey is for better understanding of the situation of Institutional Repositories in China mainland and to make the recommendation of the future development strategy. [Methods] 130 samples are selected to do the online questionnaire survey. [Results] There are many repositories in China that developed their content deposit, system platform and management policy on different levels. [Limitations] The survey scale is limited, and the analysis focuses on the institutions which have developed repositories. [Conclusions] The investigation report comprehensively summarizes and analyzes the IR current situation on Institutional Repository in Chinese Academic and Research Libraries. The recommendation is to edit the best practice of Institutional Repositories in future.
[Objective] In order to query the corresponding Ontology according to the user’s demand better, and then find the corresponding service. [Methods] The authors transform the user’s description of service to the Ontology in the grid service, and on the basis of IOPE algorithm, match the content step by step and realize the quantitative analysis of the whole process, that improve the matching algorithm of the existing service Ontology. [Results] Realize the match between the Ontology in the description of grid service and the Ontology which stored in the grid service repository. The simulation results show that it improves the accuracy of the grid service Ontology matching. [Limitations] The improved algorithm is validated in the simulation platform but can not be verified in the more complex grid environment. [Conclusions] The new algorithm can achieve the full match process of quantitative analysis, and effectively improves the precision.
[Objective] To improve the classification effect of bibliographic information of books and journal articles etc. [Context] The classification performance under the traditional vector space model is not satisfied, and LDA model can effectively improve the classification effect by mining the implied semantic information. [Methods] Using LDA model to represent each text with implied topics, the optimal number of topics is determined on the classification result.Then the SVM classification algorithm is used. [Results] Experiments show that the Macro_F1 in Fudan and Sogou corpus reach 95.5% and 93.5% respectively; the Macro_F1 on the real data from catalogue and electronic journal database reach 77.4% and 87.6% respectively. [Conclusions] The classification performance on real data is increased by 10% and 3% respectively compared to the VSM, that reaches the practical level.
[Objective] In order to find synonymous relations for knowledge organization system integration. [Methods] This paper presents an automatic algorithm, which consists of lemmatization and semantic merging, as well as various methods to control the effects induced by vocabulary granularity. [Results] Its efficiency and effectiveness is well demonstrated from large scale data testing using many source vocabularies, compared with well-known integrated knowledge organization system. [Conclusions] The proposed algorithm can be used in large scale knowledge organization system integration, and is helpful for Chinese knowledge organization system integration.
[Objective] This article explores the optimization mechanism of tag cloud by the revealing and presenting of relationship of tag cloud in folksonomy. [Context] The traditional mode of knowledge organization of tag cloud in folksonomy is unable to reflect the knowledge relevance between the themes, which restricts the perceived usefulness of tag cloud. [Methods] Through the analysis of attribute on network of user tags and modular processing, tags in cloud are divided into a number of knowledge communities. With the cooperation among the links, the color, font size, tag cloud is optimized from the perspective of knowledge relevance between the themes. [Results] The latent knowledge community is robust, and it is able to show the relationship between knowledge. [Conclusions] Optimization of tag cloud based on knowledge relevance can improve perceived usefulness on multiple granularities, and promote the researching and developing of more scientific and practical tag cloud system.
[Objective] In order to extract Ontology concepts from Chinese UGC information sources. [Methods] This paper proposes a mixed Ontology extraction method which extracting the fine-grained words and combining them into concepts based on linguistic methods and filters the concepts based on statistical methods. To prove the methods, the paper establishes the Ontology extraction model and develops a prototype system of concept extraction which is based on the UGC sources. [Results] The method has more excellent performance than other four concept extraction methods as the comparative samples in the experiments of concept extraction from UGC. The results of the accuracy rate and the recall rate respectively reaches 68.42% and 85.35%. [Limitations] The test set of concept extraction is from high-quality UGC sources and some of the test set is filtered manually.So the corpus scale is not enough. [Conclusions] This concept extraction method and technology has some significance in the Ontology concept extraction based on UGC.
[Objective] Utilizing tags frequency and time used by the user, discussing the impact of dynamic changes of user interest for personalized recmmendation accuracy. [Methods] Constructing model for personalized recom-mendation based on social tagging in P2P environment, illustrating the calculation of user preferences and recommended process in detail. Making an experiment to verify the validity of the model using P2P movie sharing system. [Results] In 10 randomly selected target users, the hit rate of recommendation for eight users is higher than traditonal collabrative filtering which is based on scores, proving the advantages of making full use of tag frequency and time factor to recommend. [Limitations] Due to the main task of this paper is to reseach the impact of dynamic changes of user interst for personalized recommendation, so only delete meaningless tags and merge similar tags by hands, do not have an effective mechanism to control the ambiguity of tags. [Conclusions] Considering the dynamic changes of user interest can help to improve the accuracy of personalized recommendation.
[Objective] This study discusses wheather commodity characteristics described by sellers are consistent with comments or not, by building the conformity model between description of sellers and comments of buyers. [Methods] Study the text of description and comments, extract the key attributes of products and determine polarity of emotional words, then select three Taobao shops to evaluate the model. [Results] The result shows that there are higher consistent degrees in B shop, A shop is the second, C shop is the worst. There are two attributes “in line” and “authentic” in C shop, which are not consistent with the comments. [Limitations] All the information from sellers and customers are not contained, such as title and picture information of products, and the photo information from customers. [Conclutions] The results can tell which attributes are consistent with the sellers description and how much they match. This result can support consumer’s decisions more effectively.
[Objective] Through the study of microblog network’s features, a local network evolution model of Sina Microblog is developed in this paper. [Methods] With Sina Microblog entire network data and a typical user’s topological structure, a model is explored based on the theories of public opinion dynamics and complex network. [Results] A framework for microblog users’ behaviors is obtained, a division basis for ordinary users and opinion leaders is got, and the local network evolution model is developed. [Limitations] In this method, the selection of typical user has its limitation, and the analysis of the entire network data has a certain deviation. [Conclusions] Finally, a conclusion can be obtained that the local network evolution model accord with real microblog network topology. The research work of this paper is helpful to know the microblog network structure well.
[Objective] In a specific domain, sentiment analysis, mostly based on general lexicon, cannot identify the context-specific sentiment belonging to the domain. Also, the same word in the specific domain shows different polarities (positive, negative, neutral) when describing different properties. The objective of this paper is to solve the problems described above. [Methods] A sentiment analysis approach based on domain-oriented specific sentiment phrases is proposed. By developing feature-sentiment Ontology, general sentiment and specific sentiment can be divided during the process of sentiment analysis. [Results] The proposed method shows fairly better results of precision and recall in terms of phrase-level sentiment analysis. [Limitations] In order to get better analysis, the Ontology should cover the concepts in the related field as much as possible and should be well-built; the authors ignore the syntactic rules during the concept extraction and sentiment analysis, because the product comments are not normative; in the phase of sentiment analysis, the authors assume that the context like conjunction would not affect the polarity. [Conclusions] The new method not only makes improvement on sentiment analysis by solving the problem described above, but also proposes a new way for sentiment lexicon management.
[Objective] To reduce cost of machine learning by declining the size of learning dataset in species description text annotation in Chinese. [Methods] Based on Bootstrapping method, design a weakly supervised learning method which performs learning and tagging processes iteratively with a small amount of data at the beginning. The iteration process promotes annotation ability continuously by expanding the knowledge base. [Results] The average score of F-value runs up to 0.911 2 on a dataset with 15 041 sentences. [Limitations] The annotation efficiency might be relatively low on sparse data. [Conclusions] The experimental data shows that the algorithm in this study not only declines the dataset size requirement of machine learning dramatically, but also increases annotation efficiency.
[Objective] This paper trends to expand retrieval approach in “Classic Reading” Teaching System and improve utilization of classical teaching resources. [Context] “Classic Reading” Teaching System is a credit-based innovation platform on teaching system, and adding image retrieval function can greatly extend the existing text-based retrieval and improve teaching effects. [Methods] This paper establishes the Semantic-Based Image Retrieval Model including extracting features, vector normalization and similarity measurement, realizing four modules including query-submit, image-retrieval, result-feedback and image management. [Results] The images in the platform are classified automatically and students can find the book with a related image, and the precise of image retrieval lays between 92% and 100%. [Conclusions] It can improve user experience as well as the teaching effects of “Classic Reading”.
[Objective] Integrating library services into WeChat public platform to improve the level of library information services. [Context] The rise of WeChat active users and the public account is divided into service account and subscription account. [Methods] Using the WeChat platform interfaces and the technology about Java Servlet, WebService to integrate the library service such as OPAC search, book renews, remind service, reference work. [Results] Users can use library resources and services in the form of one-key operation. [Conclusions] This application can enrich the service forms of the library and users can use library service comfortably.