[Objective] To understand China arXiv.org users' cognition, use conditions, and their suggestion of arXiv development.[Methods] By survey teachers, researchers and students from domestic 9 universities and institutes, the authors collect 510 respondents data by using SPSS to analyze.[Results] The questionnaire results show that Chinese researchers have not yet fully known arXiv, but some respondent who used arXiv considered that arXiv is an important approach to endeavour the first publishing rights and peer review.[Limitations] The sample selects users only fromuniversities and institutes of China arXiv service group, not include the other users.[Conclusions] China arXiv service group should support more activities to quickly promote arXiv in many application approaches in order to deliver arXiv's benefits to Chinese researchers.
[Objective] The construction of the service interface in STKOS sharing infrastructure can effectively help information services industry to achieve standardization semantic annotation process, semantic search and browsing,knowledge inference and discovery.[Context] Using a standard specification to build open interfaces is one of the important ways to implement services by STKOS.[Methods] Based on STKOS API, the modular design of STKOS query and inference interface are described, and the interface specification design is proposed.[Results] On the basis of the interface, various types of modular interface methods are combined and three types of demonstration scenarios are achieved, including taxonomic clustering, resource annotation, intelligent retrieval.[Conclusions] The objective of enhancing the third-party information systems knowledge service capabilities by STKOS is achieved.
[Objective] Study the problems of automatic mapping aiming to realize integrated retrieval, browsing,downloading information cross regions and language through classification inter operation.[Methods] Discuss semantic similarity algorithm considering characteristic sets, category matching rules and semantic relation based on artificial mapping data.[Results] Prove that 80% categories are the same as the results of artificial mapping in the experiment.[Limitations] The similarity of categories based on characteristic sets, is short of matching of semantic operation. Alsois only in the field of science, it is necessary to apply in other fields in the future.[Conclusions] The algorithm is considering comprehensively category names, notations, subject vocabularies, and semantic relations which define connotation and denotation of concepts, comparing to the existing limitation of relying solely on the category names matching method.
[Objective] To improve digital resources text categorization in hierarchical structure by adjusting skewed distribution in training sets.[Methods] This paper proposes a new method named B-LDA to improve text categorization by integrating granule partitions with LDA. The new method firstly divides rare classes based on granular partition criteria to realize transferring the granularity space of training set, then modeles important texts based on probabilistic topic models, and generates new texts by using global semantic information represented by probabilistic topic models, until the distribution of different categories becomes more balanced.[Results] The results show that with the changing of the number of characters, the F1-Value for different unbalanced level training sets has been improved between 2.7% and 9.9%.[Limitations] This paper involves only part of imbalance condition, when constructs training set for experiments because of the limitation of corpus scale. In addition, the overlap degree of the two categories selected randomly will affect the classification performance of the new method.[Conclusions] The new method can achieve better performance under imbalance data sets which composed by the text information of the bibliography of books, the title of journals and Web pages.
[Objective] This study aims to identify structural contents of scientific abstract automatically by classifying the academic abstracts sentences based on machine learning with limited samples.[Methods] This paper designs a variety of text features to represent scientific abstract sentences, then extracts these features from the academic abstracts based on natural language processing techniques so as to instruct Naive Bayesian Model and Support Vector Machines in training, and ultimately identifies the structure of academic abstracts automatically by using these models.[Results]Experiments show that the method can achieve fairly even better recognition accuracy compared with previous methods by using less training corpus.[Limitations] Due to the lack of feature words and core verbs in abstract sentences with"METHOD" class label, it resulted in a lower recognition accuracy on these sentences.[Conclusions] This method is an effective approach to achieve the automatic recognition of academic abstracts structure by using limited corpus.
[Objective] Realize keyword extraction through the merger of the internal structure information of single document and the topic information among documents.[Methods] LDA is used for topic modeling and influence calculation of candidate keywords, then, the Text Rank algorithm is improved and the importance of the candidate words is uneven transferred by topic influences and word adjacency relations. Furthermore, the probability transition matrix for iterative calculation is built and used to extract keywords.[Results] The effective combination of LDA and Text Rank is achieved, and the keyword extraction results are improved significantly when the data set presents strong topic distribution.[Limitations] High-cost multi-document topic analysis is required for combination method.[Conclusions] Document keywords are associated with document itself and the related documents collection,combination of these two aspects is an effective way to improve the results of keyword extraction.
[Objective] This paper tries to re-rank search results with the help of subject indexing in the process of pseudo feedback.[Methods] User queries are represented with probability distributions over subject terms by mining the user query and subject term association in the manner of language modeling. The weights of subject terms in documents are calculated by incorporating the generative language models for subject terms. Then re-calculate the score of search documents in the first retrieval and re-rank the documents according to their scores.[Results] The proposed method constructs the generative langauge models for subject terms and mines weights of subject terms in documents appropriately. The re-rank results are pervasively improved over the initial retieval.[Limitations] Different methods of mining the associations between subject terms and documents are not compared. This approach doesn't test the data sets with different scales or in different languages.[Conclusions] The re-rank approach can improve the retrieval precision,which exploits the associations between user queries, documents and subject terms.
[Objective] To improve recommendation quality of collaborative filtering recommender method based on Self Organizing Map(SOM) and Radial Basis Function Neural Network (RBFN).[Context] Aiming at sparsity problems in collaborative filtering method, this paper proposes to predict missing evaluation values with artificial neural networks, and puts forward a new solutions to improve recommendation accuracy.[Methods] This paper puts forward pre-clustering similar users based on user rating matrix with SOM neural network. Based on the similarity of users in the same cluster, RBFN is used to fill missing values in sparse rating matrix. After that, collaborative filtering is used to generate recommendation based on complete rating matrix.[Results] Compared with traditional mainstreamfiltering method, MAE and F-Measure experimental results show that the proposed method is more effective both in theaccuracy and relevance of recommendations.[Limitations] The proposed method is only tested on the public data set from Movie Lens, and it need further examination in other data sets.[Conclusions] The recommender method proposed in this paper solves the sparsity problem in collaborative filtering recommendation to a certain extent, and it is also aguidance to solve the cold start and scalability problems.
[Objective] The thesis explores the visualization and the measurement method of the tags semantic distance in folk sonomy, and lays foundations for optimizing the navigation algorithm of related tags.[Context] The thesis weakens the "topic drift" in the navigation of related tags and improves the knowledge service performances in folk sonomy websites such as Bib Sonomy by the visualization of the semantic distance.[Methods] The thesis designs an algorithm which helps choose the tested tags sets and measure the semantic distance, and visualizing the final results by a map with threshold value, based on the data in Bib Sonomy.[Results] There exist close semantic tags and distants emantic tags in test set, which affects the topic drift level in the behavior of the related tags navigation.[Conclusions]Semantic visualization method help users to distinguish semantic attributes between the related tags sets, and improve the navigation performances of the tags.
[Objective] Hierarchy visualization is an intuitional way to analyze semantic relations between Folksonomies by enhancing users' cognition.[Context] Folksonomy reflects the meaning of Web resources well from the perspective of the common users. Hierarchy information visualization technology, as a precise tool of representing abstract information, is widely used to assist users to cognize and analyze hierarchy data set.[Methods] Firstly, afive-tuple method is improved to describe the semantics of Folksonomy. Secondly, the paper uses an existing classification to make the Folksonomies have hierarchy relations. At last, it puts forward an information visualizational gorithm to display the Folksonomy set based on hierarchy structure.[Results] Experiments show that it reveals represents hierarchy relations of Folksonomies clearly and intuitively, improving the layout effectively. Other semantic relationships are stored in Folksonomy node for less influence on users' cognition.[Conclusions] It is proved to be an effective and simple way to visualize hierarchy information from the perspective of optimizing the overall layout and enhance the ability of user cognition.
[Objective] This article discusses the credit of e-businessmen who used the third party e-business platform.[Methods] First, the weight of e-businessmen credit evaluation index system should be made clearly. Secondly, there views of customers are quantified by Chinese words segmentation technology and emotional word polarity identification method. Thirdly, the credit of e-businessmen is calculated with grey correlation analysis method.[Results]The degrees of membership of four levels, which include best, better, general and poor level are calculated. Then, it can be concluded the credit of e-businessman by using the result.[Conclusions] With the method of grey correlation analysis in the situation of incomplete information and small sample, the authors can formulate a reasonable method of evaluating the credit of e-businessman using the review of customers. This method can quantify the contents in arelatively unified standard, and acquire the distribution of different evaluation.
[Objective] This paper aims to reveal the common structural features of keyword network of scientific research areas both at the macro level and micro level.[Methods] Three keyword networks are constructed. Theirmacro feature properties are compared with ER network, BA network and SW network, and regression analysis on theirmicro feature properties are performed.[Results] The degree sequence of keyword network shows a power-law distribution, the average clustering coefficient of them is extremely high and the average path length of them is short.The degree, betweenness centrality, eigenvector centrality, and triad closure of nodes and the frequency of keywords have positive linear correlations, while there is an inverse relationship between the local clustering coefficient of nodesand their degree.[Limitations] Samples need to be expanded to more other disciplines.[Conclusions] The keyword network of scientific research areas are special scale-free networks with small world effect, modularity, hierarchy and high centripetalism.
[Objective] The online "water army" causes the distortion of network information. The paper proposes two methods to detect water army.[Context] Use the methods to detect the "water army" existed on movie website,e-commerce website and so on.[Methods] The paper proposes static and dynamic methods to detect water army, and designs an intensity index to show the fluctuations of the number of reviews relative to the overall in one day.[Results]The paper uses mining technology to collect rating data of Douban movie site, then analyses the ratings to identify the"water army", which verifies the effectiveness of two detection methods.[Conclusions] The combination of the static and dynamic detection methods can detect the existence of "water army" phenomenon effectively. But it also has some limitations, for example, the insufficient rating data affects the detection.
[Objective] A topic classification extraction model named SM_ F_ HT is proposed to find multiple topics more effectively in Chinese SMS text message Flow (SM少).[Methods] In this model, SM_ F is divided into SMS text subsets TF-IDF combined with the hierarchical Dirichlet processes of information extraction are used to build multiple probability distributions of the SMS text vector set. Finally topic classification on SM_ F is extracted using Gibbs sampling in conjunction with the probability of the characteristic words which belong to local topic.[Results]Experimental results show that SM_F_HT is superior to CCLDA and CCMix models in perplexity and log like lihood ratio.[Limitations] In fields of SMS text pre processing and keyword extraction, this algorithm still needs further optimization.[Conclusions] The SM_ F_ HT scheme is effective for multiple topics classification extraction of SM_F.
[Objective] In order to reduce the noise and enhance customers' satisfaction in expert retrieval system, the authors put forward the idea of credibility evaluation mechanism under user's control.[Methods] Firstly, based on the binary independence retrieval model, this paper brings out the principles and assumptions that the design of evaluation mechanism needs to follow. Secondly, fousing on expert feature to define parameter, this paper has respectively designed the front-end credibility evaluation mechanism and the back-end credibility evaluation mechanism.[Results]Setting academic experts retrieval for example, the authors point out that the front-end mechanism corresponding to information organization attempts to reduce the noise in the expert feature recognition via finding the best length of expert eigen vector, while the back-end mechanism deeply integrates users into the retrieval by setting user relevant feedback as the necessary reference of path selection.[Limitations] The front-end mechanism can not deal with user query including more words, and the back-end mechanism has higher requirement of expert information organization.[Conclusions] Making combination with two mechanisms, this new mechanism can expand associated resources for expert feature recognition and upgrade user involvement.
[Objective] Translation correspondence in English-Chinese cross-lingual plagiarism documents is studied.[Methods] Similarity analysis is taken according to bilingual lexicons. To improve the precision and efficiency of corresponding words recognition, this study merges and sorts several bilingual lexicons. As to the problems of disambiguation and multiple matching, the paper proposes a method which applies word distribution and matching location to select the proper translation items. Similarities between sentences and paragraphs are defined on the stratified complex features such as word matching category, position of words and so on.[Results] Experiments on real translation documents show that precision and recall of retrieval reach 0.841 and 0.748 respectively.[Limitations] Out of Vocabulary (00V) correspondence is still hard to judge by lexicons.[Conclusions] The approach of cross-lingual similarity detection based on bilingual lexicons is easy to implement and has a wide range of application.
[Objective] Extend the service fields of ‘Xiaotu’——the smart talking robot of Tsinghua University Library by designing and developing the mobile APP and We Chat public platform service.[Context] With the development and popularity of the smart phones and mobile Internet, the mobile APP and We Chat have become main portals on mobile terminals.[Methods] Services in this paper make use of various interfaces to communicate with the main server of‘Xiaotu’ as well as transfer commands and messsages based on the development mode of the mobile terminal and the WeChat public platform, embedding the basic functions of ‘Xiaotu’ into the mobile APP and We Chat.[Results] Users can conveniently talk with ‘Xiaotu’ and search information in the mobile terminal and social network environments.[Conclusions] This application expands the application environment of ‘Xiaotu’—a special service of Tsinghua University Library, providing a ubiquitous service everywhere.
[Objective] To realize automatic monitoring running state of library document database through the design of program.[Context] In view of the massive amount of document database, state inspection by artificial is inefficiency and fault discovery is not in time, automatic monitoring and analysis has more advantages.[Methods] The system is developed by VB.NET in the environment of Win7, and the method is to simulate the readers accessing database, to obtain status information in three aspects of access, retrieval and reading from document database.[Results] The detection of database running state regularly and automatically, the fault messages in Email or QQ alarm automatically, multi-dimensional analysis of status information visualization and other functions is implemented.[Conclusions] The discovery and treatment of fault messages more timely in practical application.
[Objective] Extend the service channels of library and enhance patron experience by WeChat public platform.[Context] WeChat becomes a very popular mobile information communication platform, favors by readers.[Methods]In the development mode, the article selects the.NET as a development environment, realizes library busines sembedded the WeChat based on open source SDK by parsing XML messages sent from the public platform, building query with library operating system information and packaging the query results to XML.[Results] Readers can conveniently access the library resources and services with command interaction by WeChat public library.[Conclusions] This application can expand mobile library services and improve service quality.