[Objective] Develope Web Archive System of International Institutions. [Methods] Based on IIPC open source software framework, this paper applies a three layer expansion strategy in the acquisition terminal, provides automatical uploading and reporting function in the acquisition client, develops a WARC parser which can analyze the content of WARC file, uses Solr to be an indexer. [Results] This paper implements acquisition expansion, promotes the automatical level of system workflow by adding more function modules in the acquisition client, extracts more information by developing WARC parser modules, uses Solr to enrich index and retrieval service. [Limitations] Lack of large-scale Web archive to verify this platform. [Conclusions] The expanded Web archive framework becomes distributed, extended and full automatic.
[Objective] Expand the queries to get the query topic. [Methods] Get the query expansion text by using the pseudo-feedback technology, extract the text features and combine them by the proposed partial matching rules and vector space compression algorithm. In the end, the query topic classification can be done by the Cosine Include Angle and SVM. [Results] The precision can reach 90.34%, the recall rate is 89.34%, the F value is 89.67% and the accuracy is 89.24%. [Limitations] Online processing efficiency is not high because of expanding the queries using the searching results. [Conclusions] The proposed method is effective in query topic classification. Using the machine learning method can get the better experimental results than the Cosine Include Angle and it is significative for improving the quality of search engine.
[Objective] In order to get a higher precision, this paper is to improve the feature weighting method by introducing the effect of part of speech.[Methods]The effectiveness between introducing the part of speech into feature weighting and the classical TF-IDF is contrasted in text classification. In the approach of text classification introducing part of speech, the weights of part of speech is used forthe feature weighting calculation, and using Particle Swarm Optimization to find the best weights of the part of speech. The parallel tests all use SVM classifier.[Results] The experiment results show that the improved feature weighting method performs better than the classical TF-IDF method, and the precision of text classification achieves obvious improvement in different dimensions of feature space, and the increments are between 2% and 6%.[Limitations] Because of the lack of experimental conditions, the weights ensured in the experiment is only a result close to the best weights, it is needed to expand the scale of data and increase the number of iterations so as to get better weights.[Conclusions] Introducing part of speech into text classification can get a higher precision. The influence degree of part of speech is nouns, verbs and string in decreasing order. The modified feature weighting method is not only applicable to a particular corpus, but also the general one.
[Objective] To improve the efficiency of concepts filter by using three concept filter method with thesaurus and text. [Methods] This paper proposes a method for domain concepts triple-layer filter. Extract domain concepts from data sources containing thesaurus and text. Focuse on calculating the concepts properties and field properties of domain concepts through concepts correlation, concepts context and concepts territoriality. [Results] Experimental results show that the precision reaches 74.71% and the recall reaches 71.25% based on triple-layer filter method. [Limitations] Data sources are only about mapping, this paper doesn't use the data in other fields to demonstrate the feasibility of method. [Conclusions] This paper improves the precision and recall of domain concepts filter. Comprehensive efficiency is higher than other methods. This method could filter domain concepts from different subjects with high efficiency.
[Objective] Design a method to automatically compute Chinese word abstractness, and introduce it into metaphor identification task in natural language understanding. [Methods] The word abstractness is computed by logistic regression model. The features are the word vectors computed by neural network model and the feature weight vectors come from a hand coded abstractness dictionary. A metaphor identification algorithm based on word abstractness is proposed to demonstrate the validity of this method. [Results] By comparing with the existing methods of word abstractness computing, this method has better accordance with human cognition and is an effective method in metaphor identification task. [Limitations] The utilization of word vectors for word abstractness is defective. The scale of the abstract words affects the learning of feature weight vectors. [Conclusions] Word abstractness computing reflects the ability to concept classification, Chinese word abstractness computed by this method is better fitting the human cognition, and the experimental results show that word abstractness can improve the effect of metaphor identification.
[Objective] Establish a model to improve the out-of-vocabulary identification capability, reduce the cost of manual intervention. [Methods] On the basis of the hypothesis, a out-of-vocabulary identification model is set up combining CRFs and domain Ontology elements set. Using biodiversity text as samples, the rationality of the model is verified by comparing the performance differences among models and testing hypothesis. [Results] The experimental results show that the model established by this study has the best identification capability. The results prove that the hypothesis is true, and the model is reasonable and scientific. [Limitations] The tagging accuracy of the model remains to be improved. [Conclusions] The model established in this paper has better identification capability, while greatly reducing the cost of artificial training dataset.
[Objective] Research on the relationship between the first 80 chapters and the last 40 chapters of “A Dream of Red Mansions”. [Methods] Combined quantitative with qualitative method, compare the first 40 chapters, the middle 40 chapters and last 40 chapters with each other to calculate the ratios of the unique words of every part. Clustering is conducted respectively by utilizing the function words, N-gram model of words and part-of-speech, all content words and the word length, compute the similarities among the first 40 chapters, the middle 40 chapters and last 40 chapters according to high-frequency words. [Results] There are differences between the first 80 chapters and the last 40 chapters. There are less long words in the first 80 chapters and it is more readable and coherent than the last 40 chapters. The first 80 chapters pay more attention to description of details, while the last 40 chapters focus more on the description of actions and scenes. [Limitations] Only consider words and N-gram models, semantic and pragmatic features are not utilized. [Conclusions] The author of the first 80 chapters and the author of the last 40 chapters are not the same according to these features.
[Objective] This paper aims to enhance quality and efficiency of community detection in recommandation system by controling propagation direction of labels. [Methods] A community detection algorithm via neighborhood node influence based label propagation is proposed to optimize label propagation paths and update nodes labels stably and effectively. [Results] The experimental analysis on artificial and real social network datasets verifies that updating and propagating labels based on neighborhood influlence can reduce labels updating space and time. [Limitations] The dataset used in this paper is not enough due to the restriction of the website, and the notion of neighborhood node influence needs to be generalized. [Conclusions] This study proposes a feasible solution to enhance community detection quality by reducing label propagation instability based on neighborhood influences.
[Objective] The paper aims to predict the cooperation between scholars according to the academic research network's structural information. [Methods] A novel mixture topological factor predictive model called MTF is proposed, which cooperating local feature factors and global community factors. This model firstly introduces Naïve Bayesian algorithm to calculate local factors and then uses community contribution to compute the global factors. [Results] Experimental results show that MTF method can effectively handle the task of real scientific collaboration network relationships prediction, also performs better than some of the classic and newly proposed algorithms. [Limitations] The data used in the experiments should be at a larger scale. [Conclusions] This paper proves that the proposed model can mine community information for improving prediction performance, which leads to a new path in such area.
[Objective] To study how the star role and broker role in social media knowledge collaboration network effect on knowledge dissemination. [Methods] This paper constructs knowledge collaboration network using 197 related biological knowledge samples from Wikipedia, analyzes the relevant indicators of knowledge nodes by social network analysis tools and studies model estimation using statistical methods. [Results] Star units in the network center position and broker units having more structural holes could get better dissemination effects. The scale of the node's fans group collaboration in the network plays a half intermediary role in social media knowledge dissemination. [Limitations] Samples are limited to the knowledge nodes in biological science field, and from the perspective of the overall network, whether boundary selection and discipline affect the study remains to be further analyzed. [Conclusions] The advantages of star nodes and broker nodes play a direct role on knowledge dissemination and an indirect role by the media effect of the fan group.
[Objective] Design a framework for STKOS multi-versions and inter-version revise management and implement a STKOS version management system based on the framework. [Context] The sharing service platform of STKOS defines the content of STKOS version management, including multi-versions and inter-version revise management. [Methods] Firstly, define three types of STKOS versions, namely historic version, active service version, temporary version. Then design the data structure of STKOS changing information and a framework of STKOS multi-versions and inter-version revise managements. Finally, implement a STKOS version management system based on the framework and the medical STKOS data. [Results] In ten millions data size, this system implements the version management of STKOS. [Conclusions] The system can support multi-versions and inter-version revise management of large scale STKOS data at the same time.
[Objective] Take advantage of Ontology reasoning for linkage discovery of linked data. [Context] Based on the application in library and with book resources as research object, Ontology reasoning is applied to establish the linkage relationship between resources. [Methods] The linkage discovery framework is proposed and description of each layer is given. Fuseki, Jena, Pubby and PHP are used to implement the framework, and effectiveness inspection scheme is designed and executed. [Results] The experiment results show that the framework can establish the linkage relationship between book resources. By comparing with the similarity match method, the average recall ratio of linkage discovery can be increased by about 15%. Also, semantic knowledge discovery can be achieved. [Conclusions] Ontology reasoning can effectively be applied to the linkage discovery of linked data, which has high engineering application value.
[Objective] Expand the self-service channels and improve user experience by creating a new mode for graduation deactivation. [Context] Traditional graduation deactivation system is less-efficient and isolated. The new one-stop self-service mode would be much prevailed than the traditional one in terms of facing the challenges that caused by further developed digital campus network and massively increased workloads. [Methods] Based on MVC and upgraded three-layer structure, this new system uses open source library ‘Duilib’ and combines the technologies of ODBC, API and Web Serivces, so as to integrate all graduation information into processing platform. [Results] 90% of graduation affairs, including balance settling and account cancellation, could be done through the self-service terminals by users themselves. [Conclusions] This graduation deactivation system makes the whole process more intelligent, standard and clear, and transparent to third-party development as well.