[Objective] The paper aims to improve the ability of automatic text classification of social tagging by controlling the relation and quality of social tagging. [Methods] A classification model called “core controlled, shell uncontrolled” is constructed based on the control of a concept space called Social tagging-Keyword in order to realize the regulation control of social tagging based on subject headings. [Results] The validity tests show that this new method has a better performance on the text classification based on social tagging in consideration of efficiency and the cost. [Limitations] The data used for concept space is not as much as possible due to the restriction of the Website. Also, the concept space is lack of deep semantic relations which would be richer in the future. [Conclusions] This study proposes a feasible solution for improving the quality of social tags and the capacity of automatic text classification.
[Objective] To reuse and share implicit knowledge in complex product design. [Methods] Apply Ontology reasoning to the product design. After analyzing product design rules requirement and type, divide the rules into logical relationship and parameter constraint rule. Then transfer the rules to inference rules, achieve inference through product design-oriented Ontology reasoning framework. [Results] In the manipulator design reasoning experiment, manipulator design Ontology is built. It achieves the dynamic drive and parameter values to expand knowledge. [Limitations] The reason for qualitative parameters in the design constraints is not realized. [Conclusions] The experimental results verify the feasibility of the Ontology reasoning framework.
[Objective] This article aims to extract innovation points of sentence-level from scientific research paper of specific domain. [Methods] The field thesaurus and Ontology are used in constructing rules to extract innovation points from sentences in research papers, and a redundancy computing method based on keyword-overlap computing is used to filter redundant innovation points. [Results] The experiment is undertaken on data set of Neoplasm and the result shows that the accuracy rate is 89.42% and the recall rate is 60.14%. [Limitations] The rules need to be further improved, and the recall rate needs to be improved. [Conclusions] Using field thesaurus and the relationships in Ontology is effective in extracting innovation points from scientific research paper.
[Objective] The paper surveys and summaries the general situation of the research on music recommendation, discusses the existing problems, and proposes the corresponding research hot spot. [Methods] By using literature analysis method, the paper introduces each recommended strategy briefly from the angle of the recommendation algorithm, categorizes and summaries the articles mainly relating to music recommendation from different description perspectives of music resources. [Results] Further put forward new ideas by using rough set theory to extract the important context information, then combining user preferences under the context with collaborative filtering recommendation technology to realize music recommendation based on context-awareness. [Conclusions] There are some problems existing in the study, such as the lack of systematic research on user behavior and demand, low level of feature extraction and single evaluation index. The future development directions of music recommendation will be discussed deeply from the angle of group music recommendation, Ontology modeling and context-aware music recommendation in the mobile environment.
[Objective] This paper presents a non-uniform node clustered graph layout algorithm in order to realize intuitive, lively and beautiful information visualization. [Methods] After insight into the relationship between forcedirected algorithm and information visualization, the paper puts forward this algorithm based on force-directed model with the help of cluster and non-uniform node concepts, using charge theory as a breakthrough. The algorithm employs the hierarchical layout ideas, and every layout unit is produced independently by similar but different layout strategies. [Results] A visualization prototype system for the NKOS is implemented with it, and can be widely applied to visualizing the instances of concept class in the NKOS (especially the Chinese NKOS). [Limitations] The result of the proposed algorithm convergence conditions is not significant, so that in the process of layout, there is redundant node vibrating. Temperature and other related concepts of neural computation can be introduced to solve it in the future. [Conclusions] The paper finds a way to transfer a graph structure with semantic information into a tree structure, and based on the cluster concept, using the force directed algorithms to solve its layout problems. This algorithm can deal with the visualization for instances of concepts in Chinese NKOS, such as OntoThesaurus, and the drawing community can solve other similar problems by using it as a reference.
[Objective] Product design knowledge is obtained as fast and accurate as possible in order to meet complex product design process needs. [Methods] Use Ontology as knowledge representation model to organize and represent product design knowledge so as to provide a common understand of product design knowledge. Use Bayesian algorithm to identify the type of retrieval questions in order to reduce the scope of the candidate questions calculate keywords similarity between retrieval question and candidate questions based on TF and cosine similarity, calculate syntax similarity based on word forms and sentence length of retrieval question. [Results] Test result shows that accuracy rate is 91.3%, the recall rate is 86.2%, and accuracy rate better than other algorithms. [Limitations] Search result depends on the number of candidate questions. For large-scale data, complexity of similarity algorithm is very high, and the algorithm needs further optimization. [Conclusions] The method is effective and has a positive significance for identifying the type of questions and similarity computation.
[Objective] To help readers select interested communities from massive reader communities. [Methods] This paper proposes virtual reader community recommendation method based on probabilistic topic model, which builds reader-reader and reader-community relations on different topics by finding latent topics of reader communities, and then recommends reader communities by considering topic similarities of both communities and readers. [Results] Experiments on real data prove that the method can effectively find latent topics of reader communities and accurately recommend virtual reader communities compared with existing recommendation methods. [Limitations] Exist cold start problem of recommendation. [Conclusions] This method helps readers accurately and quickly find interested topic-related virtual reader community, promoting the communication of readers and the development of virtual reader communities.
[Objective] Through the analysis of the structure and organization of the intelligent knowledge repository of IETM, to investigate the evolution mode in mapping approaches and the changes of mapping sets of intelligent knowledge repository of IETM. [Methods] Based on the methods of Ontology mapping evolution, this paper introduces the concept of mapping sets to indicate the presence form of mapping, represents the change rule of mapping relationship in adding, deleting and modifying data module. [Results] Put forward a set of algorithm which can support the mapping evolution of intelligent IETM, this algorithm can not only satisfy the comprehensiveness and accuracy of mapping evolution, but also can improve the efficiency of the mapping evolution. [Limitations] The mapping algorithm is a preliminary study on two important database mappings of IETM. It only involves changes of mapping set, but not have too much involved in the study of mapping generation algorithm. [Conclusions] In this paper, the mapping evolution algorithm improves the standardization and efficiency of the mapping evolution of intelligent knowledge base of IETM, lays a certain foundation in achiving the automation of mapping evolution.
[Objective] To improve the classification performance and classification speed based on the KNN algorithm. [Methods] This paper proposes a classification algorithm based on the average category similarity, to judge the type of the test text by calculating the mean value of the text similarities of the test text and all texts of each category in the training set. [Results] The experimental results on the Fudan, balanced Sogou and unbalanced Sogou public corpus show that compared with KNN classification algorithm, the Macro_F1 on the two corpora of the method in this paper is increased by 3.5%, 3.2% and 3.3% respectively, the classification speed is 1/22, 1/6 and 1/5 respectively of KNN algorithm. [Limitations] Considering the time efficiency of KNN algorithm, the number of text of the experimental data is few. [Conclusions] It is a kind of practical classification algorithm for large scale text classification contrast with KNN.
[Objective] In terms of the class features vector changing and overlapping, this paper improves the classification algorithm conducted by super ball supported vector machine. [Methods] Starting from combing the operational mechanism of LDA and HS-SVM, as well as the related studies, this paper constructs a text classification model based on LDA and HS-SVM. The traditional HS-SVM is improved considering incremental learning and intensive degree, and then the dynamic change of hyper-sphere class' support vector would be achieved and the decision function for constructing hyper-sphere support vector machine would be accurately calculated. [Results] The effect of text classification can be improved from the perspectives of precision rate and recall rate. Comparative experiments are conducted and the results demonstrate that methods in this article are feasible and effective which can effectively improve texts classification. In addition, this method reduces the time of modeling and has little influence on accuracy of predication. [Limitations] Noted that the proposal in this paper is comparatively more complex than the original algorithm that need continuous improvement; and the results needs experiments on more data sets. Meanwhile, the improvement on essence of algorithm is not optimal which is necessary to be further studied. [Conclusions] This study is helpful to improve the accuracy and reduce the training time in large-scale text categorization, and also improve the efficiency and performance of text classification.
[Objective] A model of Review Attribute of Product-Based Emotion Evaluate(RAPBEE) Model is proposed to detect fake reviews of online products. [Methods] Combined with the known research on the reviews effectiveness evaluation, the measuring method of review attribute of product-based emotion outlier detection is used to comprehensive sort the reliability of the reviews, so as to detect the fake reviews. [Results] The test data set is based on the R language to run the model, the results show that after calculated by the RAPBEE model the review sequencing has achieved 86.2% of agreement compared with the real situation which indicates that the RAPBEE model has a strong practical ability and fitness. [Limitations] The model stability depends on the modeling way of the attribute dictionary and the method also can be improved when dealing with large amounts of data set(Big Data). [Conclusions] The paper proposes a new method to deal with the Chinese fake reviews detection of online products, and this method has a strong expandability in reality.
[Objective] To segment Chinese patent claims and fulfill the research needs of patent similarity. [Methods] This paper not only summarizes the segmentation words, the rules of substring segmentation and the rules of domain terms extraction, but also constructs the domain dictionary. The method based on domain dictionaries and rules to segment Chinese patent claims is presented. [Results] The experimental results show that the precision is 90%, the recall-rate is 95%, and F-score is 92%. [Limitations] However, the huge field of dictionaries reduces the efficiency of large-scale segmentation. [Conclusions] This proposed method further improves the effectiveness and efficiency of Chinese patent claims segmentation.
[Objective] This study explores the type and cooperation of institution for high-tech talent based on the data from ESI and WoS. [Methods] The type, cooperation and dynamic evolution of affiliated institutions are identified by the triple helix model. [Results] It is finded that the author's institutions mainly belong to university. The cooperation between university and industry as well as between university and government are generally becoming a trend. However, there are scarcely authors' institution including simultaneously university, government and industry. The result also shows that the mainly teamwork approach is still the interior cooperation during university. And intersecting collaboration is rather weak. Meanwhile, the increase of T(ugi) with the time means that the cross development of subject will be the future direction. [Limitations] Since the research is based on the principle of typical sampling, there are some limitations. They will be improved in the future work. [Conclusions] This paper expands the application range of the triple helix model. There are some implications for high-tech talent's growth and theory.