[Objective] This paper explores the theoretical foundation and key research topics of social public opinion analysis and decision making support with the help of big data. [Methods] We first reviewed the related theories and methods of library and information science, communication studies, public administration, computer science, psychology, system dynamics, and complex network. Then, we summarized key research based on field studies and practice analysis. [Results] We proposed six perspectives to guide research design and content organization, and also tried to solve five key issues including the effects of social public opinion dissemination to government’s decision making. [Conclusions] Big data brings new opportunities for social public opinion analysis and decision making support, which require much more new research.
[Objective] This paper proposes a new method to detect real time bursty events accurately and efficiently from massive micro-blog posts. It provides decision-making information to public opinion emergency management. [Methods] First, we introduced the reference time window mechanism, and then designed an algorithm to process the data of word frequency, document frequency, Hashtags, and word frequency growth rates. Second, used this dynamic threshold based algorithm to extract bursty words. Third, transformed micro-blog texts to feature vector of the bursty words. Finally, we detected the bursty events using agglomerative hierarchical clustering algorithm. [Results] The bursty events detection method reached 80% of accuracy rate compared with real world cases. Thus, the proposed method was feasible and effective. [Limitations] We could not describe the detected emergencies automatically due to the limits of data and size of the current study. More research is needed to analyze users’ emotion and semantic relationships among the bursty events. [Conclusions] Our study fills the knowledge gaps left by previous research, and improves the efficiency of retrieving bursty events from micro-blog posts.
[Objective] This paper analyzes the motivation and evolution of netizens’ behavior in crises with the help of the “Belief-Desire-Intention (BDI)” model to guide netizens’ emotion, and then builds a computing model for crisis response in the complex network environment. [Methods] First, we designed a model for interaction among netizens, government and media in public opinion crisis to simulate the netizens’ emotional changes, based on the BDI-Agent theoretical model. This model could reveal reasons for the changing of public opinion and help us create better crisis response strategies. Second, we built an experimental model with the Agent property, reasoning rules and interaction designs to examine the algorithm with real world cases. [Results] Our empirical study showed that the proposed model was feasible. [Limitations] More real world cases were needed to further optimize the new model. [Conclusions] The proposed BDI-agent model could map the complicated public opinion context to a reasonable computing model, which could help us predict the future development of the public opinion crises and design better response strategy.
[Objective] Technical fields that closely related to basic research require radical innovation identification from the content of scientific knowledge cited in patents (SKCP). [Methods] This paper firstly extracts keywords and subject categories of scientific references in patents to represent SKCP, then identifies topics in keywords co-occurrence network and combinations of subject categories, finally proposes the method of topic mutation degree calculation based on keywords and subject categories, to identify technical topics of radical innovation. [Results] In the domain of Nano electronics, Nano circuit is an approved radical innovation. The related topics about this are confirmed using proposed method including Nano wire, carbon nanotubes, computing circuit, and Nano materials and the manufacturing technologies. Moreover, the corresponding combination of subject categories is materials science, chemistry, optics, biology and applied physics. [Limitations] The accuracy of SKCP’s extraction, preprocessing and matching needs to be improved and the generality of method needs to be validated in other areas. [Conclusions] This method is an important improvement and supplement of radical innovation identification based on patent information and could be extended to other technical fields that are closely related to basic research.
[Objective] The paper examines the role of comment-clusters in public opinion mining. [Methods] We proposed a model to study the Comment-Clusters based on social network analysis techniques. First, we collected comments received by online news reports on three trending events as raw data. Second, we analysed structures and contents of these comments with the help of the vector relationship among them to identify the best comment-clusters. Finally, we conducted semantic analysis of the key players and their comments to investigate their sentiments and then compared them with those of the whole data set. [Results] The sentiments got from the whole data set and the comment-clusters were very close to each other. Comment-Clusters improved the performance of public opinion mining algorithm. [Limitations] The method of identifying and extracting sentiment words might yield errors. [Conclusions] The comment-clusters improve the sentiment orientation computing, which helps us obtain the public opinion more efficiently.
[Objective] To identify emerging technologies from academic papers and patents. [Methods] We adopted the Latent Dirichlet Allocation (LDA) model to find technical topics and used the similarity theory to retrieve emerging technologies from the electric car data. [Results] The proposed method was more efficient than exisiting ones. It reduced the subjectivity of the experts’ evaluation and the amount of data to be analyzed. [Limitations] We did not include the expert scoring experiment in this study, thus, we could not compare the new model’s performance with those involving human judgements. [Conclusions] The proposed model could identify emerging technologies effectively and then reduce the document reading load of the experts.
[Objective] This paper studies the mechanism of knowledge dissemination in virutal communities with the help of a new theoretical model. [Methods] First, we collected data from the virtual community GitHub. Second, we examined these data with social network and regression analysis techniques. Finally, we explored the influence of community member’s online position, physical location as well as attitude towards innovation to the knowledge dissemination speed and scope. [Results] We found that the number of online community members could change the scope of knowledge dissemination. The attitude towards innovation could affect the knowledge dissemination speed. The clustering extent posed negative effects to the knowledge dissemination scope and speed. [Limitations] This study was based on one virtual community. More research is needed to generalize the findings. [Conclusions] This study provides some strategical suggestion to virtual community management as well as members’ knowledge sharing and innovation activities.
[Objective] This paper proposes an algorithm to extract topic and opinion information from the microblog posts automatically. [Methods] First, we used the improved TF-IDF algorithm to build the topic characteristic word vector. Second, we generated lexical chain for the topics based on the relevance among words of the vector. Finally, we extracted the topic and opinion information with the sentiment dictionary, and then generated the “topic+opinion” entries. [Results] We analyzed 24,598 Sina microblog posts of four trending events from June 2014 to June 2015 retrieved by a specially designed crawler. The precision and recall rates of the proposed method were 80.3% and 76.67%, respectively. [Limitations] The data size was small, the effect that the topic model extracted the feature about Weibo still required to be improved. [Conclusions] The proposed algorithm could effectively extract the “topic and opinion” information from micoblog posts.
[Objective] This paper tries to identify important and implicit semantic relations among the subject indexed papers. [Methods] Based on the subject indexed biomedical papers from MEDLINE, we proposed an algorithm consisting of subjects coordinating and indexing rules, as well as optimization rules for weighted indexing results and relation occurrences. The new algorithm was then examined with experimental disease data. [Results] With the help of domain experts’ verification, the precision of the new algorithm was higher than 95%. [Limitations] The proposed method was only appropriate for papers with subject indexing. [Conclusions] The proposed algorithm can be used to identify semantic relations among English and Chinese subjects indexed biomedical papers, and help us develop algorithms in other areas.
[Objective] This paper evaluates academic journals with the help of their source indexes and impact indicators. [Methods] We collected data from the Journal Citation Report (JCR) in 2015 for Mathematics ones, and then conducted canonical correlation analysis with the data. [Results] Firstly, eigenfactor was the major indicator of the influence of journals. Secondly, the journal source indexes and impact indicators were significantly correlated with each other. Thirdly, total citation numbers, citation half-life, journal impact factors and journal impact factor percentile had higher contribution to the eigenfactor. Fourthly, impact indicators contain more information than the source indexes. [Limitations] More research is needed to investigate the relationship between the source indexes and impact indicators. [Conclusions] The impact indicators are more important than the source indexes. We need to increase the eigenfactor score and the weight of normalized eigenfactor. We should also decrease the weights of impact factor without journal self citations, 5-year impact factor and immediacy index.
[Objective] This paper tries to improve the performance of traditional collaborative filtering and recommendation algorithm. [Methods] We used the MovieLens dataset to evaluate the proposed algorithm. First, chose datasets with sparse degree of 0.9605, which included scoring records of 1,102 users for 2,920 movies. Second, identified the optimal number of expert users and recommended weight coefficient alpha value with series of experiments. Finally, evaluated the algorithm’s performance with comparative method. [Results] The precision of the algorithm were influenced by the expert users. When the recommended weight coefficient value was 0.6, the precision of the new algorithm was better than the traditional ones. Once the propotion of expert users increased from 2% to 20%, the coverage value increased by 0.21. Thus, the new algorithm could analyze the long tail goods more effectively. [Limitations] We did not take into account the possible correlation among different categories. [Conclusions] The proposed algorithm could effectively solve the data sparsity and cold start issues, which significantly improve the performance of the recommendation system.
[Objective] This paper proposes a context-aware recommendation system for the mobile digital libraries, with the help of the latter’s collection features and users behaviors. [Methods] Based on the theory of similar users having similar choices, we first modeled the users’ interests by introducing the concept of roles. Second, we designed an effective Weighted Set Similarity Query (WSSQ) algorithm to build a role-based trust network for the users. Finally, we modified the existing context-aware recommendation system, which was then evaluated with an Extended Epinions dataset. [Results] The proposed new recommendation system was feasible, and had better performance than other methods. [Limitations] The contexts and roles were not rich enough to process large user samples. [Conclusions] This study could help us improve the mobile digital library’s resource recommendation system.
[Objective] This sutdy builds a knowledge-oriented standard literature service system, which could generate more knowledge for the users. [Context] The proposed system is able to extract semantic knowledge unit from the standard literature, to organize information based on the knowledge relationship, and to provide standard knowledge service to users. [Methods] We used the technology of optical character recognition, natural language processing, information visualization to finish the tasks of semantic organization, knowledge extraction, Ontology construction, knowledge map and Ontology-based retrieval of standard literature. [Results] The users enjoyed knowledge-oriented standard literature information service, including standard knowledge map and Ontology-based retrieval. [Conclusions] The proposed system improves user experience and meet their knowledge demands.
[Objective] This paper explores the theoretical foundation and practical experience of buidling a computational Chinese grammar system. [Methods] This study discussed the development process of the Mandarin Grammar Online (ManGO), an Head-driven Phrase Structure Grammar (HPSG) system with Minimal Recursion Semantics. It built the lexicon and hierarchy rules for the idiosyncratic structures of the Chinese grammar. [Results] The successful development of the ManGO system showed that the HPSG was an ideal theoretical framework for the Chinese computational grammar applications. [Limitations] ManGO was still underdeveloped, and it was not able to examine this system’s coverage with large-scale natural language data. [Conclusions] ManGO connects the theories of formal and computational linguistics, therefore, it becomes the foundation to develop large scale resource grammar.
[Objective] This paper builds a co-topics network to analyze the relationship among the topics of research articles and then optimize terms representing these topics. [Methods] First, we transformed the “document-topics” bipartite Graph to co-topics networks in accordance with weighted projection rules. Second, we identified the key topics with the combination of betweenness centrality and topic probability. Third, we divided the co-topics network community with the GN algorithm. Finally we optimized topic terms with relevance method. [Results] We compared the co-topics networks and the K-means based on JSD by testing optimal topic number (28) and random subjective topic numbers(20, 30). Their clustering numbers were the same and the consistent degree of clustering content reached 100%, 95% and 87%. [Limitations] We did not include other community partition methods with the proposed co-topics networks. [Conclusions] The co-topics network meets the demands of high-dimensional data and identifies the key topics and the closely linked topics of the target documents.
[Objective] This study tries to provide better services to academic library users and improve the efficiency of library system administrations with the help of a new WeChat service platform. [Context] At present, generating and modifying customized menu of the WeChat service platform for academic libraries require us to edit the source codes, which contain no network administration functions. [Methods] The proposed model used Java’s refection mechanism, and utilized API technologies, Java programing and Hibernate database framework to develop a WeChat service platform for academic libraries. [Results] The new platform helped administrators manage network system, and edit customized menu’s features in real time. [Conclusions] The new WeChat service platform improves user experience and administration efficiency significantly.