[Objective] This paper reviews the studies of online anti-terrorism in China and then identifies their limitations as well as future research trends. [Coverage] We retrieved 60 Chinese journal articles and books from CNKI, Wanfang, Web of Science, ScienceDirect and Engineering Village databases, which were published after 2002 and had the topic of “Online Anti-terrorism”. [Methods] We examined the Chinese literature from the perspectives of anti-terrorism data analysis, public opinion dissemination, as well as early warning and crisis response mechanisms. [Results] Most of the Chinese online anti-terrorism research collected terrorism data and then analyzed terrorism- related remarks and public opinion. However, the big data processing and non-textual data analysis techniques adopted by these studies were not sufficient. Meanwhile, the online anti-terrorism laws and education need to be improved. [Limitations] We only collected the target literature from scholarly journals and books, and should have more data from the real world counter-terrorism cases. [Conclusions] Online anti-terrorism research in China is still developing, which requires coordinated support from technology, management and regulation sections to promote its advancement and integration with the big data.
[Objective] This research proposes a selection scheme for the big data application to monitor the Internet financial platforms, which is verified by the real world cases. [Methods] First, we adopted a big data model to integrate multi-source heterogeneous data from the Solarbao platform. Second, we utilized the CHAID decision tree to summarize multi-dimensional monitoring indicators based on analysis of each project’s investment risks. Finally, we employed the R-Q factor analysis method to extract the key investment risks. [Results] We got 8 indicators to track the investment risks, which could be identified by the other 10 indicators for the photovoltaic projects. [Limitations] More research needs to be done with indicators of the R-Q factor analysis, which also requires a dynamic update mechanism. [Conclusions] The proposed scheme could help investors assess the risks of individual projects and then select the appropriate ones. It will also support the risk management work of the regulatory agencies.
[Objective] This paper employs text mining technology to automatically identify research topics from large amounts of scientific literature and then detects future trends. [Methods] First, we used the LDA model to find both topical prevalence and contents of articles published by the top ten computer science journals in China. Second, we described the evolution of major topics with the help of publishing dates. [Results] We extracted 18 topics from 29, 621 computer science papers and then identified 7 trending topics as well as 6 less popular ones. [Limitations] Our study did not include papers published overseas by Chinese authors. [Conclusions] The proposed method could help us learn the evolution of computer science research and then grasp the emerging trends.
[Objective] This paper aims to identify authors with features extracted from non-standard online texts. [Methods] First, we used the non-standard text similarity M defined by the Jaccard coefficient. Second, we adopted the frequency of non-standard text from the corpus. [Results] The recognition accuracy of the two features were 85.1% and 80.2%. Adding the two features to the traditional recognition mechanism, the precision of the system increased by 5.8% and 4%, respectively. [Limitations] We did not study the online texts from the syntactic and structure levels. [Conclusions] The proposed method could effectively extract the non-standard text features and then improve the accuracy of author identification.
[Objective] This paper analyzes the impacts of query specificity on the effectiveness of information retrieval systems, aiming to improve the performance of search engine and user experience. [Methods] First, we manually constructed a labeling set for queries from the TREC Web Track. Second, we adopted the Dirichlet language model, linear interpolation language model and BM25 model to examine each query’s performance. Finally, we used the average information retrieval evaluation index as the benchmark to explore the impacts of query specificity. [Results] For the highest-ranked results, the queries with narrower specificity had better retrieval performance than their boarder counterparts. [Limitations] The proposed method was only examined with data provided by TREC. More studies were needed to evaluate its performance with other data sets. [Conclusions] Search engines should focus on the precision of the highest ranked results, and then modify their retrieval model accordingly.
[Objective] This paper establishes a model to analyze the sentiment fluctuation of consumers with online product reviews. [Methods] We constructed the model with product review mining and sentiment analysis techniques. And also examined the influence of conjunctions to sentence sentimental tendentiousness and then calculated their weights. [Results] The proposed model effectively analysed online reviews of one mobile phone posted on Jingdong and Zhongguancun Online from November 2013 to January 2015. [Limitations] Only included the total number and frequency of product feature keywords from reviews posted in neighboring time slots. [Conclusions] The proposed model could effectively analyze the developing trends and reasons of consumer sentiment fluctuation over a period of time, which provides valuable information to enterprise decision making.
[Objective] This empirical case study aims to validate the effectiveness of using Altmetrics indicators to identify high quality articles. [Methods] First, we retrieved the online usage and sharing data of highly cited papers published by the PLOS journals from social platforms (i.e., CiteULike, Mendeley and Figshare). Second, we examined relationship between these Altmetrics and SCI citation counts of the target papers. [Results] The correlation coefficient between the SCI citation data and the Altmetrics generated by Mendeley was strong (r = 0.376, p = 0.01). Meanwhile, the other two correlation coefficients were weaker. The online usage data from Mendeley might help us identify high impact literature published by specific journals. [Limitations] This research only investigated a few subjects covered by the PLOS serial journals. More research is needed to check the relationship between Altmetrics and citation counts in other fields. [Conclusions] Online usages & sharing data from CiteULike, Mendeley and Figshare might not be able to effectively identify the high impact literature.
[Objective] This paper aims to quantitatively analyze the targeting technological innovation network with patent data, and then measure its evolution from different perspectives. [Methods] First, we collected patent data of targeting technology from the Derwent Innovations Index Database. Second, we applied the patent dynamic network analysis index to the technological innovation network analysis. Finally, we built the evolution measurement system based on the technological innovation network. [Results] We analyzed the four measurements of the technological innovation network and then presented the developments and technology hot-spots. [Limitations] More in-depth research is needed to expand the evaluation index of the technological innovation network. [Conclusions] The proposed method could effectively measure the trends of technological innovation network evolution.
[Objective] We developed a paper authoring tool for semantic publishing, which makes the article’s content structured and object-oriented. Each paper is a system with executable, interactive and experiential features. [Methods] First, we divided the content of each paper (metadata, chapters, data, media etc.) into objects organized by digital template. Second, these elements interacted with each other through the event trigger mechanism. Finally, the paper was modified and presented with HTML5 pages, and then, saved as XML documents. [Results] DPaper is available at iDPaper.las.ac.cn, which provides a series of functions such as material collection (cloud notes), digital object creation, automatic reference indexing, Word document format conversion in accordance with periodical layouts etc. The paper’s content is object oriented and partial semantization. [Limitations] Compared to conventional paper editors, the DPaper’s digital object editor could not create formulas or graphics, and is not flexible to change layouts. [Conclusions] DPaper could help us compose a structured paper that meets the requirements of semantic publishing. Keywords: DPaper, Semantic publishing, Structured paper, Digital object, Authoring tool
[Objective] This paper aims to identifying the search terms more effectively in sci-tech novelty retrieval, which could reduce the subjectivity, heavy workload, de-normalization and time-consuming issues facing the manual methods. [Context] We used the corpus generated by the sci-tech novelty retrieval as the source of domain knowledge to extract search terms. Then, we discussed the relationship between the corpus and the keyword extraction. [Methods] We proposed an incremental iterative method to extract keywords from the sci-tech novelty retrieval project with the help of domain feature expansion. [Results] Compared to search terms from the real world sci-tech novelty retrieval, the recall rates of the 10 search terms extracted by the new method reached 80%. [Conclusions] The proposed method could identify most keywords and then improve the efficiency and effectiveness of the novelty retrieval tasks.
[Objective] This paper aims to improve the retrieval performance of the booming Mongolian information resources with Ontology based semantic technology. [Methods] We designed a semantic retrieval system with the help of Mongolian music domain Ontology as well as the semantic analysis and inference engine Jena. [Results] Compared to the keyword matching retrieval systems, the recall and precision of the proposed system were significantly improved (95.6% and 93.2%, respectively). [Limitations] The experimental data only included the Mongolian multi-voice music. [Conclusions] The proposed semantic retrieval system lays theoretical and technological foundations for the research of Mongolian semantic Web applications.