Data Analysis and Knowledge Discovery

Select

Reviews on Temporal Information Retrieval

Zhang Xiaojuan,Han Yi

Data Analysis and Knowledge Discovery. 2017, 1(1): 3-15. https://doi.org/10.11925/infotech.2096-3467.2017.01.02

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This study aims to summarize the research status of temporal information retrieval (T-IR) and to provide theoretical basis for the study of the relevant scholars to better grasp the T-IR problems. [Coverage] We first used Google Scholar to search related literatures by typing the keywords “termporal information retireval” in Chinese and English repectively, without time limit. After getting some related literatures, we further used the retrospective method to get more related literatures. Finally, we get 92 literatures totally. [Methods] Based on method of literature survey and methods of inducting and summarizing, a survey of the existing literature on temporal information retrieval was presented from the following three aspects: extracting temporal information from document, identifying temporal information in queries and temporal ranking model. [Results] The problems and challenges existing in temporal information retrieval are as follows: little related work existing in China while most of related work existing in foreign countries; lack of methods of data collection and data indexing reflecting dynamic characteristics of real network; ignorance of the important role of the entity and event represent time information when identify the focus time of document; lack of the predicting intent for non-periodic queries and the improvement of reproducibility of temporal information retrieval model experiment to be needed. [Limitations] This paper did not review the document crawling, document index and corresponding application of temporal information retrieval. [Conclusions] The construction of standardized evaluation datasets and non-parameter temporal information retrieval models will be the future research trends of T-IR.

Select

Integrated Analysis and Visualization of Sci-Tech Roadmaps: Case Study of Renewable Energy

Xie Xiufang,Zhang Xiaolin

Data Analysis and Knowledge Discovery. 2017, 1(1): 16-25. https://doi.org/10.11925/infotech.2096-3467.2017.01.03

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This study aims to predict the development trends of science and technology (S&T) with knowledge extracted from S&T roadmaps (STR). [Methods] First, we constructed an STR information database based on the “extraction - synchronization - classification” method of text mining. Second, we analysed the demands and trends of global S&T progress. Finally, we compared and analyzed different countries’ S&T strategies in the field of renewable energy. [Results] We used open source tools, such as Timeflow, Gephi to visualize the results of this case study, such as the globle development trends and national strategic planning in the field of renewable energy by 2050. [Limitations] The automation and personalization features of this study need to be improved. [Conclusions] The proposed method could retrieve strategic intelligence from the STRs effectively.

Select

Cross Language Information Retrieval Model Based on Matrix-weighted Association Patterns Mining

Huang Mingxuan

Data Analysis and Knowledge Discovery. 2017, 1(1): 26-36. https://doi.org/10.11925/infotech.2096-3467.2017.01.04

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]The purpose of this paper is to solve the query drift issue facing cross language information retrieval. It proposes a new model to retrieve Chinese documents with Indonesian queries. [Methods] The new model integrated the algorithms of matrix-weighted association patterns mining, query expansion, as well as user click-download behaviors. [Results] The R_prec, p@10 and p@20 values of the proposed model were higher than the 60% benchmark of the monolingual retrieval on the CLIR NTCIR-5 data set. These results were 37% higher than cross language retrieval baseline and 28% higher than the existing algorithms based on pseudo relevance feedback. [Limitations] The proposed model was only examined in the cross language retrieval system built with the vector space model, which needs to be done with the real world search engines. [Conclusions] The proposed model could effectively reduce query drift in cross language retrieval, and retrieve more relevant Chinese documents with Indonesian long queries.

Select

Extracting Semantic Knowledge from Plant Species Diversity Collections

Liu Jianhua,Wang Ying,Zhang Zhixiong,Li Chuanxi

Data Analysis and Knowledge Discovery. 2017, 1(1): 37-46. https://doi.org/10.11925/infotech.2096-3467.2017.01.05

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This paper aims to extract semantic knowledge from the biodiversity studies. [Methods] We proposed a new knowledge extraction framework focusing on species. It included various entities as well as the relationship among them. The new method was then examined with various specialized databases. [Results] The species-oriented knowledge extraction framework, could successfully retrieve semantic information from the target entities and the relations among them. This method expanded the scope of knowledge extraction practice in the biodiversity field. [Limitations] The recall and precision ratio of the new method was effected by the dictionaries and rules. More studies are needed to examine the semantic relationship among the named entities beyond co-occurrence, hierarchical and simple syntactic relations. [Conclusions] The proposed method expands the contents and methods of knowledge extraction in biodiversity research. It supports the semantic information retrieval and computation.

Select

Automatically Detecting and Tagging Foreign Language Citation Metadata

Jiang Lin,Wang Dongbo

Data Analysis and Knowledge Discovery. 2017, 1(1): 47-54. https://doi.org/10.11925/infotech.2096-3467.2017.01.06

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This paper proposes a new method to automatically extract bibliographic metadata, with the help of semantic knowledge and machine learning technologies. [Methods] We used the neural network model to create word vectors from manually split data, and then found that same type of metadata is relatively concentrated at certain locations in the vector space. Thus, we proposed a new SVM classification algorithm to classify and annotate the bibliographic metadata automatically. [Results] The proposed method achieved high recall and precision rates with citation data, especially for citations with various languages and abbreviations. [Limitations] The fine-grained extraction of the time related content could be improved. [Conclusions] The proposed method could effectively detect and tag bibliographic metadata, and improve the system’s compatibility and fault tolerance ability.

Select

Analyzing Emerging Issues with Technology Entropy Method Based on Patents: Case Study of Carbon Capture

Hou Jianhua,Guo Shuang

Data Analysis and Knowledge Discovery. 2017, 1(1): 55-63. https://doi.org/10.11925/infotech.2096-3467.2017.01.07

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This paper proposes a patent-based technology entropy analysis method, aiming to effectively monitor the development of emerging issues from the patent data. [Methods] First, we built a multi-dimensional technology entropy model for the patent-based system. Second, we analyzed the carbon capture technology from the macro and micro perspectives. [Results] We found that the technology of carbon capture in China was at the crucial development stage. Most of the studies were conducted by universities, which focused on materials with absorption and adsorption abilities. [Limitations] The data collection method needed to be modified to remove the irrelevant ones. [Conclusions] Technology entropy method could effectively analyze the evolution trends of technologies. It provides a feasible tool for us to manage and evaluate the evolution and prediction of new technologies.

Select

Impacts of Mobile Tools on Students’ Academic Reading Efficiency

Wu Dan,Lu Liuxing

Data Analysis and Knowledge Discovery. 2017, 1(1): 64-72. https://doi.org/10.11925/infotech.2096-3467.2017.01.08

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This study investigates the impacts of cell phone screen sizes and APPs on the students’ reading efficiency of academic literature. [Methods] We conducted questionnaire surveys, interviews and experiments to analyze the reading time, understanding rate and memory rate of academic papers. [Results] Cell phone screen size posed significant effects to reading time and had no significant effect to reading comprehension and memory rates. The APP’s user experience had impacts on reading comprehension, however, it posed no significant effect to reading time and memory rate. [Limitations] We had limited number of participants and need to improve the assessment method of reading comprehension and memory rates. [Conclusions] Screen size and APPs have different impacts on reading efficiency, which could be improved through optimizing the mobile devices and APP’s user experience.

Select

Retrieving 3D Models from Institutional Repository

Wu Zhiqiang,Zhu Zhongming,Liu Wei,Zhang Wangqiang,Yao Xiaona

Data Analysis and Knowledge Discovery. 2017, 1(1): 73-80. https://doi.org/10.11925/infotech.2096-3467.2017.01.09

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This paper aims to explore new content-based technology to retrieve and display 3D models, and expands the services of institutional repository. [Methods] First, we modified the open source 3D model retrieval algorithm created by the Taiwan University. Second, we obtained the orthogonal projection and features of the 3D models with the off-screen rendering technique. Finally, we used Java3D technique to generate the thumbnails of the 3D models, and then presented them online with the help of Three.js. [Results] We could retrieve 3D models from the institutional repository by submitting their URLs or the uploading methods. User could also use mouse to rotate or zoom in/out the 3D models while browsing them online. [Limitations] The proposed 3D model retrieval technique met needs of the institutional repositories. However, the recall and precision of the new system could to be improved with the help of the latest techniques in 3D model retrieval. [Conclusions] The proposed method helps the CSpace system manage 3D model collections effectively, which provides more options to retrieve and use the 3D models.

Select

Linked Data for Mobile Visual Search System of Digital Library

Qi Yunfei,Zhao Yuxiang,Zhu Qinghua

Data Analysis and Knowledge Discovery. 2017, 1(1): 81-90. https://doi.org/10.11925/infotech.2096-3467.2017.01.10

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This paper proposes a new method for the mobile visual search, which retrieves the visual and semantic information from the digital library simultaneously. [Methods] First, we used the BIBFRAME, linked data and image processing techniques to extract the semantic and characteristics information from the visual resources. Second, we combined the visual and semantic search with the help of linked data. [Results] The proposed method improved the performance of visual and semantic information retrieval. [Limitations] The system efficiency, the algorithm for feature identification, and the SPARQL retrieval procedure needed to be optimized. [Conclusions] The proposed method could successfully search visual and semantic information, which might create more innovative services for the digital library.

Select

Optimizing Feature Selection Method for Text Classification with Shuffled Frog Leaping Algorithm

Lu Yonghe,Chen Jinghuang

Data Analysis and Knowledge Discovery. 2017, 1(1): 91-101. https://doi.org/10.11925/infotech.2096-3467.2017.01.11

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective]This paper introduces the shuffled frog leaping algorithm (SFLA) to remove the irrelevant terms from the texts, and optimizes the feature selection method to improve the accuracy of text classification. [Methods] First, we used CHI and IG techniques to pre-select different dimensions of feature terms, and then adopted the modified SFLA to refine the text features’ list. Second, we used a frog to represent a feature selection rule, and applied the classification precision as the fitness function. Finally, the SVM and KNN classifier were adopted to calculate the classification precision. [Results] The modified SFLA had better performance in classification precision than CHI and IG, and the highest increasing rate was 12%. [Limitations] The feature over fitting occured in small portion of space dimensions. [Conclusions] Using feature preselection and the modified SFLA could effectively exclude irrelevant or invalid terms, and then improve the precision of feature selection.

Please choose a citation manager

Content to export

25 January 2017, Volume 1 Issue 1

模态框（Modal）标题

Please choose a citation manager

Content to export

25 January 2017, Volume 1 Issue 1