Data Analysis and Knowledge Discovery

Select

Methods and Applications for Technology Roadmap

Wei Ling,Li Shuying,Fang Shu

Data Analysis and Knowledge Discovery. 2020, 4(9): 1-14. https://doi.org/10.11925/infotech.2096-3467.2020.0625

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study systematically reviews the methods and applications for technology roadmap (TRM), aiming to analyze future research and application trends.[Coverage] We retrieved 285 articles published between 2004 and 2020 from the Web of Science Core Database.Among them, we selected 76 representative literature for the study.[Methods] We reviewed the history and concepts of TRM, and introduced related research methods and tools, formulation and implementation processes, current applications and future trends.[Results] The research and development methods as well as TRM tools could be divided into three categories.The development ideas and implementation guidelines of TRM were clear.TRM was widely used by enterprises, networks, industries and governments.[Limitations] There were more classic literature and important reports, than those published in recent two years.[Conclusions] China should strengthen the TRM’s theoretical research and optimize its formulation process.The government should also promote the research and applications of TRM as well as training of TRM professionals.

Select

Developments of Named Entity Disambiguation

Wen Pingmei,Ye Zhiwei,Ding Wenjian,Liu Ying,Xu Jian

Data Analysis and Knowledge Discovery. 2020, 4(9): 15-25. https://doi.org/10.11925/infotech.2096-3467.2020.0382

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reviews research and resources in the field of named entity disambiguation(NED) with a focus on the NED methods.[Coverage] We retrieved 57 representative papers and electronic resources from CNKI, Wanfang Data Knowledge Service Platform, and EBSCO.[Methods] First, we summarized the NED principles and methods from the perspectives of entity prominence, context similarity, entity relationship, deep learning and special identification resources. Then, we explored useful knowledge bases, open source tools as well as international conferences on NED evaluation.[Results] Traditional and classic methods were easy to use, while the new ones (e.g., deep learning) significantly improved the results of NED. Effective models often integrated various methods to yield the optimal results.[Limitations] There are subjectivity factors in comparing different methods from the literature.[Conclusions] The NED methods are still developing and could be further improved by artificial intelligence and domain resources.

Select

Analysis Framework Based on Multi-Source Data for US Export Control: An Empirical Study

Li Guangjian,Wang Kai,Zhang Qingzhi

Data Analysis and Knowledge Discovery. 2020, 4(9): 26-40. https://doi.org/10.11925/infotech.2096-3467.2020.0645

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper propose a fine-grained multi-dimensional analysis framework based on multi-source data and in-depth semantic contents, aiming to address the deficiencies in analyzing U.S. export controls.[Methods] We constrcuted the framework based on the concept of multi-source data fusion, which integrated data from the CCL for items, the EAR for regulations, the blacklist for entities, and the Federal Register for polices. First, we identified the technical terms, the exact technical indicators values and the relationship between the controlled items from the multi-source data. Then, we built an index using the semantic dictionary and model. Third, we used the named entity recognition method to establish the correlated relationship between the controlled items and entities. This framework contains four analysis modes for the status quo, the specific items, the time sequences, and the countries.[Results] We examined the effectiveness of the framework with an empirical study on lithography. The recall for recognizing the controlled items reached 97.3% with the same tail ECCN number. The precision of recognizing Chinese mainland’s entity domains was up to 83.8%.[Limitations] We only selected the lithography for the empirical study and the framework could be improved.[Conclusions] The proposed framework provides an effective method to analyze the texts of U.S. export control documents.

Select

Data Governance and Domain Ontology of Regional Public Security

Zeng Zhen,Li Gang,Mao Jin,Chen Jinghao

Data Analysis and Knowledge Discovery. 2020, 4(9): 41-55. https://doi.org/10.11925/infotech.2096-3467.2020.0145

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to construct a data governance model and domain ontology for regional public security, aiming to improve the applications of data governance.[Methods] We constructed our model based on the theory of linked data, and used public ontology (e.g., DACT and ODRL2.2) to manage public security data assets. Then, we extended the EventKG ontology for the process logic of public security. Third, we modified the PROV ontology for the source relationship among data assets and models. Fourth, we identified the relationship between data governance and process based on concepts and organizations. Finally, we constructed the ontology for the whole process of data governance.[Results] Our domain ontology was built with six scalable and reusable public ontologies. The model’s relationship richness reached 0.773 which indicated good inter-class ties. The proposed model described the complex relations and process of data governance for public security. Based on the ontology, we created knowledge graph and applications for one prefecture-level city.[Limitations] More reseach is needed to expand our new model to cyber public security.[Conclusions] The proposed model could improve the data governance in public security research and practice.

Select

Predicting Citations Based on Graph Convolution Embedding and Feature Cross：Case Study of Transportation Research

Zhang Sifan,Niu Zhendong,Lu Hao,Zhu Yifan,Wang Rongrong

Data Analysis and Knowledge Discovery. 2020, 4(9): 56-67. https://doi.org/10.11925/infotech.2096-3467.2020.0531

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a citation prediction model for scholarly articles, which could identify potential research hot spots and optimize journal editing.[Methods] First, we used graph convolution to extract literature features, which include keywords, authors, institutions, countries, and citations. Then, we used recurrent neural network and attention model to examine the time-series information of citations and other features.[Results] We evaluated the proposed model with transportation articles from core journals indexed by the Web of Science. Compared with the benchmark model, our new method’s maximum improvements on RMSE and MAE were 15.23% and 16.91%.[Limitations] At the pre-training stage, our model adopted multiple graph convolutions, which was very time consuming.[Conclusions] The proposed model, which fully integrates literature features, could effectively predict their citations.

Select

Identifying Subjects of Online Opinion from Public Health Emergencies

Shao Qi,Mu Dongmei,Wang Ping,Jin Chunyan

Data Analysis and Knowledge Discovery. 2020, 4(9): 68-80. https://doi.org/10.11925/infotech.2096-3467.2020.0117

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a framework for identifying subjects of online opinion from public health emergencies, aiming to utilizing the advantages of semantic recognition.[Methods] First, we constructed RDF triples with dependency parsing analysis and semantic role annotations from the perspectives of grammar, semantics, and pragmatics.Then, we decided the core nodes based on degrees of the semantic graph and PageRank values. Finally, we conducted an empirical study to discover the subjects of public opinion.[Results] We successfully constructed a semantic graph for public opinion topics, and discovered the core nodes focusing on events and governments.[Limitations] The depth of semantic recognition needs to be improved.[Conclusions] The proposed model could help us identify public opinion topics.

Select

Spatial Distribution and Socio-economic Driving Forces of Residential Changes: Case Study of Zhejiang Province

Zhou Heng,Chen Zhangjian,Li Aiqin,Cheng Xiaoqiang,Wu Huayi

Data Analysis and Knowledge Discovery. 2020, 4(9): 81-90. https://doi.org/10.11925/infotech.2096-3467.2020.0156

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to identify the changes of geographic elements in surveying and mapping, as well as their driving mechanism.[Methods] We collected the changes of residential areas from Zhejiang Province. With the help of GIS overlay and correlation analysis, we analyzed the socio-economic driving forces behind these changes.[Results] We found that the changes were concentrated in the north, central and southeast parts of Zhejiang Province.The development of industry was the main positive driving force (correlation coefficient: 0.336).The development of the service or retail sectors and government public investments were negative driving forces for the changes (correlation coefficients: -0.054 and -0.100).[Limitations] The accuracy of statistical data needs to be further improved to reduce the “false changes” from cartographic synthesis.[Conclusions] The changes in residential areas were different and their economic driving factors were also different.

Select

Identifying Emergency Elements Based on BiGRU-AM Model with Extended Semantic Dimension

Yin Haoran,Cao Jinxuan,Cao Luzhe,Wang Guodong

Data Analysis and Knowledge Discovery. 2020, 4(9): 91-99. https://doi.org/10.11925/infotech.2096-3467.2020.0022

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new method to recognize emergency elements based on a modified BiGRU-AM model, aiming to improve the poor interpretation of recurrent neural networks for information features with different degrees of importance.[Methods] First, we trained the text corpus to create word vectors, which were connected to semantic features like dependent syntactic relations. Then, we extracted contextual information features with BiGRU. We also introduced attention mechanism to the BiGRU network to extract diversified features. Finally, we activated the learned features with softmax function to generate needed elements.[Results] We examined the modified BiGRU-AM model with the CEC dataset and found its F-value was 2%-21% higher than algorithms of shallow machine learning.[Limitations] The proposed model’s ablilty to decide semantic relations, the accuracy of word segmentation tool, and the hyper parameters need to be improved.[Conclusions] The BiGRU-AM model with extended semantic dimension could effectively extract emergency elements.

Select

Dynamic City Profile Based on Evolutionary Analysis

Ye Guanghui,Xu Tong

Data Analysis and Knowledge Discovery. 2020, 4(9): 100-110. https://doi.org/10.11925/infotech.2096-3467.2020.0104

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reveals the changing of urban characteristics, aiming to describe the evolution of urban profiles.[Methods] We collected tourism data from social media, and used network analysis to exploring the evolution and driving forces of urban profiles.[Results] The tourism functions of scenic places were more multi-polarized, while the community structure of tourism network was more stable. The theme of scenic places posed positive impacts on the dynamic evolution of urban profile while the geographical distance among places of interest left negative impacts.[Limitations] More research is needed to extract extra insights from the collected data.[Conclusions] Our study effectively reveals the dynamic evolution of urban profiles, which benefits the urban planning process.

Select

Text Representation Learning Model Based on Attention Mechanism with Task-specific Information

Huang Lu,Zhou Enguo,Li Daifeng

Data Analysis and Knowledge Discovery. 2020, 4(9): 111-122. https://doi.org/10.11925/infotech.2096-3467.2020.0204

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study uses the Label Embedding technique to modify attention mechanism. It learns the task-specific information and generates task-related attention weights, aiming to improve the quality of text representation vectors.[Methods] First, we adopted Multi-level LSTM to extract potential semantic representation of texts. Then, we retrieved the words attracted most attention with different labels to generate attention weights through Label Embedding. Finally, we calculated the text representation vector with task-specific information, which was used to predict text classification.[Results] Compared with the TextCNN, BiGRU, TLSTM, LSTMAtt, and SelfAtt models, performance of the proposed model on multiple datasets was improved by 0.60% to 11.95% (with an overall average of 5.27%). It also had fast convergence speed and low complexity.[Limitations] The experimental datasets and the task-types need to be expanded.[Conclusions] The proposed model can effectively improve the classification results of text semantics, which has much practical value.

Select

Chinese-English Sentence Alignment of Ancient Literature Based on Multi-feature Fusion

Liang Jiwen,Jiang Chuan,Wang Dongbo

Data Analysis and Knowledge Discovery. 2020, 4(9): 123-132. https://doi.org/10.11925/infotech.2096-3467.2019.0268

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a method automatically aligning Chinese sentences from Pre-Qin Literature with their English translations, aiming to construct bilingual sentence-level parallel corpus and support cross-language retrieval.[Methods] First, we modified classification method for parallel sentence pairs to align bilingual sentences from historical literature. Based on the characteristics of bilingual corpus, we retrieved features of bilingual sentence pairs. Finally, with “sequence labeling” and “overall classification”, we identified aligned pairs from candidate sentences.[Results] In the sequence labeling experiment, the LSTM-CRF model yielded the best performance with its F value reaching 92.67%. In the overall classification experiment, the SVM had the best results with a F value of 90.63%. In the experiment combining all four features, the F value was 91.01%.[Limitations] The corpus size needs to be expanded.[Conclusions] The LSTM-CRF model with four features could effectively align ancient Chinese sentences with their English translations.

Select

Automatic Expression of Co-occurrence Clustering Based on Indexing Rules of Medical Subject Headings

Wu Jinming,Hou Yuefang,Cui Lei

Data Analysis and Knowledge Discovery. 2020, 4(9): 133-144. https://doi.org/10.11925/infotech.2096-3467.2020.0192

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study proposes an automatic procedure to present the clustering results, aiming to promote the development of co-word clustering analysis.[Methods] First, we examined the indexing rules of neoplastic diagnosis and chose 10 common neoplasms as sample sets for co-occurrence clustering analysis. Then, we reviewed the results and combined the indexing rules to identify the semantic types / subheading combination patterns of high-frequency subject headings. Third, we developed a python application to automatically interpret the clustering results for four groups of neoplasms. Finally, we invited 12 experts to evaluate the accuracy, comprehensiveness, practicality, comprehensibility and simplicity of the presentation.[Results] We found 30 indexing patterns of neoplastic diagnosis as well as 98 combination semantic patterns. The scores of the accuracy, comprehensiveness, practicality, comprehensibility and simplicity were 4.282, 4.435, 4.209, 4.457, and 4.206 out of 5.[Limitations] It was difficult to reveal the “hidden relations” among the subject headings with the proposed method.[Conclusions] Our new method could effectively present results of co-occurrence clustering analysis for medical records.

Please choose a citation manager

Content to export

25 September 2020, Volume 4 Issue 9

模态框（Modal）标题

Please choose a citation manager

Content to export

25 September 2020, Volume 4 Issue 9