Data Analysis and Knowledge Discovery

Select

Research Methods and Technologies for Information Science from Process-Problem Perspective: Case Study of Public Opinion

Hui Zhu,Hao Wang,Chengzhi Zhang

Data Analysis and Knowledge Discovery. 2019, 3(10): 2-11. https://doi.org/10.11925/infotech.2096-3467.2019.0028

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores large-scale information science literature, aiming to better examine research methods and technologies in this field and organize them from the“process-problem” perspective. [Methods] Firstly, we analyzed the information lifecycles and related research questions. Secondly, we grouped and labeled literature by research questions. Thirdly, we extracted terms of research methods and technologies based on dictionary and templates. Finally, we organized the terms from the “process-problem” perspective. [Results] The F1 value of the proposed method reached 90.91%. [Limitations] We collected experimental data only from the CNKI database and the templates for extracting terms need improvements. [Conclusions] We could extract terms of research methods and technologies with the proposed model simultaneously and effectively.

Select

Extracting Sentences of Research Originality from Full Text Academic Articles

Chengzhi Zhang,Zheng Li

Data Analysis and Knowledge Discovery. 2019, 3(10): 12-18. https://doi.org/10.11925/infotech.2096-3467.2019.0055

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper analyzes full texts of academic articles, aiming to extract sentences of research originality as well as, exploring their characteristics. [Methods] We used full-text journal papers in the field of library, information and archives as experiment data. Then, we chose mark words, created extraction rules for sentences of research originality. Finally, we analyzed distribution of these sentences with the mark words, types, and locations. [Results] The extracted sentences were mainly divided into six categories, and most of them appeared in the top 24.8% section of each article. [Limitations] The proposed sentence extraction method needs to be optimized. [Conclusions] Sentences of research originality in the field of library, information and archives focus on concepts and theories. The categories and distributions of these sentences are various among different journals.

Select

Entity Recognition of Intelligence Method Based on Deep Learning: Taking Area of Security Intelligence for Example

Lianjie Xiao,Tao Meng,Wei Wang,Zhixiang Wu

Data Analysis and Knowledge Discovery. 2019, 3(10): 20-28. https://doi.org/10.11925/infotech.2096-3467.2018.1199

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper provides directions for a new scholarly system, aiming to identify and summarize intelligence analysis methods for security intelligence. [Methods] Firstly, we retrieved full-text security intelligence literature, and tagged them using Character-level method. Then, we constructed the corpus for the extraction of intelligence analysis methods. Finally, we compared the performance of two deep learning models with the experimental data. [Results] For the BiLSTM model, the precision, recall and F1 values were 81.71%, 77.26%, and 79.36% respectively. For the BiLSTM-CRF model, the precision, recall and F1 values were 84.71%, 79.25%, and 81.83%. [Limitations] The pronouns that represent intelligence analysis methods are not taken into consideration. [Conclusions] We could use deep learning model to extract intelligence analysis methods for security intelligence.

Select

System Analysis and Design for Methodological Entities Extraction in Full Text of Academic Literature

Hao Xu,Xuefang Zhu,Chengzhi Zhang,Chuan Jiang

Data Analysis and Knowledge Discovery. 2019, 3(10): 29-36. https://doi.org/10.11925/infotech.2096-3467.2019.0069

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new system to extract methodological entities from the full texts of academic literature, aiming to identify their indexing features and usages. [Methods] Firstly, we extracted feature sentences and methodological entities based on dictionaries, rules, and manual annotations. Then, we implemented a methodology knowledge extraction module with the help of Microsoft Visual Studio 2012 and SQL Server 2012. [Results] The precision of extracting methodological features was 76%, while the recall rate was greater than 42%. Each feature sentence had 1.42 method entities on average. The formal indexing ratio for methodological entities was less than 27%, while the ratio for feature sentences was less than 35%. We also found low formal indexing rate for subject-specific methodological entities. [Limitations] This system’s recall and precision rates were not very satisfactory. The manual workload was intensive for entity extraction and did not include the semantic features. [Conclusions] The proposed method has inter-disciplinary versatility and helps us explore the dissemination routes of interdisciplinary knowledge.

Select

An Interactive Analysis Framework for Multivariate Heterogeneous Graph Data Management System

Zihao Zhao,Zhihong Shen

Data Analysis and Knowledge Discovery. 2019, 3(10): 37-46. https://doi.org/10.11925/infotech.2096-3467.2019.0252

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] An open and scalable interactive analysis framework is proposed to shield the differences between multivariate graph data models, management systems, interfaces and protocols, and supply the online interactive analyzing service faced with graph data. [Methods] By abstracting the multi-analysis requirements and heterogeneous service interfaces, an open, scalable and interactive protocol is designed. Based on the protocol, an interactive framework is designed to implement the interactive module. [Results] This interactive analysis framework is well abstracted, shields the heterogeneity of graph management systems like Neo4j and Jena effectively, and provides a good foundation for front-end applications. [Limitations] Need to be optimized and adjusted on large-scale data. [Conclusions] The interactive analysis framework of heterogeneous knowledge graph has practical significance and deserves promotion.

Select

Detecting Twitter Rumors with Deep Transfer Network

Kan Liu,Haochen Du

Data Analysis and Knowledge Discovery. 2019, 3(10): 47-55. https://doi.org/10.11925/infotech.2096-3467.2018.1250

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a new model to address the issue of insufficient data facing network rumors detection. [Methods] We proposed a deep transfer network based on the Multi-BiLSTM network as well as domain distributions of MMD statistics calculation. Then, we trained the model to learn the data loss of source domain and the distribution difference among domains. Finally, we realized the effective migration of label information across domains. [Results] Compared with two traditional rumor detection methods, the proposed model’s F1 index was increased by 10.3% and 8.5% respectively. [Limitations] The effect of transfer was not obvious in skewed data distribution and multiple domains. Conclusions] The proposed method could improve the rumor detection results. The deep transfer network could achieve positive outcomes among domains, and provide new directions for Internet rumor recognition.

Select

Clustering Wikidata’s Organizational Entities with Latent Semantic Index

Junzhi Jia,Zhuangzhuang Ye

Data Analysis and Knowledge Discovery. 2019, 3(10): 56-65. https://doi.org/10.11925/infotech.2096-3467.2018.1368

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a model to classify institutions in Wikidata’s category trees, aiming to better organize these entities. [Methods] We used an unsupervised hierarchical clustering algorithm to automatically cluster the institutional instances without proper tags. To eliminate the influence of the co-occurring feature words, we introduced the relevant attributes of the organizational entities in Wikidata. The clustering algorithm is sensitive to the data dimensions, hence, used the Latent Semantic Index to represent the texts. We also mapped the high-dimensional data to the potential low-dimensional semantic spaces through the singular value decomposition. [Results] The accuracy rate of the proposed clustering method on the experimental dataset reached 87.3%. [Limitations] The sample data sets need to be expanded. [Conclusions] The proposed model could effectively aggregate names of similar institutions and address the clustering issues of high-dimensional texts.

Select

Friend Recommendation Based on User Clustering and Dynamic Interaction Trust Relationship

Huiying Gao,Tian Wei,Jiawei Liu

Data Analysis and Knowledge Discovery. 2019, 3(10): 66-77. https://doi.org/10.11925/infotech.2096-3467.2019.0043

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study proposes a method for friend recommendation based on user information and social network topology. [Methods] Firstly, we built a feature vector model with user information. To improve the accuracy and interpretability of the clustering results, we modified the distance calculation formula for categorical variables in the K-prototypes algorithm, which helped us pre-cluster the potential friends. Secondly, we recommended friends for the target users in each cluster based on the trust relationship of topological social network, which was measured from the global and interactive perspectives, as well as adjusted with the dynamic trust factors. Finally, we calculated the dynamic comprehensive trust with the global trust degree and the dynamic interactive trust of each cluster. A Top-N friend recommendation list was generated for the target user. [Results] Compared with traditional friend recommendation methods, the proposed method has better precision, recall and F1 values. [Limitations] The proposed model only addressed the group trust as many-to-one and one-to-one relationship. [Conclusions] The new method based on user clustering and dynamic interaction trust relationship is an effective way for online friend recommendation.

Select

Cross-media Fusion Method Based on LDA2Vec and Residual Network

Qinghong Zhong,Xiaodong Qiao,Yunliang Zhang,Mengjuan Weng

Data Analysis and Knowledge Discovery. 2019, 3(10): 78-88. https://doi.org/10.11925/infotech.2096-3467.2019.0052

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper optimizes feature extraction based on the theory of cross-media fusion mechanism, aiming to reduce the semantic gaps between heterogeneous data. [Methods] With the help of LDA2Vec and ResNet V2 models, we extracted features from the texts and images. Then, we used semantic association matching technique to map the heterogeneous text / image features to the consistent expression space. [Results] Compared with the performance of the LDA and SIFT algorithms, the proposed method increased the MAP value of text / image mutual retrieval to 0.454. [Limitations] The size of training sets needs to be expanded and extracting the optimization features has limited impacts on cross-media fusion. [Conclusions] The proposed method is effective and provides new directions for cross-media studies.

Select

Classifying Texts with KACC Model

Yuman Li,Zhibo Chen,Fu Xu

Data Analysis and Knowledge Discovery. 2019, 3(10): 89-97. https://doi.org/10.11925/infotech.2096-3467.2019.0081

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to improve the quality of text representation, and correlate contents with text label vectors, aiming to improve the classification results. [Methods] Firstly, we modified the keyword extraction method (KE). We used the keyword vectors to represent the text, and adopted a category label representation algorithm (CLR) to create the text vectors. Then, we employed the attention-based capsule network (Attention-Capsnet) as the classifier, to construct the KACC (KE-Attention-Capsnet-CLR) model. Finally, we compared our classification results with other methods. [Results] KACC model effectively improved the data quality, which led to better Precision, Recall and F-Measure than existing models. The classification precision reached 97.4%. [Limitations] The experimental data size needs to be expanded, and more research is needed to examine the category discrimination rules with other corpuses. [Conclusions] KACC model is an effective classification model for texts.

Select

Creating Dynamic Tags for Social Networking Groups

Wuxuan Jiang,Huixiang Xiong,Jiaxin Ye,Ning An

Data Analysis and Knowledge Discovery. 2019, 3(10): 98-109. https://doi.org/10.11925/infotech.2096-3467.2018.1108

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a method to generate dynamic labels for the characteristics of online communities and their short-term interest. [Methods] Firstly, we used the BTM model to extract the discussion topics from short texts posted by online community members. Then, we explored their actual interest based on personal labels. Finally, we combined these results to create dynamic tags for the communities. [Results] We examined the proposed model empirically with data from two types of “Douban groups”. Tags of discussion topics and characteristics of the communities showed strong and stable relevant relationship. The tags for personal interest could accurately represent the community’s dynamic interest. [Limitations] More online communities should be included in future studies. [Conclusions] The proposed model accurately identifies characteristics of online community and its members’ short-term concerns, which also benefits information acquisition.

Select

Developers’ Collaboration Behaviors and Success of Open Source Projects

Jun Dai,Shixin Guo,Hui Wang,Yingchi Liao

Data Analysis and Knowledge Discovery. 2019, 3(10): 110-117. https://doi.org/10.11925/infotech.2096-3467.2018.0830

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study investigates the relationship between the success of open source projects and collaborative development behaviors. [Methods] Firstly, we retrieved Apache project data from GitHub to quantify successful projects and collaborative development behaviors. Then, we examined the correlations between behavioral characteristics and success with regression analysis. [Results] We found the impacts or Exp(B) of “proportion of core members”, “frequency of code submission”, and “the average number of file modifications” on the technically successful projects, were 0.037, 1.427 and 0.327. For the impacts of same characteristics on the commercially successful projects, the standard coefficient were -0.426, 0.221, and 0.195. [Limitations] The distribution of samples and the influencing factors need some revisions. [Conclusions] This paper provides new directions for the management of successful open source software projects.

Select

Identifying Ultra-short-term Market Manipulation with Stock Intraday Trading Weighted Network

Wenxiu Hu,Li Ma,Jianfeng Zhang

Data Analysis and Knowledge Discovery. 2019, 3(10): 118-126. https://doi.org/10.11925/infotech.2096-3467.2019.0192

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a weighted network for stock intraday trading, and selects the main parameters for network features, aiming to identify the ultra-short-term market manipulations. [Methods] We constructed the weighted network for stock intraday trading with tick data, and used the order ID as nodes. The lines of network were the dealing orders, and the weights of line values were actual trading volumes. Analytical software Pajek5.03 and Ucinet6 were used to obtain the statistical parameters of complex networks for the proposed model. [Results] The nine network parameters, such as weighted average degree and network density, can be used as the main parameter to determine the stock manipulation. The overall accuracy values of our model with internal and external samples were 93.58% and 87.73%. [Limitations] We only retrieved the bull market data from 2015, while the bear market data were not collected. [Conclusions] This study helps authorities identify and crack down on the stock trading manipulation.

Please choose a citation manager

Content to export

25 October 2019, Volume 3 Issue 10

模态框（Modal）标题

Please choose a citation manager

Content to export

25 October 2019, Volume 3 Issue 10