Data Analysis and Knowledge Discovery

Select

Examining E-Government Services of Chinese Cities with Geographical Regions, Government Channels and Administrative Dimensions

Si Wenfeng,Hu Guangwei

Data Analysis and Knowledge Discovery. 2018, 2(9): 1-9. https://doi.org/10.11925/infotech.2096-3467.2018.0333

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper conducts comprehensive empirical analysis for e-government service capabilities in China, aiming to address the issues facing the traditional methods. [Methods] We used Multi-dimensional Evaluation Index system to analyze the websites, WeChat profiles, micro-blogs, and other APPs run by city governments at the prefecture-level or above. Then, we ran statistical analysis based on geographical regions, government channels and service dimensions. [Results] -We found that the e-government services of Chinese cities need significant improvements, which posed various impacts on local people. We also identified seven different development stages of e-government services. [Limitations] We did not use time series data and not evaluate the e-government services at county or town levels. [Conclusions] This study offers practical suggestions on improving the e-government services in China.

Select

Analyzing Impacts of Brand Reputation on Online Sales Based on Massive Commodity Reviews and Brand

Liu Lina,Qi Jiayin,Zhang Zhenping,Zeng Dan

Data Analysis and Knowledge Discovery. 2018, 2(9): 10-21. https://doi.org/10.11925/infotech.2096-3467.2018.0164

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The paper studies the impacts of brand reputation on the online sales volume of commodities. [Methods] First, we retrieved the sales data of mobile phones from Jingdong Online Mall. Then, we used conjoint analysis to calculate the online reputation of commodities with the help of natural language processing and machine learning technologies. Third, we built a model to explore the impacts of brand competitiveness and its country-of-origin on sales. [Results] We found that brand competitiveness was an important factor influencing the sales of commodities. Online reputation increased the impact of brands’ competitiveness, and brand awareness weakened the impacts of brand country-of-origin. [Limitations] The paper only analyzed the search products, which did not include the experience products. [Conclusions] The online reputation calculated by the proposed method enhances the impacts of brand competitiveness on sales. This study could help e-commerce platforms improve their online reputation management system.

Select

Selecting Products Based on F-BiGRU Sentiment Analysis

Yu Bengong,Zhang Peihang,Xu Qingtang

Data Analysis and Knowledge Discovery. 2018, 2(9): 22-30. https://doi.org/10.11925/infotech.2096-3467.2018.0015

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a product selection method based on the Feature Bidirectional Gated Recurrent Unit model (F-BiGRU), aiming to improve the efficiency of customers’ product selection and help them make better shopping decisions. [Methods] First, we retrieved online reviews for related products. Then, we categorized these online reviews in accordance with the product attributes. Third, we trained the F-BiGRU model using positive and negative reviews. Fourth, we quantified the sentiment of reviews on different attributes with the F-BiGRU model. Finally, we got the degrees of satisfaction on product attributes, and sorted the products using TOPSIS method. [Results] We retrieved the review texts on cars to conduct an empirical analysis. We found that the F-BiGRU method improved the accuracy of sentiment analysis, and is more appropriate for the short text reviews than traditional methods. [Limitations] The proposed deep learning model requires large dataset, which limits its performance with smaller datasets. [Conclusions] The product selection method based on F-BiGRU helps consumers choose needed products more efficiently.

Select

Analyzing News Topic Evolution with Convolutional Neural Networks and Topic2Vec

Xu Yuemei,Lv Sining,Cai Lianqiao,Zhang Xiaoya

Data Analysis and Knowledge Discovery. 2018, 2(9): 31-41. https://doi.org/10.11925/infotech.2096-3467.2018.0068

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study analyzes the evolution of news topics, aiming to identify the public opinion and media coverage of certain events. [Methods] We proposed a word distributed representation method based on Topic2Vec to improve the semantic distance of topics. Then, we introduced the convolutional neural networks model to learn the topic vectors and cluster the similar ones. Finally, we obtained the topics’ evolution trends, focus events and related key sub-topics. [Results] We collected news reports on China from the website of CNN between 2015 and 2017 as datasets to examine the proposed method, which effectively revealed the evolution of topics and sentiments. [Limitations] We did not explore the impacts of time window length. [Conclusions] Compared with previous models, the proposed method improves the accuracy of topic clustering by 10% and helps us explore the topic evolution of news.

Select

Recommending Contents Based on Zhihu Q&A Community: Case Study of Logistics Topics

He Yue,Feng Yue,Zhao Shupeng,Ma Yufeng

Data Analysis and Knowledge Discovery. 2018, 2(9): 42-49. https://doi.org/10.11925/infotech.2096-3467.2018.0088

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This research analyzes the social behaviors of Zhihu (https://www.zhihu.com/) users, aiming to recommend relevant contents more effectively. [Methods] First, we proposed a content recommendation method based on association rules-LDA topic model. Then, we constructed a network of shared sub-topics for specific topics and extracted keywords of the sub-topics with the LDA model. Finally, we pushed contents of the relevant topics for the users. [Results] Our study found that many sub-topics with high degrees of cooccurrence under the topic of logistics, and their confidence levels were above 65%. [Limitations] More comprehensive data is needed in future studies.[Conclusions] The association rule-LDA model provides new directions for content recommendation.

Select

Growth Pattern of Online News Comments

Zong Hong,Xue Chunxiang,Chen Fen

Data Analysis and Knowledge Discovery. 2018, 2(9): 50-58. https://doi.org/10.11925/infotech.2096-3467.2018.0157

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study tries to identify the growth law of online news comments, aiming to explore their aging rules and potential values. [Methods] We proposed the growth measurement index of online news comments, including their growth cycle, growth peak, absolute concentration value, peak concentration index, and growth half-life. Then, we used online news and comments from sina.com to conduct an empirical study. [Results] We found that most online news comments had short growth cycles, low growth peaks, and earlier positions of peak concentrations. There were four leading growth patterns, including negative exponential, flat, uni-modal and multi-band. The growth of online news comments is affected by the aging of news, the time of news release, as well as the occurrence of relevant or the follow-up events. [Limitations] The sample data was from one website. [Conclusions] This paper analyzes the growth law of online news comments and identifies four types of growth patterns.

Select

Clustering Policy Texts Based on LDA Topic Model

Zhang Tao,Ma Haiqun

Data Analysis and Knowledge Discovery. 2018, 2(9): 59-65. https://doi.org/10.11925/infotech.2096-3467.2018.0273

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This research aims to improve the effectiveness of clustering policy texts with the help of LDA topic model. [Methods] First, we pre-processed the policy texts with the LDA model to generate the training data set. Then, we used the weighted algorithm to determine the optimal number of topics and then clustered the policy texts. [Results] We found that the G value of the weighted clustering results reached peak while the k value was 4. Our results, which were consistent with those of the manual classification, also obtained higher purity and F values. Therefore, the proposed method is effective. [Limitations] Results of each operation in our study will influence the accuracy of the final policy text clustering. [Conclusions] The proposed method could provide directions for the making of new policies, the evaluation of current policies, and the mechanism of two-way interactions.

Select

Categorizing Documents Automatically within Common Semantic Space

Li Xiangdong,Gao Fan,Li Youhai

Data Analysis and Knowledge Discovery. 2018, 2(9): 66-73. https://doi.org/10.11925/infotech.2096-3467.2018.0314

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper aims to solve the semantic differences among documents due to file types and writing styles. [Methods] First, we chose domain-independent features appearing in two document sets and domain-dependent features appearing only in one set. Then, we used the domain-independent features to construct the bidirectional graph and the spectral clustering of the domain-dependent features. Finally, we correlated the domain-dependent features, and generated the common semantic space defined by clustering features. [Results] We found that the proposed model improved the classification results by 3.0% to 6.9% compared with the traditional methods. [Limitations] The proposed model requires large number of documents belonging to the same field to build the common semantic space. [Conclusions] The common semantic space could help us effectively organize the digital resources of different file types.

Select

Extracting Keywords with TextRank and Weighted Word Positions

Liu Zhuchen,Chen Hao,Yu Yanhua,Li Jie

Data Analysis and Knowledge Discovery. 2018, 2(9): 74-79. https://doi.org/10.11925/infotech.2096-3467.2018.0271

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study integrates the position and distance attributes of words into the TextRank model, aiming to extract keywords from single document more effectively. [Methods] First, we constructed the word graph for candidates based on the TextRank method. Then, we merged the position information of the words, and calculated their probability transfer matrix. Finally, we obtained the points of candidate words by iterative calculation, and retrieved the top K of keywords with the highest scores. [Results] We found that the weighted TextRank method yielded better results than the traditional algorithms. When the K values were 3, 5, 7 and 10, the increment of F value were 1.29%, 3.14%, 5.43% and 5.88% respectively. [Limitations] This study did not include knowledge base and did not fully utilize the external lexical relationship information. [Conclusions] The position and distribution of words can help us extract keywords more effectively.

Select

Generating HSK Writing Essays with LDA Model

Xu Yanhua,Miao Yujie,Miao Lin,Lv Xueqiang

Data Analysis and Knowledge Discovery. 2018, 2(9): 80-87. https://doi.org/10.11925/infotech.2096-3467.2018.0204

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to automatically generate writing samples for the Chinese Proficiency Test (HSK), aiming to help the Chinese teachers and learners prepare for the test. [Methods] First, we used the “HSK Dynamic Corpus” as the basic corpus, and trained it with the LDA model. Then, we adopted the cross-entropy strategy to select sentences containing required keywords. Finally, we manually scored the generated texts with the evaluating criteria. [Results] The generated essays contained all needed keywords and were relevant to the topics of the writing tasks. [Limitations] Some training corpus were modified HSK essays, written by non-Chinese speaker. [Conclusions] The proposed method could generate passages of good quality with the required keywords effectively.

Select

Classifying Multilayer Social Network Links Based on Transfer Component Analysis

Wu Jiehua,Shen Jing,Zhou Bei

Data Analysis and Knowledge Discovery. 2018, 2(9): 88-99. https://doi.org/10.11925/infotech.2096-3467.2018.0342

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The paper aims to address the issues facing multi-layer social network link classification algorithms, which cannot effectively correlate information among sub-networks to improve classification. [Methods] First, we defined the common features reflecting the correlation between sub-network. Then we defined individuality features reflecting the characteristics of each sub-network’s own attributes. Third, we proposed an algorithm to classify multilayer social network links based on transfer component analysis. This algorithm collects characteristics of the correlation between layers, which makes sub-networks learn from each other. [Results] We compared the proposed model with the benchmark classification algorithm, feature selection based classification algorithm, and the benchmark transfer based classification algorithm on two real multi-layer datasets from YouTube and QueryLog. The performance of our algorithm on evaluation metrics of AUC and ROC curves were significantly improved. The evaluation index of the larger promotion curve has at least 1.57% and at most 33.2% improvement. [Limitations] We did not examine very large-scale network data with the proposed model. The relationship between the layers and performance of feature definition needs more discussion. [Conclusions] The proposed method effectively applies transfer learning to the classification of multilayer social network links and offers new directions for future studies.

Select

Analyzing Mobile Library Users and Recommending Services with VSM

Bi Datian,Wang Fu,Xu Pengcheng

Data Analysis and Knowledge Discovery. 2018, 2(9): 100-108. https://doi.org/10.11925/infotech.2096-3467.2018.0658

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper investigates the users’ information needs, searching behaviors, and preferences, aiming to identify their expectations accurately. [Methods] First, we took the perceived usefulness and ease of use from the technology acceptance model (TAM) as the theoretical framework. Then, we used surveys, server log analysis, and the vocal thinking method to study the expectations of information demands, searching behaviors and acceptance preference of users in different scenarios. Finally, we conducted expert interviews to construct users’ portrait model based on the vector space model (VSM). [Results] The proposed method helped us recommend scenarios for different users effectively with the collaborative filtering algorithm and the Tagul tool. [Limitations] The experimental sample size is small, which might affect the accuracy of recommendation. [Conclusions] The proposed model clusters users’ expectation of information and recommends scenario-based services for mobile library users.

Please choose a citation manager

Content to export

25 September 2018, Volume 2 Issue 9

模态框（Modal）标题

Please choose a citation manager

Content to export

25 September 2018, Volume 2 Issue 9