Data Analysis and Knowledge Discovery

Select

Survey on Social Question and Answer

Li Lei,He Daqing,Zhang Chengzhi

Data Analysis and Knowledge Discovery. 2018, 2(7): 1-12. https://doi.org/10.11925/infotech.2096-3467.2018.0074

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper explores the development of social Question and Answer studies. [Coverage] We used Google Scholar and CNKI to search literatures with the keywords “Social Q&A”. We then obtained a total of 77 representative literatures on social Q&A in conjunction with topic screening, intensive reading and retrospective method. [Methods] First, we introduced the development and early research on social Q&A. Then, we surveyed the latest social Q&A studies. [Results] At present, the researches on social Q&A focuses on four aspects, including questions, answers, users and platforms. [Limitations] More research is needed to thoroughly discuss each research’s topic. [Conclusions] Based on the current research, we offer some suggestions on future social Q&A studies from the perspectives of questions, answers, users, platforms, fields and applications.

Select

Web-based Crowd-funding: Financing Models, Influencing Factors and Behaviour Patterns

Wang Wei,Guo Lihuan,Wang Hongwei,Kevin Zhu,He Ling

Data Analysis and Knowledge Discovery. 2018, 2(7): 13-25. https://doi.org/10.11925/infotech.2096-3467.2018.0121

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The web-based crowdfunding has become a new channel for fund-raising, which got more and more attention from governments and investors. However, limited research has been conducted on crowdfunding. This paper reviews the latest studies on crowdfunding, and discusses its trends. [Coverage] We retrieved 157 Chinese and English papers from Web of Science and CNKI using the keywords of “Crowdfunding”, “Crowdfinancing”, “Crowdinvesting” or “P2P Lending”. [Methods] By literature metrology and data analysis methods, we introduce the definitions and classifications of crowdfunding. Then we study the factors which influence the successful campaigns from the following aspects: platform of crowdfunding, description of the projects, social relationship of the founders, geographical factors, as well as the quality signals of the projects. [Results] Results of the crowdfunding campaigns were influenced by many factors, especially the non-quality ones. There was significant difference between the investors and peoples seeking funding, which determined the prospect of each campaign. [Limitations] More research is needed to investigate the crowdfunding models [Conclusions] There are still much to be explored in crowdfunding models, such as from the psychology, behavioral science and finance perspectives.

Select

Assessing Trust-Based Users’ Influence in Social Media

Jing Dong,Zhang Dayong

Data Analysis and Knowledge Discovery. 2018, 2(7): 26-33. https://doi.org/10.11925/infotech.2096-3467.2017.1067

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] The paper studies the impacts of trust on social media users’ influence to detect factors affecting information dissemination, which could benefit the development of social media. [Methods] We proposed a comprehensive evaluation index based on the direct and indirect trust, as well as the local and global influence of each individual user of social media. [Results] Simulations based on SIR model showed that original message from individuals with the highest comprehensive index value could reach the largest number of users. [Limitations] The collected data was not comprehensive, which might yield biased results. [Conclusions] The proposed index could effectively measure the trust level of each individual in social media.

Select

Evolution and Regional Differences of E-commerce Policies for Rural Poverty Reduction Based on Topic over Time Model

Yu Chuanming,Guo Yajing,Gong Yutian,Huang Manyu,Peng Hufeng

Data Analysis and Knowledge Discovery. 2018, 2(7): 34-45. https://doi.org/10.11925/infotech.2096-3467.2018.0075

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper reveals the evolution and regional differences of E-commerce policies for rural poverty reduction from 2008 to 2017. [Methods] First, we used the ToT (Topic over Time) model to investigate the probability distributions of time-topics and topics-words related to E-commerce policies for rural poverty reduction. Then, we analyzed the evolution of the policy contents by calculating the average intensity of topics in each year and extracted the top n topic words with the highest probabilities. Third, we divided the data from each province into the eastern, central and western regions, and then analyzed the regional differences of policies according to the probability distribution of topics and words. [Results] E-commerce policies for rural poverty reduction had the starting, exploring and developing stages. The eastern, central and western regions have different focuses on logistics, platforms and personnel training. [Limitations] The regional differences of E-commerce policies need more fine-grained analysis. [Conclusions] Compared with the traditional word frequency counting method, the ToT model effectively reveals the policy evolution and their regional differences.

Select

Identifying Crowd Participants with Modified Random Forests Algorithm

Zhou Cheng,Wei Hongqin

Data Analysis and Knowledge Discovery. 2018, 2(7): 46-54. https://doi.org/10.11925/infotech.2096-3467.2017.1193

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study tries to address the classic issues facing crowd participant identification tasks. [Methods] We proposed a recursive heuristic method to reduce the attributes, aiming to establish a new crowd participant identification system based on their abilities. Then, we built a model to locate crowd participants with the help of random forests algorithm and the proposed system. [Results] Our new method reduced the data dimension to 8 from 18, which yielded better recognition rates. [Limitations] The proposed model is simple and needs to be expanded. Data of this study was retrieved from crowdsourcing contest websites, which might have data integrity issues. [Conclusions] The modified machine learning method could help us effectively identify crowdsourcing participants.

Select

Classifying Topics of Internet Public Opinion from College Students: Case Study of Sina Weibo

Jia Longjia,Zhang Bangzuo

Data Analysis and Knowledge Discovery. 2018, 2(7): 55-62. https://doi.org/10.11925/infotech.2096-3467.2018.0003

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper introduces a term weighting method to classify topics of Sina Weibo posts by college students, aiming to solve the high dimension and sparsity issues. [Methods] First, we calculated the probability of a term’s falling to specific categories and then predicted the probability of a document’s category. Then, we converted the word-based features to a class-based matrix, which was classified by the support vector machine. [Results] Our new method increased the MicroF₁/MacroF₁values of the traditional tf, tf×idf and tf×rf methods by 7.2%/7.8%, 7.5%/7.9% and 6.4%/5.7%, respectively. [Limitations] More research is needed to explore topic classification methods other than the term weighting one in this paper. [Conclusions] The proposed method could effectively reduce the dimension of feature matrix and improve the classification efficiency for Internet public opinion studies.

Select

Research on Collaborative Filtering Traveling Products Recommendation Algorithm Based on IUNCF

Zhao Ya’nan,Wang Yuqing

Data Analysis and Knowledge Discovery. 2018, 2(7): 63-71. https://doi.org/10.11925/infotech.2096-3467.2018.0179

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to address the challenges facing Smart Tourism industry, such as data sparseness and cold start, with the help of collaborative recommendation technology. [Methods] First, we clustered users with the K-means algorithm and then filtered and classified them dynamically based on the combination of collaborative recommendation technology. Then, we assigned weight to the recommended types and proposed a new algorithm based on Improved Uncertain Neighbors Collaborative Filtering (IUNCF). Finally, we examined the proposed algorithm with real world tourism data of different similarity thresholds and recommended numbers. [Results] The MAE value and F-measure reached 0.243 and 0.764, which showed the effectiveness of IUNCF in accuracy and reliability. [Limitations] The IUNCF algorithm needs to be further optimized to deal with the low frequency consumption issue. We could also extend the application of this new model. [Conclusions] The proposed IUNCF algorithm could precisely recommend smart tourism products to the consumers.

Select

A Personalized Recommendation Algorithm with Temporal Dynamics and Sequential Patterns

Li Jie,Yang Fang,Xu Chenxi

Data Analysis and Knowledge Discovery. 2018, 2(7): 72-80. https://doi.org/10.11925/infotech.2096-3467.2017.0857

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This study is to improve the effectiveness of merchandise recommendation based on temporal dynamics and sequential patterns of sales. [Methods] We developed an improved personalized recommendation algorithm for electronic commerce. First, we introduced a new similarity calculation function with time and hot coefficients. Then, we proposed an algorithm with the two-item sequential pattern, which modified the recommended list based on the sequential patterns. [Results] We examined the new method with book review data of Amazon.com from 2004-2005, and found its precision and F values were 1.89% and 0.73% higher than the collaborative filtering algorithm with adjusted cosine similarity. [Limitations] The proposed model did not examine the violations of consumers’ review scores. [Conclusions] Both the similarity function and sequential patterns can improve the effectiveness of personalized recommendation algorithms for e-commerce.

Select

A Fuzzy C-Means Algorithm Based on Huffman Tree

Xiao Mansheng,Zhou Lijuan,Wen Zhicheng

Data Analysis and Knowledge Discovery. 2018, 2(7): 81-88. https://doi.org/10.11925/infotech.2096-3467.2017.1333

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper tries to solve the issues facing traditional FCM algorithm, such as randomly choosing initial cluster center, sensitive to noise, and only capable of clustering the equally distributed samples. [Methods] We proposed a new FCM clustering algorithm based on Huffman tree with dissimilarity degree matrix of high density sample sets. The new algorithm could get initial clustering centers, and then generate the membership function of the non-normalized constraint samples. [Results] We examined the proposed algorithm with man-made samples, images, and UCI datasets. The clustering accuracy and the computation time of the new algorithm were better than algorithms based on the Gauss kernel or traditional FCM. [Limitations] The $\beta $ of the sample density adjustment factor was decided by experiment or experience without theoretical supports. [Conclusions] The proposed algorithm could be used for clustering data sets with high level of noise and distributed unequally.

Select

Extracting Names of Historical Events Based on Chinese Character Tags

Tang Huihui,Wang Hao,Zhang Zixuan,Wang Xueying

Data Analysis and Knowledge Discovery. 2018, 2(7): 89-100. https://doi.org/10.11925/infotech.2096-3467.2018.0057

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper proposes a model to extract the names of Chinese historical events, aiming to reorganize knowledge from texts and construct the ontology for these events. [Methods] We built the proposed model with conditional random fields(CRFs) and automatically tagging technology, based on the historical texts of the Wei, Jin, Northern and Southern Dynasties. Then, we explored the influence of different Chinese characters and features on recognizing event names. [Results] We constructed the best model based on the features of characters and the surnames. The F1 value of this model was as high as 98.74%. This model was examined with two open scenarios and achieved good results. [Limitations] The size of our training corpus needs to be expanded. More research is needed to compare results of single Chinese character tags and the phrases. [Conclusions] The CRFs model could effectively identify the names of Chinese historical events under appropriate working conditions.

Select

Multidimensional Information Acceptance Contexts of Mobile Library

Bi Datian,Wang Fu

Data Analysis and Knowledge Discovery. 2018, 2(7): 101-111. https://doi.org/10.11925/infotech.2096-3467.2017.1160

Abstract ( ) Download PDF ( ) HTML ( )

Knowledge map

Save

[Objective] This paper helps the library’s mobile app provide contents based on the users’ actual locations and their expectations, which are the key issues facing mobile service innovation. [Methods] First, we proposed the concept of information acceptance entropy based on the theory of information entropy and context entropy. Then, we constructed a generalized component distribution probability model for the information acceptance entropy with the help of entropy energy distribution theory. Finally, we examined our model with the academic libraries in China’s Liaoning, Jilin and Henan provinces. [Results] We wrote new algorithms with Matlab and used the Likert scale to evaluate the users’ perception and experience. The new model successfully calculated and simulated the information acceptance entropy of different scenes. We found that switching the scenes in time and increasing relevant contents will improve user’s experience. [Limitations] The sample size needs to be expanded to improve the accuracy of simulation. [Conclusions] The proposed model could compare and predict the multidimensional information acceptance entropy of different locations.

Please choose a citation manager

Content to export

25 July 2018, Volume 2 Issue 7

模态框（Modal）标题

Please choose a citation manager

Content to export

25 July 2018, Volume 2 Issue 7