[Objective] This paper explores the spatio-temporal statistical characteristics of users’ visits to Web Map Tile Service (WMTS). [Methods] First, we identified the WMTS sessions and extracted the targets based on an efficient algorithm. Then, we studied the temporal features of user access sessions with daily session numbers, requests and duration of each session, as well as assess speed per tile. For spatial characteristics, we described the relationship between users’ locations and their access targets, such as provinces, cities, and distances. [Results] The users’ WMTS sessions possessed power-law distribution, and most of them were brief and efficient with clear objectives. Users from provinces with better information infrastructure tended to have more centralized and deeper WMTS sessions. Most of the WMTS sessions searched for targets within the same province or city, while 30% of the targets were within 43 km of the users’ city centers. [Limitations] The data was collected from users who access WMTS frequently, which needs to be expanded. [Conclusions] Describing users’ access characteristics from session granularity, helps us understand users’ geographical information needs.
[Objective] This paper proposes a new method to rank the quality of answers from a popular Q&A community in China. [Methods] First, based on the information acceptance model, we established initial quality indicators for the answer’s perceived values. Then, we discretized these indicators with the K-Medoids clustering algorithm. Third, we reduced and weighted the indictors with the help of rough set theory. Finally, we generated the formal rankings with the weighted grey correlation analysis. [Results] We evaluated the proposed method with 2 297 answers for six different types of questions from the Q&A website of “Zhihu”. We found that the answers ranked higher generally included textual message with images. These answers were also more informative than others and involved active members of the Q&A community. [Limitations] The size of our dataset needs to be expanded, and the evaluation method of the proposed model could be optimized. [Conclusions] The proposed method is an effective way to rank the quality of answers from the Q&A community.
[Objective] This paper explores the semantic representation method of design process knowledge (DPK), aiming to effectively reuse dynamic DPK. [Methods] First, we introduced modular design idea upon reviewing existing research. Then, we analyzed the contents and characteristics of DPK, and semantically modeled DPK based on double-layer modular packaging technology. Finally, we represented the semantic model with the help of ontology representation method. [Results] We took the conceptual design of recoil system as an example to create semantic representation of its DPK with the OWL. [Limitations] We only examined the proposed method with one single case. [Conclusions] The proposed method could semantically represent and reuse knowledge of dynamic design process.
[Objective] This paper visualizes the text mining process through multi-view collaborative technique, aiming to identify the patterns and insights more effectively. [Methods] Based on the textual word vector matrix, we processed the texts of multi-policy subjects with data cleaning, TF-IDF calculation, vector space model, singular value decomposition and other methods. [Results] We examined effectivenesss of the proposed model with governmental information from Zhongguancun of Beijing during the period of January 2016 to August 2017. [Limitations] The framework could not visualize the single data points of large-scale texts. [Conclusions] Multi-view collaborative visualization is an effective way to interpretate text message.
[Objective] This paper tries to categorize consumers based on the elasticity of their electronic discount vouchers with a heterogeneous mixture model. [Methods] We built the proposed model with heterogeneous mixture model and 22,234 members’ shopping data at a large online retail portal in China. We obtained the model parameters from the maximum likelihood estimation. [Results] Compared with the controlled group, the treatment group’s usage of electronic vouchers increased by 18.6%, their average amount of spending added 43 yuan, and the overall contribution margin raised by 359 thousand yuan. [Limitations] We only adopted one explanatory variable (membership levels) to the probability function. [Conclusions] The proposed model could help companies optimize the effects of their promotions, and increase the usage of vouchers, sales and gross profits.
[Objective] This paper tries to identify semantics similar to the novelty points from preliminary searching results, aiming to retrieve needed journal articles or patents automatically. [Methods] First, we designed a deep multi-task hierarchical classification model based on Bi-GRU-ATT. Then, we trained several different hierarchical classification models using International Patent Classification Table (IPC) categories and patents. Third, we used a small amount of paper data to fine-tune the model for papers and patents. Finally, we compared the semantic categories of new points and candidate records to collect the matching ones. [Results] With two-level classification of patents under IPC (E21B), the new model’s precisions were 82.37% and 73.55% respectively, which were better than the benchmark models. For real novelty search points data, the precision of semantic matching was 88.13%, which was 15.16% higher than that of TF-IDF. [Limitations] Only examined our model with a small amount of IPC categories . [Conclusions] The proposed method improves the semantic matching of novelty search points.
[Objective] This paper identifies basic vocabularies of a specific domain from academic papers, aiming to grasp the knowledge structure and development context. [Methods] We combined the citation network and the co-word analysis to construct a citation co-word network. Then, we used the PageRank algorithm to evaluate the importance of the candidate words. We examined the proposed method with 110,360 articles in computer science. [Results] Our new method was compared with the word frequency method and co-word analysis qualitatively and quantitatively. We found that the proposed method performed well, and the average precision of a blind selection experiment reached 72.6%. [Limitations] The proposed method was only examined with computer science articles. [Conclusions] The new strategies could improve the performance of basic vocabulary discovery in one specific domain.
[Objective] This paper tries to identify legal terminologies automatically from the large-scale legal texts, aiming to structuralize legal big data. [Methods] We used the Conditional Random Field model as the classifier of the Active Learning algorithm, and then identify legal terms. Once the corpus was clustered by K-means, we extracted the initial list used to initiate the Active Learning algorithm with stratified sampling. Entropy was used as the basis of sample selection for Active Learning. The learning and sample selection process of active learning were carried out iteratively until the harmonic mean F value of the model was stabilized. Finally, the legal domain entity recognition model (AL-CRF) was generated. [Results] We ran the proposed model with Chinese judgment documents and found the precision and recall rates of AL-CRF model reached more than 90%, and its F value was 4.85% higher than that of the CRF model with equal labeling workload training. [Limitations] K-means clustering method is sensitive to noise and outliers, which may affect performance of the model. [Conclusions] The conditional random fields combined with active learning could reduce the workload with low-quality samples and ensure the recognition quality.
[Objective] This paper tries to discover the important locations of users, aiming to provide good data support for user behavior studies. [Methods] We presented a model for predicting important locations based on user representation. First, we proposed a vectorized representation method to predict user behaviors based on Word2Vec. Then, we constructed a user relationship network based on the similarity of user vectors to extract core users. Finally, we predicted the important locations by the behaviors of core users. [Results] The precison of important locations classifiction was 7% higher than those of the exisitng methods. Moreover, the residential and commercial areas were shown in the labeled map. [Limitations] Our method can only identify residential and business areas. [Conclusions] The proposed method could effectively find important locations and provide more supports to study user behaviors.
[Objective] This paper tries to extract topics from book reviews with the help of natural language semantics. [Methods] We proposed a method to retrieve the explicit and implicit topic keywords with the global semantic information from common sense knowledge base. [Results] The sentence coverage rate with the knowledge base method and the lexical diversity of the proposed method were 30.8% and 0.36% higher than those of the Double-Propagation algorithm. Then, based on the extracted topic words, we created a cluster map to identify the topic keywords identified by the nodes cluster centrality. [Limitations] There is no domain knowledge base in the field of book reviews. [Conclusions] The proposed method based on Knowledge Base improves the sentence coverage and lexical diversity of topics extracted from book reviews.
[Objective] This paper tries to automatically finish the disambiguation of author names in institutional repositories, and then provide human intervention mechanism at the right time. [Methods] First, we analyzed the unqiue features of the author name disambiguation. Then, we constructed a general disambiguation framework for the institutional repository. [Results] Our framework achieved good results in practice with more than 99% of precision. [Limitations] We did not examine the author names without affiliation addresses, and there may be exceptions in the alias of authors and institutions. [Conclusions] This framework could effectively disambiguate author names in institutional repositories, which helps us provide more value-added services.
[Objective] This paper proposes an algorithm to accurately assign specialists for outpatients based on their major complaints and medical histories. [Methods] We applied the convolutional neural network model to classify the medical short texts, and learn the correlation between medical terms, which were the tasks for pre-training. Then, we examined the structure, parameters and weights of the pre-trained model with actual texts of main complaint and medical history. Finally, we modified the network to obtain the final learning outcome. [Results] The F-score of the proposed approach reached 88% with the sample dataset, which was 6% higher than that of the current best baseline model. The pre-trained model significantly improved the training efficiency. [Limitations] We did not directly work with the patient’s actual complaints at the triage desk. We only used their electronic medical records, which might yield inaccurate results. [Conclusions] The proposed triage model improves the efficiency of medical triage, and promote precision medical treatment for patients.
[Objective] The study constructs a knowledge graph of academic relationships among scholars in China’s Song Dynasty, aiming to provide new techniques for knowledge exploration in humanity research. [Context] Our study addresses the usability and visualization issues facing digital collections (i.e. the China Biographical Database Project), and establishes a knowledge portal for history researchers and amateurs. [Methods] First, we built the ontology of ancient Chinese scholars. Then, we transformed their relationship to RDF data for the knowledge graph. Finally, we created an online platform to demonstrate the visualization results. [Results] We created the knowledge graph with five classes and 39 relationships based on 48,018 peoples and 6,599 geographic data. The Song’s Academic Inheritance Platform integrates the RelFinder visualization tool to display the entities’ relationships in the knowledge graph. [Conclusions] This study offers practical solutions for semantic research on the China Biographical Database Project and related fields in history.
[Objective] This paper classifies Baidu encyclopedia entries based on users’ information behaviors, aiming to identify entries with high potential values. [Methods] We chose the usage and recognition levels as indicators, and proposed a new entry classification model base on Boston matrix and BP neural network. [Results] We classified the Baidu encyclopedia entries automatically with usage indicators and created development strategies for each category. Our new model correctly identified each entry’s category information. [Limitations] More research is needed to study the newly generated entries and features difficult to quantify. [Conclusions] This research proposed an effective method to automatically classify online encyclopedia entries.