Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (11): 112-120    DOI: 10.11925/infotech.2096-3467.2020.0214
Current Issue | Archive | Adv Search |
Building Phrase Dictionary for Defective Products with Convolutional Neural Network
Peng Chen1,Lv Xueqiang1,Sun Ning2(),Zang Le1,Jiang Zhaocai2,Song Li2
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
2China National Institute of Standardization, Beijing 100191, China
Download: PDF (841 KB)   HTML ( 8
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper builds a dictionary for defective products, aiming to helps users better understand the latest developments of specific domains. [Methods] First, we extracted domain-related phrases from the corpus using word frequency features. Then, we reduced manual labeling work with the help of the TF-IDF algorithm. Finally, we proposed a Convolutional Neural Network (CNN) model using semantic and position information to generate the domain dictionary. [Results] Compared with the statistical learning method, our model improved the accuracy, recall and F1 values by 6%~9%. [Limitations] More research is needed to examine our method in other fields. [Conclusions] The proposed CNN-based method could effectively construct a dictionary for defective products.

Key wordsDomain Dictionary      Word Frequency Feature      TF-IDF      CNN     
Received: 19 March 2020      Published: 04 December 2020
ZTFLH:  TP391  
Corresponding Authors: Sun Ning     E-mail: sunninglx@163.com

Cite this article:

Peng Chen,Lv Xueqiang,Sun Ning,Zang Le,Jiang Zhaocai,Song Li. Building Phrase Dictionary for Defective Products with Convolutional Neural Network. Data Analysis and Knowledge Discovery, 2020, 4(11): 112-120.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2020.0214     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I11/112

The Procedures of Building Domain Phrase Dictionary
The Method of Generating Phrase
CNN-PD Model
数据样本 所属数据集 领域相关词 领域无关词
手机无故自动重启,找售后,售后说只能换主板,网上一查,我这个不是个例,大量使用消费者都有类似这样的问题,虽然过了三包日期,但是产品出厂的缺陷,厂家有必要负责到底! A 手机,自动重启,主板 售后,消费者,三包日期,负责到底
该产品是一款使用手机遥控的飞行器。我在仔细阅读说明书及教学视频之后,按照要求正常飞行。在起飞不久后无人机失控,手机完全无法控制,最后无人机自行飞入高空的雾气中消失。此产品造成了极大的公共安全隐患。 A 手机,飞行器,无人机失控 说明书,教学视频,公共安全隐患
加湿器底座故障,商家给更换了底座,承诺更换新的,但实际上更换的是旧的翻新底座。2018年2月25日晚发生干烧事故,差点导致失火,严重影响人身及家庭财产安全,商家对此拒绝质保。 A 加湿器,故障,失火 商家,严重影响,家庭财产安全
网购以来最失败的一次购物,虚假宣传,买的新机器怎么能漏水呢?产品不合格就能出厂卖吗?水漏得把我家的橱柜都给泡了。 B 漏水 虚假宣传,产品不合格
满心期待的购买了一台喜欢的洗衣机,18号安装好了,19号洗衣服,居然是漏电的,幸好家里人在这期间没有洗澡,谁来解决这个问题?太恐怖了,客服快点回复我。 B 洗衣机,漏电
Corpus Samples and Manual Domain Words
分词结果 词语或词组 E
充电/口 充电 口 0.903
质量/问题 质量 问题 0.897
可能/会 可能会 0.833
使用/过程/中 使用 过程 中 0.725
漏水 漏水 0.714
产品/缺陷 产品 缺陷 0.545
电池/鼓包 电池 鼓包 0.487
质量/检测/总局 质量 检测 总局 0.414
共享/单车 共享 单车 0.322
消费者/权益保护法 消费者 权益保护法 0.312
前置/摄像头 前置 摄像头 0.201
净化器/滤芯 净化器 滤芯 0.103
当事人 当事人 0.007
The Results of Phrase Mining Experiment
高分词 低分词
TF-IDF值 TF-IDF值
闪屏 1.000 发现 0.048
无法 开机 1.000 导致 0.054
质量 缺陷 1.000 无法 解决 0.232
自燃 0.942 网上 0.303
漏水 0.913 拒绝 处理 0.317
爆炸 0.817 引起 问题 0.326
触控板 0.688 客服 0.344
电池 鼓包 0.674 充电 口 0.379
冷凝器 0.436 产生 影响 0.399
The Results of Building Phrase Corpus Experiment
实验方法 数据集 Precision Recall F1 NoOther
LDA A+B 0.669 0.691 0.680 -
SVM A+B 0.684 0.712 0.698 -
LSTM A+B 0.715 0.731 0.723 0.766
LSTM+WPE A+B 0.720 0.755 0.737 0.782
CNN-PD A 0.738 0.785 0.761 0.811
CNN-PD B 0.735 0.790 0.762 0.797
CNN-PD+WPE A+B 0.744 0.797 0.770 0.842
The Results of Experiment
[1] Lee S, Shishibori M. Passage Segmentation Based on Topic Matter[J]. International Journal of Computer Processing of Oriental Languages, 2002,15(3):305-339.
[2] 陈文亮, 朱靖波, 朱慕华, 等. 基于领域词典的文本特征表示[J]. 计算机研究与发展, 2005,42(12):2155-2160.
[2] ( Chen Wenliang, Zhu Jingbo, Zhu Muhua, et al. Text Representation Using Domain Dictionary[J]. Journal of Computer Research and Development, 2005,42(12):2155-2160.)
[3] Hu B T, Lu Z D, Li H, et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014: 2042-2050.
[4] Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997,9(8):1735-1780.
doi: 10.1162/neco.1997.9.8.1735 pmid: 9377276
[5] Mashechkin I V, Petrovskiy M I, Popov D S, et al. Applying Text Mining Methods for Data Loss Prevention[J]. Programming and Computing Software, 2015,41(1):23-30.
doi: 10.1134/S0361768815010041
[6] 张涛, 马海群. 一种基于LDA主题模型的政策文本聚类方法研究[J]. 数据分析与知识发现, 2018,2(9):59-65.
[6] ( Zhang Tao, Ma Haiqun. Clustering Policy Texts Based on LDA Topic Model[J]. Data Analysis and Knowledge Discovery, 2018,2(9):59-65.)
[7] Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003,3(4-5):993-1022.
[8] He T K, Hao R, Qi H, et al. Mining Feature-Opinion from Reviews Based on Dependency Parsing[J]. International Journal of Software Engineering and Knowledge Engineering, 2017,26(9-10):1581-1591.
doi: 10.1142/S0218194016710029
[9] El-Kishky A, Song Y L, Wang C, et al. Scalable Topical Phrase Mining from Text Corpora[OL]. arXiv Preprint, arXiv: 1406. 6312.
[10] Liu J, Shang J, Wang C, et al. Mining Quality Phrases from Massive Text Corpora[C]// Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 2015: 1729-1744.
[11] 孙霞, 郑庆华, 王朝静, 等. 一种基于生语料的领域词典生成方法[J]. 小型微型计算机系统, 2005,26(6):1088-1092.
[11] ( Sun Xia, Zheng Qinghua, Wang Zhaojing, et al. Method of Special Domain Lexicon Construction Based on Raw Materials[J]. Mini-Micro Systems, 2005,26(6):1088-1092.)
[12] 尹文科, 朱明, 陈天昊. 基于Wiki链接结构图聚类的领域词典构建方法[J]. 小型微型计算机系统, 2014,35(6):1286-1292.
[12] ( Yin Wenke, Zhu Ming, Chen Tianhao. Domain Thesaurus Construction Based on Wiki Hyperlink Structure Graph Clustering[J]. Journal of Chinese Computer Systems, 2014,35(6):1286-1292.)
[13] 李伟卿, 王伟军. 基于大规模评论数据的产品特征词典构建方法研究[J]. 数据分析与知识发现, 2018,2(1):41-50.
[13] ( Li Weiqing, Wang Weijun. Building Product Feature Dictionary with Large-Scale Review Data[J]. Data Analysis and Knowledge Discovery, 2018,2(1):41-50.)
[14] Mikolov T, Sutskever I, Chen K, et al. Distributed Representations of Words and Phrases and Their Compositionality[C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013: 3111-3119.
[15] Wu J G, Li Y. Research on Construction of Semantic Dictionary in the Football Field[C]// Proceedings of the 2017 IEEE International Conference on Software Engineering Research, Management and Applications (SERA). 2017: 303-306.
[16] Ju M Z, Duan H L, Li H M. A CRF-Based Method for Automatic Construction of Chinese Symptom Lexicon[C]// Proceedings of the 2015 International Conference on Information Technology in Medicine and Education (ITME). 2015: 5-8.
[17] Church K W, Gale W A, Hanks P, et al. Using Statistics in Lexical Analysis[M]// Lexical Acquisition. Lawrence Erlbaum, 1991: 115-164.
[18] Li G Y, Wang H F. Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge[C]// Proceedings of the 2014 CCF International Conference on Natural Language Processing and Chinese Computing. 2014: 403-413.
[19] Chowdhury G G. Introduction to Modern Information Retrieval[M]. Facet Publishing, 2010.
[20] 殷聪, 张李义. 基于TF-IDF的情境后过滤推荐算法研究——以餐饮业O2O为例[J]. 数据分析与知识发现, 2018,2(11):28-36.
[20] ( Yin Cong, Zhang Liyi. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. Data Analysis and Knowledge Discovery, 2018,2(11):28-36.)
[21] Santos C N, Xiang B, Zhou B W. Classifying Relations by Ranking with Convolutional Neural Networks[OL]. arXiv Preprint, arXiv: 1504. 06580.
[22] Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network[C]// Proceedings of the 25th International Conference on Computational Linguistics: Technical Papers. 2014: 2335-2344.
[23] Collobert R, Weston J, Bottou L, et al. Natural Language Processing (almost) from Scratch[J]. Journal of Machine Learning Research, 2011,12:2493-2537.
[24] Lai S, Xu L, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015.
[25] Van Rijsbergen C. J. Information Retrieval[M]. Butterworths, 1975.
[26] Suykens J A K, Vandewalle J. Least Squares Support Vector Machine Classifiers[J]. Neural Processing Letters, 1999,9(3):293-300.
doi: 10.1023/A:1018628609742
[1] Dai Jianhua, Deng Yubin. Extracting Emotion-Cause Pairs Based on Emotional Dilation Gated CNN[J]. 数据分析与知识发现, 2020, 4(8): 98-106.
[2] Weng Mengjuan,Yao Changqing,Han Hongqi,Wang Lijun,Ran Yaxin. Classification and Indexing Method with CNN for Imbalanced Datasets[J]. 数据分析与知识发现, 2020, 4(7): 87-95.
[3] Tang Xiaobo,Gao Hexuan. Classification of Health Questions Based on Vector Extension of Keywords[J]. 数据分析与知识发现, 2020, 4(7): 66-75.
[4] Na Ma,Zhixiong Zhang,Pengmin Wu. Automatic Identification of Term Citation Object with Feature Fusion[J]. 数据分析与知识发现, 2020, 4(1): 89-98.
[5] Yunfei Shao,Dongsu Liu. Classifying Short-texts with Class Feature Extension[J]. 数据分析与知识发现, 2019, 3(9): 60-67.
[6] Hui Li,Yaqing Chai. Fine-Grained Sentiment Analysis Based on Convolutional Neural Network[J]. 数据分析与知识发现, 2019, 3(1): 95-103.
[7] Feng Guoming,Zhang Xiaodong,Liu Suhui. Classifying Chinese Texts with CapsNet[J]. 数据分析与知识发现, 2018, 2(12): 68-76.
[8] Zhao Yang,Li Qiqi,Chen Yuhan,Cao Wenhang. Examining Consumer Reviews of Overseas Shopping APP with Sentiment Analysis[J]. 数据分析与知识发现, 2018, 2(11): 19-27.
[9] Yin Cong,Zhang Liyi. Recommendation Algorithm for Post-Context Filtering Based on TF-IDF: Case Study of Catering O2O[J]. 数据分析与知识发现, 2018, 2(11): 28-36.
[10] Li Changbing,Pang Chongpeng,Li Meiping. Extracting Product Features with Weight-based Apriori Algorithm[J]. 数据分析与知识发现, 2017, 1(9): 83-89.
[11] He Yue,Xiao Min,Zhang Yue. Sentiment Analysis of Trending Topics Based on Relevance[J]. 数据分析与知识发现, 2017, 1(3): 46-53.
[12] Xu Dongdong, Wu Shaobo. An Improved TF-IDF Feature Selection Based on Categorical Description[J]. 现代图书情报技术, 2015, 31(3): 39-48.
[13] Zhang Jie, Zhang Haichao, Zhai Dongsheng. Research of the Word Segmentation for Chinese Patent Claims[J]. 现代图书情报技术, 2014, 30(9): 91-98.
[14] Lu Yonghe, Li Yanfeng. A Feature Selection Based on Consideration of Multiple Factors[J]. 现代图书情报技术, 2013, (5): 34-39.
[15] Qin Shian, Li Fayun. Improved TF-IDF Method in Text Classification[J]. 现代图书情报技术, 2013, 29(10): 27-30.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn