|
|
CNN-SM: Identifying Words on Defective Products with Sememe and Multi-features |
You Xindong,Yuan Menglong,Zhang Le(),Lv Xueqiang |
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China |
|
|
Abstract [Objective] This paper proposes a CNN model based on the sememe and multi-features, aiming to improve the recognition accuracy of words on defected consumer products. [Methods] First, we created the model’s input with a distributed word vector fused with sememe. Then, we added part-of-speech features and randomly embedded word position vectors to the input. Finally, we removed the max pooling and increased the information contained in the depth vector output by the convolution kernel, which provided sufficient information for word classification. [Results] Compared with the CNN model only adding word position vectors, the proposed method improved the precision, recall and F1 values by 0.021, 0.002 and 0.012, respectively. [Limitations] We need to improve the polarity recognition of the same expression in different scenarios. [Conclusions] The sememe, part-of-speech, and the removal of pooling layer could improve the performance of model for domain word recognition.
|
Received: 02 December 2021
Published: 26 October 2022
|
|
Fund:Natural Science Foundation of Beijing(4212020);National Natural Science Foundation of China(62171043);President Foundation of China National Institute of Standardization(282020Y-7511) |
Corresponding Authors:
Zhang Le,ORCID:0000-0002-9620-511X
E-mail: zhangle@bistu.edu.cn
|
[1] |
彭郴, 吕学强, 孙宁, 等. 基于CNN的消费品缺陷领域词典构建方法研究[J]. 数据分析与知识发现, 2020, 4(11): 112-120.
|
[1] |
( Peng Chen, Lv Xueqiang, Sun Ning, et al. Building Phrase Dictionary for Defective Products with Convolutional Neural Network[J]. Data Analysis and Knowledge Discovery, 2020, 4(11): 112-120.)
|
[2] |
Li G Y, Wang H F. Improved Automatic Keyword Extraction Based on TextRank Using Domain Knowledge[C]// Proceedings of the 2014 CCF International Conference on Natural Language Processing and Chinese Computing. 2014: 403-413.
|
[3] |
Blei D M, Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. The Journal of Machine Learning Research, 2003, 3: 993-1022.
|
[4] |
Hearst M A, Dumais S T, Osuna E, et al. Support Vector Machines[J]. IEEE Intelligent Systems and Their Applications, 1998, 13(4): 18-28.
|
[5] |
Hu B T, Lu Z D, Li H, et al. Convolutional Neural Network Architectures for Matching Natural Language Sentences[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014: 2042-2050.
|
[6] |
Hochreiter S, Schmidhuber J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
pmid: 9377276
|
[7] |
Lample G, Ballesteros M, Subramanian S, et al. Neural Architectures for Named Entity Recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2016: 260-270.
|
[8] |
闫强, 张笑妍, 周思敏. 基于义原相似度的关键词抽取方法[J]. 数据分析与知识发现, 2021, 5(4): 80-89.
|
[8] |
( Yan Qiang, Zhang Xiaoyan, Zhou Simin. Extracting Keywords Based on Sememe Similarity[J]. Data Analysis and Knowledge Discovery, 2021, 5(4): 80-89.)
|
[9] |
邵卫, 化柏林. 基于依存句法分析的科技政策领域主题词表无监督构建[J]. 情报工程, 2020, 6(6): 33-44.
|
[9] |
( Shao Wei, Hua Bolin. Unsupervised Construction of Thesaurus in the Science and Technology Policy Based on Dependency Syntax Analysis[J]. Technology Intelligence Engineering, 2020, 6(6): 33-44.)
|
[10] |
陈可嘉, 黄思翌. 中文短文本自动关键词提取的改进RAKE算法[J]. 小型微型计算机系统, 2021, 42(6): 1171-1175.
|
[10] |
( Chen Kejia, Huang Siyi. Improved RAKE Algorithm for Automatic Keyword Extraction in Chinese Short Text[J]. Journal of Chinese Computer Systems, 2021, 42(6): 1171-1175.)
|
[11] |
黄睿智, 黄德才. 词间关系的不确定图模型与关键词自动抽取方法[J]. 小型微型计算机系统, 2019, 40(2): 300-304.
|
[11] |
( Huang Ruizhi, Huang Decai. Words’ Relation Based on Uncertain Graph and Automatic Keyword Extraction[J]. Journal of Chinese Computer Systems, 2019, 40(2): 300-304.)
|
[12] |
张震, 曾金. 面向用户评论的关键词抽取研究——以美团为例[J]. 数据分析与知识发现, 2019, 3(3): 36-44.
|
[12] |
( Zhang Zhen, Zeng Jin. Extracting Keywords from User Comments: Case Study of Meituan[J]. Data Analysis and Knowledge Discovery, 2019, 3(3): 36-44.)
|
[13] |
Zhang Q, Wang Y, Gong Y Y, et al. Keyphrase Extraction Using Deep Recurrent Neural Networks on Twitter[C]// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016: 836-845.
|
[14] |
段建勇, 游世薪, 张梅, 等. 基于多特征融合的关键词抽取[J]. 计算机科学, 2020, 47(S2): 73-77.
|
[14] |
( Duan Jianyong, You Shixin, Zhang Mei, et al. Keyword Extraction Based on Multi-Feature Fusion[J]. Computer Science, 2020, 47(S2): 73-77.)
|
[15] |
Rumelhart D E, Hinton G E, Williams R J. Learning Representations by Back Propagating Errors[J]. Nature, 1986, 323(6088): 533-536.
doi: 10.1038/323533a0
|
[16] |
Mikolov T, Chen K, Corrado G S, et al. Efficient Estimation of Word Representations in Vector Space[OL]. arXiv Preprint, arXiv: 1301.3781.
|
[17] |
Pennington J, Socher R, Manning C D. GloVe: Global Vectors for Word Representation[C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. 2014: 1532-1543.
|
[18] |
Sonkar S, Waters A E, Baraniuk R G. Attention Word Embedding[C]// Proceedings of the 28th International Conference on Computational Linguistics. 2020: 6894-6902.
|
[19] |
Tan M H, Jiang J. A BERT-Based Dual Embedding Model for Chinese Idiom Prediction[C]// Proceedings of the 28th International Conference on Computational Linguistics. 2020:1312-1322.
|
[20] |
Niu Y L, Xie R B, Liu Z Y, et al. Improved Word Representation Learning with Sememes[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017: 2049-2058.
|
[21] |
董振东, 董强, 郝长伶. 知网的理论发现[J]. 中文信息学报, 2007, 21(4): 3-9.
|
[21] |
( Dong Zhendong, Dong Qiang, Hao Changling. Theoretical Findings of HowNet[J]. Journal of Chinese Information Processing, 2007, 21(4): 3-9.)
|
[22] |
郗亚辉. 产品评论中领域情感词典的构建[J]. 中文信息学报, 2016, 30(5): 136-144.
|
[22] |
( Xi Yahui. Construction of Domain-specific Sentiment Lexicon in Product Reviews[j]. Journal of Chinese Information Processing, 2016, 30(5): 136-144.)
|
[23] |
张琴, 张智雄. 基于PhraseLDA模型的主题短语挖掘方法研究[J]. 图书情报工作, 2017, 61(8): 120-125.
doi: 10.13266/j.issn.0252-3116.2017.08.015
|
[23] |
( Zhang Qin, Zhang Zhixiong. Topical Phrase Mining Based on the PhraseLDA Model[J]. Library and Information Service, 2017, 61(8): 120-125.)
doi: 10.13266/j.issn.0252-3116.2017.08.015
|
[24] |
蒋翠清, 郭轶博, 刘尧. 基于中文社交媒体文本的领域情感词典构建方法研究[J]. 数据分析与知识发现, 2019, 3(2): 98-107.
|
[24] |
( Jiang Cuiqing, Guo Yibo, Liu Yao. Constructing a Domain Sentiment Lexicon Based on Chinese Social Media Text[J]. Data Analysis and Knowledge Discovery, 2019, 3(2): 98-107.)
|
[25] |
郑新曼, 董瑜. 基于科技政策文本的程度词典构建研究[J]. 数据分析与知识发现, 2021, 5(10): 81-93.
|
[25] |
( Zheng Xinman, Dong Yu. Constructing Degree Lexicon for STI Policy Texts[J]. Data Analysis and Knowledge Discovery, 2021, 5(10): 81-93.)
|
[26] |
叶霞, 曹军博, 许飞翔, 等. 中文领域情感词典自适应学习方法[J]. 计算机工程与设计, 2020, 41(8): 2231-2237.
|
[26] |
( Ye Xia, Cao Junbo, Xu Feixiang, et al. Sentiment Dictionary Adaptive Learning Method in Chinese Domain[J]. Computer Engineering and Design, 2020, 41(8): 2231-2237.)
|
[27] |
Zeng D J, Liu K, Lai S W, et al. Relation Classification via Convolutional Deep Neural Network[C]// Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers. 2014: 2335-2344.
|
[28] |
Goodfellow I J, Bengio Y, Courville A C. Deep Learning[J]. Nature, 2015, 521: 436-444.
doi: 10.1038/nature14539
|
[29] |
van Rijsbergen C J. Information Retrieval[M]. Butterworths, 1975.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|