|
|
Extracting Entities for Enterprise Risks Based on Stroke ELMo and IDCNN-CRF Model |
Yang Meifang1,2(),Yang Bo1,2 |
1School of Information Management, Jiangxi University of Finance and Economics, Nanchang 330013, China 2Institute of Information Resource Management, Jiangxi University of Finance and Economics, Nanchang 330013, China |
|
|
Abstract [Objective] This paper proposes a new model to learn the text characteristics and contextual semantic relevance, aiming to extract entities for the enterprise risks more effectively. [Methods] Our entity extraction model is based on stroke ELMo embedded in the IDCNN-CRF. First, we used the bidirectional language model to pre-train the large-scale unstructured data for enterprise risks and obtained the stroke ELMo vector as the input feature. Then, we sent it to the IDCNN network for training, and utilized the CRF to process the output layer of IDCNN. Finally, we got the optimal entity sequence labeling for the enterprise risks. [Results] The F value of this proposed model is 91.9%, which is 2.0% higher than the performance of BiLSTM-CRF deep neural network models. The running speed of our model is 2.36 times faster than the BiLSTM-CRF. [Limitations] More research is needed to exmine this model in more fields. [Conclusions] The proposed model provides reference for constructing entity corpus of enterprise risks.
|
Received: 17 November 2021
Published: 26 October 2022
|
|
Fund:National Natural Science Foundation of China(72064015);Jiangxi Province Social Science “Thirteenth Five-Year Plan” Project(19TQ01) |
Corresponding Authors:
Yang Meifang,ORCID:0000-0002-4360-0183
E-mail: yangmeifang@jxufe.edu.cn
|
[1] |
张淑惠, 周美琼, 吴雪勤. 年报文本风险信息披露与股价同步性[J]. 现代财经(天津财经大学学报), 2021, 41(2): 62-78.
|
[1] |
( Zhang Shuhui, Zhou Meiqiong, Wu Xueqin. Risk Information Disclosure in Annual Report and Stock Price Synchronization[J]. Modern Finance and Economics-Journal of Tianjin University of Finance and Economics, 2021, 41(2): 62-78.)
|
[2] |
崔笛, 郑明, 李岩, 等. 基于分类体系的上市公司年报信息披露质量研究——以我国A股上市公司为例[J]. 情报学报, 2019, 38(12): 1250-1259.
|
[2] |
( Cui Di, Zheng Ming, Li Yan, et al. Research on the Information Disclosure in Annual Reports of A-Share Listed Companies[J]. Journal of the China Society for Scientific and Technical Information, 2019, 38(12): 1250-1259.)
|
[3] |
Appiagyei K, Boateng C A, Onumah J M. Risk Disclosures in the Annual Reports of Firms in Ghana[J]. International Journal of Management Practice, 2016, 9(2): 142.
doi: 10.1504/IJMP.2016.076743
|
[4] |
McHugh D, Shaw S, Moore T R, et al. Uncovering Themes in Personalized Learning: Using Natural Language Processing to Analyze School Interviews[J]. Journal of Research on Technology in Education, 2020, 52(3): 391-402.
doi: 10.1080/15391523.2020.1752337
|
[5] |
付瑶, 万静, 邢立栋. 基于条件随机场与信息熵的特定领域概念发现[J]. 计算机应用研究, 2020, 37(3): 708-711.
|
[5] |
( Fu Yao, Wan Jing, Xing Lidong. New Words Discovery Method Based on CRF and Information Entropy in Specific Domain[J]. Application Research of Computers, 2020, 37(3): 708-711.)
|
[6] |
Zhu L, Wang G J, Zou X C. Improved Information Gain Feature Selection Method for Chinese Text Classification Based on Word Embedding[C]// Proceedings of the 6th International Conference on Software and Computer Applications. 2017: 72-76.
|
[7] |
王昊, 邓三鸿, 苏新宁, 等. 基于深度学习的情报学理论及方法术语识别研究[J]. 情报学报, 2020, 39(8): 817-828.
|
[7] |
( Wang Hao, Deng Sanhong, Su Xinning, et al. A Study on Chinese Terminology Recognition of Theory and Method from Information Science: Based on Deep Learning[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(8): 817-828.)
|
[8] |
彭嘉毅, 方勇, 黄诚, 等. 基于深度主动学习的信息安全领域命名实体识别研究[J]. 四川大学学报(自然科学版), 2019, 56(3): 457-462.
|
[8] |
Peng Jiayi, Fang Yong, Huang Cheng, et al. Cyber Security Named Entity Recognition Based on Deep Active Learning[J]. Journal of Sichuan University(Natural Science Edition), 2019, 56(3): 457-462.)
|
[9] |
Fujimagari H, Fujita K. Detecting Research Fronts Using Neural Network Model for Weighted Citation Network Analysis[J]. Journal of Information Processing, 2015, 23(6): 753-758.
doi: 10.2197/ipsjjip.23.753
|
[10] |
徐飞, 叶文豪, 宋英华. 基于BiLSTM-CRF模型的食品安全事件词性自动标注研究[J]. 情报学报, 2018, 37(12): 1204-1211.
|
[10] |
( Xu Fei, Ye Wenhao, Song Yinghua. Part-of-Speech Automated Annotation of Food Safety Events Based on BiLSTM-CRF[J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(12): 1204-1211.)
|
[11] |
Strubell E, Verga P, Belanger D, et al. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions[C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017: 2670-2680.
|
[12] |
Radford A, Metz L, Chintala S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks[OL]. arXiv Preprint, arXiv: 1511.06434.
|
[13] |
Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508.01991.
|
[14] |
孙玥莹, 何彦青, 吴广印. 基于领域知识库的科技术语信息匹配模型研究[J]. 情报科学, 2019, 37(8): 16-21.
|
[14] |
( Sun Yueying, He Yanqing, Wu Guangyin. Information Matching Model of Terms in Scientific and Technological Literature Based on Domain Knowledge Base[J]. Information Science, 2019, 37(8): 16-21.)
|
[15] |
罗鹏程, 王一博, 王继民. 基于深度预训练语言模型的文献学科自动分类研究[J]. 情报学报, 2020, 39(10): 1046-1059.
|
[15] |
( Luo Pengcheng, Wang Yibo, Wang Jimin. Automatic Discipline Classification for Scientific Papers Based on a Deep Pre-Training Language Model[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(10): 1046-1059.)
|
[16] |
罗凌, 杨志豪, 宋雅文, 等. 基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究[J]. 计算机学报, 2020, 43(10): 1943-1957.
|
[16] |
( Luo Ling, Yang Zhihao, Song Yawen, et al. Chinese Clinical Named Entity Recognition Based on Stroke ELMo and Multi-Task Learning[J]. Chinese Journal of Computers, 2020, 43(10): 1943-1957.)
|
[17] |
Hanley K W, Hoberg G. The Information Content of IPO Prospectuses[J]. Review of Financial Studies, 2010, 23(7): 2821-2864.
doi: 10.1093/rfs/hhq024
|
[18] |
Bochkay K, Levine C B. Using MD&A to Improve Earnings Forecasts[J]. Journal of Accounting, Auditing & Finance, 2019, 34(3): 458-482.
|
[19] |
胡小荣, 姚长青, 高影繁. 基于风险短语自动抽取的上市公司风险识别方法及可视化研究[J]. 情报学报, 2017, 36(7): 663-668.
|
[19] |
( Hu Xiaorong, Yao Changqing, Gao Yingfan. Risk Identification Method of Listed Companies Based on the Automatic Risk Phrase Extraction and Visualization[J]. Journal of the China Society for Scientific and Technical Information, 2017, 36(7): 663-668.)
|
[20] |
周双文. 基于领域本体的创业板公司年报风险信息抽取方法研究[D]. 长沙: 湖南大学, 2013.
|
[20] |
( Zhou Shuangwen. A Risk Information Extraction Method About GEM Companies’ Annual Report Based on Domain Ontology[D]. Changsha: Hunan University, 2013.)
|
[21] |
郭贤伟, 赖华, 余正涛, 等. 融合情绪知识的案件微博评论情绪分类[J]. 计算机学报, 2021, 44(3): 564-578.
|
[21] |
( Guo Xianwei, Lai Hua, Yu Zhengtao, et al. Emotion Classification of Case-Related Microblog Comments Integrating Emotional Knowledge[J]. Chinese Journal of Computers, 2021, 44(3): 564-578.)
|
[22] |
Qiu J H, Zhou Y M, Wang Q, et al. Chinese Clinical Named Entity Recognition Using Residual Dilated Convolutional Neural Network with Conditional Random Field[J]. IEEE Transactions on Nanobioscience, 2019, 18(3): 306-315.
doi: 10.1109/TNB.2019.2908678
|
[23] |
Cao S S, Lu W, Zhou J, et al. cw2vec: Learning Chinese Word Embeddings with Stroke N-Gram Information[C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018: 5053-5061.
|
[24] |
Li X Y, Zhang H, Zhou X H. Chinese Clinical Named Entity Recognition with Variant Neural Structures Based on BERT Methods[J]. Journal of Biomedical Informatics, 2020, 107: 103422.
doi: 10.1016/j.jbi.2020.103422
|
[25] |
李舟军, 范宇, 吴贤杰. 面向自然语言处理的预训练技术研究综述[J]. 计算机科学, 2020, 47(3): 162-173.
doi: 10.11896/jsjkx.191000167
|
[25] |
( Li Zhoujun, Fan Yu, Wu Xianjie. Survey of Natural Language Processing Pre-Training Techniques[J]. Computer Science, 2020, 47(3): 162-173.)
doi: 10.11896/jsjkx.191000167
|
[26] |
Chua C C, Lim T Y, Soon L K, et al. Meaning Preservation in Example-Based Machine Translation with Structural Semantics[J]. Expert Systems with Applications, 2017, 78: 242-258.
doi: 10.1016/j.eswa.2017.02.021
|
[27] |
张栋, 陈文亮. 基于上下文相关字向量的中文命名实体识别[J]. 计算机科学, 2021, 48(3): 233-238.
doi: 10.11896/jsjkx.191200074
|
[27] |
( Zhang Dong, Chen Wenliang. Chinese Named Entity Recognition Based on Contextualized Char Embeddings[J]. Computer Science, 2021, 48(3): 233-238.)
doi: 10.11896/jsjkx.191200074
|
[28] |
Lai S W, Xu L H, Liu K, et al. Recurrent Convolutional Neural Networks for Text Classification[C]// Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015: 2267-2273.
|
[29] |
Hammerton J. Named Entity Recognition with Long Short-Term Memory[C]// Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL. 2003:172-175.
|
[30] |
肖毅, 熊凯伦, 张希. 基于TEI@I方法论的企业财务风险预警模型研究[J]. 管理评论, 2020, 32(7): 226-235.
|
[30] |
( Xiao Yi, Xiong Kailun, Zhang Xi. Enterprise Financial Risk Early Warning Model Based on TEI@I Methodology[J]. Management Review, 2020, 32(7): 226-235.)
|
[31] |
Chen H, Lin Z J, Ding G G, et al. GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition[C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence. 2019, 33: 6236-6243.
|
[32] |
Kim T, Kim H Y. Forecasting Stock Prices with a Feature Fusion LSTM-CNN Model Using Different Representations of the Same Data[J]. PLoS One, 2019, 14(2): e0212320.
doi: 10.1371/journal.pone.0212320
|
[33] |
Yang Z C, Hu Z T, Salakhutdinov R, et al. Improved Variational Autoencoders for Text Modeling Using Dilated Convolutions[C]// Proceedings of the 34th International Conference on Machine Learning. 2017: 3881-3890.
|
[34] |
蒋翔, 马建霞, 袁慧. 基于BiLSTM-IDCNN-CRF模型的生态治理技术领域命名实体识别[J]. 计算机应用与软件, 2021, 38(3): 134-141.
|
[34] |
( Jiang Xiang, Ma Jianxia, Yuan Hui. Named Entity Recognition in the Field of Ecological Management Technology Based on BiLSTM-IDCNN-CRF Model[J]. Computer Applications and Software, 2021, 38(3): 134-141.)
|
[35] |
李妮, 关焕梅, 杨飘, 等. 基于BERT-IDCNN-CRF的中文命名实体识别方法[J]. 山东大学学报(理学版), 2020, 55(1): 102-109.
|
[35] |
Li Ni, Guan Huanmei, Yang Piao, et al. BERT-IDCNN-CRF for Named Entity Recognition in Chinese[J]. Journal of Shandong University(Natural Science), 2020, 55(1): 102-109.)
|
[36] |
王芳, 杨京, 徐路路. 面向火灾应急管理的本体构建研究[J]. 情报学报, 2020, 39(9): 914-925.
|
[36] |
( Wang Fang, Yang Jing, Xu Lulu. Ontology Construction for Fire Emergency Management[J]. Journal of the China Society for Scientific and Technical Information, 2020, 39(9): 914-925.)
|
[37] |
张海涛, 刘伟利, 栾宇, 等. 重大突发事件的情景图谱构建[J]. 情报学报, 2021, 40(9): 924-933.
|
[37] |
( Zhang Haitao, Liu Weili, Luan Yu, et al. Construction of Scenario Graph for a Major Emergency[J]. Journal of the China Society for Scientific and Technical Information, 2021, 40(9): 924-933.)
|
[38] |
Peters M, Neumann M, Iyyer M, et al. Deep Contextualized Word Representations[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. 2018: 2227-2237.
|
[39] |
Che W X, Liu Y J, Wang Y X, et al. Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation[OL]. arXiv Preprint, arXiv: 1807.03121.
|
[40] |
Bouvrie J. Notes on Convolutional Neural Networks[OL]. Cogrints, 2006. https://web-archive.southampton.ac.uk/cogprints.org/5869/.
|
[41] |
张应成, 杨洋, 蒋瑞, 等. 基于BiLSTM-CRF的商情实体识别模型[J]. 计算机工程, 2019, 45(5): 308-314.
|
[41] |
( Zhang Yingcheng, Yang Yang, Jiang Rui, et al. Commercial Intelligence Entity Recognition Model Based on BiLSTM-CRF[J]. Computer Engineering, 2019, 45(5): 308-314.)
|
[42] |
Lafferty J, McCallum A, Pereira F C N. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]// Proceedings of the 18th International Conference on Machine Learning. 2001: 282-289.
|
[43] |
McCallumA, FreitagD, PereiraF. Maximum Entropy Markov Models for Information Extraction and Segmentation[C]// Proceedings of the 17th International Conference on Machine Learning. 2000: 591-598.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|