|
|
Cross-domain Transfer Learning for Recognizing Professional Skills from Chinese Job Postings |
Yi Xinhe1,Yang Peng2,Wen Yimin2() |
1Library of Guilin University of Electronic Technology, Guilin 541004, China 2School of Computer Science and Information Security, Guilin University of Electronic Technology,Guilin 541004, China |
|
|
Abstract [Objective] This paper analyzes the online job postings and identifies the demands of employers accurately, aiming to address the skill gaps between supply and demand in the labor market.[Methods] We proposed a model with cross-domain transfer learning to recognize professional skill words (CDTL-PSE). This task was treated as sequence tagging like named entity recognition or term extraction in CDTL-PSE. It also decomposed the SIGHAN corpus into three source domains. A domain adaptation layer was inserted between the Bi-LSTM and the CRF layers, which helped us transfer learning from each source domain to the target domain. Then, we used parameter transfer approach to train each sub-model. Finally, we obtained the prediction of label sequence by majority vote. [Results] On the self-built online recruitment data set, compared with the baseline method, the proposed model improved the F1 value by 0.91%, and reduced the labeled samples by about 50%. [Limitations] The interpretability of CDTL-PSE needs to be further improved. [Conclusions] CDTL-PSE can automatically extract words on professional skills, and effectively increase the labeled samples in the target domain.
|
Received: 31 August 2021
Published: 07 January 2022
|
|
Fund:Humanities and Social Sciences of Ministry of Education Planning Fund(17JDGC022);Graduate Education Reform Project of Guangxi(JGY2017055);Natural Science Foundation of Guangxi(2018GXNSFDA138006) |
Corresponding Authors:
Wen Yimin,ORCID:0000-0001-5017-3987
E-mail: ymwen@guet.edu.cn
|
[1] |
麦可思研究院, 王伯庆, 陈永红. 2019年中国本科生就业报告[M]. 北京: 社会科学文献出版社, 2019.
|
[1] |
(MyCOS, Wang Boqing, Chen Yonghong. Chinese 4-Year College Graduates’ Employment Annual Report (2019)[M]. Beijing: Social Sciences Academic Press, 2019.)
|
[2] |
Phaphuangwittayakul A, Saranwong S, Panyakaew S N, et al. Analysis of Skill Demand in Thai Labor Market from Online Jobs Recruitments Websites[C]// Proceedings of the 15th International Joint Conference on Computer Science and Software Engineering. 2018: 1-5.
|
[3] |
Mauro A, Greco M, Grimaldi M, et al. Human Resources for Big Data Professions: A Systematic Classification of Job Roles and Required Skill Sets[J]. Information Processing & Management, 2018, 54(5):807-817.
doi: 10.1016/j.ipm.2017.05.004
|
[4] |
Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv:1508.01991.
|
[5] |
Cho H C, Okazaki N, Miwa M, et al. Named Entity Recognition with Multiple Segment Representations[J]. Information Processing & Management, 2013, 49(4):954-965.
doi: 10.1016/j.ipm.2013.03.002
|
[6] |
Ronan C, Jason W, Leon B, et al. Natural Language Processing (almost) from Scratch[J]. The Journal of Machine Learning Research, 2011, 12:2493-2537.
|
[7] |
Lample G, Ballesteros M, Subramanian S, et al. Neural Architectures for Named Entity Recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2016: 260-270.
|
[8] |
Peng N Y, Dredze M. Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2016: 149-155.
|
[9] |
Feng X C, Feng X C, Qin B, et al. Improving Low Resource Named Entity Recognition Using Cross-Lingual Knowledge Transfer[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4071-4077.
|
[10] |
Wang S L, Zhang Y, Che W X, et al. Joint Extraction of Entities and Relations Based on a Novel Graph Scheme[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4461-4467.
|
[11] |
Li Z, Zhou J, Zhao H, et al. Cross-domain Transfer Learning for Dependency Parsing[C]// Proceedings of the 2019 CCF International Conference on Natural Language Processing and Chinese Computing. Switzerland: Springer, 2019: 835-844.
|
[12] |
Cao Y X, Hu Z K, Chua T S, et al. Low-Resource Name Tagging Learned with Weakly Labeled Data[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2019: 261-270.
|
[13] |
Cao P, Chen Y, Liu K, et al. Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. ACL, 2018: 182-192.
|
[14] |
Peng N Y, Dredze M. Multi-Task Domain Adaptation for Sequence Tagging[C]// Proceedings of the 2nd Workshop on Representation Learning for NLP. Association for Computational Linguistics, 2017: 91-100.
|
[15] |
Wang Z H, Qu Y R, Chen L H, et al. Label-Aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018: 1-15.
|
[16] |
Yang Z L, Salakhutdinov R, Cohen W W. Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks[OL]. arXiv Preprint, arXiv: 1703.06345.
|
[17] |
Lin B Y, Lu W. Neural Adaptation Layers for Cross-Domain Named Entity Recognition[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 2012-2022.
|
[18] |
Lee J Y, Dernoncourt F, Szolovits P. Transfer Learning for Named-Entity Recognition with Neural Networks[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation. European Language Resources Association, 2018:4470-4473.
|
[19] |
Peng N Y, Dredze M. Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2016: 149-155.
|
[20] |
Dong C, Zhang J, Zong C, et al. Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition[C]// Proceedings of the 2016 International Conference on Computer Processing of Oriental Languages. Berlin, German: Springer, 2016: 239-250.
|
[21] |
Kim J, Woodl P C. A Rule-Based Named Entity Recognition System for Speech Input[C]// Proceedings of the 6th International Conference on Spoken Language Processing. Piscataway, NJ, USA: IEEE, 2000:521-524.
|
[22] |
Chieu H L, Ng H T. Named Entity Recognition: A Maximum Entropy Approach Using Global Information[C]// Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 2002: 1-7.
|
[23] |
Zhang J, Shen D, Zhou G D, et al. Enhancing HMM-Based Biomedical Named Entity Recognition by Studying Special Phenomena[J]. Journal of Biomedical Informatics, 2004, 37(6):411-422.
pmid: 15542015
|
[24] |
Li L, Mao T, Huang D, et al. Hybrid Models for Chinese Named Entity Recognition[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. ACL, 2006: 72-78.
|
[25] |
Duan H, Zheng Y. A Study on Features of the CRFs-Based Chinese Named Entity Recognition[J]. International Journal of Advanced Intelligence, 2011, 3(2):287-294.
|
[26] |
Han A L F, Wong D F, Chao L S. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics[C]// Proceedings of the 20th International Conference on Intelligent Information Systems. Berlin, Heidelberg: Springer, 2013: 57-68.
|
[27] |
Quimbaya A P, Múnera A S, Rivera R A G, et al. Named Entity Recognition over Electronic Health Records Through a Combined Dictionary-Based Approach[J]. Procedia Computer Science, 2016, 100:55-61.
doi: 10.1016/j.procs.2016.09.123
|
[28] |
Zhang S D, Elhadad N. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts[J]. Journal of Biomedical Informatics, 2013, 46(6):1088-1098.
doi: 10.1016/j.jbi.2013.08.004
|
[29] |
Ma X Z, Hovy E. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1064-1074.
|
[30] |
Nadeau D, Sekine S. A Survey of Named Entity Recognition and Classification[J]. Lingvisticæ Investigation, 2007, 30(1):3-26.
|
[31] |
Yang Z L, Salakhutdinov R, Cohen W. Multi-Task Cross-Lingual Sequence Tagging from Scratch[OL]. arXiv Preprint, arXiv: 1603.06270.
|
[32] |
Xiao M, Guo Y. Domain Adaptation for Sequence Labeling Tasks with a Probabilistic Language Adaptation Model[C]// Proceedings of the 30th International Conference on Machine Learning. German: Springer, 2013:293-301.
|
[33] |
Kulkarni V, Mehdad Y, Chevalier T. Domain Adaptation for Named Entity Recognition in Online Media with Word Embeddings[OL]. arXiv Preprint, arXiv:1612.00148.
|
[34] |
Che W, Wang M, Manning C D, et al. Named Entity Recognition with Bilingual Constraints[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics. 2013: 52-62.
|
[35] |
Liu Z H, Xiong C Y, Sun M S, et al. Explore Entity Embedding Effectiveness in Entity Retrieval[C]// Proceedings of the 2019 China National Conference on Chinese Computational Linguistics. Switzerland: Springer, 2019: 105-116.
|
[36] |
Pan J H, Hu X G, Li P P, et al. Domain Adaptation via Multi-Layer Transfer Learning[J]. Neurocomputing, 2016, 190:10-24.
doi: 10.1016/j.neucom.2015.12.097
|
[37] |
Hal Daumé III. Frustratingly Easy Domain Adaptation[C]// Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007: 256-263.
|
[38] |
Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[OL]. arXiv Preprint, arXiv: 1412.6980.
|
[39] |
iResearch. China Online Recruitment Industry Development Report[R/OL].(2019-07-11). http://report.iresearch.cn/report/201907/3409.shtml.
|
[40] |
Xu J J, He H F, Sun X, et al. Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(11):2142-2152.
doi: 10.1109/TASLP.2018.2856625
|
[41] |
Wu F Z, Liu J X, Wu C H, et al. Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation[C]//Proceedings of the 2019 World Wide Web Conference. New York: ACM Press, 2019: 3342-3348.
|
[42] |
Devlin J, Chang M, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|