Please wait a minute...
Data Analysis and Knowledge Discovery  2022, Vol. 6 Issue (2/3): 274-288    DOI: 10.11925/infotech.2096-3467.2021.0963
Current Issue | Archive | Adv Search |
Cross-domain Transfer Learning for Recognizing Professional Skills from Chinese Job Postings
Yi Xinhe1,Yang Peng2,Wen Yimin2()
1Library of Guilin University of Electronic Technology, Guilin 541004, China
2School of Computer Science and Information Security, Guilin University of Electronic Technology,Guilin 541004, China
Download: PDF (3408 KB)   HTML ( 20
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper analyzes the online job postings and identifies the demands of employers accurately, aiming to address the skill gaps between supply and demand in the labor market.[Methods] We proposed a model with cross-domain transfer learning to recognize professional skill words (CDTL-PSE). This task was treated as sequence tagging like named entity recognition or term extraction in CDTL-PSE. It also decomposed the SIGHAN corpus into three source domains. A domain adaptation layer was inserted between the Bi-LSTM and the CRF layers, which helped us transfer learning from each source domain to the target domain. Then, we used parameter transfer approach to train each sub-model. Finally, we obtained the prediction of label sequence by majority vote. [Results] On the self-built online recruitment data set, compared with the baseline method, the proposed model improved the F1 value by 0.91%, and reduced the labeled samples by about 50%. [Limitations] The interpretability of CDTL-PSE needs to be further improved. [Conclusions] CDTL-PSE can automatically extract words on professional skills, and effectively increase the labeled samples in the target domain.

Key wordsProfessional Skill Words      Cross Domain Transfer Learning      Domain Adaptation     
Received: 31 August 2021      Published: 07 January 2022
ZTFLH:  TP393  
Fund:Humanities and Social Sciences of Ministry of Education Planning Fund(17JDGC022);Graduate Education Reform Project of Guangxi(JGY2017055);Natural Science Foundation of Guangxi(2018GXNSFDA138006)
Corresponding Authors: Wen Yimin,ORCID:0000-0001-5017-3987     E-mail: ymwen@guet.edu.cn

Cite this article:

Yi Xinhe, Yang Peng, Wen Yimin. Cross-domain Transfer Learning for Recognizing Professional Skills from Chinese Job Postings. Data Analysis and Knowledge Discovery, 2022, 6(2/3): 274-288.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0963     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2022/V6/I2/3/274

Example of the Requirement of a Chinese Job Post Online(Professional Skill Words are Underlined)
Framework of Bi-LSTM
Framework of CDTL-PSE
实体类型 实体数目 动词后 形容词/形容词短语后 名词/名词短语后 “和”、“或”、“、”等并列形式 其他位置
技能词 3 930 1 309 764 2 1 512 343
人名 884 136 27 204 73 444
地名 1 951 649 106 57 87 1 052
机构名 984 278 54 36 128 488
Statistics of the Occurrences of Entities in Different Positions in the Corresponding Corpus
数据集 数据类型 实体类型 技能词数量 语句数量
目标域 训练集 技能词 [5836, 6586] 725
验证集 技能词 [593, 817] 80
测试集 技能词 [1577, 1903] 201
Statistics of the Target Domain Dataset
数据集 数据
类型
实体类型 实体数量 语句
数量
源域 训练集 人名/地名/机构名 8144/16571/9277 20 864
验证集 人名/地名/机构名 884/1951/984 2 318
测试集 人名/地名/机构名 1864/3658/2185 4 636
Statistics of the Source Domain Dataset
Network Model for Each Method
方法 Precision/% Recall/% F1/% 提升/
%
No-Transfer 80.04(3.12) 82.11(2.63) 81.03(2.41) -
Per_Alter 79.78(3.43) 82.59(2.64) 81.15(2.93) +0.12
Loc_Alter 79.76(2.84) 82.00(2.76) 80.84(2.50) -0.19
Org_Alter 80.07(3.33) 82.28(2.26) 81.13(2.50) +0.10
Diff_Alter 79.81(3.22) 81.42(2.58) 80.58(2.57) -0.45
Same_Alter 79.26(3.54) 81.72(3.29) 80.44(3.04) -0.59
Alter_Ensemble 80.67(2.77) 82.38(2.70) 81.51(2.61) +0.48
Per_Tune 80.13(2.37) 82.09(2.42) 81.09(2.18) +0.06
Loc_Tune 79.85(2.30) 81.58(3.02) 80.68(2.19) -0.35
Org_Tune 80.00(2.42) 82.02(2.30) 80.96(2.00) -0.07
Diff_Tune 79.27(3.43) 81.79(1.77) 80.49(2.44) -0.54
Same_Tune 79.38(2.24) 81.73(2.12) 80.52(1.96) -0.51
Tune_Ensemble 80.37(2.23) 82.13(2.55) 81.22(1.97) +0.19
Per_Domain 80.30(2.43) 82.35(2.57) 81.29(2.07) +0.26
Loc_Domain 79.71(2.88) 82.35(2.33) 80.99(2.26) -0.04
Org_Domain 80.25(2.73) 82.26(2.06) 81.22(2.14) +0.19
Diff_Domain 79.71(4.17) 81.76(3.31) 80.69(3.46) -0.34
Same_Domain 79.61(3.04) 81.88(2.36) 80.70(2.32) -0.33
Domain_Ensemble 80.82(3.34) 82.41(3.19) 81.58(2.96) +0.55
Results of Transferring Different Source Tasks to the Target Task
迁移学习方法 DiffLabel SameLabel Per Loc Org
方法B 80.58●■★ 80.44●■★ 81.15 80.84 81.13
方法C 80.49●■★ 80.52●■★ 81.09 80.68 80.96
方法D 80.69●■★ 80.70●■★ 81.29 80.99 81.22
The F1-Score of Different Transfer Learning Methods on the Target Dataset
迁移学习方法 Per Loc Org Ensemble
方法B 81.15● 80.84● 81.13● 81.51
方法C 81.09● 80.68● 80.96● 81.22
方法D 81.29● 80.99● 81.22● 81.58
The F1-Score of Different Transfer Learning Methods and an Ensemble Learning Method on the Target Dataset
方法 Precision/% Recall/% F1/% 提升/%
No-Transfer 80.04(3.12) 82.11(2.63) 81.03(2.41) -
Alter_Ensemble 80.67(2.77) 82.38(2.70) 81.51(2.61) +0.48
CDTL-PSE:Alter(特征扩充) 80.71(2.73) 82.51(2.45) 81.59(2.41) +0.56
CDTL- PSE:Alter(Bi-LSTM) 81.39(3.61) 82.58(2.29) 81.94(2.49) +0.91
Tune_Ensemble 80.37(2.23) 82.13(2.55) 81.22(1.97) +0.19
CDTL-PSE:Tune(特征扩充) 80.26(2.42) 82.41(2.64) 81.30(2.09) +0.27
CDTL-PSE:Tune(Bi-LSTM) 81.06(2.53) 82.19(1.88) 81.61(1.93) +0.58
Domain_Ensemble 80.82(3.34) 82.41(3.19) 81.58(2.96) +0.55
BERT-Bi-LSTM-CRF 80.03(3.09) 83.84(2.12) 81.86(2.14) +0.83
Results of Different Transfer Methods
迁移学习方法 Only Ensemble 特征扩充 Add Bi-LSTM
方法B 81.51●■ 81.59 81.94
方法C 81.22●■ 81.30 81.61
The F-Score of Different Methods on the Same Dataset
The F1-scores of CDTL-PSE and the Baseline Method Under Different Scales of the Target Domain Training Data
The Top 50 PSEs in Mechanical Manufacturing Field Recognized by CDTL-PSE
The Top 50 PSEs in Fashion Design Field Recognized by CDTL-PSE
[1] 麦可思研究院, 王伯庆, 陈永红. 2019年中国本科生就业报告[M]. 北京: 社会科学文献出版社, 2019.
[1] (MyCOS, Wang Boqing, Chen Yonghong. Chinese 4-Year College Graduates’ Employment Annual Report (2019)[M]. Beijing: Social Sciences Academic Press, 2019.)
[2] Phaphuangwittayakul A, Saranwong S, Panyakaew S N, et al. Analysis of Skill Demand in Thai Labor Market from Online Jobs Recruitments Websites[C]// Proceedings of the 15th International Joint Conference on Computer Science and Software Engineering. 2018: 1-5.
[3] Mauro A, Greco M, Grimaldi M, et al. Human Resources for Big Data Professions: A Systematic Classification of Job Roles and Required Skill Sets[J]. Information Processing & Management, 2018, 54(5):807-817.
doi: 10.1016/j.ipm.2017.05.004
[4] Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv:1508.01991.
[5] Cho H C, Okazaki N, Miwa M, et al. Named Entity Recognition with Multiple Segment Representations[J]. Information Processing & Management, 2013, 49(4):954-965.
doi: 10.1016/j.ipm.2013.03.002
[6] Ronan C, Jason W, Leon B, et al. Natural Language Processing (almost) from Scratch[J]. The Journal of Machine Learning Research, 2011, 12:2493-2537.
[7] Lample G, Ballesteros M, Subramanian S, et al. Neural Architectures for Named Entity Recognition[C]// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2016: 260-270.
[8] Peng N Y, Dredze M. Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2016: 149-155.
[9] Feng X C, Feng X C, Qin B, et al. Improving Low Resource Named Entity Recognition Using Cross-Lingual Knowledge Transfer[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4071-4077.
[10] Wang S L, Zhang Y, Che W X, et al. Joint Extraction of Entities and Relations Based on a Novel Graph Scheme[C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2018: 4461-4467.
[11] Li Z, Zhou J, Zhao H, et al. Cross-domain Transfer Learning for Dependency Parsing[C]// Proceedings of the 2019 CCF International Conference on Natural Language Processing and Chinese Computing. Switzerland: Springer, 2019: 835-844.
[12] Cao Y X, Hu Z K, Chua T S, et al. Low-Resource Name Tagging Learned with Weakly Labeled Data[C]// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. Association for Computational Linguistics, 2019: 261-270.
[13] Cao P, Chen Y, Liu K, et al. Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. ACL, 2018: 182-192.
[14] Peng N Y, Dredze M. Multi-Task Domain Adaptation for Sequence Tagging[C]// Proceedings of the 2nd Workshop on Representation Learning for NLP. Association for Computational Linguistics, 2017: 91-100.
[15] Wang Z H, Qu Y R, Chen L H, et al. Label-Aware Double Transfer Learning for Cross-Specialty Medical Named Entity Recognition[C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018: 1-15.
[16] Yang Z L, Salakhutdinov R, Cohen W W. Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks[OL]. arXiv Preprint, arXiv: 1703.06345.
[17] Lin B Y, Lu W. Neural Adaptation Layers for Cross-Domain Named Entity Recognition[C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018: 2012-2022.
[18] Lee J Y, Dernoncourt F, Szolovits P. Transfer Learning for Named-Entity Recognition with Neural Networks[C]// Proceedings of the 11th International Conference on Language Resources and Evaluation. European Language Resources Association, 2018:4470-4473.
[19] Peng N Y, Dredze M. Improving Named Entity Recognition for Chinese Social Media with Word Segmentation Representation Learning[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2016: 149-155.
[20] Dong C, Zhang J, Zong C, et al. Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition[C]// Proceedings of the 2016 International Conference on Computer Processing of Oriental Languages. Berlin, German: Springer, 2016: 239-250.
[21] Kim J, Woodl P C. A Rule-Based Named Entity Recognition System for Speech Input[C]// Proceedings of the 6th International Conference on Spoken Language Processing. Piscataway, NJ, USA: IEEE, 2000:521-524.
[22] Chieu H L, Ng H T. Named Entity Recognition: A Maximum Entropy Approach Using Global Information[C]// Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 2002: 1-7.
[23] Zhang J, Shen D, Zhou G D, et al. Enhancing HMM-Based Biomedical Named Entity Recognition by Studying Special Phenomena[J]. Journal of Biomedical Informatics, 2004, 37(6):411-422.
pmid: 15542015
[24] Li L, Mao T, Huang D, et al. Hybrid Models for Chinese Named Entity Recognition[C]// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. ACL, 2006: 72-78.
[25] Duan H, Zheng Y. A Study on Features of the CRFs-Based Chinese Named Entity Recognition[J]. International Journal of Advanced Intelligence, 2011, 3(2):287-294.
[26] Han A L F, Wong D F, Chao L S. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics[C]// Proceedings of the 20th International Conference on Intelligent Information Systems. Berlin, Heidelberg: Springer, 2013: 57-68.
[27] Quimbaya A P, Múnera A S, Rivera R A G, et al. Named Entity Recognition over Electronic Health Records Through a Combined Dictionary-Based Approach[J]. Procedia Computer Science, 2016, 100:55-61.
doi: 10.1016/j.procs.2016.09.123
[28] Zhang S D, Elhadad N. Unsupervised Biomedical Named Entity Recognition: Experiments with Clinical and Biological Texts[J]. Journal of Biomedical Informatics, 2013, 46(6):1088-1098.
doi: 10.1016/j.jbi.2013.08.004
[29] Ma X Z, Hovy E. End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016: 1064-1074.
[30] Nadeau D, Sekine S. A Survey of Named Entity Recognition and Classification[J]. Lingvisticæ Investigation, 2007, 30(1):3-26.
[31] Yang Z L, Salakhutdinov R, Cohen W. Multi-Task Cross-Lingual Sequence Tagging from Scratch[OL]. arXiv Preprint, arXiv: 1603.06270.
[32] Xiao M, Guo Y. Domain Adaptation for Sequence Labeling Tasks with a Probabilistic Language Adaptation Model[C]// Proceedings of the 30th International Conference on Machine Learning. German: Springer, 2013:293-301.
[33] Kulkarni V, Mehdad Y, Chevalier T. Domain Adaptation for Named Entity Recognition in Online Media with Word Embeddings[OL]. arXiv Preprint, arXiv:1612.00148.
[34] Che W, Wang M, Manning C D, et al. Named Entity Recognition with Bilingual Constraints[C]// Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics. 2013: 52-62.
[35] Liu Z H, Xiong C Y, Sun M S, et al. Explore Entity Embedding Effectiveness in Entity Retrieval[C]// Proceedings of the 2019 China National Conference on Chinese Computational Linguistics. Switzerland: Springer, 2019: 105-116.
[36] Pan J H, Hu X G, Li P P, et al. Domain Adaptation via Multi-Layer Transfer Learning[J]. Neurocomputing, 2016, 190:10-24.
doi: 10.1016/j.neucom.2015.12.097
[37] Hal Daumé III. Frustratingly Easy Domain Adaptation[C]// Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007: 256-263.
[38] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization[OL]. arXiv Preprint, arXiv: 1412.6980.
[39] iResearch. China Online Recruitment Industry Development Report[R/OL].(2019-07-11). http://report.iresearch.cn/report/201907/3409.shtml.
[40] Xu J J, He H F, Sun X, et al. Cross-Domain and Semisupervised Named Entity Recognition in Chinese Social Media: A Unified Model[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018, 26(11):2142-2152.
doi: 10.1109/TASLP.2018.2856625
[41] Wu F Z, Liu J X, Wu C H, et al. Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation[C]//Proceedings of the 2019 World Wide Web Conference. New York: ACM Press, 2019: 3342-3348.
[42] Devlin J, Chang M, Lee K, et al. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding[C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2019: 4171-4186.
[1] Wang Xiaoqing, Chen Dong. Simulating Dynamics Prediction with Collaborative Allocation System for Blockchain Resources: Case Study of Guangdong-HongKong-Macao Greater Bay Area[J]. 数据分析与知识发现, 2022, 6(2/3): 138-150.
[2] Xu Yuemei, Fan Zuwei, Cao Han. A Multi-Task Text Classification Model Based on Label Embedding of Attention Mechanism[J]. 数据分析与知识发现, 2022, 6(2/3): 105-116.
[3] Yu Chuanming, Lin Hongjun, Zhang Zhengang. Joint Extraction Model for Entities and Events with Multi-task Deep Learning[J]. 数据分析与知识发现, 2022, 6(2/3): 117-128.
[4] Zhang Fangcong, Qin Qiuli, Jiang Yong, Zhuang Runtao. Named Entity Recognition for Chinese EMR with RoBERTa-WWM-BiLSTM-CRF[J]. 数据分析与知识发现, 2022, 6(2/3): 251-262.
[5] Wei Tingting, Jiang Tao, Zheng Shuling, Zhang Jiantao. Extracting Chinese Patent Keywords with LSTM and Logistic Regression[J]. 数据分析与知识发现, 2022, 6(2/3): 308-317.
[6] Zhou Yunze, Min Chao. Identifying Emerging Technology with LDA Model and Shared Semantic Space——Case Study of Autonomous Vehicles[J]. 数据分析与知识发现, 2022, 6(2/3): 55-66.
[7] Li Gang, Zhang Ji, Mao Jin. Social Media Image Classification for Emergency Portrait[J]. 数据分析与知识发现, 2022, 6(2/3): 67-79.
[8] Sun Yu, Qiu Jiangnan. Studying Opinion Leaders with Network Analysis and Text Mining[J]. 数据分析与知识发现, 2022, 6(1): 69-79.
[9] Zhu Dongliang, Wen Yi, Wan Zichen. Review of Recommendation Systems Based on Knowledge Graph[J]. 数据分析与知识发现, 2021, 5(12): 1-13.
[10] Zhu Lu, Deng Fang, Liu Kun, He Tingting, Liu Yuanyuan. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
[11] Zhao Zheng, Huang Qianqian, Tong Nannan. Evaluating SMEs-Supporting Policies During COVID-19 Pandemic with K-Means Clustering[J]. 数据分析与知识发现, 2021, 5(12): 148-157.
[12] Chen Jie,Ma Jing,Li Xiaofeng. Short-Text Classification Method with Text Features from Pre-trained Models[J]. 数据分析与知识发现, 2021, 5(9): 21-30.
[13] Li Wenna,Zhang Zhixiong. Research on Knowledge Base Error Detection Method Based on Confidence Learning[J]. 数据分析与知识发现, 2021, 5(9): 1-9.
[14] Sun Yu, Qiu Jiangnan. Research on Influence of Opinion Leaders Based on Network Analysis and Text Mining [J]. 数据分析与知识发现, 0, (): 1-.
[15] Wang Qinjie, Qin Chunxiu, Ma Xubu, Liu Huailiang, Xu Cunzhen. Recommending Scientific Literature Based on Author Preference and Heterogeneous Information Network[J]. 数据分析与知识发现, 2021, 5(8): 54-64.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn