Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (7): 81-90    DOI: 10.11925/infotech.2096-3467.2021.0145
Current Issue | Archive | Adv Search |
Constructing Knowledge Graph with Public Resumes
Shen Kejie,Huang Huanting,Hua Bolin()
Department of Information Management, Peking University, Beijing 100871, China
Download: PDF (1480 KB)   HTML ( 25
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper constructs knowledge graph based on the public resume data with natural language processing technology, which provides new tool for traditional data analysis. [Context] The proposed method could automatically extract profesional backgrounds and job information from resumes, and then obtain the relationship of working experience and colleagues in the organizations. The visualized knowledge graph could provide decision support for talent selection, personnel appointment and removal tasks of enterprises and institutions. [Methods] First, we used crawler to obtain the resume data and used the BERT-BiLSTM-CRF model to recognize entities. Then, we established the relationship between entities by defining rules and integrating the external domain knowledge. Finally, we used neo4j graph database to store and visualize data. [Results] The accuracy of the BERT-BiLSTM-CRF model with the entity recognition task was 84.85%. The constructed knowledge graph, which included resumes of 561 people, 8,174 entities in 3 categories, and 20,162 relationships in 5 categories, could support multi-angle queries and data mining. [Conclusions] This proposed model explores the internal relationships among resumes and provides a novel way to analyze resumes. However, there are few precise entity alignment processing and the establishment of relationships among institution entities.

Key wordsRusume Analyse      Knowledge Graph      NER      Characters Knowledge Graph     
Received: 11 February 2021      Published: 11 August 2021
ZTFLH:  TP391  
Fund:National Social Science Fund of China(17BTQ066)
Corresponding Authors: Hua Bolin,ORCID:0000-0001-9248-6455     E-mail: huabolin@pku.edu.cn

Cite this article:

Shen Kejie, Huang Huanting, Hua Bolin. Constructing Knowledge Graph with Public Resumes. Data Analysis and Knowledge Discovery, 2021, 5(7): 81-90.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0145     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I7/81

The Framework of Resumes Knowledge Graph Construction Based on Open Resource Data
The Architecture of BERT-BiLSTM-CRF Model
Schematic Diagram of Relation Extraction Rules
序号 规则 规则阐释
嵌入在机构标签内部的职位标签更改为机构标签 如“北京市人大常委会”中的“常委”会识别为职位,“常委”两字的标签修正为机构
机构地点指代消解 如“市财政局”中的市指代前半句“北京市人大常委会”中的“北京市”
机构职位一对多关系 机构数少于职位数的情形下采用职位向左最近匹配的方法构建职位与机构的关系
机构粒度处理 (1)在任职信息中,“、”后一般为职位信息,该职位与前句的机构有关;
(2)“,”后一般为区别于前句的新机构任职信息;
(3)抽取出以“,”分割分句下的第一个机构,该分句内其他的的机构定义为子机构,子机构与职位进行合并;
(4)若识别出子机构且子机构内出现地点信息,如“中国银行|辽宁省分行”,则不对识别出的两机构切分处理
机构名变迁 如“电子工业部、机械电子工业部”存在机构名包含(变迁)情况,职位与位置靠前的机构进行配对
兼任职务处理 若“兼”字后面仍识别出机构,则需要在“兼”字处对句子进行切分,抽取出兼职所属机构的信息
Six Rules for Relationships Extraction
类型 示例 操作
表述省略 部分机构实体有简称与全称多种表述,如“中国石油化工集团公司”在某履历中简写为“中国石化总公司” 若一机构名为另一机构名子字符串,剔除“集团”等停用词后且文本编辑距离在2以内,标记为同一实体并统一为全称
名称变更 在不同历史时期,同一机构实体使用不同名称,如“中国长江三峡集团公司”与“中国长江三峡工程开发总公司”为同一公司在不同时期的名称 若一机构名各字符顺序存在于另一机构名中,剔除“集团”等停用词后且文本编辑距离在5以内,化作字向量[20]并计算余弦相似度,大于0.9阈值则标记为同一实体并统一为时代靠后的名称表述
机构变迁 时代发展所导致的组织机构撤销、重组及调整现象,如“国土资源部“等部门重组为“自然资源部” 该情况实例数量较少,但难以自动化辨识。需人工借助外部知识更正名称表述
Types of Coreference Resolution
实体名 属性 属性取值 数量
姓名 干部姓名 561
地点 地名 地点名,如“河北” 3 317
等级 地点行政区域等级,如“省级”
机构 机构名 机构称谓,如“河北省委” 4 296
Entities and Their Attribute Descriptions
关系名 关系语义 头实体 尾实体 属性 属性取值 数量
出生于 某人出生于某地,如某某出生于五峰县 地点 - - 548
毕业于 某人毕业于某校,如某某毕业于北京大学 机构 - - 515
任职于 某人任职于某机构,如某某任职于河北省委 机构 开始时间 任期开始时间 12 241
结束时间 任期结束时间
位于 某机构位于某地,如北京大学位于北京 机构 地点 - - 3 544
属于 某地属于某地,如石家庄属于河北 地点 地点 - - 3 314
Description of Relationships and Their Attributes
实体类型 准确率/% 召回率/% F1值/%
地点 81.93 78.29 80.07
机构 78.84 81.8 80.29
职位 90.74 87.53 89.11
姓名 90.55 94.24 92.36
Evaluation of Various Entity Recognition Results of BERT-BilSTM-CRF Model
模型 准确率/% 召回率/% F1值/%
IDCNN-CRF 77.29 76.76 77.02
BiLSTM-CRF 78.86 76.91 77.87
BERT-BiLSTM-CRF 84.85 84.51 84.68
Model Performance
Application Examples of Knowledge Graph
[1] 田瑞强, 姚长青, 潘云涛, 等. 基于履历数据的海外华人高层次科技人才流动研究: 社会网络分析视角[J]. 图书情报工作, 2014, 58(19):92-99.
[1] (Tian Ruiqiang, Yao Changqing, Pan Yuntao, et al. Using the Curriculum Vitae for Career Mobility Research of Chinese Overseas Highly-Talent: From the Perspective of Social Network Analysis[J]. Library and Information Service, 2014, 58(19):92-99.)
[2] 马秀玲, 饶帅. 少数民族地区基层公务员晋升的影响因素研究——基于县处级正职领导干部的履历分析[J]. 西北民族大学学报(哲学社会科学版), 2016(4):53-63.
[2] (Ma Xiuling, Rao Shuai. On Influence Factor of Promotion of Basic Unit Public Servants in Ethnic Area——Case Study of CVs of County-level Principals[J]. Journal of Northwest Minzu University (Philosophy and Social Sciences), 2016(4):53-63.)
[3] Hamman J A. Career Experience and Performing Effectively as Governor[J]. American Review of Public Administration, 2004, 34(2):151-163.
doi: 10.1177/0275074004263758
[4] Sun J J, Cole M, Huang Z Y, et al. Chinese Leadership: Provincial Perspectives on Promotion and Performance[J]. Environment and Planning C: Politics and Space, 2018, 37(4):750-772.
doi: 10.1177/2399654418791580
[5] 任宁. 大规模真实文本中的人物职衔信息提取研究[D]. 北京: 北京语言大学, 2008.
[5] (Ren Ning. Personal Position and Title Information Extraction in Large-Scale Real Texts[D]. Beijing: Beijing Language and Culture University, 2008.)
[6] 谷楠楠, 冯筠, 孙霞, 等. 中文简历自动解析及推荐算法[J]. 计算机工程与应用, 2017, 53(18):141-148, 270.
[6] (Gu Nannan,(Feng Yun,(Sun Xia, et al. Chinese Resume Information Automatic Extraction and Recommendation Algorithm[J]. Computer Engineering and Applications, 2017, 53(18):141-148, 270.)
[7] Dong F, Wang J N. Personal Information Extraction of the Teaching Staff Based on CRFs[C]// Proceedings of 2015 International Conference on Network & Information Systems for Computers. 2015: 615-617.
[8] 祖石诚, 王修来, 曹阳, 等. 基于新型文本块分割法的简历解析[J]. 计算机科学, 2020, 47(S1):95-101.
[8] (Zu Shicheng, Wang Xiulai, Cao Yang, et al. Resume Parsing Based on Novel Text Block Segmentation Methodology[J]. Computer Science, 2020, 47(S1):95-101.)
[9] Gaur B, Saluja G S, Sivakumar H B, et al. Semi-supervised Deep Learning Based Named Entity Recognition Model to Parse Education Section of Resumes[J]. Neural Computing and Applications, 2021, 33:5705-5718.
doi: 10.1007/s00521-020-05351-2
[10] 曹烃. 体育科研论文合著状况分析——基于知识图谱的CSSCI文献计量分析[J]. 北京体育大学学报, 2012, 35(9):49-54.
[10] (Cao Ting. Analysis on the Co-author Status of the Sports Scientific Research Thesis——A Study Based on the Knowledge Map of CSSCI Literature Metrological Analysis[J]. Journal of Beijing Sport University, 2012, 35(9):49-54.)
[11] 杨海慈, 王军. 宋代学术师承知识图谱的构建与可视化[J]. 数据分析与知识发现, 2019, 3(6):109-116.
[11] (Yang Haici, Wang Jun. Visualizing Knowledge Graph of Academic Inheritance in Song Dynasty[J]. Data Analysis and Knowledge Discovery, 2019, 3(6):109-116.)
[12] 王晓萍, 郭梦洁, 岳婧雯. 基于关系图谱的人岗关系研究[J]. 大数据, 2020, 6(6):129-139.
[12] (Wang Xiaoping, Guo Mengjie, Yue Jingwen. Research on Person-Position Relationship Based on Relation Graph[J]. Big Data Research, 2020, 6(6):129-139.)
[13] He Y, Yun H Y, Lin L. The Character Relationship Mining Based on Knowledge Graph and Deep Learning[C]// Proceedings of the 5th International Conference on Big Data Computing and Communications (BIGCOM). 2019: 22-27.
[14] Huang Z H, Xu W, Yu K. Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv:1508.01991.
[15] Devlin J, Chang M W, Lee K, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [OL]. arXiv Preprint, arXiv:1810.04805.
[16] 王子牛, 姜猛, 高建瓴, 等. 基于BERT的中文命名实体识别方法[J]. 计算机科学, 2019, 46(S2):138-142.
[16] (Wang Ziniu, Jiang Meng, Gao Jianling, et al. Chinese Named Entity Recognition Method Based on BERT[J]. Computer Science, 2019, 46(S2):138-142.)
[17] 中国政要资料库[EB/OL]. [2021-01-30]. http://cpc.people.com.cn/GB/64162/394696/index.html.
[17] (Database of Chinese Politicians[EB/OL]. [2021-01-30]. http://cpc.people.com.cn/GB/64162/394696/index.html.)
[18] 地方党政领导人物库[EB/OL]. [2021-01-30]. http://district.ce.cn/zt/rwk/index.shtml.
[18] (Database of Local Party and Government Leaders[EB/OL]. [2021-01-30]. http://district.ce.cn/zt/rwk/index.shtml.)
[19] Jiao Z Y, Sun S Q, Ke S. Chinese Lexical Analysis with Deep Bi-GRU-CRF Network[OL]. arXiv Preprint, arXiv:1807.01882.
[20] Li S, Zhao Z, Hu R F, et al. Analogical Reasoning on Chinese Morphological and Semantic Relations[C]// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018: 138-143.
[1] Zhou Yang,Li Xuejun,Wang Donglei,Chen Fang,Peng Lijuan. Visualizing Knowledge Graph for Explosive Formula Design[J]. 数据分析与知识发现, 2021, 5(9): 42-53.
[2] Jiang Yaren, Le Xiaoqiu. Continual Learning for One-to-many Entity Relationship Generation with Small Samples[J]. 数据分析与知识发现, 2021, 5(8): 45-53.
[3] Wang Hao, Lin Kerou, Meng Zhen, Li Xinlei. Identifying Multi-Type Entities in Legal Judgments with Text Representation and Feature Generation[J]. 数据分析与知识发现, 2021, 5(7): 10-25.
[4] Zhang Le, Leng Jidong, Lv Xueqiang, Cui Zhuo, Wang Lei, You Xindong. RLCPAR: A Rewriting Model for Chinese Patent Abstracts Based on Reinforcement Learning[J]. 数据分析与知识发现, 2021, 5(7): 59-69.
[5] Gao Yilin,Min Chao. Comparing Technology Diffusion Structure of China and the U.S. to Countries Along the Belt and Road[J]. 数据分析与知识发现, 2021, 5(6): 80-92.
[6] Ruan Xiaoyun,Liao Jianbin,Li Xiang,Yang Yang,Li Daifeng. Interpretable Recommendation of Reinforcement Learning Based on Talent Knowledge Graph Reasoning[J]. 数据分析与知识发现, 2021, 5(6): 36-50.
[7] Song Ruoxuan,Qian Li,Du Yu. Identifying Academic Creative Concept Topics Based on Future Work of Scientific Papers[J]. 数据分析与知识发现, 2021, 5(5): 10-20.
[8] Li He,Liu Jiayu,Li Shiyu,Wu Di,Jin Shuaiqi. Optimizing Automatic Question Answering System Based on Disease Knowledge Graph[J]. 数据分析与知识发现, 2021, 5(5): 115-126.
[9] Li Feifei,Wu Fan,Wang Zhongqing. Sentiment Analysis with Reviewer Types and Generative Adversarial Network[J]. 数据分析与知识发现, 2021, 5(4): 72-79.
[10] Dai Bing,Hu Zhengyin. Review of Studies on Literature-Based Discovery[J]. 数据分析与知识发现, 2021, 5(4): 1-12.
[11] Hu Shaohu,Zhang Yingyi,Zhang Chengzhi. Review of Keyword Extraction Studies[J]. 数据分析与知识发现, 2021, 5(3): 45-59.
[12] Zhao Tianzi, Duan Liang, Yue Kun, Qiao Shaojie, Ma Zijuan. Generating News Clues with Biterm Topic Model[J]. 数据分析与知识发现, 2021, 5(2): 1-13.
[13] Yu Chuanming, Zhang Zhengang, Kong Lingge. Comparing Knowledge Graph Representation Models for Link Prediction[J]. 数据分析与知识发现, 2021, 5(11): 29-44.
[14] Liang Ye,Li Xiaoyuan,Xu Hang,Hu Yiran. CLOpin: A Cross-Lingual Knowledge Graph Framework for Public Opinion Analysis and Early Warning[J]. 数据分析与知识发现, 2020, 4(6): 1-14.
[15] Jiao Qihang,Le Xiaoqiu. Generating Sentences of Contrast Relationship[J]. 数据分析与知识发现, 2020, 4(6): 43-50.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn