基于YOLOv5-ECA-BiFPN的学术期刊文献图表识别与提取方法研究<sup>*</sup>

doi:10.11925/infotech.2096-3467.2022.1026

数据分析与知识发现

2023, Vol. 7

Issue (11): 158-171 https://doi.org/10.11925/infotech.2096-3467.2022.1026

研究论文

本期目录 | 过刊浏览 | 高级检索

基于YOLOv5-ECA-BiFPN的学术期刊文献图表识别与提取方法研究^*

李英群^1,²,李亚菲^1,²(

),裴雷^1,²,胡志伟^1,²,宋宁远^1,²

¹南京大学信息管理学院南京 210023
²南京大学数据智能与交叉创新实验室南京 210023

Identifying and Extracting Figures and Tables from Academic Literature Based on YOLOv5-ECA-BiFPN

Li Yingqun^1,²,Li Yafei^1,²(

),Pei Lei^1,²,Hu Zhiwei^1,²,Song Ningyuan^1,²

¹School of Information Management, Nanjing University, Nanjing 210023, China
²Laboratory of Data Intelligence and Cross Innovation, Nanjing University, Nanjing 210023, China

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF (22765 KB) HTML ( 17 )
输出: BibTeX | EndNote (RIS)

摘要

【目的】精准识别与提取学术期刊文献中的图表，促进学术图表的传播和交流。【方法】在YOLOv5算法中引入ECA通道注意力模块，并优化PAN模块为BiFPN，随机抽样13个学科门类1 300篇学术期刊文献作为实验数据，利用poppler-0.68.0将其转换为高质量的图片，并基于该数据集验证新算法性能。【结果】相较于次优值，新算法F₁值提高1.99个百分点，达到99.88%。【局限】数据标注范围与数量有待扩大，可覆盖至更多场景。【结论】基于YOLOv5-ECA-BiFPN的学术期刊文献图表识别与提取方法能够有效提高特殊场景下的图表识别与提取效果。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	李英群
	李亚菲
	裴雷
	胡志伟
	宋宁远

关键词 ：学术期刊文献, YOLOv5-ECA-BiFPN, 学术图表

Abstract：

[Objective] This paper aims to accurately identify and extract figures and tables from academic literature, which promotes the dissemination of academic achievements. [Methods] First, we introduced the ECA channel attention module into the YOLOv5 algorithm and replaced the PAN module with BiFPN. Then, we randomly chose 1300 scholarly articles from thirteen subjects as experimental data and converted them to high-quality images using poppler-0.68.0. Finally, we examined the performance of the new algorithm on this dataset. [Results] Compared with the suboptimal algorithm, the F1 value of the new model improved by 1.99% to 99.88% when applied to the dataset. [Limitations] The scope and quantity of data annotation needs to be expanded to more scenarios. [Conclusions] YOLOv5-ECA-BiFPN can effectively improve the recognition of figures and tables from academic journals.

Key words： Academic Journal Literature YOLOv5-ECA-BiFPN Academic Figures and Tables

收稿日期: 2022-09-08 出版日期: 2023-03-22

ZTFLH:

TP391 G256

基金资助:*2022年度南京大学数据智能与交叉创新实验室研究项目的研究成果之一

通讯作者: 李亚菲，ORCID：0000-0003-1754-2300，E-mail：dg20140013@smail.nju.edu.cn。

引用本文:

李英群, 李亚菲, 裴雷, 胡志伟, 宋宁远. 基于YOLOv5-ECA-BiFPN的学术期刊文献图表识别与提取方法研究^*[J]. 数据分析与知识发现, 2023, 7(11): 158-171.
Li Yingqun, Li Yafei, Pei Lei, Hu Zhiwei, Song Ningyuan. Identifying and Extracting Figures and Tables from Academic Literature Based on YOLOv5-ECA-BiFPN. Data Analysis and Knowledge Discovery, 2023, 7(11): 158-171.

链接本文:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.2096-3467.2022.1026 或 https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2023/V7/I11/158

Fig.1 技术框架

Fig.2 YOLOv5算法网络结构^[23]

Fig.3 Focus模块切片操作

Fig.4 ECA模块^[26]

Fig.5 FPN、PAN和BiFPN结构^[27]

Fig.6 学术图表识别与提取数据集

Fig.7 LabelImage标注示例

Fig.8 标签数据集信息分布

Table 1 实验环境配置

Fig.9 损失函数曲线

Fig.10 模型训练结果

Table 3 基线算法性能对比

Fig.11 噪声图像干扰下的识别效果

Fig.12 低像素图像识别效果

Fig.13 不均匀分布图像识别效果

Fig.14 多样化图像识别效果

Fig.15 多类型图像提取结果

Fig.16 语义偏差图表提取效果

Fig.17 矢量图像识别效果对比

Fig.18 表格识别效果

[1]	丁培. 学术图表知识发现技术框架及研究进展[J]. 图书情报工作, 2021, 65(23): 136-148. doi: 10.13266/j.issn.0252-3116.2021.23.015
[1]	(Ding Pei. The Technical Framework and Research Progress of Knowledge Discovery in Academic Figures and Tables[J]. Library and Information Service, 2021, 65(23): 136-148.) doi: 10.13266/j.issn.0252-3116.2021.23.015
[2]	Liu Y L, Si C K, Jin K, et al. FCENet: An Instance Segmentation Model for Extracting Figures and Captions From Material Documents[J]. IEEE Access, 2020, 9: 551-564. doi: 10.1109/Access.6287639
[3]	Clark C, Divvala S. PDFFigures 2.0: Mining Figures from Research Papers[C]// Proceedings of 2016 IEEE/ACM Joint Conference on Digital Libraries. 2016: 143-152.
[4]	于丰畅, 陆伟. 一种学术文献图表位置标注数据集构建方法[J]. 数据分析与知识发现, 2020, 4(6): 35-42.
[4]	(Yu Fengchang, Lu Wei. Constructing Data Set for Location Annotations of Academic Literature Figures and Tables[J]. Data Analysis and Knowledge Discovery, 2020, 4(6): 35-42.)
[5]	Glyph & Cog. Xpdf[EB/OL]. [2022-09-13]. http://www.xp.dfreader.com.
[6]	Choudhury S R, Giles C L. An Architecture for Information Extraction from Figures in Digital Libraries[C]// Proceedings of the 24th International Conference on World Wide Web. 2015: 667-672.
[7]	Simon A, Pret J C, Johnson A P. A Fast Algorithm for Bottom-Up Document Layout Analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(3): 273-277. doi: 10.1109/34.584106
[8]	Apache Software Foundation. Apache PDFBox[EB/OL]. [2022-05-13]. https://pdfbox.apache.org.
[9]	Yusuke S. PDFMiner[EB/OL]. [2022-09-13]. https://github.com/euske/pdfminer.
[10]	Hassan T. Object-Level Document Analysis of PDF Files[C]// Proceedings of the 9th ACM Symposium on Document Engineering. 2009: 47-55.
[11]	于丰畅, 程齐凯, 陆伟. 基于几何对象聚类的学术文献图表定位研究[J]. 数据分析与知识发现, 2021, 5(1): 140-149.
[11]	(Yu Fengchang, Cheng Qikai, Lu Wei. Locating Academic Literature Figures and Tables with Geometric Object Clustering[J]. Data Analysis and Knowledge Discovery, 2021, 5(1): 140-149.)
[12]	Praczyk P A, Nogueras-Iso J. Automatic Extraction of Figures from Scientific Publications in High-Energy Physics[J]. Information Technology and Libraries, 2013, 32(4): 25-52. doi: 10.6017/ital.v32i4.3670
[13]	Li P Y, Jiang X Y, Shatkay H. Figure and Caption Extraction from Biomedical Documents[J]. Bioinformatics, 2019, 35(21): 4381-4388. doi: 10.1093/bioinformatics/btz228 pmid: 30949681
[14]	Siegel N, Lourie N, Power R, et al. Extracting Scientific Figures with Distantly Supervised Neural Networks[C]// Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 2018: 223-232.
[15]	Li P Y, Jiang X Y, Shatkay H. Figure and Caption Extraction from Biomedical Documents[J]. Bioinformatics, 2019, 35(21): 4381-4388. doi: 10.1093/bioinformatics/btz228 pmid: 30949681
[16]	Chen K, Seuret M, Liwicki M, et al. Page Segmentation of Historical Document Images with Convolutional Autoencoders[C]// Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015: 1011-1015.
[17]	Amin A, Shiu R. Page Segmentation and Classification Utilizing Bottom-Up Approach[J]. International Journal of Image and Graphics, 2001, 1(2): 345-361. doi: 10.1142/S0219467801000219
[18]	Mehri M, Héroux P, Gomez-Krämer P, et al. Texture Feature Benchmarking and Evaluation for Historical Document Image Analysis[J]. International Journal on Document Analysis and Recognition, 2017, 20(1): 1-35. doi: 10.1007/s10032-016-0278-y
[19]	Ha J, Haralick R M, Phillips I T. Recursive X-Y Cut Using Bounding Boxes of Connected Components[C]// Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995: 952-955.
[20]	张建东, 陈仕吉, 徐小婷, 等. 基于词向量的PDF表格抽取研究[J]. 数据分析与知识发现, 2021, 5(8): 34-44.
[20]	(Zhang Jiandong, Chen Shiji, Xu Xiaoting, et al. Extracting PDF Tables Based on Word Vectors[J]. Data Analysis and Knowledge Discovery, 2021, 5(8): 34-44.)
[21]	Hassan T, Baumgartner R. Table Recognition and Understanding from PDF Files[C]// Proceedings of the 9th International Conference on Document Analysis and Recognition. 2007: 1143-1147.
[22]	唐锐, 邓建新, 叶志兴, 等. PDF文件的表格抽取研究综述[J]. 计算机应用与软件, 2021, 38(7): 1-7.
[22]	(Tang Rui, Deng Jianxin, Ye Zhixing, et al. Survey of Table Extraction in PDF Documents[J]. Computer Applications and Software, 2021, 38(7): 1-7.)
[23]	Ultralytics. YOLOv5[OL].[2022-11-12]. https://github.com/ultralytics/yolov5.
[24]	Lin T Y, Dollár P, Girshick R, et al. Feature Pyramid Networks for Object Detection[C]// Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017: 936-944.
[25]	Liu S, Qi L, Qin H F, et al. Path Aggregation Network for Instance Segmentation[C]// Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 8759-8768.
[26]	Wang Q L, Wu B G, Zhu P F, et al. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11531-11539.
[27]	Tan M X, Pang R M, Le Q V. EfficientDet: Scalable and Efficient Object Detection[C]// Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 10778-10787.

No related articles found!

Viewed

Full text

Abstract

Cited

Shared

Discussed