%A Yu Fengchang,Cheng Qikai,Lu Wei %T Locating Academic Literature Figures and Tables with Geometric Object Clustering %0 Journal Article %D 2021 %J Data Analysis and Knowledge Discovery %R 10.11925/infotech.2096-3467.2020.0630 %P 140-149 %V 5 %N 1 %U {https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/abstract/article_4923.shtml} %8 2021-01-25 %X

[Objective] This paper tries to improve the recall of figures/tables from academic literature. [Methods] First, we extracted geometric objects from the PDF files of literature. Then, we obtained priori information on scopes of figures/tables from the perspectives of underlying coding analysis and image comprehension. Third, we merged the geometric objects using K-means. Finally, we reconstructed the text contents using heuristic algorithm to determine the locations of figures/tables. [Results] On the experimental dataset, the precision of the proposed algorithm reached 0.915 and the recall was 0.918. The precision level is close to the state-of-the-art algorithms and the recall value was improved by 0.193 (26.6% better than the existing ones). [Limitations] Documents with complex layouts and irregular use of symbols will generate errors. The determination of the clustering k value and the algorithm for text filtering could be improved. [Conclusions] The proposed algorithm effectively increases the recall of figures/tables from academic literature.