Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (1): 131-138    DOI: 10.11925/infotech.2096-3467.2019.0943
Retrieving Scientific Documents with Formula Description Structure and Word Embedding
Xinyu Zai,Xuedong Tian()
School of Cyber Security and Computer, Hebei University, Baoding 071002, China
[Objective] This study proposes a scientific document retrieval method combining formula match and text ranking, which address the challenges from mathematical expressions.[Methods] First, we used the analysis algorithm for formula description structure to study the mathematical expressions. Then, we acquired formula structure information, and retrieved technical documents based on mathematical expressions. Meanwhile, we obtained the inquiry keywords and document word vectors with the help of word embedding model. Finally, we ranked the documents based on the similarity between the two word vectors[Results] The recall and precision scores of our new model were 0.77 and 0.63, which were 24.2% and 23.5% higher than those of the traditional scientific document retrieval methods.[Limitations] Our method only focuses on expressions in LaTeX format.[Conclusions] The proposed model combining formula and document keywords improves the performance of scitific document retrieval.

Received: 13 August 2019      Published: 14 March 2020
 ZTFLH: TP311
Corresponding Authors: Xuedong Tian     E-mail: xuedong_tian@126.com