Please wait a minute...
Data Analysis and Knowledge Discovery  2020, Vol. 4 Issue (7): 118-126    DOI: 10.11925/infotech.2096-3467.2019.1294
Current Issue | Archive | Adv Search |
Retrieving Mathematical Expressions Based on Hesitant Fuzzy Weight
Xu Yicong,Tian Xuedong(),Li Xinfu,Yang Fang,Shi Qingxuan
School of Cyber Security and Computer, Hebei University, Baoding 071002, China
Download: PDF (845 KB)   HTML ( 7
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This paper proposes a retrieval method for mathematical expressions, aiming to find items matching the queries from a large collection of math expressions.[Methods] Firstly, we extracted characteristic subformulas of each single mathematical expression and introduced the theory of hesitant fuzzy sets(HFSs) to compute their weights. Secondly, we added the weight values of all subformulas belonging to the same expression as the similarity scores between the index and query. Finally, we ranked retrieved results with the similarity scores.[Results] The proposed method had higher retrieval efficiency and better results than traditional methods, with the highest NDCG value reached 0.88.[Limitations] Our method did not fully address the semantics of mathematical expressions.[Conclusions] The proposed method could retrieve the needed mathematical expressions more accurately.

Key wordsMathematical Expressions Retrieval      HFSs      Weight of Subformula      Similarity Score     
Received: 02 December 2019      Published: 25 July 2020
ZTFLH:  TP393 G250  
Corresponding Authors: Tian Xuedong     E-mail: xuedong_tian@126.com

Cite this article:

Xu Yicong,Tian Xuedong,Li Xinfu,Yang Fang,Shi Qingxuan. Retrieving Mathematical Expressions Based on Hesitant Fuzzy Weight. Data Analysis and Knowledge Discovery, 2020, 4(7): 118-126.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2019.1294     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2020/V4/I7/118

The Process of Mathematical Expression Retrieval
subf md5-subf {ul,un,ulevel}
$x=\frac{-b \pm \sqrt{b^{2}-4 a c}}{2 a}$ B91DA5EC3DE8F0E3 {1.000,1.000,1.000}
$\frac{-b \pm \sqrt{b^{2}-4 a c}}{2 a}$ 82E1ED17885C94AD {0.941,0.857,1.000}
$-b \pm \sqrt{b^{2}-4 a c}$ EE207D7D9882D3AB {0.588,0.714,0.796}
$\sqrt{b^{2}-4 a c}$ 9099FE6F46BF69A5 {0.441,0.600,0.796}
$b^{2}-4 a c$ 965B906F4467622C {0.265,0.286,0.693}
$b^{2}$ F9DD6D7A16C781A2 {0.147,0.143,0.693}
-b A55F0819B2F990F6 {0.059,0.143,0.796}
Expression Information Description
Characteristic Subformula Inverted Index Structure
Mathematical Expression Retrieval and Matching Process
实验环境 配置
CPU型号 Intel(R)Core(TM) i7-7700, 3.6GHz
内存 8GB
操作系统 Microsoft Windows10
主要开发工具 Visual Studio2017, C#
模式 C/S
Development Environment
文档数量(篇) 数学表达式数量(个) 索引总大小(MB) 建立耗时
(ms)
1 024 1 908 0.47 0.501
10 240 115 797 24.92 15.411
20 480 251 018 55.13 32.603
31 741 391 955 82.96 79.957
Index File Size
索引中数学表达式数量(个) 索引文件大小(MB) 索引建立耗时
(ms)
1 000
10 000
100 000
138 539
0.13
3.78
48.21
74.99
0.070
1.641
26.540
33.909
Index File Size of Literature[6]
Retrieval Time of Different Methods
NDCG Value Comparison
[1] Lin X Y, Gao L C, Hu X, et al. A Mathematics Retrieval System for Formulae in Layout Presentations[C] // Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. 2014: 697-706.
[2] Mišutka J, Galamboš L. System Description: EgoMath2 as a Tool for Mathematical Searching on Wikipedia.org[C] //Proceedings of the 10th International Conference on Intelligent Computer Mathematics. 2011: 307-309.
[3] Sojka P, Líška M. Indexing and Searching Mathematics in Digital Libraries[C] // Proceedings of the 10th International Conference on Intelligent Computer Mathematics. 2011: 228-243.
[4] Hambasan R, Kohlhase M, Prodescu C C. MathWebSearch at NTCIR-11[C] //Proceedings of the 11th NTCIR Conference. 2014: 114-119.
[5] 周南, 田学东. LaTeX数学表达式解析与索引方法[J]. 计算机应用, 2016,36(3):833-836, 842.
[5] ( Zhou Nan, Tian Xuedong. Analyzing and Indexing Method on LaTeX Formulae[J]. Journal of Computer Applications, 2016,36(3):833-836, 842.)
[6] 周南. 基于层次结构特征的数学表达式检索模型[D]. 保定: 河北大学, 2016.
[6] ( Zhou Nan. A Retrieval Model of Mathematical Expressions Based on Hierarchical Structures of Formulae[D]. Baoding: Hebei University, 2016.)
[7] Hu X, Gao L C, Lin X Y, et al. WikiMirs: A Mathematical Information Retrieval System for Wikipedia[C] //Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. 2013: 11-20.
[8] Wang Y H, Gao L C, Wang S M, et al. WikiMirs 3.0: A Hybrid MIR System Based on the Context, Structure and Importance of Formulae in a Document[C] //Proceedings of the 15th ACM/IEEE-CS Joint Conference on Digital Libraries. 2015: 173-182.
[9] Stalnaker D, Zanibbi R. Math Expression Retrieval Using an Inverted Index over Symbol Pairs[C] //Proceedings of SPIE-IS&T Electronic Imaging. 2015,9402:940207.
[10] Xu Y X, Su W, Cheng M, et al. N-gram Index Structure Study for Semantic Based Mathematical Formula[C] // Proceedings of the 10th International Conference on Computational Intelligence and Security. 2014: 293-298.
[11] 王小龙. 基于本体的数学表达式检索技术研究[D]. 重庆: 重庆大学, 2014.
[11] ( Wang Xiaolong. Research on Ontology-Based Mathematical Expression Retrieval Technologies[D]. Chongqing: Chongqing University, 2014.)
[12] Yang S Q, Tian X D. A Maintenance Algorithm of FDS Based Mathematical Expression Index[C] // Proceedings of the 2014 International Conference on Machine Learning and Cybernetics. 2014: 888-892.
[13] 徐建民, 许彩云. 基于文本和公式的科技文档相似度计算[J]. 数据分析与知识发现, 2018,2(10):103-109.
[13] ( Xu Jianmin, Xu Caiyun. Computing Similarity of Sci-Tech Documents Based on Texts and Formulas[J]. Data Analysis and Knowledge Discovery, 2018,2(10):103-109.)
[14] 李夏梦, 潘广贞. 基于消息摘要算法第五版和IDEA的混合加密算法[J]. 科学技术与工程, 2017,17(9):233-238.
[14] ( Li Xiameng, Pan Guangzhen. Message-digest Algorithm 5-IDEA Based Hybrid Encryption Algorithm[J]. Science Technology and Engineering, 2017,17(9):233-238.)
[15] Torra V. Hesitant Fuzzy Sets[J]. International Journal of Intelligent Systems, 2010,25(6):529-539.
[16] Torra V, Narukawa Y. On Hesitant Fuzzy Sets and Decision[C] //Proceedings of the 2009 IEEE International Conference on Fuzzy Systems. 2009: 1378-1382.
[17] Xu Z S, Xia M M. Distance and Similarity Measures for Hesitant Fuzzy Sets[J]. Information Sciences, 2011,181(11):2128-2138.
[18] 张凯歌. 基于犹豫模糊集的数学检索结果排序研究[D]. 保定: 河北大学, 2017.
[18] ( Zhang Kaige. Research on the Ranking of Mathematical Retrieval Results Based on Hesitant Fuzzy Sets[D]. Baoding: Hebei University, 2017.)
[19] 景珂. 网络数学搜索中的数学查询语言与索引的研究[D]. 兰州: 兰州大学, 2009.
[19] ( Jing Ke. Research on Math Query Language and Index in Web-based Math Search[D]. Lanzhou: Lanzhou University, 2009.)
[20] 徐月霞. 面向语义的数学公式N-grams索引结构研究[D]. 兰州: 兰州大学, 2015.
[20] ( Xu Yuexia. N-gram Index Structure for Semantic Based Mathematical Formulas[D]. Lanzhou: Lanzhou University, 2015.)
[21] Jin X B, Geng G G, Xie G S, et al. Approximately Optimizing NDCG Using Pair-wise Loss[J]. Information Sciences, 2018,453:50-65.
doi: 10.1016/j.ins.2018.04.033
[1] Li Daoguo,Li Lianjie,Shen Enping. New Collaborative Filtering Recommendation Algorithm Based on User Rating Time[J]. 现代图书情报技术, 2016, 32(9): 65-69.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn