Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (9): 88-94     https://doi.org/10.11925/infotech.1003-3513.2016.09.11
  应用论文 本期目录 | 过刊浏览 | 高级检索 |
双向模式匹配在年鉴数据预处理平台中的应用
史礼婷1(),张骞2,钟永恒1,胡思思1,李贞贞1
1中国科学院武汉文献情报中心 武汉 430071
2中国航天科工集团第九总体设计部 武汉 430040
Using Bidirectional Pattern Matching Model to Pre-Process Yearbook Data
Shi Liting1(),Zhang Qian2,Zhong Yongheng1,Hu Sisi1,Li Zhenzhen1
1Wuhan Library, Chinese Academy of Sciences, Wuhan 430071, China
2The 9th Designing of China Aerospace Science Industry Corporation, Wuhan 430040, China
全文: PDF (2486 KB)   HTML ( 14
输出: BibTeX | EndNote (RIS)      
摘要 

目的】实现年鉴指标数据的结构化存储, 完成年鉴数据的更新录入。【应用背景】年鉴预处理平台是将年鉴数据统一整理、审核、上传的C/S工具平台, 采用VC++为主要编程语言, 为年鉴数据库建设提供数据基础。【方法】双向模式匹配处理是在WM模式算法基础上进行改进, 利用分词技术对录入指标进行信息元提取、采用存储过程实现模式集合的筛减、信息双向匹配保证匹配的准确高效。【结果】通过对实验数据录入的匹配结果进行分析, 发现双向模式匹配有较高指标匹配率和正确率。【结论】双向匹配算法能满足年鉴录入的需求, 提高了年鉴数据预处理工作的效率。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
史礼婷
张骞
钟永恒
胡思思
李贞贞
关键词 双向模式匹配年鉴数据WM算法    
Abstract

[Objective] We try to store the yearbook records as structured data, which will also be updated regularly. [Context] The yearbook data pre-process system is a C/S tool platform for collecting, auditing and uploading data. It was developed with VC++, and generated contents for the yearbook database. [Methods] We first modified the classic WM algorithm to build a new bidirectional pattern matching model. With the help of word segmentation technology, the new model could extract the metadata of original records. Then, we reduced the number of pattern sets with data storing procedure and bidirectional matched the records to ensure the effectiveness and efficiency of the system. [Results] The proposed algorithm achieved high level of matching rate and accuracy. [Conclusions] Bidirectional matching algorithm can meet the needs of the yearbook data entry, and improve the efficiency of the data preprocessing system.

Key wordsBidirectional pattern matching    The yearbook data    WM algorithm
收稿日期: 2016-03-09      出版日期: 2016-10-19
引用本文:   
史礼婷,张骞,钟永恒,胡思思,李贞贞. 双向模式匹配在年鉴数据预处理平台中的应用[J]. 现代图书情报技术, 2016, 32(9): 88-94.
Shi Liting,Zhang Qian,Zhong Yongheng,Hu Sisi,Li Zhenzhen. Using Bidirectional Pattern Matching Model to Pre-Process Yearbook Data. New Technology of Library and Information Service, 2016, 32(9): 88-94.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.09.11      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2016/V32/I9/88
[1] 宋莉莉. 年鉴信息化的思考与探索[J] .兰台世界, 2013(2): 11-12.
[1] (Song Lili.Consideration and Exploration on Informatics in Yearbook[J]. Lantai World, 2013(02): 11-12.)
[2] 樊胜. C/S与B/S的结构比较及Web数据库的访问方式[J]. 情报科学, 2001, 19(4): 443-445.
[2] (Fan Sheng.The Comparison Between C/S Structure and B/S Structure and the Ways to Access Web Database[J]. Information Science, 2001, 19(4): 443-445.)
[3] Alomari O, Othman Z.Bees Algorithm for Feature Selection in Network Anomaly Detection[J]. Journal of Applied Sciences Research, 2012(8): 1748-1756.
[4] 王春雨. 基于编辑距离的字符串模式匹配算法研究[D]. 秦皇岛: 燕山大学, 2015.
[4] (Wang Chunyu.The String Pattern Matching Algorithm Based on Edit Distance [D]. Qinhuangdao: Yanshan University, 2015.)
[5] Knuth D E, Morris Jr J H, Pratt V R. Fast Pattern Matching in String[J]. SIAM Journal on Computing, 1977, 6(2): 323-350.
[6] Boyer R S, Moore J S.A Fast String Searching Algorithm[J]. Communications of the ACM, 1977, 20(10): 762-772.
[7] Yao A C.The Complexity of Pattern Matching for a Random String[J]. SIAM Journal on Computing, 1979, 8(3): 368-387.
[8] Faro S, Lecroq T. The Exact Online String Matching Problem: A Review of the Most Recent Results [J]. ACM Computing Surveys (CSUR), 2013, 45(2): Article No.13.
[9] 侯淼. 并行串匹配算法研究[D]. 哈尔滨: 哈尔滨工业大学, 2014.
[9] (Hou Miao.Research of Parallel String Matching Algorithm [D]. Harbin: Harbin Institute of Technology, 2014.)
[10] Aho A V, Corasick M J.Efficient String Matching:An Aid to Bibliographic Search[J]. Communication of the ACM, 1975, 18(6): 333-340.
[11] Wu S, Manber U.A Fast Algorithm for Multi-Pattern Searching[R]. Report TR-94-17. Tucson, AZ: Department of Computer Science, University of Arizona, 1994.
[12] 王一霈, 石春, 戴上静, 等. 一种改进的针对中文编码的Wu-Manber多模式匹配算法[J]. 小型微型计算机系统, 2015, 36(4): 779-781.
[12] (Wang Yipei, Shi Chun, Dai Shangjing, et al.An Improved Wu-Manber Multi-pattern Matching Algorithm for Chinese Encoding[J]. Journal of Chinese Computer Systems, 2015, 36(4): 778-781.)
[13] 张华平. ICTCLAS2011接口文档[K]. 北京理工大学, 2011.
[13] (Zhang Huaping.ICTCLAS2011 API Document [K]. Beijing Institute of Technology, 2011.)
[14] 宋敏. 基于SOA图书馆数字资源整合平台关键技术的研究与实现[J]. 现代图书情报技术, 2009(9): 22-27.
[14] (Song Min.Research and Realization of Key Techniques of Library’s Digital Resource Integration Platform Based on SOA[J]. New Technology of Library and Information Services, 2009(9): 22-27.)
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn