Using Bidirectional Pattern Matching Model to Pre-Process Yearbook Data
Shi Liting1(),Zhang Qian2,Zhong Yongheng1,Hu Sisi1,Li Zhenzhen1
1Wuhan Library, Chinese Academy of Sciences, Wuhan 430071, China 2The 9th Designing of China Aerospace Science Industry Corporation, Wuhan 430040, China
[Objective] We try to store the yearbook records as structured data, which will also be updated regularly. [Context] The yearbook data pre-process system is a C/S tool platform for collecting, auditing and uploading data. It was developed with VC++, and generated contents for the yearbook database. [Methods] We first modified the classic WM algorithm to build a new bidirectional pattern matching model. With the help of word segmentation technology, the new model could extract the metadata of original records. Then, we reduced the number of pattern sets with data storing procedure and bidirectional matched the records to ensure the effectiveness and efficiency of the system. [Results] The proposed algorithm achieved high level of matching rate and accuracy. [Conclusions] Bidirectional matching algorithm can meet the needs of the yearbook data entry, and improve the efficiency of the data preprocessing system.
史礼婷,张骞,钟永恒,胡思思,李贞贞. 双向模式匹配在年鉴数据预处理平台中的应用[J]. 现代图书情报技术, 2016, 32(9): 88-94.
Shi Liting,Zhang Qian,Zhong Yongheng,Hu Sisi,Li Zhenzhen. Using Bidirectional Pattern Matching Model to Pre-Process Yearbook Data. New Technology of Library and Information Service, 2016, 32(9): 88-94.
(Fan Sheng.The Comparison Between C/S Structure and B/S Structure and the Ways to Access Web Database[J]. Information Science, 2001, 19(4): 443-445.)
[3]
Alomari O, Othman Z.Bees Algorithm for Feature Selection in Network Anomaly Detection[J]. Journal of Applied Sciences Research, 2012(8): 1748-1756.
[4]
王春雨. 基于编辑距离的字符串模式匹配算法研究[D]. 秦皇岛: 燕山大学, 2015.
[4]
(Wang Chunyu.The String Pattern Matching Algorithm Based on Edit Distance [D]. Qinhuangdao: Yanshan University, 2015.)
[5]
Knuth D E, Morris Jr J H, Pratt V R. Fast Pattern Matching in String[J]. SIAM Journal on Computing, 1977, 6(2): 323-350.
[6]
Boyer R S, Moore J S.A Fast String Searching Algorithm[J]. Communications of the ACM, 1977, 20(10): 762-772.
[7]
Yao A C.The Complexity of Pattern Matching for a Random String[J]. SIAM Journal on Computing, 1979, 8(3): 368-387.
[8]
Faro S, Lecroq T. The Exact Online String Matching Problem: A Review of the Most Recent Results [J]. ACM Computing Surveys (CSUR), 2013, 45(2): Article No.13.
[9]
侯淼. 并行串匹配算法研究[D]. 哈尔滨: 哈尔滨工业大学, 2014.
[9]
(Hou Miao.Research of Parallel String Matching Algorithm [D]. Harbin: Harbin Institute of Technology, 2014.)
[10]
Aho A V, Corasick M J.Efficient String Matching:An Aid to Bibliographic Search[J]. Communication of the ACM, 1975, 18(6): 333-340.
[11]
Wu S, Manber U.A Fast Algorithm for Multi-Pattern Searching[R]. Report TR-94-17. Tucson, AZ: Department of Computer Science, University of Arizona, 1994.
(Wang Yipei, Shi Chun, Dai Shangjing, et al.An Improved Wu-Manber Multi-pattern Matching Algorithm for Chinese Encoding[J]. Journal of Chinese Computer Systems, 2015, 36(4): 778-781.)
[13]
张华平. ICTCLAS2011接口文档[K]. 北京理工大学, 2011.
[13]
(Zhang Huaping.ICTCLAS2011 API Document [K]. Beijing Institute of Technology, 2011.)
(Song Min.Research and Realization of Key Techniques of Library’s Digital Resource Integration Platform Based on SOA[J]. New Technology of Library and Information Services, 2009(9): 22-27.)