|
|
A Retrieval Method Incorporating Syntactic Information for Text Corpora
|
Zhang Yongwei,Liu Ting,Liu Chang,Wu Bingxin,Yu Jingsong
|
(School of Chinese Language and Literature, University of Chinese Academy of Social Sciences, Beijing 102488, China)
(Center for Corpus and Computational Linguistics Research, Institute of Linguistics, Chinese Academy of Social Sciences, Beijing 100732, China)
(School of Software and Microelectronics, Peking University, Beijing 102600, China)
|
|
|
Abstract
[Objective] This study aims to explore an efficient method for retrieving syntactic information in large text corpora.
[Methods] Linearized indices are created for syntactic information in line with the features of syntactic information. They can directly provide information required for conditional matching during retrieval and improve retrieval efficiency.
[Results] An experiment is conducted, using People's Daily Corpus, which contains 28.51 million sentences, to test the speed of queries. The results show that the average time for 26 queries is 802.6 milliseconds, which meets the retrieval efficiency requirements of retrieval systems for large corpora.
[Limitations] More research is needed to examine proposed method with more queries.
[Conclusions]The method proposed by this study can help to quickly retrieve lexical, dependency syntactic and constituency syntactic information in large text corpora.
|
Published: 01 July 2022
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|