Please wait a minute...
Data Analysis and Knowledge Discovery
Current Issue | Archive | Adv Search |
A Retrieval Method Incorporating Syntactic Information for Text Corpora
Zhang Yongwei,Liu Ting,Liu Chang,Wu Bingxin,Yu Jingsong
(School of Chinese Language and Literature, University of Chinese Academy of Social Sciences, Beijing 102488, China) (Center for Corpus and Computational Linguistics Research, Institute of Linguistics, Chinese Academy of Social Sciences, Beijing 100732, China) (School of Software and Microelectronics, Peking University, Beijing 102600, China)
Download:
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] This study aims to explore an efficient method for retrieving syntactic information in large text corpora.

[Methods] Linearized indices are created for syntactic information in line with the features of syntactic information. They can directly provide information required for conditional matching during retrieval and improve retrieval efficiency.

[Results] An experiment is conducted, using People's Daily Corpus, which contains 28.51 million sentences, to test the speed of queries. The results show that the average time for 26 queries is 802.6 milliseconds, which meets the retrieval efficiency requirements of retrieval systems for large corpora.

[Limitations] More research is needed to examine proposed method with more queries.

[Conclusions]The method proposed by this study can help to quickly retrieve lexical, dependency syntactic and constituency syntactic information in large text corpora.


Key words Dependency Syntax      Constituency Syntax      Corpus      Index      Retrieval      
Published: 01 July 2022
ZTFLH:  TP393,G250  

Cite this article:

Zhang Yongwei, Liu Ting, Liu Chang, Wu Bingxin, Yu Jingsong. A Retrieval Method Incorporating Syntactic Information for Text Corpora . Data Analysis and Knowledge Discovery, 0, (): 1-.

URL:

https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2022-0093     OR     https://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y0/V/I/1

[1] Zhou Changshun, Ying Wenhao, Zhong Shan, Gong Shengrong. Multi-Round Iterative Retrieval Algorithm for Parsing Question-Answering Process[J]. 数据分析与知识发现, 2024, 8(3): 120-131.
[2] Li Tianyu, Liu Libo. Deep Cross-modal Hashing Based on Intra-modal Similarity and Semantic Preservation[J]. 数据分析与知识发现, 2023, 7(5): 105-115.
[3] Lyu Xueqiang, Du Yifan, Zhang Le, Pan Huiping, Tian Chi. GKTR Retrieval Model for Engineering Consulting Reports with Graph Convolution Topological and Keyword Features[J]. 数据分析与知识发现, 2023, 7(12): 155-163.
[4] Zhang Yanqiong, Zhu Zhaosong, Zhao Xiaochi. Constructing Multimodal Corpus of Chinese Vocabulary for Sign Language Linguistics[J]. 数据分析与知识发现, 2023, 7(10): 144-155.
[5] Wu Kaibiao, Lang Yuxiang, Dong Yu. Mining Policy Text Relevance with Syntactic Structure and Semantic Information[J]. 数据分析与知识发现, 2022, 6(5): 20-33.
[6] Ding Shengchun, You Weijing, Wang Xiaoying. Extracting Weapon Attributes Based on Word Completion[J]. 数据分析与知识发现, 2022, 6(2/3): 289-297.
[7] Zhang Yongwei,Liu Ting,Liu Chang,Wu Bingxin,Yu Jingsong. Text Retrieval Based on Syntactic Information[J]. 数据分析与知识发现, 2022, 6(11): 25-37.
[8] Fan Tao,Wang Hao,Wu Peng. Sentiment Analysis of Online Users' Negative Emotions Based on Graph Convolutional Network and Dependency Parsing[J]. 数据分析与知识发现, 2021, 5(9): 97-106.
[9] Huang Mingxuan,Jiang Caoqing,Lu Shoudong. Expanding Queries Based on Word Embedding and Expansion Terms[J]. 数据分析与知识发现, 2021, 5(6): 115-125.
[10] Meng Zhen,Wang Hao,Yu Wei,Deng Sanhong,Zhang Baolong. Vocal Music Classification Based on Multi-category Feature Fusion[J]. 数据分析与知识发现, 2021, 5(5): 59-70.
[11] Lu Linong,Zhu Zhongming,Zhang Wangqiang,Wang Xiaochun. Cross-database Knowledge Integration and Fingerprint of Institutional Repositories with Lingo3G Clustering Algorithm[J]. 数据分析与知识发现, 2021, 5(5): 127-132.
[12] Li Yueyan,Wang Hao,Deng Sanhong,Wang Wei. Research Trends of Information Retrieval——Case Study of SIGIR Conference Papers[J]. 数据分析与知识发现, 2021, 5(4): 13-24.
[13] Zhu Lu, Deng Fang, Liu Kun, He Tingting, Liu Yuanyuan. Cross-Modal Retrieval Based on Semantic Auto-Encoder and Hash Learning[J]. 数据分析与知识发现, 2021, 5(12): 110-122.
[14] Xu Yicong,Tian Xuedong,Li Xinfu,Yang Fang,Shi Qingxuan. Retrieving Mathematical Expressions Based on Hesitant Fuzzy Weight[J]. 数据分析与知识发现, 2020, 4(7): 118-126.
[15] Li Keyu,Wang Hao,Gong Lijuan,Tang Huihui. Measurement and Distribution of Index Quality in Research Topics from Academic Databases[J]. 数据分析与知识发现, 2020, 4(6): 91-108.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn