Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (5): 95-103    DOI: 10.11925/infotech.2096-3467.2021.0023
Generating AND-OR Logical Expressions for Semantic Features of Categorical Documents
Xu Zheng,Le Xiaoqiu
Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (1047 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      

[Objective] The paper represents category unit of the categorical document as an AND-OR logical expression with semantic features, which provides data for category semantic matching and retrieval. [Methods] We constructed the seq2seq generation model using UniLM based on the AND-OR logical semantic annotation of category unit descriptions. This model learns the speech features and explicit AND-OR logical text features, to improve the sorting strategy of Beam Search. The proposed method could generate AND-OR logical expression of semantic features within category unit. By integrating context-level semantics, we extended the external semantics of category unit. [Results] We examined our method with the manually annotated International Patent Classification data. The evaluation score of the experimental result was 87.2 points, which was 11.5 points higher than the benchmark model (BiLSTM-Attention). [Limitations] More research is needed to examine the model’s performance with other datasets. [Conclusions] The proposed semantic representation method could effectively generate AND-OR logical expressions for patent data, which integrates the internal semantic features of category unit and the semantic features at the contextual level.

Key wordsSemantic Representation      Semantic Parsing      AND-OR Logic      Categorical Document     
Received: 10 January 2021      Published: 27 May 2021
Xu Zheng,Le Xiaoqiu. Generating AND-OR Logical Expressions for Semantic Features of Categorical Documents. Data Analysis and Knowledge Discovery, 2021, 5(5): 95-103.

A Collection of Logical Expressions for the Semantic Features of an Entry Unit
类目 注释 AND-OR逻辑组合特征
E01C 21/02 现场熔化、煅烧或焙烧土壤 现场AND(熔化OR煅烧OR焙烧)AND土壤
E02B 7/16 固定堰;其上部结构或闸板 固定堰OR(固定堰AND(上部结构OR闸板))
Combined Feature of AND-OR Logically within Entry
Technology Route
The Training Mechanism of Seq2Seq Mask of UniLM Model[20]
Ef?Wc·concat(Etoken,Epos,Ees) (1)
Feature CombinationEf?Wc·concat(Etoken,Epos,Ees) (1)
The Semantic Construction Results of Hierarchical Relationships
参数 取值
Batch Size 8
Learning Rate 10-5
hidden_act GELU
隐藏层单元数 768
hidden_dropout_prob 0.1
文本截断长度 128
字向量维度 768
词性向量维度 768/2
显式语法逻辑特征向量维度 768/2
Beam Search 3
Experiment Parameters Configuration
模型 得分
BiLSTM+Attention 75.7
BERT-Seq2Seq 83.4
本文模型 87.2
Model Score Results
类目注释 BiLSTM+Attention BiLSTM+CNN BERT-Seq2Seq 本文模型
缘饰;装修条 缘饰 OR 装修条 缘饰 OR 装修条 缘 饰 OR 装 修 条 ( 缘 饰 OR 装 修 条 )
装纳公用管线用的 装纳 AND 公用 AND 管道 装纳 AND 公用 AND 管道 装 卸 AND 公 用 AND 管 道 装 纳 AND 公 用 AND 管 道
清除道碴;所用设备 ( 清除 OR 道碴 ) AND ( ( 清除 OR 测量 ) AND ( AND ( 清除 AND 道碴) OR ( ( 清除 AND 道碴 ) AND ) ) ( 清 除 AND 道碴) OR ( ( 清 除 AND 道 碴 ) AND 设 备 ) ) ( 清 除 AND 道碴) OR ( ( 清 除 AND 道 碴 ) AND 设 备 ) )
Instances of the Model Generation Result
