Please wait a minute...
Data Analysis and Knowledge Discovery  2021, Vol. 5 Issue (5): 95-103    DOI: 10.11925/infotech.2096-3467.2021.0023
Current Issue | Archive | Adv Search |
Generating AND-OR Logical Expressions for Semantic Features of Categorical Documents
Xu Zheng,Le Xiaoqiu
Department of Library, Information and Archives Management, School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China
Download: PDF (1047 KB)   HTML ( 3
Export: BibTeX | EndNote (RIS)      
Abstract  

[Objective] The paper represents category unit of the categorical document as an AND-OR logical expression with semantic features, which provides data for category semantic matching and retrieval. [Methods] We constructed the seq2seq generation model using UniLM based on the AND-OR logical semantic annotation of category unit descriptions. This model learns the speech features and explicit AND-OR logical text features, to improve the sorting strategy of Beam Search. The proposed method could generate AND-OR logical expression of semantic features within category unit. By integrating context-level semantics, we extended the external semantics of category unit. [Results] We examined our method with the manually annotated International Patent Classification data. The evaluation score of the experimental result was 87.2 points, which was 11.5 points higher than the benchmark model (BiLSTM-Attention). [Limitations] More research is needed to examine the model’s performance with other datasets. [Conclusions] The proposed semantic representation method could effectively generate AND-OR logical expressions for patent data, which integrates the internal semantic features of category unit and the semantic features at the contextual level.

Key wordsSemantic Representation      Semantic Parsing      AND-OR Logic      Categorical Document     
Received: 10 January 2021      Published: 27 May 2021
ZTFLH:  TP391  

Cite this article:

Xu Zheng,Le Xiaoqiu. Generating AND-OR Logical Expressions for Semantic Features of Categorical Documents. Data Analysis and Knowledge Discovery, 2021, 5(5): 95-103.

URL:

http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/10.11925/infotech.2096-3467.2021.0023     OR     http://manu44.magtech.com.cn/Jwk_infotech_wk3/EN/Y2021/V5/I5/95

A Collection of Logical Expressions for the Semantic Features of an Entry Unit
类目 注释 AND-OR逻辑组合特征
E01C 21/02 现场熔化、煅烧或焙烧土壤 现场AND(熔化OR煅烧OR焙烧)AND土壤
E02B 7/16 固定堰;其上部结构或闸板 固定堰OR(固定堰AND(上部结构OR闸板))
Combined Feature of AND-OR Logically within Entry
Technology Route
20]
">
The Training Mechanism of Seq2Seq Mask of UniLM Model[20]
Ef?Wc·concat(Etoken,Epos,Ees) (1)
">
Feature CombinationEf?Wc·concat(Etoken,Epos,Ees) (1)
The Semantic Construction Results of Hierarchical Relationships
参数 取值
Batch Size 8
Learning Rate 10-5
hidden_act GELU
隐藏层单元数 768
hidden_dropout_prob 0.1
文本截断长度 128
字向量维度 768
词性向量维度 768/2
显式语法逻辑特征向量维度 768/2
Beam Search 3
Experiment Parameters Configuration
模型 得分
BiLSTM+Attention 75.7
BiLSTM+CNN 76.1
BERT-Seq2Seq 83.4
本文模型 87.2
Model Score Results
类目注释 BiLSTM+Attention BiLSTM+CNN BERT-Seq2Seq 本文模型
缘饰;装修条 缘饰 OR 装修条 缘饰 OR 装修条 缘 饰 OR 装 修 条 ( 缘 饰 OR 装 修 条 )
装纳公用管线用的 装纳 AND 公用 AND 管道 装纳 AND 公用 AND 管道 装 卸 AND 公 用 AND 管 道 装 纳 AND 公 用 AND 管 道
清除道碴;所用设备 ( 清除 OR 道碴 ) AND ( ( 清除 OR 测量 ) AND ( AND ( 清除 AND 道碴) OR ( ( 清除 AND 道碴 ) AND ) ) ( 清 除 AND 道碴) OR ( ( 清 除 AND 道 碴 ) AND 设 备 ) ) ( 清 除 AND 道碴) OR ( ( 清 除 AND 道 碴 ) AND 设 备 ) )
Instances of the Model Generation Result
[1] 王丽杰. 汉语语义依存分析研究[D]. 哈尔滨: 哈尔滨工业大学, 2010.
[1] ( Wang Lijie. Research on Chinese Semantic Dependency Analysis[D]. Harbin: Harbin Institute of Technology, 2010.)
[2] 乔秀明. 基于词粒度知识迁移的依存句法分析研究[D]. 哈尔滨: 哈尔滨工业大学, 2020.
[2] ( Qiao Xiuming. Research on Transfer of Dependency Parsing Based on Lexical-level Knowledge[D]. Harbin: Harbin Institute of Technology, 2020.)
[3] Robertson S. Understanding Inverse Document Frequency: On Theoretical Arguments for IDF[J]. Journal of Documentation, 2004,6(5):503-520.
[4] 高洁云, 赵逢禹, 刘亚. 基于语义增强的改进混合特征选择的文本分类[J]. 计算机技术与发展, 2021,31(1):24-29.
[4] ( Gao Jieyun, Zhao Fengyu, Liu Ya. Text Classification of Modified Hybrid Feature Selection Based on Semantic Enhancement[J]. Computer Technology and Development, 2021,31(1):24-29.)
[5] 张爱民, 贾君枝, 郝倩倩. 中图法与DDC类目自动映射研究[J]. 现代图书情报技术, 2014(7):17-23.
[5] ( Zhang Aimin, Jia Junzhi, Hao Qianqian. The Study on Automatic Mapping of Category Between Chinese Library Classification and DDC[J]. New Technology of Library and Information Service, 2014(7):17-23.)
[6] 程锦祥, 张钟月, 曹淼, 等. 渔业专利文献分类类目设置与机器标引策略研究[J]. 农业图书情报学报, 2020,32(7):63-72.
[6] ( Cheng Jinxiang, Zhang Zhongyue, Cao Miao, et al. Taxonomy Construction and Machine Indexing Strategies of Fishery Patent Literature[J]. Journal of Library and Information Science in Agricultural, 2020,32(7):63-72.)
[7] 袁满, 欧阳元新, 熊璋, 等. 一种基于频繁词集的短文本特征扩展方法[J]. 东南大学学报(自然科学版), 2014,44(2):256-260.
[7] ( Yuan Man, Ouyang Yuanxin, Xiong Zhang, et al. Short Text Feature Extension Method Based on Frequent Term Sets[J]. Journal of Southeast University (Natural Science Edition), 2014,44(2):256-260.)
[8] 江大鹏. 基于词向量的短文本分类方法研究[D]. 杭州: 浙江大学, 2015.
[8] ( Jiang Dapeng. Research on Short Text Classification Based on Word Distributed Representation[D]. Hangzhou: Zhejiang University, 2015.)
[9] 方东昊. 基于LDA的微博短文本分类技术的研究与实现[D]. 沈阳: 东北大学, 2011.
[9] ( Fang Donghao. Study and Implementation of Microblog’s Short Text Classification Based on LDA[D]. Shenyang: Northeastern University, 2011.)
[10] 田创, 赵亚娟. 一种基于相似度的专利与产业类目映射模型——以《国际专利分类》与《国民经济行业分类》为例[J]. 图书情报工作, 2016,60(20):123-131.
[10] ( Tian Chuang, Zhao Yajuan. A Similarity-based Model for Mapping Between Patent and Industrial Classifications——Mapping Between the International Patent Classification and the Industrial Classification for National Economic Activities[J]. Library and Information Service, 2016,60(20):123-131.)
[11] 马晓萌, 徐峰, 刘清民, 等. 基于Doc2vec的专利与行业类目映射研究[J]. 情报探索, 2020 ( 6):67-74.
[11] ( Ma Xiaomeng, Xu Feng, Liu Qingmin, et al. Doc2vec-based Study on Mapping Between Patented and Industrial Categories[J]. Information Research, 2020(6):67-74.)
[12] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment Classification Using Machine Learning Techniques[C]// Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing. 2002: 79-86.
[13] Sundararaman D, Subramanian V, Wang G Y, et al. Syntax-Infused Transformer and BERT Models for Machine Translation and Natural Language Understanding[OL]. arXiv Preprint, arXiv: 1911. 06156.
[14] 尚海, 罗森林, 韩磊, 等. 基于句义成分的短文本表示方法研究[J]. 信息网络安全, 2016(5):64-70.
[14] ( Shang Hai, Luo Senlin, Han Lei, et al. Research on Short Text Representation Based on Sentential Semantic Components[J]. Netinfo Security, 2016(5):64-70.)
[15] Mnih V, Heess N, Graves A, et al. Recurrent Models of Visual Attention[OL]. arXiv Preprint, arXiv: 1406. 6247.
[16] 岳永政. 基于特征表示的中文极短文本分类方法研究[D]. 合肥: 合肥工业大学, 2020.
[16] ( Yue Yongzheng. Research on Classification Method on Chinese Short Texts with Few Words Based on Feature Representation[D]. Hefei: Hefei University of Technology, 2020.)
[17] 张虹科, 付振新, 任前平, 等. 基于融合条目词嵌入和注意力机制的自动ICD编码[J]. 北京大学学报(自然科学版), 2020,56(1):1-8.
[17] ( Zhang Hongke, Fu Zhenxin, Ren Qianping, et al. Automated ICD Coding Based on Word Embedding with Entry Embedding and Attention Mechanism[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2020,56(1):1-8.)
[18] Dong L, Lapata M. Language to Logical Form with Neural Attention[OL]. arXiv Preprint, arXiv: 1601. 01280.
[19] 张强. 基于机器翻译的中文语义解析[D]. 南京: 东南大学, 2015.
[19] ( Zhang Qiang. Chinese Semantic Parsing Based on Machine Translation[D]. Nanjing: Southeast University, 2015.)
[20] Dong L, Yang N, Wang W H, et al. Unified Language Model Pre-training for Natural Language Understanding and Generation[OL]. arXiv Preprint, arXiv: 1905. 03197.
[21] Sundararaman D, Subramanian V, Wang G Y, et al. Carin Syntax-Infused Transformer and BERT Models for Machine Translation and Natural Language Understanding[OL]. arXiv Preprint, arXiv: 1911. 06156.
[22] Papineni K, Roukos S, Ward T, et al. BLEU: A Method for Automatic Evaluation of Machine Translation[C]// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2002: 311-318.
[1] Zhang Jinzhu,Zhu Lipeng,Liu Jingjie. Unsupervised Cross-Language Model for Patent Recommendation Based on Representation[J]. 数据分析与知识发现, 2020, 4(10): 93-103.
[2] Zhu Fu,Yuefen Wang,Xuhui Ding. Semantic Representation of Design Process Knowledge Reuse[J]. 数据分析与知识发现, 2019, 3(6): 21-29.
[3] Jia Junzhi,Dong Gang. The Study on Integration of CFN and VerbNet,WordNet[J]. 现代图书情报技术, 2008, 24(6): 6-10.
  Copyright © 2016 Data Analysis and Knowledge Discovery   Tel/Fax:(010)82626611-6626,82624938   E-mail:jishu@mail.las.ac.cn