Please wait a minute...
Advanced Search
现代图书情报技术  2014, Vol. 30 Issue (3): 57-64     https://doi.org/10.11925/infotech.1003-3513.2014.03.09
  知识组织与知识管理 本期目录 | 过刊浏览 | 高级检索 |
汉构:面向深层语言处理的语法工程
杨春雷1, Dan Flickinger2
1 上海外国语大学英语学院 上海 201600;
2 斯坦福大学语言与信息研究中心 斯坦福 94305
ManGO:Grammar Engineering for Deep Linguistic Processing
Yang Chunlei1, Dan Flickinger2
1 College of English Language and Literature, Shanghai International Studies University, Shanghai 201600, China;
2 The Center for the Study of Language and Information (CSLI), Stanford University, Stanford 94305, USA
全文: PDF (1618 KB)   HTML  
输出: BibTeX | EndNote (RIS)      
摘要 

[目的] 开发面向深层语言处理的汉语普通话在线语法(简称汉构)。[应用背景] 汉构是在DELPH-IN环境内,基于语法母体,在LKB平台上开发的可计算汉语语法。它的句法和语义分析的理论框架分别是中心语驱动的短语结构语法和最简递归语义。汉构为进一步开发资源型语法和商用奠定良好基础。[方法] 根据系统的语言学本体研究对语言知识进行形式化描写;汉构的计算实现经历语法定制、汉语MRS测试套件、词库建设、语法规则定义和MRS描写等环节。[结果] 汉构覆盖汉语基本词类和主要语言现象,完全覆盖MRS测试套件。[结论] 汉构是最早的中型可计算汉语语法之一,是形式语法理论和计算语言学领域间开展合作研究的桥梁和有效载体。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
杨春雷
Dan Flickinger
关键词 普通话在线语法(汉构)语法工程中心语驱动的短语结构语法自然语言处理    
Abstract

[Objective] This article contributes to the development of ManGO (Mandarin Grammar Online) for deep linguistic processing. [Context] On the platform of LKB (the Linguistic Knowledge Builder) and based on Grammar Matrix, ManGO is developed in the environment of DELPH-IN (Deep Linguistic Processing with HPSG Initiative). The frameworks of its syntactic and semantic analysis are HPSG (Head-driven Phrase Structure Grammar) and MRS (Minimal Recursion Semantics) respectively. ManGO lays a solid foundation for further resource grammar development and commercial application. [Methods] First, linguistic knowledge is formalized according to systematic Ontological studies. Then, the computational implementation of ManGO goes through grammar customization, creation of a Chinese MRS test suite, lexicon building, definition of grammar rules and MRS representation. [Results] ManGO covers nearly all the major Chinese word types and grammar phenomona, and fully covers the Chinese MRS test suite. [Conclusions] ManGO is one of the earliest medium-size computational grammars of Chinese. It serves as the bridge and effective carrier of the interdisciplinary studies across formal grammar theory and computational linguistics.

Key wordsMandarin Grammar Online (ManGO)    Grammar engineering    Head-driven Phrase Structure Grammar (HPSG)    Natural Language Processing (NLP)
收稿日期: 2013-11-22      出版日期: 2014-04-15
:  H087  
基金资助:

本文系教育部人文社会科学研究规划基金项目“面向深层语言处理的汉语短语结构语法”(项目编号:13YJC740118)和上海外国语大学规划基金项目“语言量化现象的多维度研究”(项目编号:2013XJGH023)的研究成果之一。

通讯作者: 杨春雷 E-mail:yangchunlei@shisu.edu.cn     E-mail: yangchunlei@shisu.edu.cn
作者简介: 作者贡献声明:杨春雷: 负责语言学本体研究,建立测试套件、词库,部分语法规则的形式化描写;论文的起草和最终版本修订;Dan Flickinger: 提出技术思路,负责语法定制和部分语法规则的形式化描写。
引用本文:   
杨春雷, Dan Flickinger. 汉构:面向深层语言处理的语法工程[J]. 现代图书情报技术, 2014, 30(3): 57-64.
Yang Chunlei, Dan Flickinger. ManGO:Grammar Engineering for Deep Linguistic Processing. New Technology of Library and Information Service, 2014, 30(3): 57-64.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2014.03.09      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2014/V30/I3/57

[1] Oepen S, Flickinger D, Tsujii J, et al. Collaborative Language Engineering: A Case Study in Efficient Grammar-based Processing [M]. Stanford: CSLI Publications, 2002.

[2] Bender E M. Grammar Engineering for Linguistic Hypothesis Testing [C]. In: Proceedings of the Texas Linguistics Society X Conference: Computational Linguistics for Less-Studied Languages. Stanford: CSLI Publications Online, 2008: 16-36.

[3] Bender E M, Drellishak S, Fokkens A, et al. Grammar Customization [J]. Research on Language & Computation, 2010, 8 (1):23-72.

[4] 陆俭明. 汉语言文字应用面面观 [J]. 语言文字应用, 2000(2): 4-8. (Lu Jianming. Aspects of Language Use in China [J]. Applied Linguistics, 2000(2): 4-8.)

[5] Pollard C J, Sag I A. Head-driven Phrase Structure Grammar [M]. Chicago: The University of Chicago Press, 1994.

[6] Sag I A, Wasow T, Bender E M. Syntactic Theory: A Formal Introduction [M]. Stanford: CSLI Publications, 2003.

[7] Boas H C, Sag I A. Sign-Based Construction Grammar [M]. Stanford: CSLI Publications, 2012.

[8] Zhang Y, Wang R, Chen Y. Joint Grammar and TreeBank Development for Mandarin Chinese with HPSG [C]. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC'2012), Istanbul, Turkey. 2012:1868-1873.

[9] 范子衿, 王惠临, 张均胜. 中心语驱动短语结构语法研究综述 [J].现代图书情报技术, 2013(4): 40-47. (Fan Zijin, Wang Huilin, Zhang Junsheng. Review of Head-driven Phrase Structure Grammar [J]. New Technology of Library and Information Service, 2013(4): 40-47.)

[10] Hutchins J. Latest Developments in Machine Translation Technology [C]. In: Proceedings of MT Summit IV, Kobe, Japan.1993:11-34.

[11] Kay M. Collected Papers of Martin Kay: A Half Century of Computational Linguistics [M]. Stanford: CSLI Publications, 2010.

[12] 冯志伟. 自然语言处理的学科定位 [J]. 解放军外国语学院学报, 2005, 28(3): 1-8. (Feng Zhiwei. Academic Position of Natural Language Processing [J]. Journal of PLA University of Foreign Languages, 2005,28(3):1-8.)

[13] 方立, 吴平. 中心语驱动短语结构语法评介 [J]. 语言教学与研究, 2003(5): 31-43. (Fang Li, Wu Ping. A Review of Head-driven Phrase Structure Grammar [J]. Language Teaching and Linguistic Studies, 2003(5): 31-43.)

[14] 陆俭明. 句法语义接口问题 [J]. 外国语, 2006(3): 30-35. (Lu Jianming. On Interface between Syntax and Semantics [J]. Journal of Foreign Languages, 2006(3):30-35.)

[15] Backofen R, Becker T, Calder J, et al. The EAGLES Formalisms Working Group-Final Report [R]. Saarbrücken: German Research Center for Artificial Intelligence (DFKI), 1996.

[16] Bender E M, Flickinger D, Oepen S. The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-Linguistically Consistent Broad-Coverage Precision Grammars [C]. In: Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, Taipei, Taiwan, China.2002: 8-14.

[17] Copestake A, Flickinger D, Pollard C, et al. Minimal Recursion Semantics: An Introduction [J]. Research on Language and Computation, 2005, 3(2-3):281-332.

[18] 曾少勤, 王惠临, 张寅生.汉语文本的最小递归语义表示研究——以名词性量化短语为例 [J].现代图书情报技术, 2012 (10): 35-41. (Zeng Shaoqin, Wang Huilin, Zhang Yinsheng. Mandarin Text Representation Based on Minimal Recursion Semantics——Illustrated by Quantitative Noun Phrases [J]. New Technology of Library and Information Service, 2012(10): 35-41.)

[19] Flickinger D, Yang J C. ManGO: Mandarin Grammar Online [C]. In: Proceedings of DELPH-IN Summit 2011, Seattle, Suquamish, USA.2011.

[20] 杨春雷. 兼语式的深层语言处理: 从语言学设计到计算实现 [J]. 外国语,2013,36(3): 50-59. (Yang Chunlei. Deep Linguistic Processing of Pivotal Construction: From Linguistic Design to Implementation [J]. Journal of Foreign Languages, 2013,36(3): 50-59.)

[21] Fokkens A, Avgustinova T, Zhang Y. CLIMB Grammars: Three Projects Using Metagrammar Engineering [C]. In: Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC'12), Instanbul, Turkey. 2012:1672-1679.

[1] 王一钒,李博,史话,苗威,姜斌. 古汉语实体关系联合抽取的标注方法*[J]. 数据分析与知识发现, 2021, 5(9): 63-74.
[2] 黄名选,卢守东,徐辉. 基于加权关联模式挖掘与规则后件扩展的跨语言信息检索 *[J]. 数据分析与知识发现, 2019, 3(9): 77-87.
[3] 胡佳慧,方安,赵琬清,杨晨柳,任慧玲. 面向知识发现的中文电子病历标注方法研究 *[J]. 数据分析与知识发现, 2019, 3(7): 123-132.
[4] 杨春雷. 面向语用消歧的量化约束条件系统: 从语言学设计到计算实现*[J]. 数据分析与知识发现, 2017, 1(11): 1-11.
[5] 杨春雷. 基于HPSG的汉语词库和语法规则系统构建*[J]. 现代图书情报技术, 2016, 32(7-8): 129-136.
[6] 刘天祎,步一,赵丹群,黄文彬. 自动引文摘要研究述评[J]. 现代图书情报技术, 2016, 32(5): 1-8.
[7] 彭浩, 徐健, 肖卓. 基于比较句的网络用户评论情感分析[J]. 现代图书情报技术, 2015, 31(12): 48-56.
[8] 邱均平, 方国平. 基于知识图谱的中外自然语言处理研究的对比分析[J]. 现代图书情报技术, 2014, 30(12): 51-61.
[9] 佘贵清, 张永安. 审判案例自动抽取与标注模型研究[J]. 现代图书情报技术, 2013, (6): 23-29.
[10] 王秀艳, 崔雷. 采用混合方法抽取生物医学实体间语义关系[J]. 现代图书情报技术, 2013, 29(3): 77-82.
[11] 张运良 梁健 朱礼军 乔晓东. 基于术语定义的科技知识组织系统自动丰富关键技术研究*[J]. 现代图书情报技术, 2010, 26(7/8): 66-71.
[12] 仲夏 张志平 王惠临. 词汇化树邻接语法研究述评及中文应用初探*[J]. 现代图书情报技术, 2010, 26(5): 35-42.
[13] 傅继彬,刘杰,贾可亮,毛金涛. 基于知网和术语相关度的本体关系抽取研究*[J]. 现代图书情报技术, 2008, 24(9): 36-40.
[14] 刘耀,穗志方,周扬,王振国. 中医药本体概念描述体系的自动构建研究*[J]. 现代图书情报技术, 2008, 24(5): 21-26.
[15] 化柏林 . 知识抽取中的停用词处理技术[J]. 现代图书情报技术, 2007, 2(8): 48-51.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn