|
|
Review of Chinese Word Segmentation Studies |
Tang Lin,Guo Chonghui( ),Chen Jingfeng |
Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China |
|
|
Abstract [Objective] This paper summarizes key issues, algorithms, and models from the field of Chinese word segmentation, aiming to provide theoretical basis and practical guidance for future research.[Coverage] We reviewed a total of 109 papers from CNKI, Wanfang Data Knowledge Service Platform, and DBLP Computer Science Bibliography.[Methods] First, we discussed the developments and critical issues facing Chinese word segmentation. Then, we explored algorithms and models for Chinese word segmentation. Finally, we identified popular research topics and trends.[Results] The main challenge facing researchers is creating a Multi-Criteria Learning Model for Chinese Word Segmentation with multiple annotation datasets. The most popular research topic is building Multi-task joint model to finish both Chinese word segmentation and other natural language processing tasks.[Limitations] More research is needed to review studies on unsupervised learning approaches for Chinese word segmentation.[Conclusions] The existing methods of Chinese word segmentation still face challenges in building joint models with multi-perspective, multi-task, and multi-criterion features.
|
Received: 23 September 2019
Published: 26 April 2020
|
|
Corresponding Authors:
Chonghui Guo
E-mail: dlutguo@dlut.edu.cn
|
[1] |
GB/T 13715-1992, 信息处理用现代汉语分词规范[S]. 北京: 中国标准出版社, 1993.
|
[1] |
( GB/T 13715-1992, Contemporary Chinese Language Word Segmentation Specification for Information Processing[S]. Beijing: Standards Press of China, 1993.)
|
[2] |
梁南元 . 计算机应用与软件[J]. 计算机应用与软件, 1987(3):44-50.
|
[2] |
( Liang Nanyuan . An Introduction to Automatic Distinguishing of Written Chinese Words[J]. Computer Applications and Software, 1987(3):44-50.)
|
[3] |
刘开瑛 . 语言文字应用[J]. 语言文字应用, 1997(1):103-108.
|
[3] |
( Liu Kaiying . Research on Automatic Word Segmentation Assessment Technology in Modern Chinese[J]. Applied Linguistics, 1997(1):103-108.)
|
[4] |
孙茂松 . 汉语自动分词研究的若干最新进展——清华大学相关工作简介[C]// 中国中文信息学会二十周年学术会议, 北京. 北京: 清华大学出版社, 2001: 44-50.
|
[4] |
( Sun Maosong. Some Recent Advances in the Study of Chinese Automatic Word Segmentation: A Brief Introduction to the Work of Tsinghua University[C]// Proceedings of the 20th Anniversary Academic Conference of Chinese Information Processing Society of China, Beijing. Beijing: Tsinghua University Press, 2001: 44-50.)
|
[5] |
黄昌宁, 赵海 . 中文分词十年回顾[J]. 中文信息学报, 2007,21(3):8-19.
|
[5] |
( Huang Changning, Zhao Hai . Chinese Word Segmentation: A Decade Review[J]. Journal of Chinese Information Processing, 2007,21(3):8-19.)
|
[6] |
何莘, 王琬芜 . 自然语言检索中的中文分词技术研究进展及应用[J]. 情报科学, 2008,26(5):787-791.
|
[6] |
( He Zi, Wang Wanwu . Research and Application of Chinese Word Segmentation Technology Based on Natural Language Information Retrieval[J]. Information Science, 2008,26(5):787-791.)
|
[7] |
奉国和, 郑伟 . 国内中文自动分词技术研究综述[J]. 图书情报工作, 2011,55(2):41-45.
|
[7] |
( Feng Guohe, Zheng Wei . Review of Chinese Automatic Word Segmentation[J]. Library and Information Service, 2011,55(2):41-45.)
|
[8] |
赵芳芳, 蒋志鹏, 关毅 . 中文分词和词性标注联合模型综述[J]. 智能计算机与应用, 2014,4(3):77-80.
|
[8] |
( Zhao Fangfang, Jiang Zhipeng, Guan Yi . The Review on the Joint Model of Chinese Word Segmentation and Part-of-speech Tagging[J]. Intelligent Computer and Applications, 2014,4(3):77-80.)
|
[9] |
梁喜涛, 顾磊 . 中文分词与词性标注研究[J]. 计算机技术与发展, 2015,25(2):175-180.
|
[9] |
( Liang Xitao, Gu Lei . Study on Word Segmentation and Part-of-speech Tagging[J]. Computer Technology and Development, 2015,25(2):175-180.)
|
[10] |
赵海, 蔡登, 黄昌宁 . 中文分词十年又回顾(2007-2017 [A]// 揭春雨, 刘美君. 实证及语料库语言学前沿[M]. 北京: 中国社会科学出版社, 2017.
|
[10] |
( Zhao Hai, Cai Deng, Huang Changning. Chinese Word Segmentation: Review (2007-2017[A]//Jie Chunyu, Liu Meijun. Frontiers of Empirical and Corpus Linguistics[M]. Beijing: China Social Sciences Press, 2017.)
|
[11] |
Emerson T . The Second International Chinese Word Segmentation Bakeoff [C]// Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea. New York, USA: ACL, 2005: 123-133.
|
[12] |
Zhang Q, Liu X, Fu J . Neural Networks Incorporating Dictionaries for Chinese Word Segmentation [C]// Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA. California, USA: AAAI, 2018.
|
[13] |
Cai D, Zhao H, Zhang Z , et al. Fast and Accurate Neural Word Segmentation for Chinese [C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada. USA: ACL, 2017: 608-615.
|
[14] |
Chen X, Qiu X, Zhu C , et al. Long Short-Term Memory Neural Networks for Chinese Word Segmentation [C]// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal. New York, USA: ACL, 2015: 1197-1206.
|
[15] |
Sun X, Wang H, Li W . Fast Online Training with Frequency-adaptive Learning Rates for Chinese Word Segmentation and New Word Detection [C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea. USA: ACL, 2012: 253-262.
|
[16] |
Zhao H, Huang C N, Li M , et al. A Unified Character-based Tagging Framework for Chinese Word Segmentation[J]. ACM Transactions on Asian Language Information Processing (TALIP), 2010, 9(2):Article No. 5.
|
[17] |
Zhao H, Kit C . Unsupervised Segmentation Helps Supervised Learning of Character Tagging for Word Segmentation and Named Entity Recognition [C]// Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing, Hyderabad, India. New York, USA: ACL, 2008: 106-111.
|
[18] |
Zhang Y, Clark S . Chinese Segmentation with a Word-based Perceptron Algorithm [C]// Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic. USA: ACL, 2007: 840-847.
|
[19] |
Sproat R, Emerson T . The First International Chinese Word Segmentation Bakeoff [C]// Proceedings of the 2nd SIGHAN Workshop on Chinese Language Processing, Sapporo, Japan. New York, USA: ACL, 2003: 133-143.
|
[20] |
王换换 . 基于中文分词技术的药品适应症相似性研究[D]. 淮南: 安徽理工大学, 2015.
|
[20] |
( Wang Huanhuan . Indication Similarity of Drugs Based on Chinese Word Segmentation Technology[D]. Huainan: Anhui University of Science & Technology, 2015.)
|
[21] |
赵浩新, 俞敬松, 林杰 . 基于笔画中文字向量模型设计与研究[J]. 中文信息学报, 2019,33(5):17-23.
|
[21] |
( Zhao Haoxin, Yu Jingsong, Lin Jie . Design and Research on Chinese Word Embedding Model Based on Strokes[J]. Journal of Chinese Information Processing, 2019,33(5):17-23.)
|
[22] |
张涛 . 中文文本自动校对系统设计与实现[D]. 成都: 西南交通大学, 2017.
|
[22] |
( Zhang Tao . Design and Implementation of Chinese Text Automatic Proofreading System[D]. Chengdu: Southwest Jiaotong University, 2017.)
|
[23] |
Richard S, Shih C, Gale W , et al. A Stochastic Finite-State Word-Segmentation Algorithm for Chinese [C]// Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico, USA. New York, USA: ACL, 1994: 66-73.
|
[24] |
Gong J, Chen X, Gui T , et al. Switch-LSTMs for Multi-Criteria Chinese Word Segmentation [C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA. California, USA: AAAI, 2019: 6457-6464.
|
[25] |
刘健, 张维明 . 一种快速的交集型歧义检测方法[J]. 计算机应用研究, 2008,25(11):3259-3261.
|
[25] |
( Liu Jian, Zhang Weiming . Fast Crossing Ambiguity Detection Method[J]. Application Research of Computers, 2008,25(11):3259-3261.)
|
[26] |
秦颖, 王小捷, 张素香 . 汉语分词中组合歧义字段的研究[J]. 中文信息学报, 2007,21(1):3-8.
|
[26] |
( Qin Ying, Wang Xiaojie, Zhang Suxiang . Research on Combinational Ambiguity in Chinese Word Segmentation[J]. Journal of Chinese Information Processing, 2007,21(1):3-8.)
|
[27] |
郑家恒, 张剑锋, 谭红叶 . 中文分词中歧义切分处理策略[J]. 山西大学学报:自然科学版, 2007,30(2):163-167.
|
[27] |
( Zheng Jiaheng, Zhang Jianfeng, Tan Hongye . Segmentation Strategies on Ambiguity String in Chinese Word Segmentation[J]. Journal of Shanxi University: Natural Science Edition, 2007,30(2):163-167.)
|
[28] |
Humphreys K, Gaizauskas R, Azzam S , et al. University of Sheffield: Description of the LaSIE-II System as Used for MUC-7 [C]// Proceedings of the 7th Message Understanding Conference, Virginia, USA. New York, USA: ACL, 1998.
|
[29] |
孙茂松, 左正平, 黄昌宁 . 汉语自动分词词典机制的实验研究[J]. 中文信息学报, 2000,14(1):1-6.
|
[29] |
( Sun Maosong, Zuo Zhengping, Huang Changning . An Experimental Study on Dictionary Mechanism for Chinese Word Segmentation[J]. Journal of Chinese Information Processing, 2000,14(1):1-6.)
|
[30] |
Sproat R, Shih C . A Statistical Method for Finding Word Boundaries in Chinese Text[J]. Computer Processing of Chinese and Oriental Languages, 1990,4(4):336-351.
|
[31] |
Huang C N, Zhao H. Which is Essential for Chinese Word Segmentation: Character Versus Word[C]// Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, Wuhan, China. Beijing, China: Tsinghua University Press, 2006: 1-12.
|
[32] |
Xue N . Chinese Word Segmentation as Character Tagging[J]. Computational Linguistics & Chinese Language Processing, 2003,8(1):29-47.
|
[33] |
Xue N, Converse S P . Combining Classifiers for Chinese Word Segmentation [C]// Proceedings of the 1st SIGHAN Workshop on Chinese Language Processing, Taipei, China. New York, USA: ACL, 2002.
|
[34] |
Low J K, Ng H T, Guo W . A Maximum Entropy Approach to Chinese Word Segmentation [C]// Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea. New York, USA: ACL, 2005.
|
[35] |
Berger A L, Pietra V J D, Pietra S A D, . A Maximum Entropy Approach to Natural Language Processing[J]. Computational Linguistics, 1996,22(1):39-71.
|
[36] |
Rabiner L R . A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition[J]. Proceedings of the IEEE, 1989,77(2):257-286.
|
[37] |
McCallum A, Freitag D, Pereira F C N . Maximum Entropy Markov Models for Information Extraction and Segmentation [C]// Proceedings of the 17th International Conference on Machine Learning, CA, USA. CA, USA: ICMS, 2000.
|
[38] |
Peng F, Feng F, McCallum A . Chinese Segmentation and New Word Detection Using Conditional Random Fields [C]// Proceedings of the 20th International Conference on Computational Linguistics, Geneva, Switzerland. New York, USA: ACL, 2004.
|
[39] |
Tseng H, Chang P, Andrew G , et al. A Conditional Random Field Word Segmenter for SIGHAN Bakeoff 2005 [C]// Proceedings of the 4th SIGHAN Workshop on Chinese Language Processing, Jeju Island, Korea. New York, USA: ACL, 2005.
|
[40] |
Lafferty J, McCallum A, Pereira F C N . Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data [C]// Proceedings of the 18th International Conference on Machine Learning, MA, USA. CA, USA: ICMS, 2001: 282-289.
|
[41] |
修驰 . 适应于不同领域的中文分词方法研究与实现[D]. 北京: 北京工业大学, 2013.
|
[41] |
( Xiu Chi . The Research and Implementation of Method for Domain Chinese Word Segmentation[D]. Beijing: Beijing University of Technology, 2013.)
|
[42] |
Lü X, Zhang L, Hu J . Statistical Substring Reduction in Linear Time [C]// Proceedings of the 2004 International Conference on Natural Language Processing, Hainan, China. 2004.
|
[43] |
Kitt C, Wilks Y . Unsupervised Learning of Word Boundary with Description Length Gain [C]// Proceedings of the 3rd SIGNLL Conference on Computational Natural Language Learning, Bergen, Norway. New York, USA: SIGNLL, 1999.
|
[44] |
Feng H, Chen K, Deng X , et al. Accessor Variety Criteria for Chinese Word Extraction[J]. Computational Linguistics, 2004,30(1):75-93.
|
[45] |
Huang J H, Powers D . Chinese Word Segmentation Based on Contextual Entropy [C]// Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation, Sentosa, Singapore. New York, USA: ACL, 2003: 152-158.
|
[46] |
Chang J S, Lin T . Unsupervised Word Segmentation Without Dictionary [C]// Proceedings of the 15th Annual Conference on Computational Linguistics and Speech Processing. 2003.
|
[47] |
Chen S, Xu Y, Chang H . A Simple and Effective Unsupervised Word Segmentation Approach [C]// Proceedings of the 25th AAAI Conference on Artificial Intelligence, San Francisco, USA. California, USA: AAAI, 2011.
|
[48] |
Magistry P, Sagot B . Unsupervized Word Segmentation: The Case for Mandarin Chinese [C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea. New York, USA: ACL, 2012: 383-387.
|
[49] |
Magistry P, Sagot B . Can MDL Improve Unsupervised Chinese Word Segmentation? [C]// Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing, Nagoya, Japan. New York, USA: ACL, 2013: 1-10.
|
[50] |
Chen M, Chang B, Pei W . A Joint Model for Unsupervised Chinese Word Segmentation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. New York, USA: ACL, 2014: 854-863.
|
[51] |
Goldwater S, Griffiths T L, Johnson M . A Bayesian Framework for Word Segmentation: Exploring the Effects of Context[J]. Cognition, 2009,112(1):21-54.
|
[52] |
Jiao F, Wang S, Lee C H , et al. Semi-supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling [C]// Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia. New York, USA: ACL, 2006: 209-216.
|
[53] |
Zhao H, Kit C . Integrating Unsupervised and Supervised Word Segmentation: The Role of Goodness Measures[J]. Information Sciences, 2011,181(1):163-183.
|
[54] |
Zeng X, Wong D F, Chao L S , et al. Co-regularizing Character-based and Word-based Models for Semi-supervised Chinese Word Segmentation [C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria. New York, USA: ACL, 2013: 171-176.
|
[55] |
Yang T, Jiang T J, Kuo C , et al. Unsupervised Overlapping Feature Selection for Conditional Random Fields Learning in Chinese Word Segmentation [C]// Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing. 2011.
|
[56] |
Collobert R, Weston J, Bottou L , et al. Natural Language Processing (Almost) from Scratch[J]. Journal of Machine Learning Research, 2011,12:2493-2537.
|
[57] |
LeCun Y, Bottou L, Bengio Y , et al. Gradient-based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998,86(11):2278-2324.
|
[58] |
Vincent P, Larochelle H, Lajoie I , et al. Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion[J]. Journal of Machine Learning Research, 2010,11:3371-3408.
|
[59] |
Chen X, Qiu X, Zhu C , et al. Gated Recursive Neural Network for Chinese Word Segmentation [C]// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China. New York, USA: ACL, 2015: 1744-1753.
|
[60] |
Cai D, Zhao H . Neural Word Segmentation Learning for Chinese [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. New York, USA: ACL, 2016.
|
[61] |
Graves A . Long Short-Term Memory[A]// Graves A. Supervised Sequence Labelling with Recurrent Neural Networks[M]. Berlin: Springer, 2012: 37-45.
|
[62] |
Schuster M, Paliwal K K . Bidirectional Recurrent Neural Networks[J]. IEEE Transactions on Signal Processing, 1997,45(11):2673-2681.
|
[63] |
Pei W, Ge T, Chang B . Max-margin Tensor Neural Network for Chinese Word Segmentation [C]// Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA. New York, USA: ACL, 2014: 293-303.
|
[64] |
张洪刚, 李焕 . 基于双向长短时记忆模型的中文分词方法[J]. 华南理工大学学报:自然科学版, 2017,45(3):61-67.
|
[64] |
( Zhang Honggang, Li Huan . Chinese Word Segmentation Method on the Basis of Bidirectional Long-Short Term Memory Model[J]. Journal of South China University of Technology: Natural Science Edition, 2017,45(3):61-67.)
|
[65] |
Ma J, Ganchev K, Weiss D . State-of-the-art Chinese Word Segmentation with BI-LSTMs [C]// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium. New York, USA: ACL, 2018: 4902-4908.
|
[66] |
Mikolov T, Chen K, Corrado G , et al. Efficient Estimation of Word Representations in Vector Space [C]// Proceedings of the 1st International Conference on Learning Representations, Arizona, USA. New York, USA: ACL, 2013.
|
[67] |
Pennington J, Socher R, Manning C . Glove: Global Vectors for Word Representation [C]// Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar. New York, USA: ACL, 2014: 1532-1543.
|
[68] |
Peters M E, Neumann M, Iyyer M , et al. Deep Contextualized Word Representations [C]// Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics, New Orleans, USA. New York, USA: ACL, 2018: 2227-2237.
|
[69] |
Vaswani A, Shazeer N, Parmar N , et al. Attention is All You Need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA. San Diego, CA: NIPS, 2017: 5998-6008.
|
[70] |
Yang Z, Dai Z, Yang Y , et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding[OL]. arXiv Preprint, arXiv: 1906. 08237.
|
[71] |
Wang J, Zhou J, Zhou J , et al. Multiple Character Embeddings for Chinese Word Segmentation [C]// Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy. New York, USA: ACL, 2019: 210-216.
|
[72] |
Xue N, Xia F, Chiou F D , et al. The Penn Chinese TreeBank: Phrase Structure Annotation of a Large Corpus[J]. Natural Language Engineering, 2005,11(2):207-238.
|
[73] |
Liu J, Wu F, Wu C , et al. Neural Chinese Word Segmentation with Dictionary[J]. Neurocomputing, 2019,338:46-54.
|
[74] |
Zhao H, Liu Q . The CIPS-SIGHAN CLP2010 Chinese Word Segmentation Backoff [C]// Proceedings of the 2010 CIPS-SIGHAN Joint Conference on Chinese Language Processing, Beijing, China. New York, USA: ACL, 2010.
|
[75] |
Zhang R, Kikui G, Sumita E . Subword-based Tagging by Conditional Random Fields for Chinese Word Segmentation [C]// Proceedings of the 2006 Human Language Technology Conference of the NAACL, New York, USA. New York, USA: ACL, 2006: 193-196.
|
[76] |
Ng H T, Low J K . Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based? [C]// Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain. New York, USA: ACL, 2004: 277-284.
|
[77] |
张梅山, 邓知龙, 车万翔 , 等. 统计与词典相结合的领域自适应中文分词[J]. 中文信息学报, 2012,26(2):8-12.
|
[77] |
( Zhang Meishan, Deng Zhilong, Che Wanxiang , et al. Combining Statistical Model and Dictionary for Domain Adaption of Chinese Word Segmentation[J]. Journal of Chinese Information Processing, 2012,26(2):8-12.)
|
[78] |
Huang Z, Xu W, Yu K . Bidirectional LSTM-CRF Models for Sequence Tagging[OL]. arXiv Preprint, arXiv: 1508. 01991.
|
[79] |
Ma X, Hovy E . End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany. USA: ACL, 2016: 1064-1074.
|
[80] |
Yao Y, Huang Z . Bi-directional LSTM Recurrent Neural Network for Chinese Word Segmentation [C]// Proceedings of the 23rd International Conference on Neural Information Processing, Kyoto, Japan. Illinois, USA: INNS, 2016: 345-353.
|
[81] |
冯国明, 张晓冬, 刘素辉 . 基于自主学习的专业领域文本DBLC分词模型[J]. 数据分析与知识发现, 2018,2(5):40-47.
|
[81] |
( Feng Guoming, Zhang Xiaodong, Liu Suhui . DBLC Model for Word Segmentation Based on Autonomous Learning[J]. Data Analysis and Knowledge Discovery, 2018,2(5):40-47.)
|
[82] |
张文静, 张惠蒙, 杨麟儿 , 等. 基于Lattice-LSTM的多粒度中文分词[J]. 中文信息学报, 2019,33(1):18-24.
|
[82] |
( Zhang Wenjing, Zhang Huimeng, Yang Liner , et al. Multi-grained Chinese Word Segmentation with Lattice-LSTM[J]. Journal of Chinese Information Processing, 2019,33(1):18-24.)
|
[83] |
Gong C, Li Z, Zhang M , et al. Multi-grained Chinese Word Segmentation [C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. New York, USA: ACL, 2017: 692-703.
|
[84] |
Jin G, Chen X . The Fourth International Chinese Language Processing BakeOff: Chinese Word Segmentation, Named Entity Recognition and Chinese POS Tagging [C]// Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing, Hyderabad, India. New York, USA: ACL, 2008: 69-81.
|
[85] |
Huang W, Cheng X, Chen K , et al. Toward Fast and Accurate Neural Chinese Word Segmentation with Multi-Criteria Learning[OL]. arXiv Preprint, arXiv: 1903. 04190.
|
[86] |
Zeman D, Popel M, Straka M , et al. CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies [C]// Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, Vancouver, Canada. New York, USA: ACL, 2017: 1-19.
|
[87] |
Qiu X, Pei H, Yan H , et al. Multi-Criteria Chinese Word Segmentation with Transformer[OL]. arXiv Preprint, arXiv: 1906. 12035.
|
[88] |
He H, Wu L, Yan H , et al. Effective Neural Solution for Multi-Criteria Word Segmentation[A]// Satapathy S C, Bhateja V, Das S. Smart Intelligent Computing and Applications[M]. Springer, 2019: 133-142.
|
[89] |
黄昌宁, 李玉梅, 朱晓丹 . 中文文本标注规范(5.0版)[Z]. 微软亚洲研究院, 2006.
|
[89] |
( Huang Changning, Li Yumei, Zhu Xiaodan . Tokenization Guidelines of Chinese Text (V5. 0)[Z]. Microsoft Research Asia, 2006.)
|
[90] |
Yu S . Specification for Corpus Processing at Peking University: Word Segmentation, POS Tagging and Phonetic Notation[J]. Chinese Language and Computing, 2003,13:121-158.
|
[91] |
Chen X, Shi Z, Qiu X , et al. Adversarial Multi-Criteria Learning for Chinese Word Segmentation [C]// Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark. New York, USA: ACL, 2017: 1193-1203.
|
[92] |
Kipf T N, Welling M . Semi-supervised Classification with Graph Convolutional Networks [C]// Proceedings of the 5th International Conference on Learning Representations, Toulon, France. New York, USA: ACL, 2017.
|
[93] |
Collobert R, Weston J . A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning [C]// Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland. New York, USA: ACM, 2008: 160-167.
|
[94] |
Zhang Y, Clark S . A Fast Decoder for Joint Word Segmentation and POS-Tagging Using a Single Discriminative Model [C]// Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, Massachusetts, USA. New York, USA: ACL, 2010: 843-852.
|
[95] |
Zeng X, Wong D F, Chao L S , et al. Graph-based Semi-supervised Model for Joint Chinese Word Segmentation and Part-of-speech Tagging [C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria. New York, USA: ACL, 2013: 770-779.
|
[96] |
Qiu X, Zhao J, Huang X . Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning [C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, USA. New York, USA: ACL, 2013: 658-668.
|
[97] |
Zheng X, Chen H, Xu T . Deep Learning for Chinese Word Segmentation and POS Tagging [C]// Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, USA. New York, USA: ACL, 2013: 647-657.
|
[98] |
Wang H J, Si N W, Chen C . An Effective Joint Model for Chinese Word Segmentation and POS Tagging [C]// Proceedings of the 2016 International Conference on Intelligent Information Processing, Wuhan, China. New York, USA: ACM, 2016.
|
[99] |
Chen X, Qiu X, Huang X . A Long Dependency Aware Deep Architecture for Joint Chinese Word Segmentation and POS Tagging[OL]. arXiv Preprint, arXiv: 1611. 05384.
|
[100] |
Chen X, Qiu X, Huang X . A Feature-enriched Neural Model for Joint Chinese Word Segmentation and Part-of-speech Tagging [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia. California, USA: IJCAI, 2017: 3960-3966.
|
[101] |
Hatori J, Matsuzaki T, Miyao Y , et al. Incremental Joint Approach to Word Segmentation, POS Tagging , and Dependency Parsing in Chinese [C]// Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea. New York, USA: ACL, 2012: 1045-1053.
|
[102] |
Wang Z, Zong C, Xue N . A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing [C]// Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria. New York, USA: ACL, 2013: 623-627.
|
[103] |
Guo Z, Zhang Y, Su C , et al. Character-level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese[J]. IEICE Transactions on Information and Systems, 2016,99(1):257-264.
|
[104] |
Shen M, Li W, Choe H J , et al. Consistent Word Segmentation, Part-of-speech Tagging and Dependency Labelling Annotation for Chinese Language [C]// Proceedings of the 26th International Conference on Computational Linguistics, Osaka, Japan. New York, USA: COLING, 2016: 298-308.
|
[105] |
Yan H, Qiu X, Huang X . A Unified Model for Joint Chinese Word Segmentation and Dependency Parsing[OL]. arXiv Preprint, arXiv: 1904. 04697.
|
[106] |
Li X, Zong C, Su K . A Unified Model for Solving the OOV Problem of Chinese Word Segmentation[J]. ACM Transactions on Asian and Low-Resource Language Information Processing, 2015,14(3):12-29.
|
[107] |
Zhang M, Fu G, Yu N . Segmenting Chinese Microtext: Joint Informal-Word Detection and Segmentation with Neural Networks [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia. California, USA: IJCAI, 2017: 4228-4234.
|
[108] |
Shi X, Huang H, Jian P , et al. Neural Chinese Word Segmentation as Sequence to Sequence Translation [C]// Proceedings of the Chinese National Conference on Social Media Processing, Beijing, China. Berlin, Germany: Springer, 2017: 91-103.
|
[109] |
Wu F, Liu J, Wu C , et al. Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation [C]// Proceedings of the 2019 World Wide Web Conference, CA, USA. New York, USA: ACM, 2019: 3342-3348.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|