|
|
Generating HSK Writing Essays with LDA Model |
Xu Yanhua1, Miao Yujie2, Miao Lin2, Lv Xueqiang2() |
1School of Chinese Language and Literature, Ludong University, Yantai 264025, China 2Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China |
|
|
Abstract [Objective] This paper tries to automatically generate writing samples for the Chinese Proficiency Test (HSK), aiming to help the Chinese teachers and learners prepare for the test. [Methods] First, we used the “HSK Dynamic Corpus” as the basic corpus, and trained it with the LDA model. Then, we adopted the cross-entropy strategy to select sentences containing required keywords. Finally, we manually scored the generated texts with the evaluating criteria. [Results] The generated essays contained all needed keywords and were relevant to the topics of the writing tasks. [Limitations] Some training corpus were modified HSK essays, written by non-Chinese speaker. [Conclusions] The proposed method could generate passages of good quality with the required keywords effectively.
|
Received: 26 February 2018
Published: 25 October 2018
|
|
[1] |
Reiter E, Dale R.Building Natural Language Generation Systems[M]. Cambridge University Press, 2000.
|
[2] |
李春红. 基于汉语国际推广战略的新汉语水平考试效度研究——以新HSK五级写作测试为个案[J]. 大学教育, 2013(12): 111-113.
|
[2] |
(Li Chunhong.Research on Validity of the New Hanyu Shuiping Kaoshi Based on Chinese International Promotion Strategy——Take the New HSK Level 5 Writing Test as a Case[J].University Education, 2013(12): 111-113.)
|
[3] |
Klein S.Control of Style with a Generative Grammar[J]. Language, 1965, 41(4): 619-631.
doi: 10.2307/411529
|
[4] |
Manyika J, Chui M, Brown B, et al. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Mckinsey.com[R/OL]. [2012-12-13]. .
|
[5] |
曹存根, 岳小莉, 眭跃飞. PNAI: 一种新型的叙事与动画智能实验平台[J]. 信息技术快报, 2006, 4(5): 1-4.
|
[5] |
(Cao Cungen, Yue Xiaoli, Sui Yuefei.PNAI: A New Narrative and Animation Intelligent Experiment Platform[J]. Information Technology Letter, 2006, 4(5): 1-4.)
|
[6] |
He J, Zhou M, Jiang L.Generating Chinese Classical Poems with Statistical Machine Translation Models[C]// Proceedings of the 26th AAAI Conference on Artificial Intelligence. AAAI Press, 2012: 1650-1656.
|
[7] |
Zhang J, Yao J G, Wan X.Towards Constructing Sports News from Live Text Commentary[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016:1361-1371.
|
[8] |
写稿机器人“小南”上岗[EB/OL]. [2017-06-01]. .
|
[8] |
(Writing Robot “Xiaonan” [EB/OL]. [2017-06-01].
|
[9] |
Reviewer-Horacek H.Review of “Building Natural Language Generation Systems” by Ehud Reiter and Robert Dale. Cambridge University Press 2000[J].Computational Linguistics, 1996, 27(2): 298-300.
|
[10] |
贾佩山. 自然语言生成技术及其应用实例[J]. 电脑与信息技术, 1997, 5(2): 7-9.
|
[10] |
(Jia Peishan.Natural Language Generation Technology and Its Application Examples[J]. Computer and Information Technology, 1997, 5(2): 7-9.)
|
[11] |
王纤. 自然语言生成系统的实现技术分析[J]. 微型电脑应用, 1997(4): 51-54.
|
[11] |
(Wang Xian.On the Implementation Techniques for Natural Language Generation Systems[J]. Microcomputer Applications, 1997(4): 51-54.)
|
[12] |
张建华, 陈家骏. 自然语言生成综述[J]. 计算机应用研究, 2006, 23(8): 1-3.
|
[12] |
(Zhang Jianhua, Chen Jiajun.Summarization of Natural Language Generation[J]. Research on Computer Applications, 2006, 23(8): 1-3.)
|
[13] |
詹卫东. 自然语言的自动分析与生成简介[J]. 术语标准化与信息技术, 2010(4): 33-42.
|
[13] |
(Zhan Weidong.A Brief Introduction to Natural Language Understanding and Generation[J]. Terminology Standardization & Information Technology, 2010(4): 33-42.)
|
[14] |
汪卫明, 陈世鸿, 王世同,等. 基于语义模板的医学问答自动生成[J]. 武汉大学学报:理学版, 2009, 55(2): 233-238.
|
[14] |
(Wang Weiming, Chen Shihong, Wang Shitong, et al.Automatic Generation of Medical Question Answer Pairs Based on Semantic Templates[J]. Journal of Wuhan University: Science Edition, 2009, 55(2): 233-238.)
|
[15] |
吴焕萍, 吕终亮, 张华平,等. 气象落区文本自动生成研究[J]. 计算机工程与应用, 2014, 50(13): 247-251.
|
[15] |
(Wu Huanping, Lü Zhongliang, Zhang Huaping, et al.Text Generation on Weather Falling Area Description[J]. Computer Engineering and Applications, 2014, 50(13): 247-251.)
|
[16] |
孙剑, 周深根, 徐豪华. 基于模板的作战仿真数据自动生成军事报文方法研究[C]// 见第18届中国系统仿真技术及其应用学术年会论文集. 2012.
|
[16] |
(Sun Jian, Zhou Shengen, Xu Haohua.Research on Automatic Generation for Military Message from Simulation Data Based on Template[C]// Proceedings of the 14th Chinese Conference on System Simulation Technology & Application. 2012.
|
[17] |
Lopez A.Statistical Machine Translation[J]. ACM Computing Surveys, 2008, 40(3): 1-49.
|
[18] |
Jiang L, Zhou M, He J.Generating Chinese Couplets and Quatrain Using a Statistical Approach[C]// Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation. 2009: 377-384.
|
[19] |
Jiang L, Zhou M.Generating Chinese Couplets Using a Statistical MT Approach[C]// Proceedings of the 22nd International Conference on Computational Linguistics. 2008.
|
[20] |
Soumya S, Kumar G S, Naseem R, et al.Automatic Text Summarization[M]. MIT Press, 2011.
|
[21] |
Sauper C, Barzilay R.Automatically Generating Wikipedia Articles: A Structure-Aware Approach[C]// Proceedings of the 4th International Joint Conference on Natural Language. 2009: 208-216.
|
[22] |
Generating Chinese Classical Poems with RNN Encoder-Decoder[EB/OL]. [2017-10-07]. .
|
[23] |
Wang Q, Luo T, Wang D, et al.Chinese Song Iambics Generation with Neural Attention-based Model [C]//Proceedings of International Joint Coherence on Artificial Intelligence. New York: AAAI Press, 2016: 2943-2949.
|
[24] |
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
|
[25] |
Haidar M A, O'Shaughnessy D. LDA-based LM Adaptation Using Latent Semantic Marginals and Minimum Discriminant Information[C]// Proceedings of the 20th European Signal Processing Conference. 2012: 2040-2044.
|
[26] |
Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(S1): 5228-5235.
doi: 10.1073/pnas.0307752101
|
[27] |
张宝林. “HSK动态作文语料库”简介[J]. 国外汉语教学动态, 2003(4): 37-38.
|
[27] |
(Zhang Baolin.“HSK Dynamic Composition Corpus” Introduction[J]. Foreign Chinese Teaching Dynamics, 2003(4): 37-38.)
|
[28] |
Baez J C, Fritz T.A Bayesian Characterization of Relative Entropy[J]. Theory & Applications of Categories, 2014, 29(16): 422-456.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|