1School of Chinese Language and Literature, Ludong University, Yantai 264025, China 2Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology University, Beijing 100101, China
[Objective] This paper tries to automatically generate writing samples for the Chinese Proficiency Test (HSK), aiming to help the Chinese teachers and learners prepare for the test. [Methods] First, we used the “HSK Dynamic Corpus” as the basic corpus, and trained it with the LDA model. Then, we adopted the cross-entropy strategy to select sentences containing required keywords. Finally, we manually scored the generated texts with the evaluating criteria. [Results] The generated essays contained all needed keywords and were relevant to the topics of the writing tasks. [Limitations] Some training corpus were modified HSK essays, written by non-Chinese speaker. [Conclusions] The proposed method could generate passages of good quality with the required keywords effectively.
(Li Chunhong.Research on Validity of the New Hanyu Shuiping Kaoshi Based on Chinese International Promotion Strategy——Take the New HSK Level 5 Writing Test as a Case[J].University Education, 2013(12): 111-113.)
[3]
Klein S.Control of Style with a Generative Grammar[J]. Language, 1965, 41(4): 619-631.
doi: 10.2307/411529
[4]
Manyika J, Chui M, Brown B, et al. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Mckinsey.com[R/OL]. [2012-12-13]. .
(Cao Cungen, Yue Xiaoli, Sui Yuefei.PNAI: A New Narrative and Animation Intelligent Experiment Platform[J]. Information Technology Letter, 2006, 4(5): 1-4.)
[6]
He J, Zhou M, Jiang L.Generating Chinese Classical Poems with Statistical Machine Translation Models[C]// Proceedings of the 26th AAAI Conference on Artificial Intelligence. AAAI Press, 2012: 1650-1656.
[7]
Zhang J, Yao J G, Wan X.Towards Constructing Sports News from Live Text Commentary[C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016:1361-1371.
[8]
写稿机器人“小南”上岗[EB/OL]. [2017-06-01]. .
[8]
(Writing Robot “Xiaonan” [EB/OL]. [2017-06-01].
[9]
Reviewer-Horacek H.Review of “Building Natural Language Generation Systems” by Ehud Reiter and Robert Dale. Cambridge University Press 2000[J].Computational Linguistics, 1996, 27(2): 298-300.
[10]
贾佩山. 自然语言生成技术及其应用实例[J]. 电脑与信息技术, 1997, 5(2): 7-9.
[10]
(Jia Peishan.Natural Language Generation Technology and Its Application Examples[J]. Computer and Information Technology, 1997, 5(2): 7-9.)
[11]
王纤. 自然语言生成系统的实现技术分析[J]. 微型电脑应用, 1997(4): 51-54.
[11]
(Wang Xian.On the Implementation Techniques for Natural Language Generation Systems[J]. Microcomputer Applications, 1997(4): 51-54.)
[12]
张建华, 陈家骏. 自然语言生成综述[J]. 计算机应用研究, 2006, 23(8): 1-3.
[12]
(Zhang Jianhua, Chen Jiajun.Summarization of Natural Language Generation[J]. Research on Computer Applications, 2006, 23(8): 1-3.)
(Zhan Weidong.A Brief Introduction to Natural Language Understanding and Generation[J]. Terminology Standardization & Information Technology, 2010(4): 33-42.)
(Wang Weiming, Chen Shihong, Wang Shitong, et al.Automatic Generation of Medical Question Answer Pairs Based on Semantic Templates[J]. Journal of Wuhan University: Science Edition, 2009, 55(2): 233-238.)
(Sun Jian, Zhou Shengen, Xu Haohua.Research on Automatic Generation for Military Message from Simulation Data Based on Template[C]// Proceedings of the 14th Chinese Conference on System Simulation Technology & Application. 2012.
Jiang L, Zhou M, He J.Generating Chinese Couplets and Quatrain Using a Statistical Approach[C]// Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation. 2009: 377-384.
[19]
Jiang L, Zhou M.Generating Chinese Couplets Using a Statistical MT Approach[C]// Proceedings of the 22nd International Conference on Computational Linguistics. 2008.
[20]
Soumya S, Kumar G S, Naseem R, et al.Automatic Text Summarization[M]. MIT Press, 2011.
[21]
Sauper C, Barzilay R.Automatically Generating Wikipedia Articles: A Structure-Aware Approach[C]// Proceedings of the 4th International Joint Conference on Natural Language. 2009: 208-216.
[22]
Generating Chinese Classical Poems with RNN Encoder-Decoder[EB/OL]. [2017-10-07]. .
[23]
Wang Q, Luo T, Wang D, et al.Chinese Song Iambics Generation with Neural Attention-based Model [C]//Proceedings of International Joint Coherence on Artificial Intelligence. New York: AAAI Press, 2016: 2943-2949.
[24]
Blei D M, Ng A Y, Jordan M I.Latent Dirichlet Allocation[J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[25]
Haidar M A, O'Shaughnessy D. LDA-based LM Adaptation Using Latent Semantic Marginals and Minimum Discriminant Information[C]// Proceedings of the 20th European Signal Processing Conference. 2012: 2040-2044.
[26]
Griffiths T L, Steyvers M.Finding Scientific Topics[J]. Proceedings of the National Academy of Sciences of the United States of America, 2004, 101(S1): 5228-5235.
doi: 10.1073/pnas.0307752101