[1] Koehn P. Europarl: A Parallel Corpus for Statistical Machine Translation [C]. In: Proceedings of the 10th Machine Translation Summit, Phuket, Thailand. 2005: 79-86.
[2] 吴琳, 魏星, 霍翠婷. 基于 Web 的专利双语语料自动获取研究及实现——以esp@cenet数据库为例[J]. 现代图书情报技术, 2009(9): 57-63. (Wu Lin, Wei Xing, Huo Cuiting. Research and Implement of Automatic Patent Bilingual Corpus Extraction from Web——Taking esp@cenet as an Example [J]. New Technology of Library and Information Service, 2009(9): 57-63. )
[3] Resnik P, Smith N A. The Web as a Parallel Corpus [J]. Computational Linguistics, 2003, 29(3): 349-380.
[4] Ma X, Liberman M Y. BITS: A Method for Bilingual Text Search over the Web [C]. In: Proceedings of Machine Translation Summit VII, Singapore. 1999.
[5] Chen J, Nie J. Automatic Construction of Parallel English-Chinese Corpus for Cross-Language Information Retrieval [C]. In: Proceedings of the 6th Applied Natural Language Processing Conference, Seattle, Washington, USA. 2000: 21-28.
[6] Zhang Y, Wu K, Gao J, et al. Automatic Acquisition of Chinese-English Parallel Corpus from the Web [C]. In: Proceedings of the 28th European Conference on IR Research, London, UK. Springer Berlin Heidelberg, 2006: 420-431.
[7] Zhang C Z, Yao X C, Kit C. Finding More Bilingual Web Pages with High Credibility via Link Analysis [C]. In: Proceedings of the 6th Workshop on Building and Using Comparable Corpora, Sofia, Bulgaria. 2013.
[8] 刘奇, 刘洋, 孙茂松. URL 模式与 HTML 结构相结合的平行网页获取方法 [J]. 中文信息学报, 2013, 27(3): 91-99. (Liu Qi, Liu Yang, Sun Maosong. A Parallel Pages Mining Approach: Comibing URL Patterns and HTML Structures [J]. Journal of Chinese Information Processing, 2013, 27(3): 91-99.)
[9] Gale W A, Church K W. A Program for Aligning Sentences in Bilingual Corpora [J]. Computational Linguistics, 1993, 19(1): 75-102.
[10] Brown P F, Lai J C, Mercer R L. Aligning Sentences in Parallel Corpora [C]. In: Proceedings of the 29th Annual Meeting on Association for Computational Linguistics, 1991: 169-176.
[11] Kay M, Röscheisen M. Text-translation Alignment [J]. Computational Linguistics, 1993, 19(1): 121-142.
[12] Chen S F. Aligning Sentences in Bilingual Corpora Using Lexical Information [C]. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, 1993.
[13] Church K W. Char_align: A Program for Aligning Parallel Texts at the Character Level [C]. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, 1993.
[14] Wu D. Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria [C]. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, 1994: 80-87.
[15] Moore R C. Fast and Accurate Sentence Alignment of Bilingual Corpora [C]. In: Proceedings of the 5th Conference of the Association for Machine Translation in the Americas, Tiburon, CA, USA. 2002: 135-144.
[16] Fattah M A, Bracewell D B, Ren F, et al. Sentence Alignment Using P-NNT and GMM [J]. Computer Speech & Language, 2007,21(4): 594-608.
[17] Sennrich R, Volk M. MT-based Sentence Alignment for Ocr-generated Parallel Texts [C]. In: Proceedings of the 9th Conference of the Association for Machine Translation in the Americas (AMTA 2010), Denver, Colorado, USA. 2010.
[18] Trieu H L, Nguyen P T, Nguyen K A. Improving Moore's Sentence Alignment Method Using Bilingual Word Clustering [C]. In: Proceedings of the 5th International Conference on Knowledge and Systems Engineering. Springer International Publishing, 2014: 149-160.
[19] 熊文新. 英汉环保领域平行语料的句对齐与再对齐[J]. 现代图书情报技术, 2013(6): 36-41. (Xiong Wenxin. Sentence Alignment and Re-Alignment for Environmental Protection Texts in English-Chinese Parallel Corpus [J]. New Technology of Library and Information Service, 2013(6): 36-41.)
[20] Vapnik V N. The Nature of Statistical Learning Theory[M]. Springer New York, 2000.
[21] Forman G. An Extensive Empirical Study of Feature Selection Metrics for Text Classification [J]. Journal of Machine Learning Research, 2003, 3: 1289-1305.
[22] Mesleh A M A. Chi Square Feature Extraction Based SVMs Arabic Language Text Categorization System [J]. Journal of Computer Science, 2007, 3(6): 430-435.
[23] Peng H, Long F, Ding C. Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238.
[24] Robertson S. Understanding Inverse Document Frequency: On Theoretical Arguments for IDF [J]. Journal of Documentation, 2004, 60(5): 503-520.
[25] Varga D, Halácsy P, Kornai A, et al. Parallel Corpora for Medium Density Languages [A].// Recent Advances in Natural Language Processing IV [M]. John Benjamins Publishing Company, 2007: 247-258.
[26] Ma X. Champollion: A Robust Parallel Text Sentence Aligner [C]. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, 2006.
[27] Chang C C, Lin C J. LIBSVM: A Library for Support Vector Machines [J]. ACM Transactions on Intelligent Systems and Technology (TIST), 2011, 2(3): Article No.27.
[28] Stolcke A. SRILM -An Extensible Language Modeling Toolkit [C]. In: Proceedings of the 7th International Conference on Spoken Language Processing, Denver, Colorado, USA. 2002.
[29] Xiao T, Zhu J, Zhang H, et al. NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation [C]. In: Proceedings of the ACL 2012 System Demonstrations, 2012.
[30] Przybocki M A, Peterson K, Bronsart S. Translation Adequacy and Preference Evaluation Tool (TAP-ET) [C]. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC, Marrakech, Morocco. 2008. |