Wang Hao, Li Sishu, Deng Sanhong
A language recognition program which is used to auto recognize the textures of the most popular languages on Internet including Chinese-simple, Chinese-traditional, English, French, German, Russian and Korean, is realized in this paper based on the N-Gram language module. The speech recognition experiments are divided into two stages of training of multilingual corpus and testing of language recognition, the texts of training and testing come from the Open Directory Project. The program is used to participate in the language recognition test, as well as to make contrast tests to another language recognition program based on N-Gram named TextCat. The result of the language recognition experiment proves that the program has a fine performance on recognizing Chinese-simple, Chinese-traditional and German, and the accuracy of recognition on Russian, French and English in the next place, the Korean is always interfered with Chinese in these experiments.