New Technology of Library and Information Service  2011, Vol. 27 Issue (12): 31-38    DOI: 10.11925/infotech.1003-3513.2011.12.05
Chinese and Bengali Proper Noun Recognition Based on String Frequency Statistics Model
Kishore Biswas1, Wang Huilin2, Yu Wei2
1. Department of Information Management, Peking University, Beijing 100871, China;
2. Institute of Science & Technology Information of China, Beijing 100038, China
Abstract  This paper implements String Frequency Statistics Algorithm proposed by Nagao to build Proper Noun Recognition (PNR) system for Chinese and Bengali languages. First, n-grams are extracted from untagged input corpus,then they are filtered to get rid of redundant sub-strings, using SSR algorithm. Finally, this multilingual PNR system assigns each n-gram a probability of being a proper noun based on the information of their neighboring words and outputs results according to their probability score. The test results show that this system can effectively recognize name of people, places, organizations or institutions from the input text.
Key wordsProper noun recognition      String statistics      Nagao algorithm      SSR      algorithm     
Received: 03 November 2011      Published: 02 February 2012



