基于特征融合的术语型引用对象自动识别方法研究*

基于特征融合的术语型引用对象自动识别方法研究*

马娜,张智雄,吴朋民

Automatic Identification of Term Citation Object with Feature Fusion

Na Ma,Zhixiong Zhang,Pengmin Wu

表3 标注结果与预测结果差异实例

Table3 Examples of Differences Between Labeled Results and Predicted Results

预测模型	预测结果
BiLSTM-CNN-CRF（POS+REF+DIS）	We have adopted the Conditional Maximum Entropy (MaxEnt) modeling paradigm as outlined in REF3 and REF19
	To quickly (and approximately) evaluate this phenomenon, we trained the statistical IBM word-alignment model 4 REF7, using the GIZA ++ software REF11 for the following language pairs: Chinese-English, Italian-English, and Dutch-English, using the IWSLT-2006 corpus REF23 for the first two language pairs, and the Europarl corpus REF9 for the last one.
	In computational linguistic literature, much effort has been devoted to phonetic transliteration, such as English-Arabic, English-Chinese REF5, English-Japanese REF6 and English-Korean.
	Tokenisation, species word identification and chunking were implemented in-house using the LTXML2 tools REF4, whilst abbreviation extraction used the Schwartz and Hearst abbreviation extractor REF9 and lemmatisation used morpha REF12.