﻿ 知识组织系统自动映射规则研究与实现<sup>*</sup>——以《杜威十进分类法》和《中国图书馆分类法》为例

Study and Implementation on the Automatic Mapping Rules Between Knowledge Organization Systems——The Case of the Dewey Decimal Classification and the Chinese Library Classification
Qu Jianfeng, Li Fang, Zhang Yihua, JLi Bao
Shanghai Jiaotong University Library, Shanghai 200240, China
Abstract

This paper applies mathematical statistics corpus linguistics to generate mapping rules between DDC(Dewey Decimal Classification) and CLC(Chinese Library Classification). A test system is then built up and the DDC-CLC mapping table is produced through the system. The mapping table is examined by bibliographic records with DDC and CLC data so that continuously improved mapping rules and tables can be obtained.

Keyword: Dewey decimal classification; Chinese library classification; Automatic mapping; Knowledge organization systems; Mapping rules
1 引言

2 基于数据统计的映射规则研究

2.1 基于数据统计的映射规则

P(x)=px x=0,1

L(X1,X2,…,Xn;p)=

ln[L(X1,X2,…,Xn;p)]=ln ∑xi+ ln

= - = =0

p=

p(X=xi)≥80%

2.2 映射关系表的扩展

3 DDC-CLC映射规则试验系统的应用

 Figure Option 图1 DDC-CLC映射规则试验系统的应用框架

(1)表现层:用户输入测试的参数并提交相应的数据给DDC-CLC映射规则试验系统,系统根据统计模型计算出相应的DDC-CLC映射关系表,并且用户可以从系统中提取已计算出映射关系表的部分中间结果集。

(2)应用层:DDC-CLC映射规则试验系统应用统计映射规则模型来计算DDC-CLC映射关系表,并提供用户所需要的参数更改接口,从而不断地完善映射规则。

(3)数据存储层:收集各类样本数据,包含来自确定来源的样本数据和从网上的云资源中采集到的数据,并按照一定格式存储;在形成DDC-CLC映射关系表时,会存在一些无法得到映射关系的类,此时就需要补充人工分类来保证DDC-CLC映射关系表的完整性。

4 实施方案
4.1 DDC-CLC映射规则试验系统的流程

 Figure Option 图2 DDC-CLC映射规则算法试验

4.2 DDC-CLC映射规则算法试验系统的实现

(1)汇总统计模块

(2)数据处理模块

 Figure Option 图3 DDC-CLC对应数据一对多数据

(3)合并上位类模块

(4)结果展示

5 结语

