面向中文学术文本的单文档关键短语抽取
*
夏天
Extracting Key-phrases from Chinese Scholarly Papers
Xia Tian
表1
数据集中关键短语的统计信息
Table 1
Key Phrase Statistics in the Dataset
构词数量
平均字符长度
出现次数
占比
累计占比
1
3.34
20 303
28.07%
28.07%
2
4.33
39 028
53.95%
82.02%
3
5.95
10 005
13.83%
95.85%
4
7.46
2 142
2.96%
98.81%
5
9.48
476
0.66%
99.47%
6
10.55
218
0.30%
99.77%
7
12.65
79
0.11%
99.88%
8
15.59
37
0.05%
99.93%
9
17.07
14
0.02%
99.95%
10
16.18
22
0.03%
99.98%
其他
-
13
0.02%
100.00%