面向地域的网络话题识别方法*
刘玉文,王凯

Finding Geographic Locations of Popular Online Topics
Liu Yuwen,Wang Kai
表2 话题特征词及位置识别结果
Table 2 Recognition Results About Feature Words and Positions of Topics
序号 话题特征词及生成概率 话题位置及生成概率
1 危险0.00095; 废物0.00086; 垃圾0.00083; 罚款0.00075;
成都0.00072; 10万0.00071; 分类0.00069; 规定0.00069;
收集点0.00066; 混入0.00065; 生活0.00063; 新规0.00063;
单位0.00061; 个人0.00061; 5月0.00059;
r134 0.032; r141 0.030; r136 0.030; r137 0.029; r143 0.029; r156 0.027; r144 0.025; r139 0.025; r152 0.023; r146 0.022; r138 0.022; r141 0.021;r145 0.020; r135 0.020; r142 0.019; r146 0.017; r149 0.017; r148 0.015;
2 机动车0.00103; 交通0.00098; 违法0.00096; 行为0.00091;
天津0.00086; 项0.00086; 举报0.00085; 奖励0.00077;
影响0.00073; 20万0.00069; 行驶0.00068; 事故0.00062;
道路0.00062; 每起0.00057; 安全0.00055;
r196 0.027; r200 0.025; r201 0.025; r205 0.023; r195 0.020; r203 0.020; r194 0.017; r210 0.017; r216 0.016; r212 0.016; r208 0.016; r197 0.015;
r199 0.015; r215 0.013; r221 0.012; r190 0.012; r207 0.011; r211 0.011;
3 网约车0.00062; 交通0.00061; 安全0.00057; 道路0.00056;
条例0.00052; 平台0.00052; 处罚0.00051; 派单0.00049;
南京0.00049; 面临0.00046; 公司0.00044; 治理0.00043;
乘客0.00043; 合法0.00041; 监管0.00041;
r108 0.031; r103 0.028; r112 0.028; r105 0.027; r115 0.025; r116 0.023; r101 0.022; r120 0.022; r117 0.020; r113 0.019; r100 0.019; r120 0.018;
r98 0.015; r108 0.015; r102 0.012; r122 0.012; r111 0.010; r106 0.010;
4 医院0.00051; 三甲0.00051; 顺序0.00049; 急症0.00049;
先来后到0.00046; 急诊0.00046; 分级0.00045;
北京0.00044; 专业0.00039; 就诊0.00038; 优先0.00038;
危重0.00036; 患者0.00033; 医护0.00033; 改变0.00032;
r220 0.022; r219 0.022; r217 0.018; r218 0.018; r225 0.017; r223 0.017; r230 0.016; r237 0.015; r231 0.015; r229 0.014; r222 0.014; r225 0.012;
r232 0.012; r226 0.011; r228 0.011; r235 0.011; r233 0.011; r227 0.010;
5 小学0.00151; 上饶0.00144; 杀人0.00136; 刀0.00128;
班主任0.00119; 刘帅0.00111; 血0.00104; 何琛0.00102;
老师0.00101; 王某建0.00101; 第五0.00098; 语文0.00096;
卫生间0.00085; 医生0.00077; 校长0.00068;
r88 0.019; r87 0.019; r85 0.018; r92 0.018; r83 0.018; r77 0.018;
r134 0.017; r219 0.016; r75 0.016; r70 0.016; r8 0.015; r97 0.015;
r152 0.015; r146 0.014; r160 0.014; r141 0.014; r2 0.013; r179 0.013;
6 保险0.00085; 养老0.00085; 城镇0.00083; 职工0.00083;
人社部0.0081; 比例0.0080; 缴费0.00080; 医疗费0.00077;
单位0.00075; 降低0.00072; 社保0.00068; 失业0.00067;
调整0.00061; 工伤0.00058; 政策0.00057;
r220 0.020; r134 0.018; r196 0.018; r108 0.017; r223 0.017; r231 0.016; r146 0.016; r70 0.016; r77 0.016; r219 0.015; r205 0.015; r108 0.015;
r37 0.015; r6 0.015; r194 0.015; r207 0.014; r69 0.014; r118 0.014;
7 西甲0.00078; 武磊0.00077; 西班牙0.00073; 跑位0.00071;
吹0.0071; 希望0.00068; 首发0.00066; 足球0.00065;
球王0.00065; 单刀 0.0063; 中国0.00060; 欧战0.00060;
孤立0.00059; 速度0.00059; 替换0.0056;
r2 0.025; r8 0.025; r219 0.025; r223 0.023; r141 0.023; r71 0.023; r78 0.023; r169 0.022; r38 0.022; r227 0.022; r188 0.022; r192 0.022;
r49 0.021; r201 0.021; r105 0.021; r83 0.012; r152 0.020; r78 0.019;
8 五一0.00131; 爆满0.00130; 旅游0.00127; 酒店0.00126;
西湖0.00122; 北京0.00121; 客流0.00117; 飞机0.00112;
携程0.00111; 黄山0.00108; 高峰0.00108; 出境0.00099;
景区0.00092; 游客0.00087; 人多0.00085;
r86 0.023; r219 0.023; r16 0.022; r25 0.022; r133 0.022; r217 0.022;
r156 0.021; r193 0.021; r158 0.021; r112 0.021; r104 0.021; r51 0.020;
r28 0.020; r163 0.020; r179 0.020; r199 0.019; r46 0.019; r229 0.017;