Please wait a minute...
Advanced Search
现代图书情报技术  2016, Vol. 32 Issue (3): 58-66     https://doi.org/10.11925/infotech.1003-3513.2016.03.08
  研究论文 本期目录 | 过刊浏览 | 高级检索 |
微博城市投诉文本中地理位置实体的完整性研究*
孙赫1,2(),李淑琴2,吕学强1,2,刘克会3,4
1北京信息科技大学网络文化与数字传播北京市重点实验室 北京 100101
2北京信息科技大学计算机学院 北京 100101
3北京理工大学管理与经济学院 北京 100081
4北京城市系统工程研究中心 北京 100035
Retrieving Geographic Information for Micro-blog’s City Complaints
Sun He1,2(),Li Shuqin2,Lv Xueqiang1,2,Liu Kehui3,4
1Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, Beijing Information Science and Technology, Beijing 100101, China
2College of Computer, Beijing Information Science and Technology University, Beijing 100101, China
3School of Management and Economics Beijing Institute of Technology, Beijing 10081, China
4Beijing Research Center of Urban Systems Engineering, Beijing 100035, China
全文: PDF (804 KB)   HTML ( 45
输出: BibTeX | EndNote (RIS)      
摘要 

目的】利用互动问答社区——百度知道的知识共享、更新及时的优势, 弥补维护大规模地理隶属关系资源库开销大的不足, 并通过百度知道自动补全缺陷地理位置实体。【方法】对缺陷地理位置实体转化为所属区域问题, 并通过百度知道进行检索; 根据检索结果提取特征, 计算该地理位置实体属于各个区域的得分, 并构建缺陷地理位置实体的所属区域特征向量; 利用规则对缺陷地理位置实体进行完整化处理, 实现地理位置实体完整性表示。【结果】在完整化微博城市投诉文本中的缺陷地理位置实体时, 该方法的综合精确率达到92.51%。【局限】对零地理位置实体无法完整表示。【结论】该方法对缺陷地理位置实体完整化是有效的、可行的。

服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘克会
孙赫
李淑琴
吕学强
关键词 微博城市投诉文本缺陷地理位置实体互动问答社区特征值计算完整性表示    
Abstract

[Objective] This study aims to utilize the knowledge sharing and constantly updating advantages of the Question Answering Community - Baidu Zhidao, which helps us reduce the cost of maintaining large geographical relationship resource, and find the complete location information. [Methods] First, we changed the incomplete location information to the approximate area names retrieved from Baidu Zhidao. Second, extracted each area’s features and calculated scores of related geographic entities. Finally, we constructed the feature vectors for the areas with those geographic entities, which help us identify the geographic locations of these posts. [Results] The proposed method could retrieve accurate geographic information from 92.51% of City Complaints from the Micro-blog platform. [Limitations] The proposed method could not analyze posts without any geographic location information. [Conclusions] Our study found an effective and feasible way to locate the missing geographic information.

Key wordsCity complaints of Micro-blog    Defect location entity    Question Answering Community(QAC)    Eigenvalue calculation    Integrity
收稿日期: 2015-09-22      出版日期: 2016-04-12
基金资助:*本文系2013年北京市属高等学校创新团队建设与教师职业发展计划项目“大数据内容理解的理论基础及智能化处理技术”(项目编号: IDHT20130519)、北京市科学技术研究院创新工程项目“面向智慧城市的公共设施协同管理关键技术研究”(项目编号: PXM2014_17825_ 000002)和网络文化与数字传播北京市重点实验室开放课题“基于棋局大数据的处理及计算机博弈关键技术研究”(项目编号: ICDD201507)的研究成果之一
引用本文:   
孙赫,李淑琴,吕学强,刘克会. 微博城市投诉文本中地理位置实体的完整性研究*[J]. 现代图书情报技术, 2016, 32(3): 58-66.
Sun He,Li Shuqin,Lv Xueqiang,Liu Kehui. Retrieving Geographic Information for Micro-blog’s City Complaints. New Technology of Library and Information Service, 2016, 32(3): 58-66.
链接本文:  
https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/10.11925/infotech.1003-3513.2016.03.08      或      https://manu44.magtech.com.cn/Jwk_infotech_wk3/CN/Y2016/V32/I3/58
[1] 蔡华利, 刘鲁, 李红. 基于规则推理的突发事件发生地点识别研究[J]. 情报学报, 2011, 30(2): 219-224.
[1] (Cai Huali, Liu Lu, Li Hong.Rule Reasoning-based Occurring Place Recognition for Unexpected Event[J]. Journal of the China Society for Scientific and Technical Information, 2011, 30(2): 219-224.)
[2] 李丽双, 黄德根, 陈春荣, 等. 用支持向量机进行中文地名识别的研究[J]. 小型微型计算机系统, 2005, 26(8): 1416-1419.
[2] (Li Lishuang, Huang Degen, Chen Chunrong, et al.Research on Method of Automatic Recognition of Chinese Place Names Based on Support Vector Machines[J]. Journal of Chinese Computer Systems, 2005, 26(8): 1416-1419.)
[3] 唐旭日, 陈小荷, 许超, 等. 基于篇章的中文地名识别研究[J]. 中文信息学报, 2010, 24(2): 24-32.
[3] (Tang Xuri, Chen Xiaohe, Xu Chao, et al.Discourse-Based Chinese Location Name Recognition[J]. Journal of Chinese Information Processing, 2010, 24(2): 24-32.)
[4] 杜萍, 刘勇. 基于本体的中文地名识别[J]. 西北师范大学学报: 自然科学版, 2012, 47(6): 87-93.
[4] (Du Ping, Liu Yong.Recognition of Chinese Place Names Based on Ontology[J]. Journal of Northwest Normal University: Natural Science, 2011, 47(6): 87-93.)
[5] 李诺, 张全. 利用地名用字分析的中文地名识别处理[J]. 计算机工程与应用, 2009, 45(28): 230-232.
[5] (Li Nuo, Zhang Quan.Chinese Place Name Identification with Chinese Characters Features[J]. Computer Engineering and Applications, 2009, 45(28): 230-232.)
[6] 李丽双, 党延忠, 廖文平, 等. CRF 与规则相结合的中文地名识别[J]. 大连理工大学学报, 2012, 52(2): 285-289.
[6] (Li Lishuang, Dang Yanzhong, Liao Wenping, et al.Recognition of Chinese Location Names Based on CRF and Rules[J]. Journal of Dalian University of Technology, 2012, 52(2): 285-289.)
[7] 李丽双, 黄德根, 陈春荣, 等. SVM 与规则相结合的中文地名自动识别[J]. 中文信息学报, 2006, 20(5): 51-57.
[7] (Li Lishuang, Huang Degen, Chen Chunrong, et al.Identifying Chinese Place Names Based on Support Vector Machines and Rules[J]. Journal of Chinese Information Processing, 2006, 20(5): 51-57.)
[8] 黄德根, 岳广玲, 杨元生. 基于统计的中文地名识别[J]. 中文信息学报, 2003, 17(2): 36-41.
[8] (Huang Degen, Yue Guangling, Yang Yuansheng.Identification of Chinese Place Names Based on Statistics[J]. Journal of Chinese Information Processing, 2003, 17(2): 36-41.)
[9] 钱晶, 张玥杰, 张涛. 基于最大熵的汉语人名地名识别方法研究[J]. 小型微型计算机系统, 2006, 27(9): 1761-1765.
[9] (Qian Jing, Zhang Yuejie, Zhang Tao.Research on Chinese Person Name and Location Name Recognition Based on Maximum Entropy Model[J]. Journal of Chinese Computer Systems, 2006, 27(9): 1761-1765.)
[10] 高燕, 张维维, 张艳红, 等. 最大熵模型在最长地点实体识别中的应用[J]. 广东石油化工学院学报, 2012, 22(4): 40-42.
[10] (Gao Yan, Zhang Weiwei, Zhang Yanhong, et al.Application of Maximum Entropy Model in the LLE Identification[J]. Journal of Guangdong University of Petrochemical Technology, 2012, 22(4): 40-42.)
[11] Li X W, Lv X Q, Liu K H.Automatic Recognition of Chinese Location Entity [A]. // Natural Language Processing and Chinese Computing[M]. Springer Berlin Heidelberg, 2014: 379-391.
[12] Egenhofer M J.Toward the Semantic Geospatial Web[C]. In: Proceedings of the 10th ACM International Symposium on Advances in Geographic Information System. 2002.
[13] 杜萍. 基于本体的中国行政区划地名识别与抽取研究[D]. 兰州: 兰州大学, 2011.
[13] (Du Ping.Study on the Ontology- Based Extraction of the Names of Chinese Administrative Division [D]. Lanzhou: Lanzhou University, 2011.)
[14] McCurley K S. Geospatial Mapping and Navigation of the Web [C]. In: Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China. 2001: 221-229.
[15] Amitay E, Har’El N, Sivan R, et al. Web-a-Where: Geotagging Web Content [C]. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.2004: 273-280.
[16] Smith D A, Crane G.Disambiguating Geographic Names in a Historical Digital Library [A]. // Research and Advanced Technology for Digital Libraries[M]. Springer Berlin/ Heidelberg, 2001: 127-136.
[17] Overell S, Magalhaes J, Rüger S M.Place Disambiguation with Co-occurrence Models [C]. In: Proceedings of the 2006 Cross Language Evaluation Forum, Alicante, Spain. 2006.
[18] Overell S E, Rüger S M.Using Co-occurrence Models for Placename Disambiguation[J]. International Journal of Geographical Information Science, 2008, 22(3): 265-287.
[19] NLPIR汉语分词系统[EB/OL]. [2015-11-10]. .
[19] (NLPIR Chinese Word Segmentation System [EB/OL]. [2015-11-10].
[20] 中国人知识搜索行为研究报告[R/OL]. [2015-11-10]. .
[20] (Report of Knowledge Search Behavior of Chinese User[R/OL]. [2015-11-10].
[21] 推荐答案[EB/OL]. [2015-08-20]. .
[21] (Answer [EB/OL]. [2015-08-20].
[22] 李学伟, 吕学强, 董志安, 等.利用URL-Key进行查询分类[J]. 北京大学学报: 自然科学版, 2015, 51(2): 220-226.
[22] (Li Xuewei, Lv Xueqiang, Dong Zhian, et al.Query Classification by Using URL-Key[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2015, 51(2): 220-226.)
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 2015 《数据分析与知识发现》编辑部
地址:北京市海淀区中关村北四环西路33号 邮编:100190
电话/传真:(010)82626611-6626,82624938
E-mail:jishu@mail.las.ac.cn