[1] Newcombe H B, Kennedy J M, Axford S J, et al.Automatic Linkage of Vital Records [J]. Science, 1959, 130(3381): 954-959.
[2] Fellegi I P, Sunter A B.A Theory for Record Linkage [J]. Journal of the American Statistical Association, 1969, 64(328): 1183-1210.
[3] Newcombe H B, Kennedy J M.Record Linkage: Making Maximum Use of the Discriminating Power of Identifying Information [J]. Communications of the ACM, 1962, 5(11): 563-566.
[4] Hernandez M A, Stolfo S J. The Merge/Purge Problem for Large Databases[C]. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD'95), San Jose, California, USA. New York: ACM, 1995: 127-138.
[5] Sarawagi S, Bhamidipaty A. Interactive Deduplication Using Active Learning [C]. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'02), Edmonton, Alberta, Canada. New York: ACM, 2002: 269-278.
[6] Dong X, Halevy A, Madhavan J. Reference Reconciliation in Complex Information Spaces [C].In: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA. New York: ACM, 2005: 85-96.
[7] Tejada S, Knoblock C A, Minton S.Learning Object Identification Rules for Information Integration [J]. Information Systems, 2001, 26(8): 607-633.
[8] Christen P. Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection [M]. Springer Berlin Heidelberg, 2012.
[9] Elmagarmid A K, Ipeirotis P G, Verykios V S.Duplicate Record Detection: A Survey [J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(1): 1-16.
[10] Winkler W E. Overview of Record Linkage and Current Research Directions [R]. Washington, D C: U.S. Census Brueau, 2006.
[11] Benjelloun O, Garcia-Molina H, Menestrina D, et al.Swoosh: A Generic Approach to Entity Resolution[C]. In: Proceedings of the 35th International Conference on Very Large Data Bases, Lyon, France.2009: 255-276.
[12] Bhattacharya I, Getoor L.Collective Entity Resolution in Relational Data [J]. ACM Transactions on Knowledge Discovery from Data, 2007, 1(1): Article No. 5.
[13] Manning C D, Raghavan P, Schütze H, et al. Introduction to Information Retrieval [M]. Cambridge University Press, 2008: 496.
[14] Arasu A, Gotz M, Kaushik R. On Active Learning of Record Matching Packages [C]. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, Indiana, USA. New York: ACM, 2010: 783-794.
[15] 刘骏豪, 孙晶莹.2011 年德国人口普查中的新技术——记录连接[J]. 中国统计, 2011(11): 38-39. (Liu Junhao, Sun Jingying. The New Technology in 2011 German Population Census——Record Connection [J]. China Statistics, 2011(11): 38-39.)
[16] 谭明超, 刁兴春, 曹建军.实体分辨研究综述[J]. 计算机科学, 2014, 41(4): 9-12, 20. (Tan Mingchao, Diao Xingchun, Cao Jianjun. Survey on Entity Resolution [J]. Computer Science, 2014, 41(4): 9-12, 20.)
[17] Müller H, Freytag J-C. Problems, Methods, and Challenges in Comprehensive Data Cleansing [M]. Humboldt University Berlin, 2003.
[18] Record Linkage in Large Data Sets [EB/OL]. [2014-12-02]. http://www.dani-sola.com/record-linkage-in-large-data-sets/.
[19] Herzog T N, Scheuren F J, Winkler W E. Data Quality and Record Linkage Techniques [M]. Springer-Verlag, 2007.
[20] Winkler W E. Methods for Record Linkage and Bayesian Networks [R]. Statistical Research Division, US Census Bureau, Washington, DC, 2002.
[21] Whang S E, Garcia-Molina H. Entity Resolution with Evolving Rules [C]. In: Proceedings of the 36th International Conference on Very Large Data Bases, Singapore. 2010: 1326-1337.
[22] Whang S E, Garcia-Molina H.Incremental Entity Resolution on Rules and Data [J]. The VLDB Journal, 2014, 23(1): 77-102.
[23] Whang S E, Garcia-Molina H.Developments in Generic Entity Resolution [J]. IEEE Data Engineering Bulletin, 2011, 13(11): 24-30.
[24] Whang S E, Menestrina D, Koutrika G, et al. Entity Resolution with Iterative Blocking [C]. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Providence, Rhode Island, USA. New York: ACM, 2009: 219-232.
[25] Gruenheid A, Dong X L, Srivastava D. Incremental Record Linkage [C]. In: Proceedings of the 40th International Conference on Very Large Data Bases, Hangzhou, China, 2014: 697-708.
[26] Sarawagi S, Deshpande V, Kasliwal S. Efficient Top-k Count Queries over Imprecise Duplicates [C]. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, Saint Petersburg, Russia. New York: ACM, 2009: 450-461.
[27] Hernández M A, Stolfo S J.Real-world Data is Dirty: Data Cleansing and the Merge/Purge Problem [J]. Data Mining and Knowledge Discovery, 1998, 2(1): 9-37.
[28] Mathieu C, Sankur O, Schudy W.Online Correlation Clustering [OL]. ArXiv Preprint arXiv: 10010920.
[29] Charikar M, Chekuri C, Feder T, et al. Incremental Clustering and Dynamic Information Retrieval [C]. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing (STOC'97). New York: ACM, 1997: 626-635.
[30] Aggarwal C C, Han J, Wang J, et al. A Framework for Clustering Evolving Data Streams [C].In: Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany.2003: 81-92.
[31] Singla P, Domingos P. Collective Object Identification [C]. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland. San Francisco: Morgan Kaufmann Publishers Inc., 2005: 1636-1637.
[32] Christen P. Automatic Record Linkage Using Seeded Nearest Neighbour and Support Vector Machine Classification [C]. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA. New York: ACM, 2008: 151-159.
[33] 楼俊杰, 徐从富, 郝春亮.基于马尔科夫逻辑网络的实体解析改进算法[J]. 计算机科学, 2010, 37(8): 243-247. (Lou Junjie, Xu Congfu, Hao Chunliang. Improvement of Entity Resolution Based on Markov Logic Networks [J]. Computer Science, 2010, 37(8): 243-247.)
[34] Chaudhuri S, Ganti V, Xin D.Mining Document Collections to Facilitate Accurate Approximate Entity Matching [C]. In: Proceedings of the 35th International Conference on Very Large Data Bases, Lyon, France.2009: 395-406.
[35] Shu L, Bo L, Meng W. A Latent Topic Model for Complete Entity Resolution [C]. In: Proceedings of IEEE 25th International Conference on Data Engineering (ICDE'09). IEEE, 2009: 880-891.
[36] Rastogi V, Dalvi N, Garofalakis M. Large-scale Collective Entity Matching [C]. In: Proceedings of the 37th International Conference on Very Large Data Bases, Seattle, Washington, USA.2011: 208-218.
[37] Getoor L, Machanavajjhala A.Entity Resolution: Theory, Practice & Open Challenges [C]. In: Proceedings the 38th International Conference on Very Large Data Bases, Istanbul, Turkey. 2012: 2018-2019.
[38] McCallum A, Nigam K, Ungar L H. Efficient Clustering of High-dimensional Data Sets with Application to Reference Matching [C]. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, USA. New York: ACM, 2000: 169-178.
[39] 甄灵敏, 杨晓春, 王斌, 等.基于属性权重的实体解析技术 [J]. 计算机研究与发展, 2013, 50(S1): 281-289. (Zhen Lingmin, Yang Xiaochun, Wang Bin, et al. An Entity Resolution Approach Based on Attributes Weights [J]. Journal of Computer Research and Development, 2013, 50(S1): 281-289.)
[40] Kim H S, Lee D. HARRA: Fast Iterative Hashed Record Linkage for Large-scale Data Collections [C]. In: Proceedings of the 13th International Conference on Extending Database Technology, Lausanne, Switzerland. New York: ACM, 2010: 525-536.
[41] Vernica R, Carey M J, Li C. Efficient Parallel Set-similarity Joins Using MapReduce [C]. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, Indianapolis, Indiana, USA. New York: ACM, 2010: 495-506.
[42] Bilenko M, Kamath B, Mooney R J. Adaptive Blocking: Learning to Scale up Record Linkage [C]. In: Proceedings of the 6th International Conference on Data Mining (ICDM'06), Hong Kong, China.IEEE, 2006: 87-96.
[43] Baxter R, Christen P, Churches T. A Comparison of Fast Blocking Methods for Record Linkage [C]. In: Proceedings of the 1st Workshop on Data Cleaning, Record Linkage and Object Consolidation (KDD'03), Washington, DC, USA. 2003: 25-27.
[44] Kirsten T, Kolb L, Hartung M, et al.Data Partitioning for Parallel Entity Matching [OL]. arXiv Preprint arXiv: 10065309.
[45] Koudas N, Marathe A, Srivastava D. Flexible String Matching Against Large Databases in Practice [C]. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04), Toronto, Canada. 2004: 1078-1086.
[46] Chaudhuri S, Ganti V, Kaushik R. A Primitive Operator for Similarity Joins in Data Cleaning [C]. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE' 06). Washington DC: IEEE Computer Society, 2006: 5.
[47] Xiao C, Wang W, Lin X, et al. Efficient Similarity Joins for Near Duplicate Detection [C]. In: Proceedings of the 17th International Conference on World Wide Web, Beijing, China.New York: ACM, 2008: 131-140.
[48] Papapetrou P, Athitsos V, Kollios G, et al.Reference-based Alignment in Large Sequence Databases [C]. In: Proceedings of the 35th International Conference on Very Large Data Bases, Lyon, France.2009: 205-216.
[49] Li C, Lu J, Lu Y. Efficient Merging and Filtering Algorithms for Approximate String Searches [C]. In: Proceedings of the IEEE 24th International Conference on Data Engineering, Cancun, Mexico.IEEE Computer Society, 2008: 257-266.
[50] Li C, Wang B, Yang X. VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-length Grams [C]. In: Proceedings of the 33rd International Conference on Very Large Data Bases, Vienna, Austria.2007: 303-314.
[51] Yang X, Wang B, Li C. Cost-based Variable-length-gram Selection for String Collections to Support Approximate Queries Efficiently [C]. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, Vancouver, Canada.New York: ACM, 2008: 353-364.
[52] Behm A, Shengyue J, Chen L, et al. Space-Constrained Gram-Based Indexing for Efficient Approximate String Search [C]. In: Proceedings of IEEE 25th International Conference on Data Engineering (ICDE'09), Shanghai, China.IEEE, 2009: 604-615.
[53] 邱越峰, 田增平, 季文赟, 等.一种高效的检测相似重复记录的方法 [J]. 计算机学报, 2001, 24(1): 69-77. (Qiu Yuefeng, Tian Zengping, Ji Wenyun, et al. An Efficient Approach for Detecting Approximately Duplicate Database
Records [J]. Chinese Journal of Computers, 2001, 24(1): 69-77.)
[54] Lieberman M D, Sankaranarayanan J, Samet H. A Fast Similarity Join Algorithm Using Graphics Processing Units [C]. In: Proceedings of the IEEE 24th International Conference on Data Engineering.Washington DC: IEEE Computer Society, 2008: 1111-1120.
[55] 燕彩蓉, 万永权.并行实体解析与记录聚合模型 [J]. 小型微型计算机系统, 2013, 34(8): 1843-1847. (Yan Cairong, Wan Yongquan. Parallel Entity Resolution and Record Aggregation Model [J]. Journal of Chinese Computer Systems, 2013, 34(8): 1843-1847.)
[56] 燕彩蓉, 张洋舜, 徐光伟.支持隐私保护的众包实体解析 [J]. 计算机科学与探索, 2014, 8(7): 802-811. (Yan Cairong, Zhang Yangshun, Xu Guangwei. Crowdsourcing Entity Resolution with Privacy Protection [J]. Journal of Frontiers of Computer Science and Technology, 2014, 8(7): 802-811.)
[57] 王宁, 李杰.大数据环境下用于实体解析的两层相关性聚类方法 [J]. 计算机研究与发展, 2014, 51(9): 2108-2116. (Wang Ning, Li Jie. Two-Tiered Correlation Clustering Method for Entity Resolution in Big Data [J]. Journal of Computer Research and Development, 2014, 51(9): 2108-2116.)
[58] 杨丹, 申德荣, 于戈, 等.数据空间中时间为中心的集合实体识别策略[J]. 计算机科学与探索, 2012, 6(11): 974-984. (Yang Dan, Shen Derong, Yu Ge, et al. Time-centered Collective Entity Resolution Strategy in Dataspace [J]. Journal of Frontiers of Computer Science and Technology, 2012, 6(11): 974-984.) |