Image Annotation Tags by Deep Learning and Real Users: A Comparative Study
Lu Wei, Luo Mengqi(), Ding Heng, Li Xin
School of Information Management, Wuhan University, Wuhan 430072, China Information Retrieval and Knowledge Mining Laboratory, Wuhan University, Wuhan 430072, China
[Objective] This paper proposes a user tagging framework and examines the limitations of tagging image with deep learning techniques, aiming to improve the performance of automatic annotation services. [Methods] We analyzed the user-added tags from one million images on flickr.com to extract the high frequency ones. Then, we mapped these tags with the proposed framework, and compared them with tags from the ImageNet database. Finally, we analyzed images with high frequency tags with the deep learning algorithm - MXNet. [Results] The automatic image annotation techniques based on deep learning could not effectively understand the image’s background knowledge, as well as the image’s descriptions from the human perceptive. [Limitations] Our dataset needs to be expanded and analyzed with other deep learning algorithms. [Conclusions] The development of automatic image annotation, requires us to establish the association between image information, background knowledge, and description, as well as cultivate deductive reasoning and context-aware abilities.
陆伟, 罗梦奇, 丁恒, 李信. 深度学习图像标注与用户标注比较研究*[J]. 数据分析与知识发现, 2018, 2(5): 1-10.
Lu Wei,Luo Mengqi,Ding Heng,Li Xin. Image Annotation Tags by Deep Learning and Real Users: A Comparative Study. Data Analysis and Knowledge Discovery, 2018, 2(5): 1-10.
Leung C H C, Luo M Q. Building Up of Image and Multimedia Object Index Through Continuous Usage[C]// Proceedings of International Conference on Computer Networks, E-Learning and Information Technology, Bangkok, Thailand. HongKong: ICCNEIT, 2013.
[2]
Sill L A.Indexing Multimedia and Creative Works: The Problems of Meaning and Interpretation[J]. Library Collections, Acquisitions, and Technical Services, 2005, 29(4): 448-449.
doi: 10.1080/14649055.2005.10766098
[3]
Beaudoin J.Folksonomies: Flickr Image Tagging: Patterns Made Visible[J]. Bulletin of the American Society for Information Science & Technology, 2007, 34(1): 26-29.
doi: 10.1002/bult.2007.1720340108
[4]
Golbeck J, Koepfler J, Emmerling B.An Experimental Study of Social Tagging Behavior and Image Content[J]. Journal of the Association for Information Science & Technology, 2011, 62(9): 1750-1760.
doi: 10.1002/asi.21522
[5]
Klavans J L, Laplante R, Golbeck J.Subject Matter Categorization of Tags Applied to Digital Images from Art Museums[J]. Journal of the Association for Information Science & Technology, 2014, 65(1): 3-12.
doi: 10.1002/asi.22950
[6]
Xie L, Natsev A, Hill M, et al.The Accuracy and Value of Machine-generated Image Tags: Design and User Evaluation of an End-to-End Image Tagging System[C]//Proceedings of ACM International Conference on Image & Video Retrieval. 2010: 58-65.
[7]
Ordonez V, Kulkarni G, Berg T L.Im2text: Describing Images Using 1 Million Captioned Photographs[C]// Proceedings of Conference on Neural Information Processing Systems.2011: 1143-1151.
[8]
Lee S, De Neve W, Ro Y M.Image Tag Refinement along the ‘What’ Dimension Using Tag Categorization and Neighbor Voting[C]//Proceedings of 2010 IEEE International Conference on Multimedia & Expo.2010: 48-53.
[9]
Izadinia H, Farhadi A, Hertzmann A, et al.Image Classification and Retrieval from User-Supplied Tags[OL]. arXiv Preprint. arXiv: 1411.6909.
[10]
Eleta I, Golbeck J.A Study of Multilingual Social Tagging of Art Images: Cultural Bridges and Diversity[C]//Proceedings of the ACM 2012 Conference on Computer Supported Cooperative Work, Seattle, Washington, USA. New York, USA: ACM, 2012: 695-704.
[11]
Cunningham S J, Bainbridge D, Masoodian M.How People Describe Their Image Information Needs: A Grounded Theory Analysis of Visual Arts Queries[C]//Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries.2004: 47-48.
(Wang Xiaoguang, Xu Lei, Li Gang.Semantic Description Framework Research on Dunhuang Fresco Digital Image[J]. Journal of Library Science in China, 2014, 40(1): 50-59.)
doi: 10.3969/j.issn.1001-8867.2014.01.005
[13]
Zhang J, Yang Y, Tian Q, et al.Personalized Social Image Recommendation Method Based on User-Image-Tag Model[J].IEEE Transactions on Multimedia, 2017, 19(11): 2439-2449.
doi: 10.1109/TMM.2017.2701641
[14]
Sa N, Yuan X.What Motivates People Use Social Tagging[A]// Lecture Notes in Computer Science[M]. 2013, 8029: 86-93.
[15]
Heckner M, Heilemann M, Wolff C.Personal Information Management vs. Resource Sharing: Towards a Model of Information Behavior in Social Tagging Systems[C]// Proceedings of International Conference on Weblogs and Social Media(ICWSM 2009), San Jose, California, USA. 2009.
[16]
Nov O, Ye C.Why do People Tag? Motivations for Photo Tagging[J]. Communications of the ACM, 2010, 53(7): 128-131.
[17]
Ames M, Naaman M.Why We Tag: Motivations for Annotation in Mobile and Online Media[C]// Proceedings of the SIGCHI Conference on Human Factors in Computing Systems(CHI 2007), San Jose, California, USA. 2007: 971-980.
[18]
Nwana A O, Chen T.Who Ordered This?: Exploiting Implicit User Tag Order Preferences for Personalized Image Tagging[C]// Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. 2016: 1-6.
[19]
Strohmaier M, Körner C, Kern R.Why do Users Tag? Detecting Users’ Motivation for Tagging in Social Tagging Systems[C]//Proceedings of International Conference on Weblogs and Social Media(ICWSM 2010), Washington, DC, USA. 2010: 23-26.
[20]
Patel T, Shah B.A Survey on Facial Feature Extraction Techniques for Automatic Face Annotation[C]// Proceedings of 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA).2017: 224-228.
[21]
Hao Z, Ge H, Gu T.Automatic Image Annotation Based on Particle Swarm Optimization and Support Vector Clustering[J]. Mathematical Problems in Engineering, 2017(1): 1-11.
[22]
Ke X, Zhou M, Niu Y, et al.Data Equilibrium Based Automatic Image Annotation by Fusing Deep Model and Semantic Propagation[J]. Pattern Recognition, 2017, 71: 60-77.
doi: 10.1016/j.patcog.2017.05.020
[23]
Gu Y, Xue H, Yang J.Cross-Modal Saliency Correlation for Image Annotation[J]. Neural Processing Letters, 2017, 45(3): 777-789.
doi: 10.1007/s11063-016-9511-4
[24]
Bahrololoum A, Nezamabadi-Pour H.A Multi-expert Based Framework for Automatic Image Annotation[J]. Pattern Recognition, 2017, 61: 169-184.
doi: 10.1016/j.patcog.2016.07.034
[25]
Budikova P, Batko M, Zezula P.ConceptRank for Search-based Image Annotation[J]. Multimedia Tools and Applications, 2018, 77(7): 8847-8882.
doi: 10.1007/s11042-017-4777-8
[26]
Uricchio T, Ballan L, Seidenari L, et al.Automatic Image Annotation via Label Transfer in the Semantic Space[J]. Pattern Recognition, 2017, 71: 144-157.
doi: 10.1016/j.patcog.2017.05.019
[27]
Mehmood Z, Mahmood T, Javid M A.Content-based Image Retrieval and Semantic Automatic Image Annotation Based on the Weighted Average of Triangular Histograms Using Support Vector Machine[J]. Applied Intelligence, 2017(1): 1-16.
doi: 10.1007/s10489-017-0957-5
[28]
Tariq A, Foroosh H.Learning Semantics for Image Annotation[OL]. arXiv Preprint, arXiv: 1705.05102.
[29]
Chien B C, Ku C W.Large-scale Image Annotation with Image-text Hybrid Learning Models[J]. Soft Computing, 2017, 21(11): 2857-2869.
doi: 10.1007/s00500-016-2221-z
[30]
Verma Y, Jawahar C V.Image Annotation by Propagating Labels from Semantic Neighbourhoods[J]. International Journal of Computer Vision, 2017, 121(1): 126-148.
doi: 10.1007/s11263-016-0927-0
[31]
Tariq A, Foroosh H.A Context-driven Extractive Framework for Generating Realistic Image Descriptions[J]. IEEE Transactions on Image Processing, 2017, 26(2): 619-632.
doi: 10.1109/TIP.2016.2628585
pmid: 28113935
[32]
Karpathy A, Li F F.Deep Visual-Semantic Alignments for Generating Image Descriptions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(4): 664-676.
doi: 10.1109/TPAMI.2016.2598339
pmid: 27514036
[33]
Oquab M, Bottou L, Laptev I, et al.Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks[C]// Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2014: 1717-1724.
[34]
Gong Y, Jia Y, Leung T, et al.Deep Convolutional Ranking for Multilabel Image Annotation[OL]. arXiv Preprint, arXiv: 1312.4894.
[35]
Sánchez J, Perronnin F, Mensink T, et al.Image Classification with the Fisher Vector: Theory and Practice[J]. International Journal of Computer Vision, 2013, 105(3): 222-245.
doi: 10.1007/s11263-013-0636-x
[36]
Tian J, Huang Y, Guo Z, et al.A Multi-Modal Topic Model for Image Annotation Using Text Analysis[J]. IEEE Signal Processing Letters, 2014, 22(7): 886-890.
[37]
Yan F, Mikolajczyk K.Deep Correlation for Matching Images and Text[C]// Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015: 3441-3450.
[38]
Gu Y, Qian X, Li Q, et al.Image Annotation by Latent Community Detection and Multikernel Learning[J]. IEEE Transactions on Image Processing, 2015, 24(11): 3450-3463.
doi: 10.1109/TIP.2015.2443501
pmid: 26068319
[39]
Thomee B, Shamma D A, Friedland G, et al.YFCC100M: The New Data in Multimedia Research[J]. Communnications of the ACM, 2016, 59(2): 64-73.
doi: 10.1145/2812802