Food Visual Recognition



Description

We present a system to assist users in dietary logging habits, which performs food recognition from pictures snapped on their phone in two different scenarios. In the first scenario, called Food in context, we exploit the GPS information of a user to determine which restaurant they are having a meal at, therefore restricting the categories to recognize to the set of items in the menu. Such context allows us to also report precise calories information to the user about their meal, since restaurant chains tend to standardize portions and provide the dietary information of each meal. In the second scenario, called Foods “in the wild”we try to recognize a cooked meal from a picture which could be snapped anywhere. We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating a multi-task loss function on top of a convolutional neural network (CNN) architecture. It then refines the CNN predictions using a random walk based smoothing procedure, which further exploits the rich semantic information. We perform extensive experiments on food recognition on both scenarios, demonstrating the feasibility of our approach at scale.


publications

Michele Merler, Hui Wu, Rosario Uceda-Sosa, Quoc-Bao Nguyen, John R Smith. Snap, Eat, RepEat: a food recognition engine for dietary logging. 2nd International Workshop on Multimedia Assisted Dietary Management @ACM Multimedia (MADIMA) 2016. PDF BibTeX Slides Poster


Hui Wu, Michele Merler, Rosario Uceda-Sosa, John R Smith. Learning to make better mistakes: Semantics-aware visual food recognition. ACM Multimedia (MM) 2016. PDF BibTeX