Fast Local Spatial Verification for Feature-Agnostic Large-Scale Image Retrieval


Images from social media can reflect diverse viewpoints, heated arguments, and expressions of creativity, adding new complexity to retrieval tasks. Researchers working on Content-Based Image Retrieval (CBIR) have traditionally tuned their algorithms to match filtered results with user search intent. However, we are now bombarded with composite images of unknown origin, authenticity, and even meaning. With such uncertainty, users may not have an initial idea of what the search query results should look like. For instance, hidden people, spliced objects, and subtly altered scenes can be difficult for a user to detect initially in a meme image, but may contribute significantly to its composition. It is pertinent to design systems that retrieve images with these nuanced relationships in addition to providing more traditional results, such as duplicates and near-duplicates — and to do so with enough efficiency at large scale. We propose a new approach for spatial verification that aims at modeling object-level regions using image keypoints retrieved from an image index, which is then used to accurately weight small contributing objects within the results, without the need for costly object detection steps. We call this method the Objects in Scene to Objects in Scene (OS2OS) score, and it is optimized for fast matrix operations, which can run quickly on either CPUs or GPUs. It performs comparably to state-of-the-art methods on classic CBIR problems (Oxford 5K, Paris 6K, and Google-Landmarks), and outperforms them in emerging retrieval tasks such as image composite matching in the NIST MFC2018 dataset and meme-style imagery from Reddit.

IEEE Transactions on Image Processing