
Imagine you have two pictures of the same scene taken from different angles. Most of the objects in both images are the same, you’re just looking at them from different angles. In computer vision, objects are assumed to have certain features such as edges, corners, etc. Compliance with these features is critical for some applications. But what would it take to match features between two images?
Finding correspondences between images is a prerequisite for estimating 3D structure and camera poses in computer vision tasks such as simultaneous localization and mapping (SLAM) and structure from motion (SfM). This is done by matching local features and is difficult to achieve due to changes in lighting conditions, occlusion, blur, etc. because of.
Traditionally, feature matching is done using a two-step approach. First, the bow step derives visual elements from images. Secondly, the rear step applies set correction and pose calculation to help match the resulting visual features. Once this is done, the features are ready and feature matching is modeled as a linear assignment problem.
As in all other fields, deep neural networks have played a crucial role in solving feature matching problems in recent years. They have been used to learn better sparse detectors and local descriptors from data using convolutional neural networks (CNNs).
However, they were usually part of the feature matching problem rather than a complete solution. What if a single neural network could perform context gathering, matching, and filtering in a single architecture? Time to introduce SuperGlue.
SuperGlue approaches present alignment problems in a different way. It learns the matching process from pre-existing local features using a graph neural network structure. It replaces existing approaches that first learn task-agnostic features and match them using heuristics and simple methods. The complete approach gives SuperGlue strong advantages over existing methods. SuperGlue is learnable middle end which could be used to improve existing approaches.
So how does SuperGlue accomplish this? It appears in a new window and treats the feature matching problem as a partial assignment between two sets of native features. Instead of solving the linear assignment problem to match the features, it treats it as an optimal transportation problem. SuperGlue uses a graph neural network (GNN) that predicts the cost function of this transport optimization.
We all know how transformers have made huge strides in natural language processing and more recently in computer vision tasks. SuperGlue uses a transformer to take advantage of both the spatial relationship of key points and their visual appearance.
SuperGlue is fully trained. Image pairs are used as training data. Positioning priorities are learned from a large labeled data set; so SuperGlue can understand a 3D scene.
SuperGlue can be used for a number of problems that require high-quality feature matching to multi-view geometry. It runs in real-time on commodity hardware and can be used for both classical and learned functions. You can find more information about SuperGlue at the links below.
Take a look paper project and code. All credit for this study goes to the researchers of this project. Also, don’t forget to join our Reddit page and a channel of discordwhere we share the latest AI research news, great AI projects, and more.
Ekrem Çetinkaya received his B.A. in 2018 and M.Sc. in 2019 from Ozegin University in Istanbul, Turkey. He wrote his M.Sc. thesis on image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt in Austria and works as a researcher in the ATHENA project. His research interests include deep learning, computer vision, and multimedia networks.