STHN: Deep Homography Estimation for UAV Thermal Geo-localization with Satellite Imagery

IEEE Robotics and Automation Letters (RA-L) 2024

1New York University, 2Technology Innovation Institute
*Equal Contribution

TL;DR

STHN introduces a coarse-to-fine deep homography estimation approach for UAV thermal geo-localization, enabling accurate alignment between thermal and satellite imagery even with significant appearance differences and geometric noise.

Video

Abstract

Accurate geo-localization of Unmanned Aerial Vehicles (UAVs) is crucial for outdoor applications including search and rescue operations, power line inspections, and environmental monitoring. The vulnerability of Global Navigation Satellite Systems (GNSS) signals to interference and spoofing necessitates the development of additional robust localization methods for autonomous navigation. Visual Geo-localization (VG), leveraging onboard cameras and reference satellite maps, offers a promising solution for absolute localization. Specifically, Thermal Geo-localization (TG), which relies on image-based matching between thermal imagery with satellite databases, stands out by utilizing infrared cameras for effective nighttime localization. However, the efficiency and effectiveness of current TG approaches, are hindered by dense sampling on satellite maps and geometric noises in thermal query images. To overcome these challenges, we introduce STHN, a novel UAV thermal geo-localization approach that employs a coarse-to-fine deep homography estimation method. This method attains reliable thermal geo-localization within a 512-meter radius of the UAV's last known location even with a challenging 11% size ratio between thermal and satellite images, despite the presence of indistinct textures and self-similar patterns. We further show how our research significantly enhances UAV thermal geo-localization performance and robustness against geometric noises under low-visibility conditions in the wild. The code is made publicly available.

Method Overview

STHN uses a coarse-to-fine deep homography estimation pipeline with three main components: a Thermal Generative Module (TGM), a coarse alignment module, and a refinement module. The coarse alignment module first estimates a rough homography between resized satellite and thermal images. The refinement module then crops the aligned region and performs fine-grained alignment. A two-stage training strategy with bounding box augmentation ensures robust refinement.

STHN Framework Architecture
Framework overview. The coarse alignment module estimates a rough homography, then the refinement module crops and refines the aligned region for precise localization.

Quantitative Results

Comparison of test MACE (m) between different homography estimation methods across different DC. Lower is better.

Method DC=50m DC=64m DC=128m DC=256m DC=512m Failure Rate
SIFT + RANSAC 442.20 654.77 547.29 529.63 1650.46 99.6%
SIFT + MAGSAC++ 512.60 438.54 529.46 561.64 693.03 99.7%
ORB + RANSAC 720.80 733.69 733.94 4614.84 975.83 82.6%
LoFTR + RANSAC 1123.74 1697.33 1317.69 1269.71 2564.65 0%
DHN 16.78 20.43 77.68 197.27 457.23 0%
IHN 5.91 7.81 51.74 190.93 367.24 0%
Ours (WS=512) 4.24 4.93 14.97 142.71 347.50 0%
Ours (WS=1024) 4.92 5.31 6.03 9.22 86.74 0%
Ours (WS=1536) 6.50 7.04 7.27 16.78 16.42 0%
Ours (two-stage) 7.51 7.20 7.51 14.99 12.70 0%

Blue bold = best result. Underlined = second best. Highlighted row = our method.

Comparison with image-based matching methods at DC=512m.

Method Test CE (m) Latency (ms)
AnyLoc-VLAD-DINOv2 258.21 352,404
STGL-NetVLAD-ResNet50 89.31 7,180
STGL-GeM-ResNet50 13.52 4,919
Ours (one-stage) 15.90 35.2
Ours (two-stage) 12.12 63.9

Ablation Study

We investigate the effects of the Thermal Generative Module (TGM) and the relationship between satellite image size WS and search distance DC on alignment accuracy.

TGM effectiveness plot
Effectiveness of TGM. Validation MACE across different DC with and without TGM when WS=512.
Coarse alignment analysis
Coarse alignment analysis. Validation MACE vs. WS across various DC.

Robustness to Geometric Noise

Our two-stage method maintains accurate localization under rotation, resizing, and perspective transformation noise. Green = Ground Truth, Blue = Coarse Alignment, Red = Final Prediction.

Rotation Noise
One Stage
Rotation, 1-stage Rotation, 1-stage
Two Stages
Rotation, 2-stage Rotation, 2-stage
Resizing Noise
One Stage
Resize, 1-stage Resize, 1-stage
Two Stages
Resize, 2-stage Resize, 2-stage
Perspective Noise
One Stage
Perspective, 1-stage Perspective, 1-stage
Two Stages
Perspective, 2-stage Perspective, 2-stage

BibTeX


      @ARTICLE{xiao2024sthn,
        author={Xiao, Jiuhong and Zhang, Ning and Tortei, Daniel and Loianno, Giuseppe},
        journal={IEEE Robotics and Automation Letters},
        title={STHN: Deep Homography Estimation for UAV Thermal Geo-Localization With Satellite Imagery},
        year={2024},
        volume={9},
        number={10},
        pages={8754-8761},
        doi={10.1109/LRA.2024.3448129}}