This post provides a summary of some of the most important overhead imagery datasets for object detection. The aim of this post is to be a living document where I continue to add new datasets as they are released.

Table of Contents

Overhead Imagery Datasets Overview

Dataset Name Total Number of Objects Number of Images Number of Categories Image Size Resolution Annotation Type Source Year Released Restrictions
DIOR 192,472 23,463 20 large TBD Horizontal Bounding Boxes TBD 2020 None
DOTA 188,282 2,806 15 387 X 455 - 4096 X 7168 mostly 20-40cm (see below) Rotated and Horizontal Bounding Boxes Google Earth (mostly) and satellites 2018 Academic purposes only; any commercial use is prohibited
XVIEW 1,000,000+ TBD 60 large 30 cm Horizontal Bounding Boxes WorldView-3 satellites 2018 Non-commercial use
NWPU VHR-10 3,651 800 10 large 8-200cm Horizontal Bounding Boxes Google Earth and Vaihingen dataset 2016 Research purposes only
COWC 32,716 TBD 1 (cars) large 15 cm Center Points TBD 2016 None



DIOR is a huge dataset with ten times the number of images as DOTA, although a similar number of objects. It is the most recent dataset on the list.

Academic paper

Object detection in optical remote sensing images: A survey and a new benchmark


Airplane, Airport, Baseball field, Basketball court, Bridge, Chimney, Dam, Expressway service area, Expressway toll station, Harbor, Golf course, Ground track field, Overpass, Ship, Stadium, Storage tank, Tennis court, Train station, Vehicle, and Windmill


DOTA is a large dataset that combines aerial and satellite imagery. It combines different sensors and platforms.

Academic paper

DOTA: A Large-scale Dataset for Object Detection in Aerial Images


(abbreviations used on leaderboard are shown in parentheses)

Plane, Ship, Storage Tank (ST), Baseball Diamond (BD), Tennis Court (TC), Basketball Court (BC), Ground Track Field (GTF), Harbor, Bridge, Large Vehicle (LV), Small Vehicle (SV), Helicopter (HC), Roundabout (RA), Soccer Ball Field (SBF), Basketball Court


DOTA Leaderboard

GSD Distribution

Here are histograms and boxplots of the ground sample distance for images in the dataset (when provided). Outliers have been excluded from the box plot for clarity.






The xView dataset contains over 1 million objects across 60 classes covering over 1,400 km^2. Objects in xView vary in size from 3 meters (10 pixels) to greater than 3,000 meters (10,000 pixels).

Academic paper

xView: Objects in Context in Overhead Imagery


They use ontological labels, which I like. Although for datasets that don’t, this same idea could be done after the fact.

Aircraft Hangar, Barge, Building, Bus, Cargo Truck, Cargo/container Car, Cement Mixer, Construction Site, Container Crane, Container Ship, Crane Truck, Damaged/demolished Building, Dump Truck, Engineering Vehicle, Excavator, Facility, Ferry, Fishing Vessel, Fixed-wing Aircraft, Flat Car, Front Loader/bulldozer, Ground Grader, Haul Truck, Helicopter, Helipad, Hut/tent, Locomotive, Maritime Vessel, Mobile Crane, Motorboat, Oil Tanker, Passenger Vehicle, Passenger Car, Passenger/cargo Plane, Pickup Truck, Pylon, Railway Vehicle, Reach Stacker, Sailboat, Shed, Shipping Container, Shipping Container Lot, Small Aircraft, Small Car, Storage Tank, Straddle Carrier, Tank Car, Tower, Tower Crane, Tractor, Trailer, Truck, Truck Tractor, Truck Tractor W/ Box Trailer, Truck Tractor W/ Flatbed Trailer, Truck Tractor W/ Liquid Tank, Tugboat, Utility Truck, Vehicle Lot, Yacht



Label Noise

They do three stages of quality control, including a mix of manual and automated checks and a comparison with a gold standard (hand-annotated by experts) dataset. In order to pass expert quality control, the batch was required to have a precision of 0.75 and recall of 0.95 at 0.5 intersection over union (IoU) when compared to the gold standard.

However, despite these efforts the xView dataset still has considerable noise. I think that part of the problem is that a 0.75 precision requirement for ground truth data isn’t very high. The winning solution on the xView dataset challenge noted that using focal loss became problematic because “these exponentially higher weights lead to an extreme effect of hard and mislabeled samples”. It seemed like part of their solution was of a loss function that worked well for messy imagery. Other researchers have noted that the mislabeled data affected their model performance as well.


This dataset does well for geographic diversity.


The images in this dataset, like most satellite images, were preprocessed by performing orthorectification, pan-sharpening, and atmospheric correction.

This dataset was released under a noncommercial license. See the xView dataset rules for more information.


Northwestern Polytechnical University Very High Resolution-10

Academic papers

Multi-class geospatial object detection and geographic image classification based on collection of part detectors (Paywall)

A survey on object detection in optical remote sensing images (Paywall)

Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images (Paywall)


Airplane, Ship, Storage Tank, Baseball Diamond, Tennis Court, Basketball Court, Ground Track Field, Harbor, Bridge, Vehicle

Label Noise

According to the website this dataset was manually annotated by experts, so the noise should be low.


150 of the 800 images are background only (no objects).

These images are from Google Earth and Vaihingen data set. The Vaihingen data was provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF).


Cars Overhead With Context

Academic paper

A Large Contextual Dataset for Classification, Detection and Counting of Cars with Deep Learning




Data is from six different locations:

  • Toronto, Canada
  • Selwyn, New Zealand
  • Potsdam, Germany
  • Vaihingen, Germany
  • Columbus, Ohio
  • Utah

The imagery from Vaihingen, Germany and Columbus, Ohio is in grayscale.