This post provides a summary of some of the most important overhead imagery datasets for object detection. The aim of this post is to be a living document where I continue to add new datasets as they are released.
Table of contents
- Overhead Imagery Datasets Overview
Overhead Imagery Datasets Overview
|Dataset Name||Total Number of Objects||Number of Images||Number of Categories||Image Size||Resolution||Annotation Type||Source||Year Released||Restrictions|
|DIOR||192,472||23,463||20||large||TBD||Horizontal Bounding Boxes||TBD||2020||None|
|DOTA||188,282||2,806||15||387 X 455 - 4096 X 7168||mostly 20-40cm (see below)||Rotated and Horizontal Bounding Boxes||Google Earth (mostly) and satellites||2018||Academic purposes only; any commercial use is prohibited|
|XVIEW||1,000,000+||TBD||60||large||30 cm||Horizontal Bounding Boxes||WorldView-3 satellites||2018||Non-commercial use|
|NWPU VHR-10||3,651||800||10||large||8-200cm||Horizontal Bounding Boxes||Google Earth and Vaihingen dataset||2016||Research purposes only|
|COWC||32,716||TBD||1 (cars)||large||15 cm||Center Points||TBD||2016||None|
DIOR is a huge dataset with ten times the number of images as DOTA, although a similar number of objects. It is the most recent dataset on the list.
Airplane, Airport, Baseball field, Basketball court, Bridge, Chimney, Dam, Expressway service area, Expressway toll station, Harbor, Golf course, Ground track field, Overpass, Ship, Stadium, Storage tank, Tennis court, Train station, Vehicle, and Windmill
DOTA is a large dataset that combines aerial and satellite imagery. It combines different sensors and platforms.
(abbreviations used on leaderboard are shown in parentheses)
Plane, Ship, Storage Tank (ST), Baseball Diamond (BD), Tennis Court (TC), Basketball Court (BC), Ground Track Field (GTF), Harbor, Bridge, Large Vehicle (LV), Small Vehicle (SV), Helicopter (HC), Roundabout (RA), Soccer Ball Field (SBF), Basketball Court
Here are histograms and boxplots of the ground sample distance for images in the dataset (when provided). Outliers have been excluded from the box plot for clarity.
The xView dataset contains over 1 million objects across 60 classes covering over 1,400 km^2. Objects in xView vary in size from 3 meters (10 pixels) to greater than 3,000 meters (10,000 pixels).
They use ontological labels, which I like. Although for datasets that don’t, this same idea could be done after the fact.
Aircraft Hangar, Barge, Building, Bus, Cargo Truck, Cargo/container Car, Cement Mixer, Construction Site, Container Crane, Container Ship, Crane Truck, Damaged/demolished Building, Dump Truck, Engineering Vehicle, Excavator, Facility, Ferry, Fishing Vessel, Fixed-wing Aircraft, Flat Car, Front Loader/bulldozer, Ground Grader, Haul Truck, Helicopter, Helipad, Hut/tent, Locomotive, Maritime Vessel, Mobile Crane, Motorboat, Oil Tanker, Passenger Vehicle, Passenger Car, Passenger/cargo Plane, Pickup Truck, Pylon, Railway Vehicle, Reach Stacker, Sailboat, Shed, Shipping Container, Shipping Container Lot, Small Aircraft, Small Car, Storage Tank, Straddle Carrier, Tank Car, Tower, Tower Crane, Tractor, Trailer, Truck, Truck Tractor, Truck Tractor W/ Box Trailer, Truck Tractor W/ Flatbed Trailer, Truck Tractor W/ Liquid Tank, Tugboat, Utility Truck, Vehicle Lot, Yacht
They do three stages of quality control, including a mix of manual and automated checks and a comparison with a gold standard (hand-annotated by experts) dataset. In order to pass expert quality control, the batch was required to have a precision of 0.75 and recall of 0.95 at 0.5 intersection over union (IoU) when compared to the gold standard.
However, despite these efforts the xView dataset still has considerable noise. I think that part of the problem is that a 0.75 precision requirement for ground truth data isn’t very high. The winning solution on the xView dataset challenge noted that using focal loss became problematic because “these exponentially higher weights lead to an extreme effect of hard and mislabeled samples”. It seemed like part of their solution was of a loss function that worked well for messy imagery. Other researchers have noted that the mislabeled data affected their model performance as well.
This dataset does well for geographic diversity.
The images in this dataset, like most satellite images, were preprocessed by performing orthorectification, pan-sharpening, and atmospheric correction.
This dataset was released under a noncommercial license. See the xView dataset rules for more information.
Northwestern Polytechnical University Very High Resolution-10
Airplane, Ship, Storage Tank, Baseball Diamond, Tennis Court, Basketball Court, Ground Track Field, Harbor, Bridge, Vehicle
According to the website this dataset was manually annotated by experts, so the noise should be low.
150 of the 800 images are background only (no objects).
These images are from Google Earth and Vaihingen data set. The Vaihingen data was provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF).
Cars Overhead With Context
Data is from six different locations:
- Toronto, Canada
- Selwyn, New Zealand
- Potsdam, Germany
- Vaihingen, Germany
- Columbus, Ohio
The imagery from Vaihingen, Germany and Columbus, Ohio is in grayscale.