SARLO-80: Worldwide Slant SAR Language Optic Dataset at 80 cm Resolution

Community Article Published December 1, 2025

SARLO-80 logo

Authors: Solène Debuysère1, Nicolas Trouvé1, Nathan Letheule1, Elise Colin1, Georgia Channing2

Affiliations:
1 ONERA – The French Aerospace Lab
2 Hugging Face

Satellite imagery has transformed the way we observe our planet. Most of the time, these images come from optical sensors, which capture the world in visible light, just like our eyes. But there is another way to observe the planet: Synthetic Aperture Radar (SAR). SAR uses microwaves instead of visible light and can capture images at any time of day, even through clouds or bad weather.

We curated raw Umbra SAR acquisitions to create the SARLO-80 (Slant SAR Language Optic, 80 cm) dataset, a structured, high-resolution multimodal resource optimized for AI and machine learning applications. By pairing SAR imagery with geometrically aligned optical data and natural-language descriptions, it bridges radar and vision–language domains.

Before outlining the processing steps, it’s helpful to briefly recall how SAR differs from conventional optical sensing.

Dataset repo: ONERA/SARLO-80

Optics vs Radar: Two Different Views of Earth

Optical and radar imaging provide two fundamentally different ways of observing the Earth’s surface. While optical imagery resembles natural photographs formed by visible light, Synthetic Aperture Radar (SAR) imagery is constructed from microwave echoes that interact with the physical and electromagnetic properties of the terrain. This difference affects every aspect of image acquisition, resolution, geometry, and interpretation.

1. Active and Passive Sensing

Unlike optical sensors that depend on sunlight and clear skies, SAR actively emits microwaves and can image the Earth even through clouds — a key advantage when over 60% of the planet is covered by clouds at any given time.

High-resolution SAR and optical comparison Figure 1: Example of Capella Image where Sequoia satellite of Brazil demonstrates how our high resolution SAR (left) can provide a clear view of deforestation, even when clouds obscure optical images (right).

2. Image Formation Principles

An optical image is a direct projection of light through a lens onto a sensor array. Radar imagery, by contrast, is reconstructed computationally from a sequence of radar echoes collected as the satellite moves along its orbit. By combining measurements over time, the system synthesizes a large “virtual” antenna — the synthetic aperture — which enables fine spatial resolution (see Figure 3).

In optical systems, spatial resolution depends primarily on the aperture size of the lens. In radar systems, it depends instead on signal frequency, bandwidth, and the distance traveled by the sensor during data acquisition. This distinction allows SAR satellites to achieve high resolution even with relatively compact antennas. This resolution is encoded in the size of the bright points, with each point corresponding approximately to the smallest distinguishable feature the radar can resolve.

3. Radar Geometry and Distortions

Optical and radar sensors observe the Earth from fundamentally different geometries. Optical systems capture images in a ground-projected plane (green plane in Figure 2), where each pixel corresponds directly to a point on the surface. In contrast, Synthetic Aperture Radar (SAR) acquires data in slant range geometry (orange plane in Figure 2), measuring distances along the radar’s line of sight. To make SAR and optical images geometrically comparable, one of them must be reprojected into the geometry of the other—or both into a common reference geometry—to achieve approximate geometric superposability, since perfect geometric superposition is physically impossible due to their distinct viewing geometries.

Figure 2: SAR geometry acquisition with slant-range and ground-range planes.
Figure 2: SAR geometry acquisition with slant-range and ground-range planes.

Furthermore, this oblique acquisition causes elevated terrain and tall structures to appear displaced toward the sensor, introducing geometric distortions such as:

  • Layover – Tall structures, such as mountains or buildings, appear to lean toward the radar because their upper parts return signals before their bases.
  • Foreshortening – Slopes facing the radar appear compressed because their top and bottom are illuminated almost simultaneously.
  • Shadowing – Areas hidden from the radar beam appear dark or unmeasured.

Figure 3: Comparison of optical vs. SAR image formation and distortions.
Figure 3: Comparison of optical vs. SAR image formation and distortions.

These effects are inherent to radar imaging and carry useful information about surface topography and orientation.

Figure 4: Example of layover in Copenhagen.
Figure 4: Example of layover in Copenhagen.

Figure 5: Example of volcano foreshortening.
Figure 5: Example of volcano foreshortening.

4. Coherence and Speckle Characteristics

SAR sensors record not only the amplitude of the backscattered signal but also its phase — the precise timing of the returned wave. This property makes radar data coherent, enabling advanced techniques such as polarimetry and interferometry (InSAR).

Coherence also produces a characteristic speckle pattern, visible as granular texture in SAR images. Speckle results from the constructive and destructive interference of radar signals scattered by multiple small targets within a single resolution cell. Although it may resemble noise, speckle is a deterministic phenomenon that contains information about the surface’s physical structure and scattering behavior.

5. Interpretation and Applications

Interpreting SAR imagery requires understanding that brightness corresponds to backscattering intensity rather than optical brightness or color. Highly reflective surfaces (e.g., rough terrain or metallic structures) appear bright, while smooth surfaces (e.g., calm water or flat soil) appear dark. Despite its more abstract appearance, SAR provides unique observational capabilities that complement optical data:

  • Surface deformation monitoring using interferometry
  • Mapping of soil moisture, vegetation, and ice dynamics
  • Detection of infrastructure, ships, and flood extents

Together, optical and radar observations form a comprehensive view of the Earth — optical systems providing intuitive visual context, and radar systems revealing structural, dynamic, and geophysical properties invisible to the human eye.

Creating the Umbra Dataset

Figure 6: Worldwide map of Umbra data.
Figure 6: Worldwide map of Umbra data.

Open data source: Umbra Open Data

Although radar offers remarkable sensing capabilities, it remains challenging to process. To make this data more accessible, we curated and transformed the open-source radar imagery collected by the Umbra satellite constellation into a machine-learning–ready format.

We started from around 2,500 Umbra SICD images acquired across the globe. These SAR scenes, captured in complex format and VV or HH polarization, span resolutions from 20 cm to 2 m and incidence angles between 10° and 70°. To standardize them, we refocused the spectrum and resampled all data to 80 cm × 80 cm in slant-range geometry, then split each large scene into overlapping 1,024 × 1,024 pixel patches.

To make the dataset multimodal, each SAR patch was paired with a high-resolution optical image projected into the radar’s slant-range geometry. This ensures pixel-level alignment between radar and optical imagery, even though the optical projection may show geometric distortions.

Figure 7: Example of optical and SAR pair (both in slant-range plane).
Figure 7: Example of optical and SAR pair (both in slant-range plane).

Finally, to extend the dataset to vision–language research, we generated three natural-language captions for each optical image (SHORT, MID, and LONG) using CogVLM2, refined and cleaned with Qwen LLM. For example, in Figure 7, the captions are:

  • SHORT:

    A satellite image of a dense forest with a winding road, multiple water bodies, and several buildings.

  • MID:

    A satellite image of a dense forested area featuring a winding road, multiple water bodies, and several structures, likely docks or industrial facilities.

  • LONG:

    A satellite image of a dense forested landscape with a winding road, numerous water bodies including a long canal with locks, and several buildings or facilities adjacent to the waterway.

The resulting collection contains about 119,566 triplets — each composed of a SAR crop, a co-registered optical crop, and text descriptions — forming a foundation for training multimodal models that jointly understand radar, optical, and language data.

The dataset is available on Hugging Face under:
ONERA/SARLO-80

Applications of SAR and AI

The Umbra SAR Dataset brings together SAR, optical, and textual data in a standardized, multimodal format, opening new possibilities for AI applications such as:

  • Classification
  • Segmentation
  • Change detection
  • Generative modeling

By combining radar’s all-weather, structural insights with optical imagery’s intuitive visual information, the dataset supports research across diverse domains — from monitoring crop health and soil moisture in agriculture, to rapid disaster assessment, urban growth tracking, and environmental studies like deforestation and glacier movement. This complementary approach enables AI models to learn richer, more resilient representations of the Earth, demonstrating how radar and optical imagery together provide a deeper understanding of our planet.

Conclusion

The Umbra SAR Dataset was built with one goal: to make radar more accessible for AI. By aligning high-resolution SAR with optical imagery and natural-language descriptions, it provides a foundation for new models that can interpret radar’s unique perspective and connect it to human-understandable concepts.

Acknowledgments

This work was carried out as part of the PhD of Solène Debuysère at DEMR-ONERA – Université de Paris Saclay, under the supervision of Nicolas Trouvé, Nathan Letheule, and Elise Colin. We gratefully acknowledge ONERA, and especially DEMR-SEM and Olivier Lévêque Team for providing computational and research resources, Umbra for the SAR data collections and open-access initiatives enabling research use (https://umbra.space/open-data/), and Hugging Face, in particular Georgia Channing, for her help on this project.

Contacts

If you have any questions or would like to contribute, don't hesitate to contact us:

Community

Sign up or log in to comment