..
How To Maximally Compress Geotiffs?
Large satellite images can be difficult to transfer or visualize depending on internet speed and specs. This may result in slow data transfer/inspection/visualization when conducting data processing from a VM or attempting to share the imagery.
As we iterate on the data processing pipeline, we can minimize the resulting GeoTIFF
using the following simple techniques:
- Select one band.
- Coarsen the resolution
N
times. - Use JPEG 2000 compression.
- Use
uint8
data type.
I share a function that takes an input image path, applies the above steps, and saves the resulting image:
from pathlib import Path
import rasterio
from rasterio.enums import Resampling
def compress_image(in_path, out_path, res_factor=10, bands=[1]):
"""
Compresses and resamples a raster image, saving the result in JPEG 2000 format.
This function takes an input raster file, resamples it by a specified factor, and compresses it using JPEG 2000. It is designed to work on the first band of the input raster. If the original data is not uint8, the function performs min-max normalization and converts it to uint8 before saving.
Parameters:
- in_path (str or Path): Path to the input raster file.
- out_path (str or Path): Path where the output raster file will be saved.
- res_factor (int): The factor by which to coarsen the resolution of the image. For example, a `res_factor` of 5 will reduce both the width and height of the image by a factor of 5. Default is 5.
Returns:
- None. The result is saved to the `out_path` as a JPEG 2000 compressed image.
"""
# Convert to pathlib.Path
in_path = Path(in_path)
out_path = Path(out_path)
# Verify that the input file exists
assert in_path.exists(), "The input file does not exist."
# Open the source image
with rasterio.open(in_path) as src:
# Calculate the new resolution
new_width = src.width // res_factor
new_height = src.height // res_factor
# Read the data from the first band and resample it
data = src.read(
bands,
out_shape=(
new_height,
new_width
),
resampling=Resampling.nearest
)
# Update the transformation for the new resolution
transform = src.transform * src.transform.scale(
(src.width / data.shape[-1]),
(src.height / data.shape[-2])
)
# Prepare new metadata for the output file
out_meta = src.meta.copy()
out_meta.update(
driver="JP2OpenJPEG",
height=new_height,
width=new_width,
transform=transform,
compress="JPEG 2000",
dtype="uint8",
count=len(bands),
crs=src.crs,
predictor=2
)
# If the original data is not uint8, conduct min-max normalization and convert to uint8
if src.dtypes[0] != 'uint8':
data = (data - data.min()) / (data.max() - data.min()) * 255
data = data.astype("uint8")
# Write the resampled data to the output file
with rasterio.open(out_path, "w", **out_meta) as dest:
if len(bands) > 1: dest.write(data)
else: dest.write(data[bands[0]], 1)
Notes:
- If the input raster’s data type is not
uint8
, min-max normalization is applied to convert the data. This may lead to a loss of information, especially if the original data contains a wide range of values. - The function does not explicitly handle
nodata
values during resampling and normalization. - If
nodata
values are present, they may affect the normalization calculation and result in alterednodata
representations in the output.