Introduction to Raster Data
- Raster data is pixelated data where each pixel is associated with a specific location.
- Raster data always has an extent and a resolution.
- The extent is the geographical area covered by a raster.
- The resolution is the area covered by each pixel of a raster.
Introduction to Vector Data
- Vector data structures represent specific features on the Earth’s surface along with attributes of those features.
- Vector data is often interpreted data and collected for a different purpose than you would want to use it for.
- Vector objects are either points, lines, or polygons.
Coordinate Reference Systems
- All geospatial datasets (raster and vector) are associated with a specific coordinate reference system.
- A coordinate reference system includes datum, projection, and additional parameters specific to the dataset.
- All maps are distored because of the projection.
The Geospatial Landscape
- Many software packages exist for working with geospatial data.
- Command-line programs allow you to automate and reproduce your work.
- JupyterLab provides a user-friendly interface for working with Python.
Access satellite imagery using Python
- Accessing satellite images via the providers’ API enables a more reliable and scalable data retrieval.
- STAC catalogs can be browsed and searched using the same tools and scripts.
-
rioxarray
allows you to open and download remote raster files.
Read and visualize raster dataResampling the raster image
-
rioxarray
andxarray
are for working with multidimensional arrays like pandas is for working with tabular data. -
rioxarray
stores CRS information as a CRS object that can be converted to an EPSG code or PROJ4 string. - Missing raster data are filled with nodata values, which should be handled with care for statistics and visualization.
Vector data in Python
- Load spatial objects into Python with
geopandas.read_file()
function. - Spatial objects can be plotted directly with
GeoDataFrame
’s.plot()
method. - Convert CRS of spatial objects with
.to_crs()
. Note that this generates aGeoSeries
object. - Create a buffer of spatial objects with
.buffer()
. - Merge spatial objects with
pd.concat()
.
Crop raster data with rioxarray and geopandas
- Use
clip_box
to crop a raster with a bounding box. - Use
clip
to crop a raster with a given polygon. - Use
reproject_match
to match two raster datasets.
Raster Calculations in Python
- Python’s built-in math operators are fast and simple options for raster math.
Calculating Zonal Statistics on Rasters
- Zones can be extracted by attribute columns of a vector dataset
- Zones can be rasterized using
rasterio.features.rasterize
- Calculate zonal statistics with
xrspatial.zonal_stats
over the rasterized zones.
Parallel raster computations using Dask
- The
%%time
Jupyter magic command can be used to profile calculations. - Data ‘chunks’ are the unit of parallelization in raster calculations.
- (
rio
)xarray
can open raster files as chunked arrays. - The chunk shape and size can significantly affect the calculation performance.
- Cloud-optimized GeoTIFFs have an internal structure that enables performant parallel read.