Introduction to Raster Data


  • Raster data is pixelated data where each pixel is associated with a specific location.
  • Raster data always has an extent and a resolution.
  • The extent is the geographical area covered by a raster.
  • The resolution is the area covered by each pixel of a raster.

Introduction to Vector Data


  • Vector data structures represent specific features on the Earth’s surface along with attributes of those features.
  • Vector data is often interpreted data and collected for a different purpose than you would want to use it for.
  • Vector objects are either points, lines, or polygons.

Coordinate Reference Systems


  • All geospatial datasets (raster and vector) are associated with a specific coordinate reference system.
  • A coordinate reference system includes datum, projection, and additional parameters specific to the dataset.
  • All maps are distored because of the projection.

The Geospatial Landscape


  • Many software packages exist for working with geospatial data.
  • Command-line programs allow you to automate and reproduce your work.
  • JupyterLab provides a user-friendly interface for working with Python.

Access satellite imagery using Python


  • Accessing satellite images via the providers’ API enables a more reliable and scalable data retrieval.
  • STAC catalogs can be browsed and searched using the same tools and scripts.
  • rioxarray allows you to open and download remote raster files.

Read and visualize raster dataResampling the raster image


  • rioxarray and xarray are for working with multidimensional arrays like pandas is for working with tabular data.
  • rioxarray stores CRS information as a CRS object that can be converted to an EPSG code or PROJ4 string.
  • Missing raster data are filled with nodata values, which should be handled with care for statistics and visualization.

Vector data in Python


  • Load spatial objects into Python with geopandas.read_file() function.
  • Spatial objects can be plotted directly with GeoDataFrame’s .plot() method.
  • Convert CRS of spatial objects with .to_crs(). Note that this generates a GeoSeries object.
  • Create a buffer of spatial objects with .buffer().
  • Merge spatial objects with pd.concat().

Crop raster data with rioxarray and geopandas


  • Use clip_box to crop a raster with a bounding box.
  • Use clip to crop a raster with a given polygon.
  • Use reproject_match to match two raster datasets.

Raster Calculations in Python


  • Python’s built-in math operators are fast and simple options for raster math.

Calculating Zonal Statistics on Rasters


  • Zones can be extracted by attribute columns of a vector dataset
  • Zones can be rasterized using rasterio.features.rasterize
  • Calculate zonal statistics with xrspatial.zonal_stats over the rasterized zones.

Parallel raster computations using Dask


  • The %%time Jupyter magic command can be used to profile calculations.
  • Data ‘chunks’ are the unit of parallelization in raster calculations.
  • (rio)xarray can open raster files as chunked arrays.
  • The chunk shape and size can significantly affect the calculation performance.
  • Cloud-optimized GeoTIFFs have an internal structure that enables performant parallel read.

Data cubes with ODC-STAC