Summary and Schedule
Data Carpentries
Data Carpentry’s teaching is hands-on. Participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.
Geospatial Raster and Vector Data with Python
In this lesson you will learn how to work with geospatial data and how to process these with python. Python is one of the most popular programming languages for data science and analytics, with a large and steadily growing community in the field of Earth and Space Sciences. The lesson is meant for participants with a working basic knowledge of Python and allow them to to familiarize with the world of geospatial raster and vector data. (If you are unfamiliar to python we recommend you to follow this course or have a look here ). In the Introduction to Geospatial Raster and Vector Data with Python lesson you will be introduced to a set of tools from the Python ecosystem and learn how these can be used to carry out geospatial data analysis tasks. In particular, you will learn to work with satellite images (i.e. the Copernicus Sentinel-2 mission ) and open topographical geo-datasets (i.e. OpenStreetmap). You will learn how these spatial datasets can be accessed, explored, manipulated and visualized using Python.
Case study - Wildfires
As a case study for this lesson we will focus on wildfires. According to the IPCC assessment report, the wildfire seasons are lengthening as a result of changes in temperature and increasing drought conditions IPCC. To analyse the impact of these wildfires, we will focus on the wildfire that occured on the Greek island Rhodes in the summer of 2023, which had a devastating effect and led to the evacuation of 19.000 people. In this lesson we are going analyse the effect of this disaster by estimating which built-up areas were affected by these wildfires. Furthermore, we will analyse which vegetation and land-use types have been affected the most by the wildfire in order to get an understanding of which areas are more vulnerable to wildfires. Finally we are going to estimate which locations would be most suitable for placing watchtowers in the region. The analysis that we set up provides insights in the effect of the wildfire and generates input for wildfire mitigation strategies.
Note, that the analyses presented in this lesson are developed for educational purposes. Therefore in some occasions the analysis steps have been simplified and assumptions have been made.
The data used in this lesson includes optical satellite images from the Copernicus Sentinel-2 mission and topographical data from OpenStreetMap (OSM). These datasets are real-world open data sets that entail sufficient complexity to teach many aspects of data analysis and management. The datasets have been selected to allow participants to focus on the core ideas and skills being taught while offering the chance to encounter common challenges with geospatial data. Furthermore, we have selected datasets which are available anywhere on earth.
During this lesson we will setup an analysis pipeline which identifies scorched areas based bands of satellite images collected after the disaster in july 2023. Next, we will analyse the vegetation type, by calculating the NDVI index, that was present in these areas before the wildfire by looking at satellite images before the disaster and compare them with the scorched areas. To confront the effected built-up areas and most important roads, we will be using OSM vector data and compare that with the scorched areas identified above. Finally, we will use elevation data to perform viewshed analyses in order to locate the best locations for (hypothetical) watchtowers, which in theory would allow to identify a wildfire earlier, thus allowing to react more quickly.
To most effectively use these materials, make sure to download the data and install everything before working through this lesson (this especially accounts for learners that follow this lesson in a workshop).
Python libraries used in this lesson
The main python libraries that are used in this lesson are:
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to Raster Data |
What format should I use to represent my data? What are the main data types used for representing geospatial data? What are the main attributes of raster data? ::: |
Duration: 00h 20m | 2. Introduction to Vector Data |
What are the main attributes of vector data? ::: |
Duration: 00h 35m | 3. Coordinate Reference Systems |
What is a coordinate reference system and how do I interpret
one? ::: |
Duration: 01h 00m | 4. The Geospatial Landscape |
What programs and applications are available for working with geospatial
data? ::: |
Duration: 01h 10m | 5. Access satellite imagery using Python |
Where can I find open-access satellite data? How do I search for satellite imagery with the STAC API? How do I fetch remote raster datasets using Python? ::: |
Duration: 01h 55m | 6. Read and visualize raster data |
How is a raster represented by rioxarray? How do I read and plot raster data in Python? How can I handle missing data? ::: |
Duration: 03h 35m | 7. Vector data in Python |
How can I read, inspect, and process spatial objects, such as points,
lines, and polygons? ::: |
Duration: 04h 25m | 8. Crop raster data with rioxarray and geopandas |
How can I crop my raster data to the area of interest? ::: |
Duration: 05h 05m | 9. Raster Calculations in Python |
How do I perform calculations on rasters and extract pixel values for
defined locations? ::: |
Duration: 06h 20m | 10. Calculating Zonal Statistics on Rasters |
How to compute raster statistics on different zones delineated by vector
data? ::: |
Duration: 07h 00m | 11. Parallel raster computations using Dask |
How can I parallelize computations on rasters with Dask? How can I determine if parallelization improves calculation speed? What are good practices in applying parallelization to my raster calculations? ::: |
Duration: 07h 55m | 12. Data cubes with ODC-STAC |
a ::: |
Duration: 08h 40m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Data Sets
- Create a new directory on your Desktop called
geospatial-python
. - Within
geospatial-python
, create a directory calleddata
. - Download the data required for this lesson via this link (612MB).
- Unzip downloaded files and save them to the just created
data
directory.
Now you should have the following files in the data
directory:
-
sentinel-2
- This is a directory containing multiple bands of Sentinel-2 raster images over the island of Rhodes, on Aug 27, 2023. -
dem/rhodes_dem.tif
- This is the Digital Elevation Model (DEM) of the island of Rhodes, retrieved from Copernicus Digital Elevation Model (GLO-30 instance) and modified for this course. -
gadm/ADM_ADM_3.gpkg
- This is the administration boundaries of Rhodes, downloaded from GADM and modified for this course. -
osm/osm_landuse.gpkg
andosm/osm_roads.gpkg
- They are landuse poylgons and roads polylines of Rhodes, downloaded from Openstreetmaps via Geofabrik and modified for this course.
Software Setup
Installing Python Using Anaconda
Python is a popular language for scientific computing, and great for general-purpose programming as well. Installing all of its scientific packages individually can be a bit difficult, however, so we recommend the all-in-one installer Anaconda.
Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.9 is fine). Also, please set up your python environment at least a day in advance of the workshop. If you encounter problems with the installation procedure, ask your workshop organizers via e-mail for assistance so you are ready to go as soon as the workshop begins.
Installing Anaconda
Open https://www.anaconda.com/download with your web browser.
Download the Anaconda for Windows installer with Python 3.
Install Python 3 by running the Anaconda Installer, using all of the defaults for installation except make sure that:
- Register Anaconda as my default Python 3.x option is checked (it should be in the latest version of Anaconda).
- Add Anaconda to my PATH environment variable is selected.
Open https://www.anaconda.com/download with your web browser.
Download the Anaconda installer with Python 3 for OS X. These instructions assume that you use the “Graphical Installer”
.pkg
fileFollow the Python 3 installation instructions. Make sure that the install location is set to Install only for me so Anaconda will install its files locally, relative to your home directory. Installing the software for all users tends to create problems in the long run and should be avoided.
Note that the following installation steps require you to work from the shell. If you run into any difficulties, please request help before the workshop begins.
Open https://www.anaconda.com/download with your web browser.
Download the Anaconda installer with Python 3 for Linux.
Open a terminal window and navigate to the directory where the executable is downloaded (e.g.,
cd ~/Downloads
).-
Type:
and press “Tab” to autocomplete the full file name. The name of file you just downloaded should appear.
Press “Enter” (or “Return” depending on your keyboard).
Follow the text-only prompts. When the license agreement appears (a colon will be present at the bottom of the screen) press “Spacebar” until you see the bottom of the text. Type
yes
and press “Enter” to approve the license. Press “Enter” again to approve the default location for the files. Typeyes
and press “Enter” to prepend Anaconda to yourPATH
(this makes the Anaconda distribution your user’s default Python).Close the terminal window.
Setting up the workshop environment
If Anaconda was properly installed, you should have access to the
conda
command in your terminal (use the Anaconda
prompt on Windows).
-
Test that
conda
is correctly installed by typing:which should print the version of conda that is currently installed, e.g. :
OUTPUT
conda 22.9.0
-
Run the following command:
IMPORTANT: If your terminal responds to the above command with
conda: command not found
see the Troubleshooting section. -
Create the Python environment for the workshop by running:
The main python libraries that are used in this lesson are:
In order to not have to install all these libraries and their dependencies seperately, a .yaml configuration file has been created which will install all of them automatically. (If you still want to install all these packages seperately you could use
pip
. However, the easiest way is just to execute the .yaml file run the following command:BASH
mamba env create -n geospatial -f https://raw.githubusercontent.com/carpentries-incubator/geospatial-python/main/files/environment.yaml
Note that this step can take several minutes.
-
When installation has finished you should see the following message in the terminal:
OUTPUT
# To activate this environment, use # $ conda activate geospatial # # To deactivate an active environment, use # $ conda deactivate
-
Now Activate the
geospatial
environment by running:
If successful, the text (base)
in your terminal prompt
will now read (geospatial)
indicating that you are now in
the Anaconda virtual environment named geospatial
. The
command which python
should confirm that we’re using the
Python installation in the geospatial
virtual environment.
For example:
OUTPUT
/Users/your-username/anaconda3/envs/geospatial/bin/python
IMPORTANT: If you close the terminal, you will need to reactivate
this environment with conda activate geospatial
to use the
Python libraries required for the lesson and to start JupyterLab, which
is also installed in the geospatial
environment.
Starting JupyterLab
In order to follow the lesson, you should launch JupyterLab. After activating the geospatial conda environment, enter the following command in your terminal (use the Anaconda prompt on Windows):
Once you have launched JupyterLab, create a new Python 3 notebook, type the following code snippet in a cell and press the “Play” button:
If all the steps above completed successfully you are ready to follow along with the lesson!
Troubleshooting conda: command not found
-
Mac OS and Linux users:
-
First, find out where Anaconda is installed.
The typical install location is in your
$HOME
directory (i.e.,/Users/your-username/
) so usels ~
to check whether ananaconda3
directory is present in your home directory:OUTPUT
Applications Downloads Pictures anaconda3 Library Public Desktop Movies Documents Music
If, like above, you see a directory called
anaconda3
in the output we’re in good shape. If not, contact the instructor for help. -
Activate the
conda
command-line program by entering the following command:If all goes well, nothing will print to the terminal and your prompt will now have
(base)
floating around somewhere on the left. This is an indication that you are in the base Anaconda environment.Continue from the beginning of step 3 to complete the creation of the
geospatial
virtual environment.
-