Spatial Analysis
Last updated on 2025-12-05 | Edit this page
Estimated time: 101 minutes
Overview
Questions
- What is PySAL and what can it do for spatial analysis?
- How do we compute spatial weights and perform spatial autocorrelation?
- How do we interpret results like Moran’s I?
Objectives
- Understand the purpose of PySAL in spatial data science
- Learn how to load spatial data using GeoPandas
- Construct spatial weight matrices
- Compute Global Moran’s I using PySAL
- Visualize spatial clustering and spatial autocorrelation
Why is PySAL Important?
PySAL (Python Spatial Analysis Library) is one of the most widely used toolkits for working with spatial data in Python. Unlike traditional statistical libraries, PySAL is designed specifically for datasets where location matters — where observations influence nearby observations, and spatial patterns may not be random.
Geographers, urban planners, environmental scientists, epidemiologists, and data analysts use PySAL to identify spatial relationships, detect clustering, and build models that incorporate proximity and geography.
What PySAL Helps Us Understand
- Where events cluster or disperse across space
- Whether high or low values form hotspots or
coldspots
- How neighborhoods influence one another (spatial dependency)
- Spatial inequality patterns in income, population, crime, disease,
etc.
- Geographic diffusion (wildfire spread, disease transmission,
migration flows)
- Environmental change and land-use impacts
Why Researchers Use PySAL
- Built for spatial statistics — tools that general
libraries lack
- Easy integration with GeoPandas, raster data, and
shapefiles
- Provides standard spatial methods such as:
- Spatial weights (Queen, Rook, KNN, Distance-based)
- Global & Local Moran’s I (LISA)
- Spatial clustering & hotspot detection
- Spatial regression models
- Spatial weights (Queen, Rook, KNN, Distance-based)
- Enables data-driven decision making in
geography
- Scales from local studies to large regional/global analyses
- Helps test spatial hypotheses scientifically instead of visually
Spatial Analysis in Context
Spatial analysis answers questions like:
| Question | PySAL Method |
|---|---|
| Do areas with high values cluster together? | Moran’s I |
| Where are hotspots located? | Local Moran / LISA maps |
| What counts as a neighbor? | Spatial weights matrices |
| Are patterns random or significant? | Monte Carlo permutation tests |
| How do variables influence each other across space? | Spatial regression |
PySAL makes these methods accessible in Python, allowing analysts to move from maps to statistical evidence — revealing underlying spatial patterns that are not visible from visualization alone.
Introduction
PySAL is the Python Spatial Analysis Library — a powerful, open-source toolkit for working with spatial data. It provides tools for:
- spatial weights
- spatial autocorrelation
- clustering
- spatial regression
- neighborhood analysis
This tutorial introduces the core PySAL workflow, closely following the structure used in your uploaded notebook.
We will cover:
- Loading polygon or point data
- Building spatial weights
- Running Global Moran’s I
- Visualizing results
This tutorial assumes basic familiarity with pandas,
geopandas, and Python.
What you need to know for Carpentries lessons:
-
questionsprime the learner for the lesson. -
objectivesstate what skills will be gained. -
keypointssummarize what was learned.
Learners may struggle initially with spatial weights (rook, queen, k-nearest). Spend extra time walking through simple diagrams before showing code.
1. Loading Spatial Data
PySAL works seamlessly with GeoPandas.
Here’s a simple example using a polygon shapefile:
Plot the boundaries:
This ensures the geometry is valid and loads correctly.
2. Building Spatial Weights
Spatial weights define who is a neighbor of whom.
PySAL includes:
Rook contiguity
Queen contiguity
K-nearest neighbors
Distance-based neighbors
3. Global Moran’s I
Moran’s I measures global spatial autocorrelation:
Positive values → clustering
Negative values → dispersion
Near zero → random pattern
Assume the dataset has a numeric column value:
View the results:
Plot the Moran scatterplot:
4. Local Moran’s I (Outlier Analysis)
Local Moran’s I finds hotspots and coldspots.
Add LISA quadrant labels to the GeoDataFrame:
Map the clusters:
This creates a basic LISA cluster map.
Queen neighbors may include diagonal touches. Rook neighbors require shared edges only. You should see fewer rook neighbors than queen neighbors.
Challenge (continued)
Challenge 2: Compute Moran’s I on a New Variable
Choose any numeric variable in your dataset:
Extract the variable
Compute Moran’s I
Interpret whether clustering exists
A positive Moran’s I with low p-value → strong clustering. Near zero → randomness. Negative → spatial dispersion.
Math
Global Moran’s I is defined as:
$ I = \frac{N}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})} {\sum_i (x_i - \bar{x})^2} $
Where:
N = number of observations
W = sum of all spatial weights
w_ij = weight between units i and j
x = variable of interest
PySAL provides tools for weights, autocorrelation, clustering, and modeling
Queen and rook weights define spatial neighbors differently
Moran’s I measures global autocorrelation
Local Moran (LISA) identifies hotspots and coldspots
GeoPandas and PySAL together form a powerful spatial analysis workflow