Content from Census Geocoding
Last updated on 2026-01-14 | Edit this page
Overview
Questions
- What is geocoding and why is it essential for census analysis?
- How can we convert addresses into spatial coordinates?
- How do we combine census data with OpenStreetMap features?
- How can spatial context improve demographic analysis?
Objectives
- Understand what geocoding is and how it works
- Convert address-based census data into geographic coordinates
- Query OpenStreetMap (OSM) features using Python
- Combine census points with OSM layers for spatial analysis
- Visualize geocoded census data alongside urban infrastructure
Introduction
Census and demographic datasets are often non-spatial — they exist as tables containing addresses, place names, or administrative units. To analyze these data geographically, we must first geocode them: converting text-based locations into latitude and longitude coordinates.
Once census data are geocoded, they can be enriched with contextual information from OpenStreetMap (OSM), such as roads, buildings, parks, schools, or hospitals. This enables deeper spatial insights into population distribution, accessibility, and urban structure.
In this lesson, you will learn how to:
- Geocode address-based census data
- Convert results into spatial objects
- Query OpenStreetMap features
- Visualize census data in its geographic context
Why Census Geocoding Matters ?
Census data becomes far more powerful when location is explicitly included. Geocoding allows researchers to move from spreadsheets to spatial insight.
What Census Geocoding Helps Us Understand
- Population distribution and density patterns
- Access to services (schools, hospitals, transit, parks)
- Spatial inequality and environmental justice
- Urban growth and land-use change
- Neighborhood-level demographic trends
- Relationships between people and infrastructure
Why Researchers Combine Census Data with OSM
- Census data provides who and what
- OpenStreetMap provides where and how
- Together, they enable:
- Accessibility studies
- Urban planning analysis
- Public health assessments
- Infrastructure equity evaluations
- Place-based policy analysis
Geocoding transforms census data from static tables into spatial evidence.
1. Installing Required Libraries
2. Load Census Data or Address Data
This dataset should contain an address column (e.g., street, city, state).
3. Geocode Addresses Using Nominatim
PYTHON
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="census_geocoding_tutorial")
def geocode_address(address):
try:
location = geolocator.geocode(address)
return location.latitude, location.longitude
except:
return None, None
df["lat"], df["lon"] = zip(*df["address"].apply(geocode_address))
Note: Geocoding services may return None for incomplete or ambiguous addresses.
4. Convert to a GeoDataFrame
PYTHON
import geopandas as gpd
gdf = gpd.GeoDataFrame(
df,
geometry=gpd.points_from_xy(df.lon, df.lat),
crs="EPSG:4326"
)
gdf.head()
Plot the geocoded points:
5. Query OpenStreetMap Features
OpenStreetMap provides free, global geographic data.
Example: download buildings in a city.
PYTHON
import osmnx as ox
place = "Lafayette, Indiana, USA"
buildings = ox.geometries_from_place(
place,
tags={"building": True}
)
Plot buildings with census points:
6. Adding Spatial Context to Census Data
You can buffer census points to analyze nearby features.
PYTHON
gdf_buffer = gdf.copy()
gdf_buffer["geometry"] = gdf_buffer.geometry.buffer(200) # meters (after projection)
Spatial join example:
This links buildings to nearby census locations.
Challenge
Challenge 1 — Query a Different OSM Feature
Choose one:
Roads →
{"highway": True}Schools →
{"amenity": "school"}Parks →
{"leisure": "park"}
Plot the feature with census points.
Challenge (continued)
Challenge 2 — Accessibility Analysis
For each census point:
Create a buffer
Count how many buildings fall inside
Interpret spatial differences
Higher counts suggest higher accessibility or density.
Math
Geocoding transforms a text location L into coordinates
(x,y)
Spatial joins evaluate relationships between geometries:
- within
- intersects
- contains These operations allow census attributes to be analyzed spatially.
Geocoding converts census addresses into spatial coordinates
GeoPandas enables spatial operations on tabular data
OpenStreetMap provides rich contextual geographic layers
Combining census + OSM reveals spatial patterns and inequality
Spatial context transforms demographic data into actionable insight
Module Overview
| Lesson | Overview |
|---|---|
| Beginner | Introduction to Address Geocoding |
| Intermediate | Introduction to OSM Overpass API |
| Advanced | Introduction to Advanced Batch-Geocoding |
Content from Network Analysis
Last updated on 2025-12-05 | Edit this page
Overview
Questions
- How do we download and visualize road networks with OSM data?
- What is a graph network and how is it represented in Python?
- How can we compute shortest paths and network distances?
Objectives
- Learn how to retrieve OpenStreetMap road data using OSMnx
- Convert road networks into graphs for routing and analysis
- Visualize networks and shortest paths on a map
- Compute route distances and travel time across a network
Overview
This tutorial provides a practical introduction to performing road
network analysis using Python, focusing on analyzing road networks in a
specified area (e.g., West Lafayette, Indiana) to study food deserts. It
uses libraries such as networkx, osmnx,
folium, pandas, geopandas, and
matplotlib to fetch, visualize, and analyze road networks,
compute centroid nodes, and calculate the shortest path based on travel
time. The tutorial also applies this analysis to food accessibility data
in Indiana.
The coordinates of grocery stores in Indiana were fetched using OpenStreetMap (OSM). Network analysis is used to calculate distances and times from grocery store locations to the center of areas of interest, considering factors such as the number of supermarkets, income, and vehicle accessibility. Distances are classified as good (around 1 mile) or low-accessible (over 5 miles), depending on rural or urban settings. Low Income and Low Access maps were created for each census tract in Indiana and compared to the USDA food desert dataset.
Why Network Analysis for Food Deserts?
- Mapping Accessibility: Models connections between grocery stores and transportation systems to identify areas with limited healthy food access due to distance or lack of transportation.
- Area Development: Helps improve accessibility and quality of life in underserved regions.
- Promotes Equity: Highlights disparities to create solutions for equitable access to nutritious food.
- Optimization of Resources: Ensures equal distribution of resources for all individuals.
Environment Setup
Libraries imported for this tutorial: - osmnx: Fetches
and processes OpenStreetMap road network data. - networkx:
Performs graph-based computations, such as shortest path calculations. -
folium: Enables interactive map visualizations. -
geopandas and shapely: Handle geospatial data
and geometry operations. - matplotlib: Generates static
plots, including network visualization. - geopy: Calculates
geodesic distances for spatial analysis.
Data Acquisition
The road network for West Lafayette, Indiana, is fetched using
ox.graph.from_place("West Lafayette, Indiana", network_type="drive"),
retrieving the drivable road network from OpenStreetMap as a graph
(nodes as intersections, edges as road segments). The graph can be saved
as a GraphML file (e.g.,
westlafayette_indiana_network.graphml) using
ox.save_graphml to avoid redundant downloads. This can be
adapted for any U.S. area with a single line of code.
Applications
- Urban Planning: Analyzing road connectivity and accessibility in cities.
- Transportation Studies: Optimizing routes based on travel time or distance.
- Geospatial Analysis: Studying spatial relationships in infrastructure networks.
- Emergency Response: Identifying the fastest routes for first responders.
Introduction
Network analysis allows us to study movement, connectivity, and accessibility across geographic space. Roads, sidewalks, rivers, power lines, and transit systems can be modeled as graphs, where intersections are nodes and paths are edges.
This lesson demonstrates how to:
- Download a road network using OSMnx
- Convert it into a graph using NetworkX
- Visualize the network
- Run shortest path routing between two locations
1. Install Required Libraries
2. Import Libraries
3. Download a Road Network from OpenStreetMap
Visualize network:
4. Convert the Graph to Nodes and Edges GeoDataFrames
Plot edges alone:
5. Find Shortest Route Between Two Points
Choose two coordinates manually or by clicking on a map.
PYTHON
orig = ox.distance.nearest_nodes(G, -86.9145, 40.4253) # lon, lat
dest = ox.distance.nearest_nodes(G, -86.9079, 40.4268)
Calculate shortest path:
Plot route:
Challenge
Challenge 1 — Try Your Own Route
- Pick any two points in a city of your choice.
- Compute and visualize the shortest path between them.
Math
A network is represented as a graph:
G = (V,E)
Where:
- V = set of nodes (intersections)
- E = edges (roads)
Shortest path = minimum weighted path across E.
OSMnx simplifies downloading and converting OSM road networks
Graphs model movement and connectivity in space
NetworkX allows shortest path and routing analysis
Visualization helps interpret accessibility patterns
Content from Spatial Analysis
Last updated on 2025-12-05 | Edit this page
Overview
Questions
- What is PySAL and what can it do for spatial analysis?
- How do we compute spatial weights and perform spatial autocorrelation?
- How do we interpret results like Moran’s I?
Objectives
- Understand the purpose of PySAL in spatial data science
- Learn how to load spatial data using GeoPandas
- Construct spatial weight matrices
- Compute Global Moran’s I using PySAL
- Visualize spatial clustering and spatial autocorrelation
Why is PySAL Important?
PySAL (Python Spatial Analysis Library) is one of the most widely used toolkits for working with spatial data in Python. Unlike traditional statistical libraries, PySAL is designed specifically for datasets where location matters — where observations influence nearby observations, and spatial patterns may not be random.
Geographers, urban planners, environmental scientists, epidemiologists, and data analysts use PySAL to identify spatial relationships, detect clustering, and build models that incorporate proximity and geography.
What PySAL Helps Us Understand
- Where events cluster or disperse across space
- Whether high or low values form hotspots or
coldspots
- How neighborhoods influence one another (spatial dependency)
- Spatial inequality patterns in income, population, crime, disease,
etc.
- Geographic diffusion (wildfire spread, disease transmission,
migration flows)
- Environmental change and land-use impacts
Why Researchers Use PySAL
- Built for spatial statistics — tools that general
libraries lack
- Easy integration with GeoPandas, raster data, and
shapefiles
- Provides standard spatial methods such as:
- Spatial weights (Queen, Rook, KNN, Distance-based)
- Global & Local Moran’s I (LISA)
- Spatial clustering & hotspot detection
- Spatial regression models
- Spatial weights (Queen, Rook, KNN, Distance-based)
- Enables data-driven decision making in
geography
- Scales from local studies to large regional/global analyses
- Helps test spatial hypotheses scientifically instead of visually
Spatial Analysis in Context
Spatial analysis answers questions like:
| Question | PySAL Method |
|---|---|
| Do areas with high values cluster together? | Moran’s I |
| Where are hotspots located? | Local Moran / LISA maps |
| What counts as a neighbor? | Spatial weights matrices |
| Are patterns random or significant? | Monte Carlo permutation tests |
| How do variables influence each other across space? | Spatial regression |
PySAL makes these methods accessible in Python, allowing analysts to move from maps to statistical evidence — revealing underlying spatial patterns that are not visible from visualization alone.
Introduction
PySAL is the Python Spatial Analysis Library — a powerful, open-source toolkit for working with spatial data. It provides tools for:
- spatial weights
- spatial autocorrelation
- clustering
- spatial regression
- neighborhood analysis
This tutorial introduces the core PySAL workflow, closely following the structure used in your uploaded notebook.
We will cover:
- Loading polygon or point data
- Building spatial weights
- Running Global Moran’s I
- Visualizing results
This tutorial assumes basic familiarity with pandas,
geopandas, and Python.
What you need to know for Carpentries lessons:
-
questionsprime the learner for the lesson. -
objectivesstate what skills will be gained. -
keypointssummarize what was learned.
1. Loading Spatial Data
PySAL works seamlessly with GeoPandas.
Here’s a simple example using a polygon shapefile:
Plot the boundaries:
This ensures the geometry is valid and loads correctly.
2. Building Spatial Weights
Spatial weights define who is a neighbor of whom.
PySAL includes:
Rook contiguity
Queen contiguity
K-nearest neighbors
Distance-based neighbors
3. Global Moran’s I
Moran’s I measures global spatial autocorrelation:
Positive values → clustering
Negative values → dispersion
Near zero → random pattern
Assume the dataset has a numeric column value:
View the results:
Plot the Moran scatterplot:
4. Local Moran’s I (Outlier Analysis)
Local Moran’s I finds hotspots and coldspots.
Add LISA quadrant labels to the GeoDataFrame:
Map the clusters:
This creates a basic LISA cluster map.
Queen neighbors may include diagonal touches. Rook neighbors require shared edges only. You should see fewer rook neighbors than queen neighbors.
Challenge (continued)
Challenge 2: Compute Moran’s I on a New Variable
Choose any numeric variable in your dataset:
Extract the variable
Compute Moran’s I
Interpret whether clustering exists
A positive Moran’s I with low p-value → strong clustering. Near zero → randomness. Negative → spatial dispersion.
Math
Global Moran’s I is defined as:
$ I = \frac{N}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})} {\sum_i (x_i - \bar{x})^2} $
Where:
N = number of observations
W = sum of all spatial weights
w_ij = weight between units i and j
x = variable of interest
PySAL provides tools for weights, autocorrelation, clustering, and modeling
Queen and rook weights define spatial neighbors differently
Moran’s I measures global autocorrelation
Local Moran (LISA) identifies hotspots and coldspots
GeoPandas and PySAL together form a powerful spatial analysis workflow
Content from Spatial Clustering
Last updated on 2025-12-05 | Edit this page
Overview
Questions
- What is spatial clustering and why do we use it?
- How can we perform basic clustering on geographic point data?
- How do algorithms like K-Means, Hierarchical Clustering, and DBSCAN differ?
Objectives
- Understand the concept of spatial clustering
- Learn how to prepare point data for clustering
- Apply K-Means, Hierarchical Clustering, and DBSCAN in Python
- Visualize clustering results on simple scatterplots and maps
Why is Spatial Clustering Important?
Spatial clustering is a core method in geospatial analysis for identifying how points, people, places, or events are distributed across space. Instead of treating data as isolated observations, clustering helps us detect patterns, revealing where concentrations or groupings occur — and just as importantly, where they do not.
Clustering allows us to transform large sets of point data into meaningful spatial insights that can guide research, decision-making, and planning.
What Spatial Clustering Helps Us Understand
- Where events or features form geographic hotspots
- How points group based on proximity or similarity
- Regions of high vs. low density
- Patterns of distribution — clustered, dispersed, or random?
- Spatial relationships in social, environmental, or urban data
- Location-based trends that maps alone may not reveal
Why Analysts Use Spatial Clustering
- Reduces complex spatial datasets into interpretable
groups
- Helps detect clusters in public health (disease outbreaks), crime,
ecology, and more
- Identifies emerging hotspots for management or
intervention
- Useful for urban planning, environmental monitoring, and
archaeology
- Works well as a first step for further spatial statistics (PySAL,
regression, AI)
- Enables classification, prediction, and pattern recognition in large datasets
Clustering at a Glance
| Method | Strength | Best For |
|---|---|---|
| K-Means | Simple, fast | Well-separated, circular clusters |
| Hierarchical | Dendrogram visualization | Multi-scale grouping, unknown k values |
| DBSCAN | Finds irregular shapes + noise | Spatial hotspots and natural patterns |
Spatial clustering is often the first analytical step when exploring point distribution. It moves the analysis beyond visual mapping — showing not only where points are located, but how spatial processes shape them.
Introduction
Spatial clustering is a core method used in geography, archaeology, ecology, and urban studies. It helps identify patterns in the spatial distribution of points—such as hotspots of crime, clusters of archaeological artifacts, or regions with similar environmental characteristics.
This beginner tutorial walks you through the fundamentals of spatial clustering using a simple dataset of geographic coordinates. The workflow is entirely in Python, following the structure used in your uploaded notebook.
We will cover:
- Loading and exploring point data
- Preparing coordinates for clustering
- Running three clustering algorithms
- Visualizing the results
All examples use standard Python libraries:pandas, geopandas, matplotlib,
sklearn, and scipy.
1. Loading Spatial Point Data
Spatial clustering typically starts with a set of point locations. A minimal example:
Visualize the raw points:
PYTHON
import matplotlib.pyplot as plt
plt.scatter(df.lon, df.lat, s=10)
plt.title("Raw Spatial Points")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()
This simple scatterplot helps identify whether your data already looks clustered.
2. K-Means Clustering
K-Means is the simplest clustering algorithm. It works best when:
You know the number of clusters you want
Clusters are roughly circular
Points are evenly distributed
PYTHON
from sklearn.cluster import KMeans
coords = df[['lon', 'lat']]
kmeans = KMeans(n_clusters=4, random_state=42)
df['kmeans_label'] = kmeans.fit_predict(coords)
Visualize results:
3. Hierarchical Clustering
Hierarchical clustering builds clusters step-by-step. It is useful when:
You want a dendrogram
You don’t know the number of clusters beforehand
Clusters may have irregular shapes
Example:
PYTHON
from sklearn.cluster import AgglomerativeClustering
agg = AgglomerativeClustering(n_clusters=4)
df['hier_label'] = agg.fit_predict(coords)
Plot:
4. DBSCAN: Density-Based Clustering
DBSCAN is ideal for spatial datasets because:
It finds clusters of any shape
It identifies noise points
It does not require the number of clusters in advance
Example:
PYTHON
from sklearn.cluster import DBSCAN
import numpy as np
epsilon = 0.01 # distance threshold
db = DBSCAN(eps=epsilon, min_samples=5).fit(coords)
df['dbscan_label'] = db.labels_
Points labeled -1 are noise (outliers).
Plot:
PYTHON
plt.scatter(df.lon, df.lat, c=df.dbscan_label, cmap='Accent')
plt.title("DBSCAN Spatial Clusters")
plt.show()
Challenge
Challenge 1: Exploring Your Own Dataset
Using the examples above:
Load your own set of spatial coordinates
Apply K-Means and DBSCAN
Compare the results
Which method performs better, and why?
Example Interpretation
K-Means finds evenly divided clusters
DBSCAN finds natural geographic groups and labels outliers
For irregular spatial patterns, DBSCAN usually performs better
Larger eps creates larger clusters. Smaller
eps creates more clusters and more noise points.
Math
DBSCAN uses density to define clusters. Its key idea:
A point belongs to a cluster if it has at least
min_samples neighbors within a distance ε:
$\text{density} = \frac{\text{neighbors}}{\pi \varepsilon^2}$
Higher density regions form clusters; low density points become noise.
Spatial clustering groups geographic points into meaningful patterns
K-Means is simple but assumes circular clusters
Hierarchical clustering builds clusters step-wise
DBSCAN is best for irregular shapes and detecting noise
Always visualize your clusters to interpret them correctly
Content from NDVI Analysis
Last updated on 2025-12-05 | Edit this page
Overview
Questions
- What is NDVI and why is it useful?
- How do we calculate NDVI from Landsat imagery?
- How do we load and visualize raster data in Python?
- How can we classify and map greenness using NDVI?
Objectives
- Understand NDVI and the spectral bands needed to compute it
- Learn to read geospatial raster files using rasterio
- Calculate NDVI using Red & NIR bands from Landsat
- Visualize NDVI as a map with color gradients
- Create a simple vegetation classification from NDVI values
Why is NDVI Important?
NDVI is one of the most widely used vegetation indices in remote sensing because it provides a simple yet powerful way to assess plant health and landscape greenness over large areas. Healthy vegetation strongly reflects Near-Infrared (NIR) light and absorbs Red light for photosynthesis — NDVI takes advantage of this behavior to quantify vegetation vigor.
What NDVI Helps Us Understand
- Crop health and agricultural productivity
- Drought severity and water stress
- Forest cover and vegetation density
- Urban expansion and land use change
- Seasonal phenology (spring green-up, fall senescence)
- Disaster monitoring (wildfire burn severity, storm damage)
Why Researchers Use NDVI
- It is easy to compute from satellite imagery
- Works across multiple sensors (Landsat, Sentinel-2, MODIS,
etc.)
- Allows temporal comparison (year-to-year vegetation
trends)
- Useful for ecosystem monitoring & climate change
studies
- Enables land cover classification and biomass
estimation
- Supports decision-making in agriculture and forestry
NDVI Interpretation at a Glance
| NDVI Range | Interpretation | Example Areas |
|---|---|---|
| -1 to 0 | Water, snow, clouds, barren | Lakes, rivers |
| 0–0.2 | Bare soil, built-up land | Urban areas, deserts |
| 0.2–0.5 | Moderate vegetation | Grasslands, shrubs |
| > 0.5 | Dense, healthy vegetation | Forests, croplands |
NDVI is therefore a foundation metric in environmental science — enabling researchers, planners, and ecologists to visualize vegetation patterns, track change through time, and make data-driven decisions about land and resources.
In this lesson, we will compute NDVI for Indiana using Landsat bands and generate maps with Python.
1. Installing Required Libraries
2. Import Dependencies
3. Load Landsat RED and NIR Bands
Make sure your directory contains Landsat .TIF files
(Band 4 = Red, Band 5 = NIR).
PYTHON
red = rasterio.open("LC08_L1TP_red.tif")
nir = rasterio.open("LC08_L1TP_nir.tif")
red_band = red.read(1).astype('float32')
nir_band = nir.read(1).astype('float32')
Plot a band to inspect:
4. Calculate NDVI
Visualize NDVI:
5. Classify NDVI into Vegetation Categories
PYTHON
ndvi_class = np.digitize(ndvi, bins=[0, 0.2, 0.5])
# 0 = water/barren, 1 = low vegetation, 2 = dense vegetation
colors = ['blue', 'yellow', 'green']
plt.imshow(ndvi_class, cmap=plt.matplotlib.colors.ListedColormap(colors))
plt.title("NDVI Vegetation Classification")
plt.show()
Challenge 1 — Try It Yourself
Change the NDVI color map (cmap)
Classify NDVI into four categories instead of three
Add labels or legends to your final map
Water regions become transparent/ignored in the plot.
NDVI is affected by seasonality, cloud cover, and atmospheric effects. Always check metadata to ensure you’re comparing compatible scenes.
Math
NDVI uses reflectance difference between two bands:
NDVI = (NIR - RED)/(NIR + RED)
NIR increases with vegetation health — higher NDVI = greener land.
NDVI uses Red & NIR reflectance from satellite imagery
Landsat Band 4 = Red, Band 5 = NIR for NDVI
NDVI ranges from -1 (water) to +1 (healthy vegetation)
Python tools: rasterio, numpy, matplotlib
NDVI maps reveal vegetation patterns visually and quantitatively