Last updated on 2026-01-14 | Edit this page

Overview

Questions

What is geocoding and why is it essential for census analysis?
How can we convert addresses into spatial coordinates?
How do we combine census data with OpenStreetMap features?
How can spatial context improve demographic analysis?

Objectives

Understand what geocoding is and how it works
Convert address-based census data into geographic coordinates
Query OpenStreetMap (OSM) features using Python
Combine census points with OSM layers for spatial analysis
Visualize geocoded census data alongside urban infrastructure

Introduction

Census and demographic datasets are often non-spatial — they exist as tables containing addresses, place names, or administrative units. To analyze these data geographically, we must first geocode them: converting text-based locations into latitude and longitude coordinates.

Once census data are geocoded, they can be enriched with contextual information from OpenStreetMap (OSM), such as roads, buildings, parks, schools, or hospitals. This enables deeper spatial insights into population distribution, accessibility, and urban structure.

In this lesson, you will learn how to:

Geocode address-based census data
Convert results into spatial objects
Query OpenStreetMap features
Visualize census data in its geographic context

Why Census Geocoding Matters ?

Census data becomes far more powerful when location is explicitly included. Geocoding allows researchers to move from spreadsheets to spatial insight.

What Census Geocoding Helps Us Understand

Population distribution and density patterns
Access to services (schools, hospitals, transit, parks)
Spatial inequality and environmental justice
Urban growth and land-use change
Neighborhood-level demographic trends
Relationships between people and infrastructure

Why Researchers Combine Census Data with OSM

Census data provides who and what
OpenStreetMap provides where and how
Together, they enable:
- Accessibility studies
- Urban planning analysis
- Public health assessments
- Infrastructure equity evaluations
- Place-based policy analysis

Geocoding transforms census data from static tables into spatial evidence.

1. Installing Required Libraries

PYTHON

!pip install geopandas geopy osmnx matplotlib

2. Load Census Data or Address Data

PYTHON

import pandas as pd

df = pd.read_csv("census_addresses.csv")
df.head()

This dataset should contain an address column (e.g., street, city, state).

3. Geocode Addresses Using `Nominatim`

PYTHON

from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="census_geocoding_tutorial")

def geocode_address(address):
    try:
        location = geolocator.geocode(address)
        return location.latitude, location.longitude
    except:
        return None, None

df["lat"], df["lon"] = zip(*df["address"].apply(geocode_address))

Note: Geocoding services may return None for incomplete or ambiguous addresses.

4. Convert to a GeoDataFrame

PYTHON

import geopandas as gpd

gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.points_from_xy(df.lon, df.lat),
    crs="EPSG:4326"
)

gdf.head()

Plot the geocoded points:

PYTHON

gdf.plot(figsize=(6,6), color="red")

5. Query OpenStreetMap Features

OpenStreetMap provides free, global geographic data.

Example: download buildings in a city.

PYTHON

import osmnx as ox

place = "Lafayette, Indiana, USA"

buildings = ox.geometries_from_place(
    place,
    tags={"building": True}
)

Plot buildings with census points:

PYTHON

ax = buildings.plot(color="lightgray", figsize=(8,6))
gdf.plot(ax=ax, color="red", markersize=10)

6. Adding Spatial Context to Census Data

You can buffer census points to analyze nearby features.

PYTHON

gdf_buffer = gdf.copy()
gdf_buffer["geometry"] = gdf_buffer.geometry.buffer(200)  # meters (after projection)

Spatial join example:

PYTHON

join = gpd.sjoin(buildings, gdf_buffer, predicate="within")
join.head()

This links buildings to nearby census locations.

Challenge

Challenge 1 — Query a Different OSM Feature

Choose one:

Roads → {"highway": True}
Schools → {"amenity": "school"}
Parks → {"leisure": "park"}

Plot the feature with census points.

Show me the solution

PYTHON

parks = ox.geometries_from_place(place, tags={"leisure": "park"})
parks.plot()

Challenge

Challenge (continued)

Challenge 2 — Accessibility Analysis

For each census point:

Create a buffer
Count how many buildings fall inside
Interpret spatial differences

Show me the solution

Higher counts suggest higher accessibility or density.

Math

Geocoding transforms a text location L into coordinates (x,y)

Spatial joins evaluate relationships between geometries:

within
intersects
contains These operations allow census attributes to be analyzed spatially.

Key Points

Geocoding converts census addresses into spatial coordinates
GeoPandas enables spatial operations on tabular data
OpenStreetMap provides rich contextual geographic layers
Combining census + OSM reveals spatial patterns and inequality
Spatial context transforms demographic data into actionable insight

Module Overview

Lesson	Overview
Beginner	Introduction to Address Geocoding
Intermediate	Introduction to OSM Overpass API
Advanced	Introduction to Advanced Batch-Geocoding

Content from Network Analysis

Last updated on 2025-12-05 | Edit this page

Overview

Questions

How do we download and visualize road networks with OSM data?
What is a graph network and how is it represented in Python?
How can we compute shortest paths and network distances?

Objectives

Learn how to retrieve OpenStreetMap road data using OSMnx
Convert road networks into graphs for routing and analysis
Visualize networks and shortest paths on a map
Compute route distances and travel time across a network

Overview

This tutorial provides a practical introduction to performing road network analysis using Python, focusing on analyzing road networks in a specified area (e.g., West Lafayette, Indiana) to study food deserts. It uses libraries such as networkx, osmnx, folium, pandas, geopandas, and matplotlib to fetch, visualize, and analyze road networks, compute centroid nodes, and calculate the shortest path based on travel time. The tutorial also applies this analysis to food accessibility data in Indiana.

The coordinates of grocery stores in Indiana were fetched using OpenStreetMap (OSM). Network analysis is used to calculate distances and times from grocery store locations to the center of areas of interest, considering factors such as the number of supermarkets, income, and vehicle accessibility. Distances are classified as good (around 1 mile) or low-accessible (over 5 miles), depending on rural or urban settings. Low Income and Low Access maps were created for each census tract in Indiana and compared to the USDA food desert dataset.

Why Network Analysis for Food Deserts?

Mapping Accessibility: Models connections between grocery stores and transportation systems to identify areas with limited healthy food access due to distance or lack of transportation.
Area Development: Helps improve accessibility and quality of life in underserved regions.
Promotes Equity: Highlights disparities to create solutions for equitable access to nutritious food.
Optimization of Resources: Ensures equal distribution of resources for all individuals.

Environment Setup

Libraries imported for this tutorial: - osmnx: Fetches and processes OpenStreetMap road network data. - networkx: Performs graph-based computations, such as shortest path calculations. - folium: Enables interactive map visualizations. - geopandas and shapely: Handle geospatial data and geometry operations. - matplotlib: Generates static plots, including network visualization. - geopy: Calculates geodesic distances for spatial analysis.

Data Acquisition

The road network for West Lafayette, Indiana, is fetched using ox.graph.from_place("West Lafayette, Indiana", network_type="drive"), retrieving the drivable road network from OpenStreetMap as a graph (nodes as intersections, edges as road segments). The graph can be saved as a GraphML file (e.g., westlafayette_indiana_network.graphml) using ox.save_graphml to avoid redundant downloads. This can be adapted for any U.S. area with a single line of code.

Applications

Urban Planning: Analyzing road connectivity and accessibility in cities.
Transportation Studies: Optimizing routes based on travel time or distance.
Geospatial Analysis: Studying spatial relationships in infrastructure networks.
Emergency Response: Identifying the fastest routes for first responders.

Visualization

A folium map example shows a blue star indicating the centroid of West Lafayette to a random point (green flag), with the shortest path marked by a red line/polygon.

Limitations

Data Dependency: Relies on OpenStreetMap data, which may vary in quality or availability by region.
Performance: Large networks may require significant computational resources for fetching and processing.

Introduction

Network analysis allows us to study movement, connectivity, and accessibility across geographic space. Roads, sidewalks, rivers, power lines, and transit systems can be modeled as graphs, where intersections are nodes and paths are edges.

This lesson demonstrates how to:

Download a road network using OSMnx
Convert it into a graph using NetworkX
Visualize the network
Run shortest path routing between two locations

1. Install Required Libraries

PYTHON

!pip install osmnx networkx matplotlib

2. Import Libraries

PYTHON

import osmnx as ox
import networkx as nx
import matplotlib.pyplot as plt

3. Download a Road Network from OpenStreetMap

PYTHON

place = "West Lafayette, Indiana, USA"

G = ox.graph_from_place(place, network_type="drive")

Visualize network:

PYTHON

fig, ax = ox.plot_graph(G, node_size=5, edge_color="gray")

4. Convert the Graph to Nodes and Edges GeoDataFrames

PYTHON

nodes, edges = ox.graph_to_gdfs(G)
nodes.head(), edges.head()

Plot edges alone:

PYTHON

edges.plot(figsize=(8,6), linewidth=0.8)
plt.title("Road Network")
plt.show()

5. Find Shortest Route Between Two Points

Choose two coordinates manually or by clicking on a map.

PYTHON

orig = ox.distance.nearest_nodes(G, -86.9145, 40.4253)  # lon, lat
dest = ox.distance.nearest_nodes(G, -86.9079, 40.4268)

Calculate shortest path:

PYTHON

route = nx.shortest_path(G, orig, dest, weight="length")

Plot route:

PYTHON

ox.plot_graph_route(G, route, route_color="red")

Challenge

Challenge 1 — Try Your Own Route

Pick any two points in a city of your choice.
Compute and visualize the shortest path between them.

Show me the solution

PYTHON

orig = ox.distance.nearest_nodes(G, lon1, lat1)
dest = ox.distance.nearest_nodes(G, lon2, lat2)
route = nx.shortest_path(G, orig, dest, weight="length")
ox.plot_graph_route(G, route)

Challenge

Challenge (continued)

Challenge 2 — Estimate Travel Distance

Using your computed route, calculate the total path length:

PYTHON

total_length = sum(
    ox.utils_graph.get_route_edge_attributes(G, route, "length")
)
print("Route length (meters):", total_length)

Show me the solution

Convert meters → km:

PYTHON

print(total_length/1000, "km")

Math

A network is represented as a graph:

G = (V,E)

Where:

V = set of nodes (intersections)
E = edges (roads)

Shortest path = minimum weighted path across E.

Key Points

OSMnx simplifies downloading and converting OSM road networks
Graphs model movement and connectivity in space
NetworkX allows shortest path and routing analysis
Visualization helps interpret accessibility patterns

Module Overview

Lesson	Overview
Beginner	Introduction to Network Analysis and color coding distances.
Advanced	Obtains coordinates from OSM and uses centroid analysis to calculate distances and travel times for multiple points of interests.

Content from Spatial Analysis

Last updated on 2025-12-05 | Edit this page

Overview

Questions

What is PySAL and what can it do for spatial analysis?
How do we compute spatial weights and perform spatial autocorrelation?
How do we interpret results like Moran’s I?

Objectives

Understand the purpose of PySAL in spatial data science
Learn how to load spatial data using GeoPandas
Construct spatial weight matrices
Compute Global Moran’s I using PySAL
Visualize spatial clustering and spatial autocorrelation

Why is PySAL Important?

PySAL (Python Spatial Analysis Library) is one of the most widely used toolkits for working with spatial data in Python. Unlike traditional statistical libraries, PySAL is designed specifically for datasets where location matters — where observations influence nearby observations, and spatial patterns may not be random.

Geographers, urban planners, environmental scientists, epidemiologists, and data analysts use PySAL to identify spatial relationships, detect clustering, and build models that incorporate proximity and geography.

What PySAL Helps Us Understand

Where events cluster or disperse across space
Whether high or low values form hotspots or coldspots
How neighborhoods influence one another (spatial dependency)
Spatial inequality patterns in income, population, crime, disease, etc.
Geographic diffusion (wildfire spread, disease transmission, migration flows)
Environmental change and land-use impacts

Why Researchers Use PySAL

Built for spatial statistics — tools that general libraries lack
Easy integration with GeoPandas, raster data, and shapefiles
Provides standard spatial methods such as:
- Spatial weights (Queen, Rook, KNN, Distance-based)
- Global & Local Moran’s I (LISA)
- Spatial clustering & hotspot detection
- Spatial regression models
Enables data-driven decision making in geography
Scales from local studies to large regional/global analyses
Helps test spatial hypotheses scientifically instead of visually

Spatial Analysis in Context

Spatial analysis answers questions like:

Question	PySAL Method
Do areas with high values cluster together?	Moran’s I
Where are hotspots located?	Local Moran / LISA maps
What counts as a neighbor?	Spatial weights matrices
Are patterns random or significant?	Monte Carlo permutation tests
How do variables influence each other across space?	Spatial regression

PySAL makes these methods accessible in Python, allowing analysts to move from maps to statistical evidence — revealing underlying spatial patterns that are not visible from visualization alone.

Introduction

PySAL is the Python Spatial Analysis Library — a powerful, open-source toolkit for working with spatial data. It provides tools for:

spatial weights
spatial autocorrelation
clustering
spatial regression
neighborhood analysis

This tutorial introduces the core PySAL workflow, closely following the structure used in your uploaded notebook.

We will cover:

Loading polygon or point data
Building spatial weights
Running Global Moran’s I
Visualizing results

This tutorial assumes basic familiarity with pandas, geopandas, and Python.

What you need to know for Carpentries lessons:

questions prime the learner for the lesson.
objectives state what skills will be gained.
keypoints summarize what was learned.

1. Loading Spatial Data

PySAL works seamlessly with GeoPandas.
Here’s a simple example using a polygon shapefile:

PYTHON

import geopandas as gpd

gdf = gpd.read_file("data/shapes.shp")
gdf.head()

Plot the boundaries:

PYTHON

gdf.plot(edgecolor="black", figsize=(6,6))

This ensures the geometry is valid and loads correctly.

2. Building Spatial Weights

Spatial weights define who is a neighbor of whom.

PySAL includes:

Rook contiguity
Queen contiguity
K-nearest neighbors
Distance-based neighbors

Example: Queen Contiguity

PYTHON

from libpysal.weights import Queen

w = Queen.from_dataframe(gdf)
w.transform = "R"  # row-standardization

Check neighbors of the first polygon:

PYTHON

w.neighbors[0]

3. Global Moran’s I

Moran’s I measures global spatial autocorrelation:

Positive values → clustering
Negative values → dispersion
Near zero → random pattern

Assume the dataset has a numeric column value:

PYTHON

import esda
import numpy as np

y = gdf['value']
mi = esda.Moran(y, w)

View the results:

PYTHON

mi.I, mi.p_sim

Plot the Moran scatterplot:

PYTHON

import splot.esda as esdaplot

esdaplot.moran_scatterplot(mi)

4. Local Moran’s I (Outlier Analysis)

Local Moran’s I finds hotspots and coldspots.

PYTHON

lisa = esda.Moran_Local(y, w)

Add LISA quadrant labels to the GeoDataFrame:

PYTHON

gdf["lisa_cluster"] = lisa.q

Map the clusters:

PYTHON

gdf.plot(column="lisa_cluster", cmap="Set1", figsize=(8,6), legend=True)

This creates a basic LISA cluster map.

Challenge

Challenge 1: Create Your Own Spatial Weights

Using the GeoDataFrame loaded above:

Create rook contiguity weights
Print the neighbor list for observation 10
Compare how rook vs queen differ

PYTHON

from libpysal.weights import Rook
w_rook = Rook.from_dataframe(gdf)
w_rook.neighbors[10]

Show me the solution

Queen neighbors may include diagonal touches. Rook neighbors require shared edges only. You should see fewer rook neighbors than queen neighbors.

Challenge

Challenge (continued)

Challenge 2: Compute Moran’s I on a New Variable

Choose any numeric variable in your dataset:

Extract the variable
Compute Moran’s I
Interpret whether clustering exists

Show me the solution

A positive Moran’s I with low p-value → strong clustering. Near zero → randomness. Negative → spatial dispersion.

Math

Global Moran’s I is defined as:

$ I = \frac{N}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})} {\sum_i (x_i - \bar{x})^2} $

Where:

N = number of observations

W = sum of all spatial weights

w_ij = weight between units i and j

x = variable of interest

Key Points

PySAL provides tools for weights, autocorrelation, clustering, and modeling

Queen and rook weights define spatial neighbors differently

Moran’s I measures global autocorrelation

Local Moran (LISA) identifies hotspots and coldspots

GeoPandas and PySAL together form a powerful spatial analysis workflow

Module Overview

Lesson	Overview
Beginner	Introduction to Spatial Analysis using PySAL package.
Advanced	(to be added)

Content from Spatial Clustering

Last updated on 2025-12-05 | Edit this page

Overview

Questions

What is spatial clustering and why do we use it?
How can we perform basic clustering on geographic point data?
How do algorithms like K-Means, Hierarchical Clustering, and DBSCAN differ?

Objectives

Understand the concept of spatial clustering
Learn how to prepare point data for clustering
Apply K-Means, Hierarchical Clustering, and DBSCAN in Python
Visualize clustering results on simple scatterplots and maps

Why is Spatial Clustering Important?

Spatial clustering is a core method in geospatial analysis for identifying how points, people, places, or events are distributed across space. Instead of treating data as isolated observations, clustering helps us detect patterns, revealing where concentrations or groupings occur — and just as importantly, where they do not.

Clustering allows us to transform large sets of point data into meaningful spatial insights that can guide research, decision-making, and planning.

What Spatial Clustering Helps Us Understand

Where events or features form geographic hotspots
How points group based on proximity or similarity
Regions of high vs. low density
Patterns of distribution — clustered, dispersed, or random?
Spatial relationships in social, environmental, or urban data
Location-based trends that maps alone may not reveal

Why Analysts Use Spatial Clustering

Reduces complex spatial datasets into interpretable groups
Helps detect clusters in public health (disease outbreaks), crime, ecology, and more
Identifies emerging hotspots for management or intervention
Useful for urban planning, environmental monitoring, and archaeology
Works well as a first step for further spatial statistics (PySAL, regression, AI)
Enables classification, prediction, and pattern recognition in large datasets

Clustering at a Glance

Method	Strength	Best For
K-Means	Simple, fast	Well-separated, circular clusters
Hierarchical	Dendrogram visualization	Multi-scale grouping, unknown k values
DBSCAN	Finds irregular shapes + noise	Spatial hotspots and natural patterns

Spatial clustering is often the first analytical step when exploring point distribution. It moves the analysis beyond visual mapping — showing not only where points are located, but how spatial processes shape them.

Introduction

Spatial clustering is a core method used in geography, archaeology, ecology, and urban studies. It helps identify patterns in the spatial distribution of points—such as hotspots of crime, clusters of archaeological artifacts, or regions with similar environmental characteristics.

This beginner tutorial walks you through the fundamentals of spatial clustering using a simple dataset of geographic coordinates. The workflow is entirely in Python, following the structure used in your uploaded notebook.

We will cover:

Loading and exploring point data
Preparing coordinates for clustering
Running three clustering algorithms
Visualizing the results

All examples use standard Python libraries:
pandas, geopandas, matplotlib, sklearn, and scipy.

1. Loading Spatial Point Data

Spatial clustering typically starts with a set of point locations. A minimal example:

PYTHON

import pandas as pd

df = pd.read_csv("points.csv")   # contains lon, lat
df.head()

Visualize the raw points:

PYTHON

import matplotlib.pyplot as plt

plt.scatter(df.lon, df.lat, s=10)
plt.title("Raw Spatial Points")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

This simple scatterplot helps identify whether your data already looks clustered.

2. K-Means Clustering

K-Means is the simplest clustering algorithm. It works best when:

You know the number of clusters you want
Clusters are roughly circular
Points are evenly distributed

PYTHON

from sklearn.cluster import KMeans

coords = df[['lon', 'lat']]
kmeans = KMeans(n_clusters=4, random_state=42)
df['kmeans_label'] = kmeans.fit_predict(coords)

Visualize results:

PYTHON

plt.scatter(df.lon, df.lat, c=df.kmeans_label, cmap='tab10')
plt.title("K-Means Clustering")
plt.show()

3. Hierarchical Clustering

Hierarchical clustering builds clusters step-by-step. It is useful when:

You want a dendrogram
You don’t know the number of clusters beforehand
Clusters may have irregular shapes

Example:

PYTHON

from sklearn.cluster import AgglomerativeClustering

agg = AgglomerativeClustering(n_clusters=4)
df['hier_label'] = agg.fit_predict(coords)

Plot:

PYTHON

plt.scatter(df.lon, df.lat, c=df.hier_label, cmap='viridis')
plt.title("Hierarchical Clustering")
plt.show()

4. DBSCAN: Density-Based Clustering

DBSCAN is ideal for spatial datasets because:

It finds clusters of any shape
It identifies noise points
It does not require the number of clusters in advance

Example:

PYTHON

from sklearn.cluster import DBSCAN
import numpy as np

epsilon = 0.01   # distance threshold
db = DBSCAN(eps=epsilon, min_samples=5).fit(coords)

df['dbscan_label'] = db.labels_

Points labeled -1 are noise (outliers).

Plot:

PYTHON

plt.scatter(df.lon, df.lat, c=df.dbscan_label, cmap='Accent')
plt.title("DBSCAN Spatial Clusters")
plt.show()

Challenge

Challenge 1: Exploring Your Own Dataset

Using the examples above:

Load your own set of spatial coordinates
Apply K-Means and DBSCAN
Compare the results

Which method performs better, and why?

Show me the solution

Example Interpretation

K-Means finds evenly divided clusters
DBSCAN finds natural geographic groups and labels outliers
For irregular spatial patterns, DBSCAN usually performs better

Challenge

Challenge (continued)

Challenge 2: Adjusting DBSCAN Sensitivity

Try changing the eps parameter:

PYTHON

DBSCAN(eps=500, min_samples=5) # eps is in meters
DBSCAN(eps=1000,  min_samples=5)

Show me the solution

Larger eps creates larger clusters. Smaller eps creates more clusters and more noise points.

Math

DBSCAN uses density to define clusters. Its key idea:

A point belongs to a cluster if it has at least min_samples neighbors within a distance ε:

$\text{density} = \frac{\text{neighbors}}{\pi \varepsilon^2}$

Higher density regions form clusters; low density points become noise.

Key Points

Spatial clustering groups geographic points into meaningful patterns
K-Means is simple but assumes circular clusters
Hierarchical clustering builds clusters step-wise
DBSCAN is best for irregular shapes and detecting noise
Always visualize your clusters to interpret them correctly

Module Overview

Lesson	Overview
Beginner	Introduction to Spatial Clustering using Crime Datasets.
Advanced	(to be added)

Content from NDVI Analysis

Last updated on 2025-12-05 | Edit this page

Overview

Questions

What is NDVI and why is it useful?
How do we calculate NDVI from Landsat imagery?
How do we load and visualize raster data in Python?
How can we classify and map greenness using NDVI?

Objectives

Understand NDVI and the spectral bands needed to compute it
Learn to read geospatial raster files using rasterio
Calculate NDVI using Red & NIR bands from Landsat
Visualize NDVI as a map with color gradients
Create a simple vegetation classification from NDVI values

Why is NDVI Important?

NDVI is one of the most widely used vegetation indices in remote sensing because it provides a simple yet powerful way to assess plant health and landscape greenness over large areas. Healthy vegetation strongly reflects Near-Infrared (NIR) light and absorbs Red light for photosynthesis — NDVI takes advantage of this behavior to quantify vegetation vigor.

What NDVI Helps Us Understand

Crop health and agricultural productivity
Drought severity and water stress
Forest cover and vegetation density
Urban expansion and land use change
Seasonal phenology (spring green-up, fall senescence)
Disaster monitoring (wildfire burn severity, storm damage)

Why Researchers Use NDVI

It is easy to compute from satellite imagery
Works across multiple sensors (Landsat, Sentinel-2, MODIS, etc.)
Allows temporal comparison (year-to-year vegetation trends)
Useful for ecosystem monitoring & climate change studies
Enables land cover classification and biomass estimation
Supports decision-making in agriculture and forestry

NDVI Interpretation at a Glance

NDVI Range	Interpretation	Example Areas
-1 to 0	Water, snow, clouds, barren	Lakes, rivers
0–0.2	Bare soil, built-up land	Urban areas, deserts
0.2–0.5	Moderate vegetation	Grasslands, shrubs
> 0.5	Dense, healthy vegetation	Forests, croplands

NDVI is therefore a foundation metric in environmental science — enabling researchers, planners, and ecologists to visualize vegetation patterns, track change through time, and make data-driven decisions about land and resources.

In this lesson, we will compute NDVI for Indiana using Landsat bands and generate maps with Python.

1. Installing Required Libraries

PYTHON

!pip install rasterio matplotlib numpy

2. Import Dependencies

PYTHON

import rasterio
import numpy as np
import matplotlib.pyplot as plt

3. Load Landsat RED and NIR Bands

Make sure your directory contains Landsat .TIF files (Band 4 = Red, Band 5 = NIR).

PYTHON

red = rasterio.open("LC08_L1TP_red.tif")
nir = rasterio.open("LC08_L1TP_nir.tif")

red_band = red.read(1).astype('float32')
nir_band = nir.read(1).astype('float32')

Plot a band to inspect:

PYTHON

plt.imshow(red_band, cmap='Reds')
plt.title("Red Band")
plt.colorbar()
plt.show()

4. Calculate NDVI

PYTHON

ndvi = (nir_band - red_band) / (nir_band + red_band)

Visualize NDVI:

PYTHON

plt.figure(figsize=(7,6))
plt.imshow(ndvi, cmap='YlGn')
plt.colorbar(label="NDVI Value")
plt.title("NDVI Map of Indiana")
plt.show()

5. Classify NDVI into Vegetation Categories

PYTHON

ndvi_class = np.digitize(ndvi, bins=[0, 0.2, 0.5])

# 0 = water/barren, 1 = low vegetation, 2 = dense vegetation
colors = ['blue', 'yellow', 'green']
plt.imshow(ndvi_class, cmap=plt.matplotlib.colors.ListedColormap(colors))
plt.title("NDVI Vegetation Classification")
plt.show()

Challenge

Challenge 1 — Try It Yourself

Change the NDVI color map (cmap)
Classify NDVI into four categories instead of three
Add labels or legends to your final map

Show me the solution

PYTHON

bins = [0, 0.2, 0.4, 0.6]
ndvi_class = np.digitize(ndvi, bins=bins)

Challenge

Challenge 2 — Mask Water Pixels

Use NDVI to mask water (<0):

PYTHON

water_mask = ndvi < 0
ndvi_water_removed = np.where(water_mask, np.nan, ndvi)

plt.imshow(ndvi_water_removed, cmap='YlGn')
plt.title("NDVI with Water Masked")
plt.show()

Show me the solution

Water regions become transparent/ignored in the plot.

Callout

NDVI is affected by seasonality, cloud cover, and atmospheric effects. Always check metadata to ensure you’re comparing compatible scenes.

Math

NDVI uses reflectance difference between two bands:

NDVI = (NIR - RED)/(NIR + RED)

NIR increases with vegetation health — higher NDVI = greener land.

Key Points

NDVI uses Red & NIR reflectance from satellite imagery
Landsat Band 4 = Red, Band 5 = NIR for NDVI
NDVI ranges from -1 (water) to +1 (healthy vegetation)
Python tools: rasterio, numpy, matplotlib
NDVI maps reveal vegetation patterns visually and quantitatively

Module Overview

Lesson	Overview
Beginner	Introduction to NDVI using LANDSAT Dataset.
Advanced	(to be added)

Overview

Questions

Objectives

Introduction

Why Census Geocoding Matters ?

What Census Geocoding Helps Us Understand

Why Researchers Combine Census Data with OSM

1. Installing Required Libraries

PYTHON

2. Load Census Data or Address Data

PYTHON

3. Geocode Addresses Using Nominatim

PYTHON

4. Convert to a GeoDataFrame

PYTHON

PYTHON

5. Query OpenStreetMap Features

PYTHON

PYTHON

6. Adding Spatial Context to Census Data

PYTHON

PYTHON

Challenge

Show me the solution

PYTHON

Challenge (continued)

Show me the solution

Math

Module Overview

Overview

Questions

Objectives

Overview

Why Network Analysis for Food Deserts?

Environment Setup

Data Acquisition

Applications

Visualization

Limitations

Introduction

1. Install Required Libraries

PYTHON

2. Import Libraries

PYTHON

3. Download a Road Network from OpenStreetMap

PYTHON

PYTHON

4. Convert the Graph to Nodes and Edges GeoDataFrames

PYTHON

PYTHON

5. Find Shortest Route Between Two Points

PYTHON

PYTHON

PYTHON

Challenge

Show me the solution

PYTHON

Challenge (continued)

PYTHON

Show me the solution

PYTHON

Math

Module Overview

Overview

Questions

Objectives

Why is PySAL Important?

What PySAL Helps Us Understand

Why Researchers Use PySAL

Spatial Analysis in Context

Introduction

1. Loading Spatial Data

PYTHON

PYTHON

2. Building Spatial Weights

Example: Queen Contiguity

PYTHON

PYTHON

3. Global Moran’s I

PYTHON

3. Geocode Addresses Using `Nominatim`