Content from Census Geocoding


Last updated on 2026-01-14 | Edit this page

Overview

Questions

  • What is geocoding and why is it essential for census analysis?
  • How can we convert addresses into spatial coordinates?
  • How do we combine census data with OpenStreetMap features?
  • How can spatial context improve demographic analysis?

Objectives

  • Understand what geocoding is and how it works
  • Convert address-based census data into geographic coordinates
  • Query OpenStreetMap (OSM) features using Python
  • Combine census points with OSM layers for spatial analysis
  • Visualize geocoded census data alongside urban infrastructure

Introduction


Census and demographic datasets are often non-spatial — they exist as tables containing addresses, place names, or administrative units. To analyze these data geographically, we must first geocode them: converting text-based locations into latitude and longitude coordinates.

Once census data are geocoded, they can be enriched with contextual information from OpenStreetMap (OSM), such as roads, buildings, parks, schools, or hospitals. This enables deeper spatial insights into population distribution, accessibility, and urban structure.

In this lesson, you will learn how to:

  1. Geocode address-based census data
  2. Convert results into spatial objects
  3. Query OpenStreetMap features
  4. Visualize census data in its geographic context

Why Census Geocoding Matters ?


Census data becomes far more powerful when location is explicitly included. Geocoding allows researchers to move from spreadsheets to spatial insight.

What Census Geocoding Helps Us Understand

  • Population distribution and density patterns
  • Access to services (schools, hospitals, transit, parks)
  • Spatial inequality and environmental justice
  • Urban growth and land-use change
  • Neighborhood-level demographic trends
  • Relationships between people and infrastructure

Why Researchers Combine Census Data with OSM

  • Census data provides who and what
  • OpenStreetMap provides where and how
  • Together, they enable:
    • Accessibility studies
    • Urban planning analysis
    • Public health assessments
    • Infrastructure equity evaluations
    • Place-based policy analysis

Geocoding transforms census data from static tables into spatial evidence.

1. Installing Required Libraries


PYTHON

!pip install geopandas geopy osmnx matplotlib

2. Load Census Data or Address Data


PYTHON

import pandas as pd

df = pd.read_csv("census_addresses.csv")
df.head()

This dataset should contain an address column (e.g., street, city, state).

3. Geocode Addresses Using Nominatim


PYTHON

from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="census_geocoding_tutorial")

def geocode_address(address):
    try:
        location = geolocator.geocode(address)
        return location.latitude, location.longitude
    except:
        return None, None

df["lat"], df["lon"] = zip(*df["address"].apply(geocode_address))

Note: Geocoding services may return None for incomplete or ambiguous addresses.

4. Convert to a GeoDataFrame


PYTHON

import geopandas as gpd

gdf = gpd.GeoDataFrame(
    df,
    geometry=gpd.points_from_xy(df.lon, df.lat),
    crs="EPSG:4326"
)

gdf.head()

Plot the geocoded points:

PYTHON

gdf.plot(figsize=(6,6), color="red")

5. Query OpenStreetMap Features


OpenStreetMap provides free, global geographic data.

Example: download buildings in a city.

PYTHON

import osmnx as ox

place = "Lafayette, Indiana, USA"

buildings = ox.geometries_from_place(
    place,
    tags={"building": True}
)

Plot buildings with census points:

PYTHON

ax = buildings.plot(color="lightgray", figsize=(8,6))
gdf.plot(ax=ax, color="red", markersize=10)

6. Adding Spatial Context to Census Data


You can buffer census points to analyze nearby features.

PYTHON

gdf_buffer = gdf.copy()
gdf_buffer["geometry"] = gdf_buffer.geometry.buffer(200)  # meters (after projection)

Spatial join example:

PYTHON

join = gpd.sjoin(buildings, gdf_buffer, predicate="within")
join.head()

This links buildings to nearby census locations.

Challenge

Challenge

Challenge 1 — Query a Different OSM Feature

Choose one:

  • Roads → {"highway": True}

  • Schools → {"amenity": "school"}

  • Parks → {"leisure": "park"}

Plot the feature with census points.

PYTHON

parks = ox.geometries_from_place(place, tags={"leisure": "park"})
parks.plot()
Challenge

Challenge (continued)

Challenge 2 — Accessibility Analysis

For each census point:

  • Create a buffer

  • Count how many buildings fall inside

  • Interpret spatial differences

Higher counts suggest higher accessibility or density.

Math


Geocoding transforms a text location L into coordinates (x,y)

Spatial joins evaluate relationships between geometries:

  • within
  • intersects
  • contains These operations allow census attributes to be analyzed spatially.
Key Points
  • Geocoding converts census addresses into spatial coordinates

  • GeoPandas enables spatial operations on tabular data

  • OpenStreetMap provides rich contextual geographic layers

  • Combining census + OSM reveals spatial patterns and inequality

  • Spatial context transforms demographic data into actionable insight

Module Overview

Lesson Overview
Beginner Introduction to Address Geocoding
Intermediate Introduction to OSM Overpass API
Advanced Introduction to Advanced Batch-Geocoding

Content from Network Analysis


Last updated on 2025-12-05 | Edit this page

Overview

Questions

  • How do we download and visualize road networks with OSM data?
  • What is a graph network and how is it represented in Python?
  • How can we compute shortest paths and network distances?

Objectives

  • Learn how to retrieve OpenStreetMap road data using OSMnx
  • Convert road networks into graphs for routing and analysis
  • Visualize networks and shortest paths on a map
  • Compute route distances and travel time across a network

Overview


This tutorial provides a practical introduction to performing road network analysis using Python, focusing on analyzing road networks in a specified area (e.g., West Lafayette, Indiana) to study food deserts. It uses libraries such as networkx, osmnx, folium, pandas, geopandas, and matplotlib to fetch, visualize, and analyze road networks, compute centroid nodes, and calculate the shortest path based on travel time. The tutorial also applies this analysis to food accessibility data in Indiana.

The coordinates of grocery stores in Indiana were fetched using OpenStreetMap (OSM). Network analysis is used to calculate distances and times from grocery store locations to the center of areas of interest, considering factors such as the number of supermarkets, income, and vehicle accessibility. Distances are classified as good (around 1 mile) or low-accessible (over 5 miles), depending on rural or urban settings. Low Income and Low Access maps were created for each census tract in Indiana and compared to the USDA food desert dataset.

Why Network Analysis for Food Deserts?

  1. Mapping Accessibility: Models connections between grocery stores and transportation systems to identify areas with limited healthy food access due to distance or lack of transportation.
  2. Area Development: Helps improve accessibility and quality of life in underserved regions.
  3. Promotes Equity: Highlights disparities to create solutions for equitable access to nutritious food.
  4. Optimization of Resources: Ensures equal distribution of resources for all individuals.

Environment Setup

Libraries imported for this tutorial: - osmnx: Fetches and processes OpenStreetMap road network data. - networkx: Performs graph-based computations, such as shortest path calculations. - folium: Enables interactive map visualizations. - geopandas and shapely: Handle geospatial data and geometry operations. - matplotlib: Generates static plots, including network visualization. - geopy: Calculates geodesic distances for spatial analysis.

Data Acquisition

The road network for West Lafayette, Indiana, is fetched using ox.graph.from_place("West Lafayette, Indiana", network_type="drive"), retrieving the drivable road network from OpenStreetMap as a graph (nodes as intersections, edges as road segments). The graph can be saved as a GraphML file (e.g., westlafayette_indiana_network.graphml) using ox.save_graphml to avoid redundant downloads. This can be adapted for any U.S. area with a single line of code.

Applications

  • Urban Planning: Analyzing road connectivity and accessibility in cities.
  • Transportation Studies: Optimizing routes based on travel time or distance.
  • Geospatial Analysis: Studying spatial relationships in infrastructure networks.
  • Emergency Response: Identifying the fastest routes for first responders.

Visualization

A folium map example shows a blue star indicating the centroid of West Lafayette to a random point (green flag), with the shortest path marked by a red line/polygon.

Limitations

  • Data Dependency: Relies on OpenStreetMap data, which may vary in quality or availability by region.
  • Performance: Large networks may require significant computational resources for fetching and processing.

Introduction


Network analysis allows us to study movement, connectivity, and accessibility across geographic space. Roads, sidewalks, rivers, power lines, and transit systems can be modeled as graphs, where intersections are nodes and paths are edges.

This lesson demonstrates how to:

  1. Download a road network using OSMnx
  2. Convert it into a graph using NetworkX
  3. Visualize the network
  4. Run shortest path routing between two locations

1. Install Required Libraries


PYTHON

!pip install osmnx networkx matplotlib

2. Import Libraries


PYTHON

import osmnx as ox
import networkx as nx
import matplotlib.pyplot as plt

3. Download a Road Network from OpenStreetMap


PYTHON

place = "West Lafayette, Indiana, USA"

G = ox.graph_from_place(place, network_type="drive")

Visualize network:

PYTHON

fig, ax = ox.plot_graph(G, node_size=5, edge_color="gray")

4. Convert the Graph to Nodes and Edges GeoDataFrames


PYTHON

nodes, edges = ox.graph_to_gdfs(G)
nodes.head(), edges.head()

Plot edges alone:

PYTHON

edges.plot(figsize=(8,6), linewidth=0.8)
plt.title("Road Network")
plt.show()

5. Find Shortest Route Between Two Points


Choose two coordinates manually or by clicking on a map.

PYTHON

orig = ox.distance.nearest_nodes(G, -86.9145, 40.4253)  # lon, lat
dest = ox.distance.nearest_nodes(G, -86.9079, 40.4268)

Calculate shortest path:

PYTHON

route = nx.shortest_path(G, orig, dest, weight="length")

Plot route:

PYTHON

ox.plot_graph_route(G, route, route_color="red")
Challenge

Challenge

Challenge 1 — Try Your Own Route

  • Pick any two points in a city of your choice.
  • Compute and visualize the shortest path between them.

PYTHON

orig = ox.distance.nearest_nodes(G, lon1, lat1)
dest = ox.distance.nearest_nodes(G, lon2, lat2)
route = nx.shortest_path(G, orig, dest, weight="length")
ox.plot_graph_route(G, route)
Challenge

Challenge (continued)

Challenge 2 — Estimate Travel Distance

Using your computed route, calculate the total path length:

PYTHON

total_length = sum(
    ox.utils_graph.get_route_edge_attributes(G, route, "length")
)
print("Route length (meters):", total_length)

Convert meters → km:

PYTHON

print(total_length/1000, "km")

Math


A network is represented as a graph:

G = (V,E)

Where:

  • V = set of nodes (intersections)
  • E = edges (roads)

Shortest path = minimum weighted path across E.

Key Points
  • OSMnx simplifies downloading and converting OSM road networks

  • Graphs model movement and connectivity in space

  • NetworkX allows shortest path and routing analysis

  • Visualization helps interpret accessibility patterns

Module Overview

Lesson Overview
Beginner Introduction to Network Analysis and color coding distances.
Advanced Obtains coordinates from OSM and uses centroid analysis to calculate distances and travel times for multiple points of interests.

Content from Spatial Analysis


Last updated on 2025-12-05 | Edit this page

Overview

Questions

  • What is PySAL and what can it do for spatial analysis?
  • How do we compute spatial weights and perform spatial autocorrelation?
  • How do we interpret results like Moran’s I?

Objectives

  • Understand the purpose of PySAL in spatial data science
  • Learn how to load spatial data using GeoPandas
  • Construct spatial weight matrices
  • Compute Global Moran’s I using PySAL
  • Visualize spatial clustering and spatial autocorrelation

Why is PySAL Important?


PySAL (Python Spatial Analysis Library) is one of the most widely used toolkits for working with spatial data in Python. Unlike traditional statistical libraries, PySAL is designed specifically for datasets where location matters — where observations influence nearby observations, and spatial patterns may not be random.

Geographers, urban planners, environmental scientists, epidemiologists, and data analysts use PySAL to identify spatial relationships, detect clustering, and build models that incorporate proximity and geography.

What PySAL Helps Us Understand

  • Where events cluster or disperse across space
  • Whether high or low values form hotspots or coldspots
  • How neighborhoods influence one another (spatial dependency)
  • Spatial inequality patterns in income, population, crime, disease, etc.
  • Geographic diffusion (wildfire spread, disease transmission, migration flows)
  • Environmental change and land-use impacts

Why Researchers Use PySAL

  • Built for spatial statistics — tools that general libraries lack
  • Easy integration with GeoPandas, raster data, and shapefiles
  • Provides standard spatial methods such as:
    • Spatial weights (Queen, Rook, KNN, Distance-based)
    • Global & Local Moran’s I (LISA)
    • Spatial clustering & hotspot detection
    • Spatial regression models
  • Enables data-driven decision making in geography
  • Scales from local studies to large regional/global analyses
  • Helps test spatial hypotheses scientifically instead of visually

Spatial Analysis in Context

Spatial analysis answers questions like:

Question PySAL Method
Do areas with high values cluster together? Moran’s I
Where are hotspots located? Local Moran / LISA maps
What counts as a neighbor? Spatial weights matrices
Are patterns random or significant? Monte Carlo permutation tests
How do variables influence each other across space? Spatial regression

PySAL makes these methods accessible in Python, allowing analysts to move from maps to statistical evidence — revealing underlying spatial patterns that are not visible from visualization alone.

Introduction


PySAL is the Python Spatial Analysis Library — a powerful, open-source toolkit for working with spatial data. It provides tools for:

  • spatial weights
  • spatial autocorrelation
  • clustering
  • spatial regression
  • neighborhood analysis

This tutorial introduces the core PySAL workflow, closely following the structure used in your uploaded notebook.

We will cover:

  1. Loading polygon or point data
  2. Building spatial weights
  3. Running Global Moran’s I
  4. Visualizing results

This tutorial assumes basic familiarity with pandas, geopandas, and Python.

What you need to know for Carpentries lessons:

  1. questions prime the learner for the lesson.
  2. objectives state what skills will be gained.
  3. keypoints summarize what was learned.

1. Loading Spatial Data


PySAL works seamlessly with GeoPandas.
Here’s a simple example using a polygon shapefile:

PYTHON

import geopandas as gpd

gdf = gpd.read_file("data/shapes.shp")
gdf.head()

Plot the boundaries:

PYTHON

gdf.plot(edgecolor="black", figsize=(6,6))

This ensures the geometry is valid and loads correctly.

2. Building Spatial Weights


Spatial weights define who is a neighbor of whom.

PySAL includes:

  • Rook contiguity

  • Queen contiguity

  • K-nearest neighbors

  • Distance-based neighbors

Example: Queen Contiguity

PYTHON

from libpysal.weights import Queen

w = Queen.from_dataframe(gdf)
w.transform = "R"  # row-standardization

Check neighbors of the first polygon:

PYTHON

w.neighbors[0]

3. Global Moran’s I


Moran’s I measures global spatial autocorrelation:

  • Positive values → clustering

  • Negative values → dispersion

  • Near zero → random pattern

Assume the dataset has a numeric column value:

PYTHON

import esda
import numpy as np

y = gdf['value']
mi = esda.Moran(y, w)

View the results:

PYTHON

mi.I, mi.p_sim

Plot the Moran scatterplot:

PYTHON

import splot.esda as esdaplot

esdaplot.moran_scatterplot(mi)

4. Local Moran’s I (Outlier Analysis)


Local Moran’s I finds hotspots and coldspots.

PYTHON

lisa = esda.Moran_Local(y, w)

Add LISA quadrant labels to the GeoDataFrame:

PYTHON

gdf["lisa_cluster"] = lisa.q

Map the clusters:

PYTHON

gdf.plot(column="lisa_cluster", cmap="Set1", figsize=(8,6), legend=True)

This creates a basic LISA cluster map.

Challenge

Challenge

Challenge 1: Create Your Own Spatial Weights

Using the GeoDataFrame loaded above:

  • Create rook contiguity weights

  • Print the neighbor list for observation 10

  • Compare how rook vs queen differ

PYTHON

from libpysal.weights import Rook
w_rook = Rook.from_dataframe(gdf)
w_rook.neighbors[10]

Queen neighbors may include diagonal touches. Rook neighbors require shared edges only. You should see fewer rook neighbors than queen neighbors.

Challenge

Challenge (continued)

Challenge 2: Compute Moran’s I on a New Variable

Choose any numeric variable in your dataset:

  • Extract the variable

  • Compute Moran’s I

  • Interpret whether clustering exists

A positive Moran’s I with low p-value → strong clustering. Near zero → randomness. Negative → spatial dispersion.

Math


Global Moran’s I is defined as:

$ I = \frac{N}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})} {\sum_i (x_i - \bar{x})^2} $

Where:

N = number of observations

W = sum of all spatial weights

w_ij = weight between units i and j

x = variable of interest

Key Points

PySAL provides tools for weights, autocorrelation, clustering, and modeling

Queen and rook weights define spatial neighbors differently

Moran’s I measures global autocorrelation

Local Moran (LISA) identifies hotspots and coldspots

GeoPandas and PySAL together form a powerful spatial analysis workflow

Module Overview

Lesson Overview
Beginner Introduction to Spatial Analysis using PySAL package.
Advanced (to be added)

Content from Spatial Clustering


Last updated on 2025-12-05 | Edit this page

Overview

Questions

  • What is spatial clustering and why do we use it?
  • How can we perform basic clustering on geographic point data?
  • How do algorithms like K-Means, Hierarchical Clustering, and DBSCAN differ?

Objectives

  • Understand the concept of spatial clustering
  • Learn how to prepare point data for clustering
  • Apply K-Means, Hierarchical Clustering, and DBSCAN in Python
  • Visualize clustering results on simple scatterplots and maps

Why is Spatial Clustering Important?


Spatial clustering is a core method in geospatial analysis for identifying how points, people, places, or events are distributed across space. Instead of treating data as isolated observations, clustering helps us detect patterns, revealing where concentrations or groupings occur — and just as importantly, where they do not.

Clustering allows us to transform large sets of point data into meaningful spatial insights that can guide research, decision-making, and planning.

What Spatial Clustering Helps Us Understand

  • Where events or features form geographic hotspots
  • How points group based on proximity or similarity
  • Regions of high vs. low density
  • Patterns of distribution — clustered, dispersed, or random?
  • Spatial relationships in social, environmental, or urban data
  • Location-based trends that maps alone may not reveal

Why Analysts Use Spatial Clustering

  • Reduces complex spatial datasets into interpretable groups
  • Helps detect clusters in public health (disease outbreaks), crime, ecology, and more
  • Identifies emerging hotspots for management or intervention
  • Useful for urban planning, environmental monitoring, and archaeology
  • Works well as a first step for further spatial statistics (PySAL, regression, AI)
  • Enables classification, prediction, and pattern recognition in large datasets

Clustering at a Glance

Method Strength Best For
K-Means Simple, fast Well-separated, circular clusters
Hierarchical Dendrogram visualization Multi-scale grouping, unknown k values
DBSCAN Finds irregular shapes + noise Spatial hotspots and natural patterns

Spatial clustering is often the first analytical step when exploring point distribution. It moves the analysis beyond visual mapping — showing not only where points are located, but how spatial processes shape them.

Introduction


Spatial clustering is a core method used in geography, archaeology, ecology, and urban studies. It helps identify patterns in the spatial distribution of points—such as hotspots of crime, clusters of archaeological artifacts, or regions with similar environmental characteristics.

This beginner tutorial walks you through the fundamentals of spatial clustering using a simple dataset of geographic coordinates. The workflow is entirely in Python, following the structure used in your uploaded notebook.

We will cover:

  • Loading and exploring point data
  • Preparing coordinates for clustering
  • Running three clustering algorithms
  • Visualizing the results

All examples use standard Python libraries:
pandas, geopandas, matplotlib, sklearn, and scipy.

1. Loading Spatial Point Data


Spatial clustering typically starts with a set of point locations. A minimal example:

PYTHON

import pandas as pd

df = pd.read_csv("points.csv")   # contains lon, lat
df.head()

Visualize the raw points:

PYTHON

import matplotlib.pyplot as plt

plt.scatter(df.lon, df.lat, s=10)
plt.title("Raw Spatial Points")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

This simple scatterplot helps identify whether your data already looks clustered.

2. K-Means Clustering


K-Means is the simplest clustering algorithm. It works best when:

  • You know the number of clusters you want

  • Clusters are roughly circular

  • Points are evenly distributed

PYTHON

from sklearn.cluster import KMeans

coords = df[['lon', 'lat']]
kmeans = KMeans(n_clusters=4, random_state=42)
df['kmeans_label'] = kmeans.fit_predict(coords)

Visualize results:

PYTHON

plt.scatter(df.lon, df.lat, c=df.kmeans_label, cmap='tab10')
plt.title("K-Means Clustering")
plt.show()

3. Hierarchical Clustering


Hierarchical clustering builds clusters step-by-step. It is useful when:

  • You want a dendrogram

  • You don’t know the number of clusters beforehand

  • Clusters may have irregular shapes

Example:

PYTHON

from sklearn.cluster import AgglomerativeClustering

agg = AgglomerativeClustering(n_clusters=4)
df['hier_label'] = agg.fit_predict(coords)

Plot:

PYTHON

plt.scatter(df.lon, df.lat, c=df.hier_label, cmap='viridis')
plt.title("Hierarchical Clustering")
plt.show()

4. DBSCAN: Density-Based Clustering


DBSCAN is ideal for spatial datasets because:

  • It finds clusters of any shape

  • It identifies noise points

  • It does not require the number of clusters in advance

Example:

PYTHON

from sklearn.cluster import DBSCAN
import numpy as np

epsilon = 0.01   # distance threshold
db = DBSCAN(eps=epsilon, min_samples=5).fit(coords)

df['dbscan_label'] = db.labels_

Points labeled -1 are noise (outliers).

Plot:

PYTHON

plt.scatter(df.lon, df.lat, c=df.dbscan_label, cmap='Accent')
plt.title("DBSCAN Spatial Clusters")
plt.show()
Challenge

Challenge

Challenge 1: Exploring Your Own Dataset

Using the examples above:

  1. Load your own set of spatial coordinates

  2. Apply K-Means and DBSCAN

  3. Compare the results

Which method performs better, and why?

Example Interpretation

  • K-Means finds evenly divided clusters

  • DBSCAN finds natural geographic groups and labels outliers

  • For irregular spatial patterns, DBSCAN usually performs better

Challenge

Challenge (continued)

Challenge 2: Adjusting DBSCAN Sensitivity

Try changing the eps parameter:

PYTHON

DBSCAN(eps=500, min_samples=5) # eps is in meters
DBSCAN(eps=1000,  min_samples=5)

Larger eps creates larger clusters. Smaller eps creates more clusters and more noise points.

Math


DBSCAN uses density to define clusters. Its key idea:

A point belongs to a cluster if it has at least min_samples neighbors within a distance ε:

$\text{density} = \frac{\text{neighbors}}{\pi \varepsilon^2}$

Higher density regions form clusters; low density points become noise.

Key Points
  • Spatial clustering groups geographic points into meaningful patterns

  • K-Means is simple but assumes circular clusters

  • Hierarchical clustering builds clusters step-wise

  • DBSCAN is best for irregular shapes and detecting noise

  • Always visualize your clusters to interpret them correctly

Module Overview

Lesson Overview
Beginner Introduction to Spatial Clustering using Crime Datasets.
Advanced (to be added)

Content from NDVI Analysis


Last updated on 2025-12-05 | Edit this page

Overview

Questions

  • What is NDVI and why is it useful?
  • How do we calculate NDVI from Landsat imagery?
  • How do we load and visualize raster data in Python?
  • How can we classify and map greenness using NDVI?

Objectives

  • Understand NDVI and the spectral bands needed to compute it
  • Learn to read geospatial raster files using rasterio
  • Calculate NDVI using Red & NIR bands from Landsat
  • Visualize NDVI as a map with color gradients
  • Create a simple vegetation classification from NDVI values

Why is NDVI Important?


NDVI is one of the most widely used vegetation indices in remote sensing because it provides a simple yet powerful way to assess plant health and landscape greenness over large areas. Healthy vegetation strongly reflects Near-Infrared (NIR) light and absorbs Red light for photosynthesis — NDVI takes advantage of this behavior to quantify vegetation vigor.

What NDVI Helps Us Understand

  • Crop health and agricultural productivity
  • Drought severity and water stress
  • Forest cover and vegetation density
  • Urban expansion and land use change
  • Seasonal phenology (spring green-up, fall senescence)
  • Disaster monitoring (wildfire burn severity, storm damage)

Why Researchers Use NDVI

  • It is easy to compute from satellite imagery
  • Works across multiple sensors (Landsat, Sentinel-2, MODIS, etc.)
  • Allows temporal comparison (year-to-year vegetation trends)
  • Useful for ecosystem monitoring & climate change studies
  • Enables land cover classification and biomass estimation
  • Supports decision-making in agriculture and forestry

NDVI Interpretation at a Glance

NDVI Range Interpretation Example Areas
-1 to 0 Water, snow, clouds, barren Lakes, rivers
0–0.2 Bare soil, built-up land Urban areas, deserts
0.2–0.5 Moderate vegetation Grasslands, shrubs
> 0.5 Dense, healthy vegetation Forests, croplands

NDVI is therefore a foundation metric in environmental science — enabling researchers, planners, and ecologists to visualize vegetation patterns, track change through time, and make data-driven decisions about land and resources.

In this lesson, we will compute NDVI for Indiana using Landsat bands and generate maps with Python.

1. Installing Required Libraries


PYTHON

!pip install rasterio matplotlib numpy

2. Import Dependencies


PYTHON

import rasterio
import numpy as np
import matplotlib.pyplot as plt

3. Load Landsat RED and NIR Bands


Make sure your directory contains Landsat .TIF files (Band 4 = Red, Band 5 = NIR).

PYTHON

red = rasterio.open("LC08_L1TP_red.tif")
nir = rasterio.open("LC08_L1TP_nir.tif")

red_band = red.read(1).astype('float32')
nir_band = nir.read(1).astype('float32')

Plot a band to inspect:

PYTHON

plt.imshow(red_band, cmap='Reds')
plt.title("Red Band")
plt.colorbar()
plt.show()

4. Calculate NDVI


PYTHON

ndvi = (nir_band - red_band) / (nir_band + red_band)

Visualize NDVI:

PYTHON

plt.figure(figsize=(7,6))
plt.imshow(ndvi, cmap='YlGn')
plt.colorbar(label="NDVI Value")
plt.title("NDVI Map of Indiana")
plt.show()

5. Classify NDVI into Vegetation Categories


PYTHON

ndvi_class = np.digitize(ndvi, bins=[0, 0.2, 0.5])

# 0 = water/barren, 1 = low vegetation, 2 = dense vegetation
colors = ['blue', 'yellow', 'green']
plt.imshow(ndvi_class, cmap=plt.matplotlib.colors.ListedColormap(colors))
plt.title("NDVI Vegetation Classification")
plt.show()
Challenge

Challenge 1 — Try It Yourself

  • Change the NDVI color map (cmap)

  • Classify NDVI into four categories instead of three

  • Add labels or legends to your final map

PYTHON

bins = [0, 0.2, 0.4, 0.6]
ndvi_class = np.digitize(ndvi, bins=bins)
Challenge

Challenge 2 — Mask Water Pixels

Use NDVI to mask water (<0):

PYTHON

water_mask = ndvi < 0
ndvi_water_removed = np.where(water_mask, np.nan, ndvi)

plt.imshow(ndvi_water_removed, cmap='YlGn')
plt.title("NDVI with Water Masked")
plt.show()

Water regions become transparent/ignored in the plot.

Callout

NDVI is affected by seasonality, cloud cover, and atmospheric effects. Always check metadata to ensure you’re comparing compatible scenes.

Math


NDVI uses reflectance difference between two bands:

NDVI = (NIR - RED)/(NIR + RED)

NIR increases with vegetation health — higher NDVI = greener land.

Key Points
  • NDVI uses Red & NIR reflectance from satellite imagery

  • Landsat Band 4 = Red, Band 5 = NIR for NDVI

  • NDVI ranges from -1 (water) to +1 (healthy vegetation)

  • Python tools: rasterio, numpy, matplotlib

  • NDVI maps reveal vegetation patterns visually and quantitatively

Module Overview

Lesson Overview
Beginner Introduction to NDVI using LANDSAT Dataset.
Advanced (to be added)