Spatial Clustering

Last updated on 2025-12-05 | Edit this page

Estimated time: 101 minutes

Overview

Questions

What is spatial clustering and why do we use it?
How can we perform basic clustering on geographic point data?
How do algorithms like K-Means, Hierarchical Clustering, and DBSCAN differ?

Objectives

Understand the concept of spatial clustering
Learn how to prepare point data for clustering
Apply K-Means, Hierarchical Clustering, and DBSCAN in Python
Visualize clustering results on simple scatterplots and maps

Why is Spatial Clustering Important?

Spatial clustering is a core method in geospatial analysis for identifying how points, people, places, or events are distributed across space. Instead of treating data as isolated observations, clustering helps us detect patterns, revealing where concentrations or groupings occur — and just as importantly, where they do not.

Clustering allows us to transform large sets of point data into meaningful spatial insights that can guide research, decision-making, and planning.

What Spatial Clustering Helps Us Understand

Where events or features form geographic hotspots
How points group based on proximity or similarity
Regions of high vs. low density
Patterns of distribution — clustered, dispersed, or random?
Spatial relationships in social, environmental, or urban data
Location-based trends that maps alone may not reveal

Why Analysts Use Spatial Clustering

Reduces complex spatial datasets into interpretable groups
Helps detect clusters in public health (disease outbreaks), crime, ecology, and more
Identifies emerging hotspots for management or intervention
Useful for urban planning, environmental monitoring, and archaeology
Works well as a first step for further spatial statistics (PySAL, regression, AI)
Enables classification, prediction, and pattern recognition in large datasets

Clustering at a Glance

Method	Strength	Best For
K-Means	Simple, fast	Well-separated, circular clusters
Hierarchical	Dendrogram visualization	Multi-scale grouping, unknown k values
DBSCAN	Finds irregular shapes + noise	Spatial hotspots and natural patterns

Spatial clustering is often the first analytical step when exploring point distribution. It moves the analysis beyond visual mapping — showing not only where points are located, but how spatial processes shape them.

Introduction

Spatial clustering is a core method used in geography, archaeology, ecology, and urban studies. It helps identify patterns in the spatial distribution of points—such as hotspots of crime, clusters of archaeological artifacts, or regions with similar environmental characteristics.

This beginner tutorial walks you through the fundamentals of spatial clustering using a simple dataset of geographic coordinates. The workflow is entirely in Python, following the structure used in your uploaded notebook.

We will cover:

Loading and exploring point data
Preparing coordinates for clustering
Running three clustering algorithms
Visualizing the results

All examples use standard Python libraries:
pandas, geopandas, matplotlib, sklearn, and scipy.

Instructor Note

Learners may need extra time understanding the differences between clustering algorithms. Consider pausing after each method and showing multiple visualizations.

1. Loading Spatial Point Data

Spatial clustering typically starts with a set of point locations. A minimal example:

PYTHON

import pandas as pd

df = pd.read_csv("points.csv")   # contains lon, lat
df.head()

Visualize the raw points:

PYTHON

import matplotlib.pyplot as plt

plt.scatter(df.lon, df.lat, s=10)
plt.title("Raw Spatial Points")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

This simple scatterplot helps identify whether your data already looks clustered.

2. K-Means Clustering

K-Means is the simplest clustering algorithm. It works best when:

You know the number of clusters you want
Clusters are roughly circular
Points are evenly distributed

PYTHON

from sklearn.cluster import KMeans

coords = df[['lon', 'lat']]
kmeans = KMeans(n_clusters=4, random_state=42)
df['kmeans_label'] = kmeans.fit_predict(coords)

Visualize results:

PYTHON

plt.scatter(df.lon, df.lat, c=df.kmeans_label, cmap='tab10')
plt.title("K-Means Clustering")
plt.show()

3. Hierarchical Clustering

Hierarchical clustering builds clusters step-by-step. It is useful when:

You want a dendrogram
You don’t know the number of clusters beforehand
Clusters may have irregular shapes

Example:

PYTHON

from sklearn.cluster import AgglomerativeClustering

agg = AgglomerativeClustering(n_clusters=4)
df['hier_label'] = agg.fit_predict(coords)

Plot:

PYTHON

plt.scatter(df.lon, df.lat, c=df.hier_label, cmap='viridis')
plt.title("Hierarchical Clustering")
plt.show()

4. DBSCAN: Density-Based Clustering

DBSCAN is ideal for spatial datasets because:

It finds clusters of any shape
It identifies noise points
It does not require the number of clusters in advance

Example:

PYTHON

from sklearn.cluster import DBSCAN
import numpy as np

epsilon = 0.01   # distance threshold
db = DBSCAN(eps=epsilon, min_samples=5).fit(coords)

df['dbscan_label'] = db.labels_

Points labeled -1 are noise (outliers).

Plot:

PYTHON

plt.scatter(df.lon, df.lat, c=df.dbscan_label, cmap='Accent')
plt.title("DBSCAN Spatial Clusters")
plt.show()

Challenge

Challenge 1: Exploring Your Own Dataset

Using the examples above:

Load your own set of spatial coordinates
Apply K-Means and DBSCAN
Compare the results

Which method performs better, and why?

Show me the solution

Example Interpretation

K-Means finds evenly divided clusters
DBSCAN finds natural geographic groups and labels outliers
For irregular spatial patterns, DBSCAN usually performs better

Challenge

Challenge (continued)

Challenge 2: Adjusting DBSCAN Sensitivity

Try changing the eps parameter:

PYTHON

DBSCAN(eps=500, min_samples=5) # eps is in meters
DBSCAN(eps=1000,  min_samples=5)

Show me the solution

Larger eps creates larger clusters. Smaller eps creates more clusters and more noise points.

Math

DBSCAN uses density to define clusters. Its key idea:

A point belongs to a cluster if it has at least min_samples neighbors within a distance ε:

$\text{density} = \frac{\text{neighbors}}{\pi \varepsilon^2}$

Higher density regions form clusters; low density points become noise.

Key Points

Spatial clustering groups geographic points into meaningful patterns
K-Means is simple but assumes circular clusters
Hierarchical clustering builds clusters step-wise
DBSCAN is best for irregular shapes and detecting noise
Always visualize your clusters to interpret them correctly

Module Overview

Lesson	Overview
Beginner	Introduction to Spatial Clustering using Crime Datasets.
Advanced	(to be added)