Spatial Clustering

Last updated on 2025-12-05 | Edit this page

Estimated time: 101 minutes

Overview

Questions

  • What is spatial clustering and why do we use it?
  • How can we perform basic clustering on geographic point data?
  • How do algorithms like K-Means, Hierarchical Clustering, and DBSCAN differ?

Objectives

  • Understand the concept of spatial clustering
  • Learn how to prepare point data for clustering
  • Apply K-Means, Hierarchical Clustering, and DBSCAN in Python
  • Visualize clustering results on simple scatterplots and maps

Why is Spatial Clustering Important?


Spatial clustering is a core method in geospatial analysis for identifying how points, people, places, or events are distributed across space. Instead of treating data as isolated observations, clustering helps us detect patterns, revealing where concentrations or groupings occur — and just as importantly, where they do not.

Clustering allows us to transform large sets of point data into meaningful spatial insights that can guide research, decision-making, and planning.

What Spatial Clustering Helps Us Understand

  • Where events or features form geographic hotspots
  • How points group based on proximity or similarity
  • Regions of high vs. low density
  • Patterns of distribution — clustered, dispersed, or random?
  • Spatial relationships in social, environmental, or urban data
  • Location-based trends that maps alone may not reveal

Why Analysts Use Spatial Clustering

  • Reduces complex spatial datasets into interpretable groups
  • Helps detect clusters in public health (disease outbreaks), crime, ecology, and more
  • Identifies emerging hotspots for management or intervention
  • Useful for urban planning, environmental monitoring, and archaeology
  • Works well as a first step for further spatial statistics (PySAL, regression, AI)
  • Enables classification, prediction, and pattern recognition in large datasets

Clustering at a Glance

Method Strength Best For
K-Means Simple, fast Well-separated, circular clusters
Hierarchical Dendrogram visualization Multi-scale grouping, unknown k values
DBSCAN Finds irregular shapes + noise Spatial hotspots and natural patterns

Spatial clustering is often the first analytical step when exploring point distribution. It moves the analysis beyond visual mapping — showing not only where points are located, but how spatial processes shape them.

Introduction


Spatial clustering is a core method used in geography, archaeology, ecology, and urban studies. It helps identify patterns in the spatial distribution of points—such as hotspots of crime, clusters of archaeological artifacts, or regions with similar environmental characteristics.

This beginner tutorial walks you through the fundamentals of spatial clustering using a simple dataset of geographic coordinates. The workflow is entirely in Python, following the structure used in your uploaded notebook.

We will cover:

  • Loading and exploring point data
  • Preparing coordinates for clustering
  • Running three clustering algorithms
  • Visualizing the results

All examples use standard Python libraries:
pandas, geopandas, matplotlib, sklearn, and scipy.

Learners may need extra time understanding the differences between clustering algorithms. Consider pausing after each method and showing multiple visualizations.

1. Loading Spatial Point Data


Spatial clustering typically starts with a set of point locations. A minimal example:

PYTHON

import pandas as pd

df = pd.read_csv("points.csv")   # contains lon, lat
df.head()

Visualize the raw points:

PYTHON

import matplotlib.pyplot as plt

plt.scatter(df.lon, df.lat, s=10)
plt.title("Raw Spatial Points")
plt.xlabel("Longitude")
plt.ylabel("Latitude")
plt.show()

This simple scatterplot helps identify whether your data already looks clustered.

2. K-Means Clustering


K-Means is the simplest clustering algorithm. It works best when:

  • You know the number of clusters you want

  • Clusters are roughly circular

  • Points are evenly distributed

PYTHON

from sklearn.cluster import KMeans

coords = df[['lon', 'lat']]
kmeans = KMeans(n_clusters=4, random_state=42)
df['kmeans_label'] = kmeans.fit_predict(coords)

Visualize results:

PYTHON

plt.scatter(df.lon, df.lat, c=df.kmeans_label, cmap='tab10')
plt.title("K-Means Clustering")
plt.show()

3. Hierarchical Clustering


Hierarchical clustering builds clusters step-by-step. It is useful when:

  • You want a dendrogram

  • You don’t know the number of clusters beforehand

  • Clusters may have irregular shapes

Example:

PYTHON

from sklearn.cluster import AgglomerativeClustering

agg = AgglomerativeClustering(n_clusters=4)
df['hier_label'] = agg.fit_predict(coords)

Plot:

PYTHON

plt.scatter(df.lon, df.lat, c=df.hier_label, cmap='viridis')
plt.title("Hierarchical Clustering")
plt.show()

4. DBSCAN: Density-Based Clustering


DBSCAN is ideal for spatial datasets because:

  • It finds clusters of any shape

  • It identifies noise points

  • It does not require the number of clusters in advance

Example:

PYTHON

from sklearn.cluster import DBSCAN
import numpy as np

epsilon = 0.01   # distance threshold
db = DBSCAN(eps=epsilon, min_samples=5).fit(coords)

df['dbscan_label'] = db.labels_

Points labeled -1 are noise (outliers).

Plot:

PYTHON

plt.scatter(df.lon, df.lat, c=df.dbscan_label, cmap='Accent')
plt.title("DBSCAN Spatial Clusters")
plt.show()
Challenge

Challenge

Challenge 1: Exploring Your Own Dataset

Using the examples above:

  1. Load your own set of spatial coordinates

  2. Apply K-Means and DBSCAN

  3. Compare the results

Which method performs better, and why?

Example Interpretation

  • K-Means finds evenly divided clusters

  • DBSCAN finds natural geographic groups and labels outliers

  • For irregular spatial patterns, DBSCAN usually performs better

Challenge

Challenge (continued)

Challenge 2: Adjusting DBSCAN Sensitivity

Try changing the eps parameter:

PYTHON

DBSCAN(eps=500, min_samples=5) # eps is in meters
DBSCAN(eps=1000,  min_samples=5)

Larger eps creates larger clusters. Smaller eps creates more clusters and more noise points.

Math


DBSCAN uses density to define clusters. Its key idea:

A point belongs to a cluster if it has at least min_samples neighbors within a distance ε:

$\text{density} = \frac{\text{neighbors}}{\pi \varepsilon^2}$

Higher density regions form clusters; low density points become noise.

Key Points
  • Spatial clustering groups geographic points into meaningful patterns

  • K-Means is simple but assumes circular clusters

  • Hierarchical clustering builds clusters step-wise

  • DBSCAN is best for irregular shapes and detecting noise

  • Always visualize your clusters to interpret them correctly

Module Overview

Lesson Overview
Beginner Introduction to Spatial Clustering using Crime Datasets.
Advanced (to be added)