Google's AlphaEarth Foundation embeddings turned land use classification into a simple clustering problem. A non-expert produced granular land use maps in 2 weeks, including building all necessary tools from scratch. Traditional approaches require months of expert analysis, manual labeling, and ground truth validation.

Sat
Detail
80%
Leaflet © OpenStreetMap contributors

From Complex Remote Sensing to Simple Clustering

Traditional satellite analysis requires spectral expertise, atmospheric correction, and temporal compositing. AlphaEarth Foundation eliminates this complexity by providing pre-computed 64-dimensional embeddings that encode a full year of satellite observations, spatial context, and multi-scale features.

Our study area covers 49 square kilometers of central Auroville and surrounding Tamil villages. The 7km × 7km region contains urban cores, agricultural experiments, forest restoration, and traditional settlements. At 10-meter resolution, this creates 490,000 pixels for analysis. The AlphaEarth Foundation embeddings represent the complete 2024 calendar year, providing current land use patterns.

Development Velocity: 2 Weeks from Zero to Maps

This project demonstrates AI-enhanced rapid prototyping in action. We built alpha-bhu for ML processing with K-means clustering implementation, multi-scale analysis workflow, and geospatial data processing. We created geo-darshan for interactive web visualization with real-time cluster display, color mapping, and responsive interface.

The total effort involved one non-expert working for 2 weeks, including failed approaches and complete end-to-end solution development. Approximately half the time was spent exploring dead ends like different clustering algorithms, validation approaches, and visualization strategies that didn't work.

Traditional ML approaches would have required months of expert remote sensing analysis, extensive ground truth collection, supervised classification training, and validation workflows.

Both projects are in early alpha. Normally we don't open source at this maturity level, but we wanted the work available for anyone interested and to build in public.

Three-Step Workflow

The workflow starts by downloading embeddings through Google Earth Engine, which exports AlphaEarth data to Google Drive in minutes. Next, FAISS K-means handles 490,000 pixels across 64 dimensions efficiently on standard hardware. No preprocessing, feature engineering, or hyperparameter tuning is required beyond choosing k. Finally, visual evaluation and manual assignment of labels to clusters completes the process.

Mathematical cluster metrics like silhouette score, Calinski-Harabasz index, or inertia correlated poorly with meaningful land use patterns, making visual inspection and human interpretation essential for creating useful land use categories.

Multi-Scale Clustering Strategy

Rather than searching for optimal k, we use three scales. Broad categories like agriculture versus forest versus built areas emerge at k=22. Intermediate distinctions including crop types, building density, and vegetation structure appear at k=44. Fine-grained patterns showing specific growth stages and infrastructure details become visible at k=88. This approach provides analysis flexibility without separate classification workflows.

Practical Results

The clustering produced an up-to-date land use map identifying distinct landscape features. Fallow fields, established orchards containing cashew, casuarina, and coconut, and dense built environments transitioning to sparse settlements all emerged clearly. The analysis revealed planted forests from restoration efforts, barren areas, major roads, and water bodies. The clustering successfully distinguished between different orchard types, revealing agricultural diversity that would have required extensive field surveys and months of traditional remote sensing analysis to achieve.

Why This Approach Works

Computational efficiency shifts the bottleneck from image processing to downloading embeddings. Clustering 490,000 pixels runs on standard hardware, enabling rapid iteration across different k values. The expertise barrier disappears because no knowledge is required of spectral bands, atmospheric correction, or classification training. Entry requirements drop from graduate-level remote sensing to basic Python, enabled by AlphaEarth's pre-computed embeddings and AI-assisted development.

Rapid prototyping becomes possible through AI collaboration, allowing parallel development of ML pipeline and web interface with continuous iteration and immediate visual feedback. The fail-fast iteration approach enabled rapid exploration of multiple techniques. Alternative clustering methods and validation techniques that didn't work were quickly abandoned, allowing focus on what actually delivered results. AlphaEarth embeddings provide up-to-date land use information without the lag typical of traditional mapping approaches.

Community Applications

This workflow enables communities to map their areas for environmental monitoring, planning support, change detection, and participatory mapping. The approach combines automated clustering with local knowledge for comprehensive area understanding.

Current Limitations

The approach currently relies on visual evaluation only with no quantitative validation. Analysis covers a single time period, and cluster interpretation remains manual.

Credits

Vision and planning: @restlessronin with @grok-4 and @claude-4-sonnet.

Technical implementation: @claude-4-sonnet handled all ML code for alpha_bhu and web interface development for geo_darshan.

Auroville GIS expertise: The Pitchandikulam Forest team, especially Azhagappan Mani.

Copy: @claude-4-sonnet

Showrunner: @restlessronin