ÌìÃÀ´«Ã½¹ÙÍø

>

DeepAndes: A Self-Supervised Vision Foundation Model for Multispectral Remote Sensing Imagery of the Andes

Guo, Junlin., Zimmer-Dauphinee, James R., Nieusma, Jordan M., Lu, Siqi., Liu, Quan., Deng, Ruining., Cui, Can., Yue, Jialin., Lin, Yizhe., Yao, Tianyuan., Xiong, Juming., Zhu, Junchao., Qu, Chongyu., Yang, Yuechen., Wilkes, Mitchell., Wang, Xiao., VanValkenburgh, Parker., Wernke, Steven A., & Huo, Yuankai. (2025).Ìý.ÌýIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,Ìý18, 26983-26999.Ìý

Archaeologists often useÌýremote sensing, which involves studying landscapes through satellite imagery, to understand how past societies grew, interacted, and adapted over long periods of time. These large-scale surveys can reveal patterns that ground-based fieldwork alone cannot. Their power increases even more when combined withÌýdeep learningÌý²¹²Ô»åÌýcomputer vision, which help detect archaeological features automatically. However, traditional supervised deep learning methods struggle because they require huge amounts of detailed annotations, which are difficult and time-consuming to create for subtle archaeological features.

At the same time, newÌývision foundation models—large, general-purpose computer vision systems—have shown impressive performance using minimal annotations. But most of these models are designed for standardÌýRGB images, not theÌýmultispectral satellite imageryÌý(including eight different spectral bands) that archaeologists rely on for detecting subtle, buried, or eroded features.

To address this gap, the researchers createdÌýDeepAndes, a transformer-based vision foundation model specifically built forÌýAndean archaeology. It was trained onÌýthree million multispectral satellite imagesÌýand uses a customized version of theÌýDINOv2 self-supervised learning algorithm, adapted to handle eight-band data. This makes DeepAndes the first foundation model tailored to the Andean region and its archaeological detection challenges.

The team tested DeepAndes on tasks such as classifying difficult, imbalanced image datasets, retrieving specific types of images, and performing pixel-levelÌýsemantic segmentation. Across all areas, the model outperformed systems trained from scratch or on smaller datasets, achieving higherÌýF1 scores,Ìýmean average precision, andÌýDice scores, especially inÌýfew-shot learningÌýsituations where only a small number of labeled examples are available.

Overall, these results show that large-scaleÌýself-supervised pretrainingÌýcan greatly improve archaeological remote sensing, helping researchers identify ancient sites and landscapes more accurately and efficiently.

Fig. 1.Ìý

Overview of DeepAndes. This figure shows the training dataset (a)–(d) and three domain-specific downstream tasks (e) using DeepAndes—a vision foundation model designed for multispectral satellite imagery in the Andes region. Particularly, (a) shows a large-scale map of the imagery used to train DeepAndes, highlighting various land cover types, with their area distribution shown in (c). (b) presents the unit sample patch [red box in (a), (b), (d)] with eight spectral bands. (d) illustrates image patching for DINOv2 training, with geospatial sampling densely covering different archaeological sites.