DeepAndes: A Self-Supervised Vision Foundation Model for Multispectral Remote Sensing Imagery of the Andes

Nov 23, 2025, 4:53 PM

Guo, Junlin., Zimmer-Dauphinee, James R., Nieusma, Jordan M., Lu, Siqi., Liu, Quan., Deng, Ruining., Cui, Can., Yue, Jialin., Lin, Yizhe., Yao, Tianyuan., Xiong, Juming., Zhu, Junchao., Qu, Chongyu., Yang, Yuechen., Wilkes, Mitchell., Wang, Xiao., VanValkenburgh, Parker., Wernke, Steven A., & Huo, Yuankai. (2025).听.听IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing,听18, 26983-26999.听

Archaeologists often use听remote sensing, which involves studying landscapes through satellite imagery, to understand how past societies grew, interacted, and adapted over long periods of time. These large-scale surveys can reveal patterns that ground-based fieldwork alone cannot. Their power increases even more when combined with听deep learning听补苍诲听computer vision, which help detect archaeological features automatically. However, traditional supervised deep learning methods struggle because they require huge amounts of detailed annotations, which are difficult and time-consuming to create for subtle archaeological features.

At the same time, new听vision foundation models鈥攍arge, general-purpose computer vision systems鈥攈ave shown impressive performance using minimal annotations. But most of these models are designed for standard听RGB images, not the听multispectral satellite imagery听(including eight different spectral bands) that archaeologists rely on for detecting subtle, buried, or eroded features.

To address this gap, the researchers created听DeepAndes, a transformer-based vision foundation model specifically built for听Andean archaeology. It was trained on听three million multispectral satellite images听and uses a customized version of the听DINOv2 self-supervised learning algorithm, adapted to handle eight-band data. This makes DeepAndes the first foundation model tailored to the Andean region and its archaeological detection challenges.

The team tested DeepAndes on tasks such as classifying difficult, imbalanced image datasets, retrieving specific types of images, and performing pixel-level听semantic segmentation. Across all areas, the model outperformed systems trained from scratch or on smaller datasets, achieving higher听F1 scores,听mean average precision, and听Dice scores, especially in听few-shot learning听situations where only a small number of labeled examples are available.

Overall, these results show that large-scale听self-supervised pretraining听can greatly improve archaeological remote sensing, helping researchers identify ancient sites and landscapes more accurately and efficiently.

Fig. 1.听

Overview of DeepAndes. This figure shows the training dataset (a)鈥�(d) and three domain-specific downstream tasks (e) using DeepAndes鈥攁 vision foundation model designed for multispectral satellite imagery in the Andes region. Particularly, (a) shows a large-scale map of the imagery used to train DeepAndes, highlighting various land cover types, with their area distribution shown in (c). (b) presents the unit sample patch [red box in (a), (b), (d)] with eight spectral bands. (d) illustrates image patching for DINOv2 training, with geospatial sampling densely covering different archaeological sites.

天美传媒官网