datasets | VALIANT /valiant Vanderbilt Advanced Lab for Immersive AI Translation (VALIANT) Thu, 26 Mar 2026 20:17:22 +0000 en-US hourly 1 Loneliness, Anxiety Symptoms, Depressive Symptoms, and Suicidal Ideation in the All of Us Dataset /valiant/2026/03/26/loneliness-anxiety-symptoms-depressive-symptoms-and-suicidal-ideation-in-the-all-of-us-dataset/ Thu, 26 Mar 2026 20:17:22 +0000 /valiant/?p=6368 Katherine Musacchio Schafer; Jacob Franklin; Peter J. Embí; Colin G. Walsh (2026)..JAMA Network Open, 9(3), e260596.

This study examined how feelings of loneliness may help explain the link between anxiety, depression, andsuicidal ideation(thinking about suicide). Using survey data from over 62,000 adults in the U.S., researchers measured anxiety symptoms, depressive symptoms, loneliness, and suicidal thoughts using standard mental health questionnaires. They found that all three—anxiety, depression, and loneliness—were positively related to suicidal ideation, meaning higher levels of each were associated with more frequent suicidal thoughts.

Importantly, the study showed thatloneliness acts as a mediator, meaning it partly explainshowanxiety and depression are connected to suicidal ideation. In other words, people with anxiety or depression may be more likely to feel lonely, and that loneliness, in turn, increases the likelihood of suicidal thoughts. While anxiety and depression still had direct effects on suicidal ideation, loneliness accounted for a meaningful portion of this relationship.

Overall, the findings suggest that addressing loneliness could be a key strategy for reducing suicide risk, alongside treating anxiety and depression. By targeting loneliness—through social support, community engagement, or other interventions—it may be possible to interrupt the pathway from mental health symptoms to suicidal thinking.

Figure 1. Participant Flowchart From the All of Us Research Program

]]>
Microenvironment-aware spatial modeling for accurate inference of cell identity /valiant/2026/01/28/microenvironment-aware-spatial-modeling-for-accurate-inference-of-cell-identity/ Wed, 28 Jan 2026 15:26:03 +0000 /valiant/?p=5652 Liu, Qi; Wang, Yu; Hsu, Chihyuan; Wanjalla, Celestine N.; Lau, Ken S.; & Shyr, Yu. (2026)..Nucleic Acids Research,54(1).

Spatial omics technologies make it possible to measure many molecular features in cells while also preserving information about where those cells are located in a tissue. This spatial context provides valuable insight into how cells are organized and how tissues are structured. New platforms that work at single-cell resolution have further improved our ability to detect cell states that depend on the surrounding microenvironment. However, most existing computational tools for analyzing spatial omics data focus on identifying broad spatial regions rather than determining the identities of individual cells.

Traditional single-cell clustering methods define cell identities using only molecular features inside the cell, such as gene expression, and do not account for how nearby cells and the local tissue environment influence cell behavior. To address this limitation, we introduce MEcell, a method that directly incorporates spatial information and automatically determines how much influence the surrounding environment should have when identifying cell types. MEcell does not require users to tune parameters, making it easier to apply across datasets.

We tested MEcell on 90 simulated datasets and 7 real-world datasets from multiple spatial transcriptomics platforms and tissue types, including MERFISH Vizgen, Xenium, CosMx, Visium HD, Slide seq V2, and open ST. Across all tests, MEcell consistently performed better than existing methods at accurately identifying cell identities. These results show that the local microenvironment plays a crucial role in defining cell identity and demonstrate that MEcell is a powerful tool for capturing the full diversity of cells in spatial omics data.

Figure 1.

The rationale of MEcell. (A) A toy example where a cell’s transcriptionally similar neighbors are located within the same microenvironment, indicating that the microenvironment will play a minimal role in shaping the nearest-neighbor graph. (B) A toy example where a cell’s transcriptionally similar neighbors are located across distinct microenvironments, suggesting that the microenvironment will play a significant influence in shaping the nearest-neighbor graph.

]]>
Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step /valiant/2025/12/19/biomedical-data-repositories-require-governance-for-artificial-intelligence-machine-learning-applications-at-every-step/ Fri, 19 Dec 2025 16:44:58 +0000 /valiant/?p=5557 Clayton, E. W., Rose, S., Nebecker, C., Novak, L., Bensoussan, Y. E., Chen, Y., Collins, B. X., Cordes, A., Evans, B. J., Ferryman, K. S., Hurst, S., Jiang, X., Lee, A. Y., McWeeney, S., Parker, J., Bélisle-Pipon, J.-C., Rosenthal, E. S., Yin, Z., Yracheta, J. M., & Malin, B. A. (2025)..JAMIA Open,8(6), ooaf134.

This article examines the experience of the NIH’s Bridge2AI Program, which funded four large biomedical and behavioral datasets designed to be well documented and ready for use with artificial intelligence (AI) and machine learning (ML). The goal of these datasets is to encourage responsible and effective use of AI in research, but building them raised many ethical, legal, social, and practical challenges. The authors describe the key steps involved in creating and managing these AI-ready datasets, including deciding which data to collect and why, responding to public concerns, handling participant consent based on how the data were obtained, ensuring responsible future use, determining where and how data are stored, clarifying how much control participants have over data sharing, and setting rules for data access and downloading.

Across these steps, the projects faced important questions about long-term data storage, future uses of the data, and how to balance openness with privacy and participant protection. The authors highlight the different choices made by the four projects, such as how they gathered public input, selected data storage solutions, and defined criteria for who can access and download the data. Although the governance approaches varied, common themes emerged, suggesting shared best practices.

Overall, the article summarizes key lessons learned from the Bridge2AI Program about how to collect, manage, and govern large datasets intended for AI and ML. These insights can guide future initiatives in designing datasets that are not only technically useful for AI, but also ethically sound, socially responsible, and trustworthy.

Figure 1.

Steps in governance of data collection and decision-making and responsible use for the development of AI with greater attention to public concerns throughout. The first 2 steps—promoting responsible selection—address the primary work of the DGPs, while the remaining 4 steps—promoting responsible use—are crucial factors the DGPs must consider.

]]> Fine-grained multiclass nuclei segmentation with molecular empowered all-in-SAM model /valiant/2025/11/23/fine-grained-multiclass-nuclei-segmentation-with-molecular-empowered-all-in-sam-model/ Sun, 23 Nov 2025 16:56:52 +0000 /valiant/?p=5466 Li, Xueyuan., Cui, Can., Deng, Ruining., Tang, Yucheng., Liu, Quan., Yao, Tianyuan., Bao, Shunxing., Chowdhury, Naweed Iffat., Yang, Haichun., & Huo, Yuankai. (2025)..Journal of Medical Imaging,12(5), 57501.

Recent advances in computational pathology—the use of computers to analyze tissue images—have been driven byVision Foundation Models (VFMs), particularly theSegment Anything Model (SAM). SAM, a type of VFM, can segment, or outline, cell nuclei using either prompts (zero-shot segmentation) or specialized cell-focused models, allowing it to work across many types of cells. However, general VFMs often struggle with fine-grained tasks, such as identifying specific nuclei subtypes or particular cells.

To address this, we developed themolecular empowered all-in-SAM model, which enhances SAM and VFMs for more precise pathology analysis. Our approach has three key components: (1)annotation, where molecular-informed guidance allows even non-experts to label images without detailed pixel-level work; (2)learning, where SAM is adapted with a SAM adapter to focus on specific cell types and biological features; and (3)refinement, which improves segmentation accuracy through molecular-oriented corrective learning.

Testing on both in-house and public datasets showed that all-in-SAM greatly improves cell classification, even when annotation quality varies. This approach reduces the workload for human annotators and makes precise biomedical image analysis more accessible, especially in resource-limited settings, supporting advances in automated pathology and medical diagnostics using VFMs.

Fig.1

Overall idea of our work: this diagram illustrates the distinctions between our approach (bottom panel) and existing methods. (1)Traditional: expert annotators manually label cells using only PAS images. (2)MOCL: lay annotators provide pixel-level labels under the guidance of IF molecular images, followed by the application of deep learning for segmentation. (3)SAM-L: the SAM technique is utilized to expedite the annotation process, requiring only minimal (box) annotations. (4)All-in-SAM (our method): we integrate SAM in the annotation phase and adaptively fine-tune it during the training of the model.

]]>
Multimodal state-dependent connectivity analysis of arousal and autonomic centers in the brainstem and basal forebrain /valiant/2025/08/25/multimodal-state-dependent-connectivity-analysis-of-arousal-and-autonomic-centers-in-the-brainstem-and-basal-forebrain/ Mon, 25 Aug 2025 19:49:23 +0000 /valiant/?p=5012 Pourmotabbed, Haatef, Martin, Caroline G., Goodale, Sarah E., Doss, Derek J., Wang, Shiyu, Bayrak, Roza G., Kang, Hakmook, Morgan, Victoria L., Englot, Dario J., & Chang, Catie E. (2025). “.” Imaging Neuroscience, 3, IMAG.a.91.

Vigilance, or how alert and awake we are, constantly changes and affects our thinking and behavior. This state can be disrupted in many brain disorders. Certain areas deep in the brain, called neuromodulatory nuclei in the brainstem and basal forebrain, help regulate alertness and drive widespread brain activity and communication. However, it is not well understood how the brain’s large-scale networks change when we shift between being alert and drowsy.

In this study, we used simultaneous EEG (which measures brain electrical activity) and advanced fMRI scans to explore how these arousal centers connect with other parts of the brain depending on vigilance. We found that when people are drowsy, most of these nuclei show stronger global connections, especially to regions like the thalamus, precuneus, and sensory and motor areas. When people are more alert, the nuclei connect most strongly to networks involved in attention, internal thought, and hearing. These patterns remained consistent even after controlling for blood flow effects.

To confirm our findings, we analyzed two large brain imaging datasets and showed that these connectivity patterns are reproducible across different types of fMRI scans. Overall, this study provides new insights into how brain regions that regulate arousal influence large-scale brain activity depending on our level of alertness.

Fig 1 – Reproducible static connectivity profiles of neuromodulatory arousal centers. (a) Static functional connectivity (FC) t-maps of the locus coresuleus (LC), cuneiform/subcuneiform nucleus (CSC), and nucleus basalis of Meynert (NBM) in the VU 3T-ME, HCP 3T, and HCP 7T datasets for the mCSF/WM preprocessing pipeline. The FC t-maps were thresholded at 40% of the top t-values in the gray matter and at p < 0.05 (voxel-wise false discovery rate [FDR]-corrected over the entire gray matter volume). AFNI was used for visualization of the t-maps (@chauffeur_afni function; upper functional range set to the 98thpercentile). (b) Spatial overlap of the thresholded static FC t-maps of the subcortical arousal regions with 16 canonical brain network templates from the FINDLAB and Melbourne atlases (Shirer et al., 2012;Tian et al., 2020). A positive value for the spatial overlap corresponds to mostly positive correlations within the brain network template while a negative value corresponds to mostly negative correlations. (c) Spatial reproducibility (Dice similarity coefficient) of the thresholded static FC t-maps between the three fMRI datasets.

]]> Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets /valiant/2025/08/25/scalable-quality-control-on-processing-of-large-diffusion-weighted-and-structural-magnetic-resonance-imaging-datasets/ Mon, 25 Aug 2025 19:45:55 +0000 /valiant/?p=5008 Kim, Michael E., Gao, Chenyu, Newlin, Nancy R., Rudravaram, Gaurav, Krishnan, Aravind R., Ramadass, Karthik, Kanakaraj, Praitayini, Schilling, Kurt G., Dewey, Blake E., & Bennett, David Alan. (2025). “.” PLOS ONE, 20(8), e0327388.

Careful quality control (QC) is essential when working with large medical imaging datasets, because poor-quality data can lead to wrong conclusions or poorly trained machine learning models. However, QC can be very time consuming. Most existing methods try to save time using automated tools that detect unusual data points, but these tools cannot catch every mistake. This means researchers still need to visually check the results of data processing in a reliable and scalable way.

In this study, we designed a QC pipeline for a large collection of brain scans, including diffusion-weighted and structural MRI. Our method was built to: (1) provide a consistent way for teams of researchers to perform and manage QC, (2) allow fast visualization of preprocessed data so the process is quicker without sacrificing quality, and (3) make it easy to combine and share QC results across datasets and pipelines.

We tested our method by comparing it to an automated QC approach on a set of 1,560 brain scans, and by measuring how much agreement there was between different researchers performing QC. The results showed mostly high agreement among researchers and only small differences compared to the automated method. Overall, while visual QC still takes time, our approach makes the process more streamlined and efficient.

Fig 1. Issues with automatic and team-based QC.

When maintaining large neuroimaging datasets with multiple processing pipelines, shallow quality control processes that rely on derived metrics can fail to catch instances of algorithmic failures. However, deep QC processes quickly become unscalable and inefficient as the amount of data available increases due to the required time for mass visualization of outputs. For example, opening 50,000 T1w images separately in an image viewer for deep QC can take over 60 hours if it takes five seconds to load images in and out of the viewer. Team driven efforts to alleviate such large time costs come with additional challenges due to inconsistencies in reporting and methods of performing QC.

]]>
Edge Classification on Graphs: New Directions in Topological Imbalance /valiant/2025/04/23/edge-classification-on-graphs-new-directions-in-topological-imbalance/ Wed, 23 Apr 2025 13:57:14 +0000 /valiant/?p=4214 Cheng, Xueqi; Wang, Yu; Liu, Yunchao; Zhao, Yuying; Aggarwal, Charu C.; Derr, Tyler. “WSDM 2025 – Proceedings of the 18th ACM International Conference on Web Search and Data Mining (2025): 392-400. .

Recent years have seen great success in using Graph Machine Learning (GML) for tasks like node and graph classification, and predicting links between nodes. However,edge classification—which has many real-world uses, such as analyzing social networks and improving cybersecurity—has not progressed as much, even with the rise of GML methods.

To address this gap, our study presents a comprehensive approach to edge classification. We identify a novel problem called theTopological Imbalance Issue, which happens when edges are unevenly distributed across classes. This imbalance affects the structure around each edge and reduces classification performance.

Inspired by recent work showing how node classification accuracy can vary with local graph patterns, we explore whether similar local structure differences affect edge classification. To do this, we introduceTopological Entropy (TE)—a new metric that measures how imbalanced the local edge class distribution is. Our results show that TE accurately reflects this local imbalance and that focusing on edges with high TE can improve edge classification.

Based on this insight, we propose two strategies:

  1. Topological Reweighting, which adjusts training weights based on TE, and
  1. TE Wedge-based Mixup, which creates synthetic edges between highly imbalanced areas to improve training.

We combine these into a new edge classification strategy calledTopoEdge, designed specifically to address topological imbalance. Experiments on real-world datasets show that our methods significantly improve performance. Our code and data are available at, and we also provide curated datasets and testing setups to serve as a new benchmark for future edge classification research.

]]>