informatics | VALIANT /valiant Vanderbilt Advanced Lab for Immersive AI Translation (VALIANT) Thu, 21 Nov 2024 17:49:19 +0000 en-US hourly 1 Aspiring to clinical significance: Insights from developing and evaluating a machine learning model to predict emergency department return visit admissions /valiant/2024/11/21/aspiring-to-clinical-significance-insights-from-developing-and-evaluating-a-machine-learning-model-to-predict-emergency-department-return-visit-admissions/ Thu, 21 Nov 2024 17:49:19 +0000 /valiant/?p=3337 Zhang, Y.; Huang, Y.; Rosen, A.; Jiang, L.G.; McCarty, M.; RoyChoudhury, A.; Han, J.H.; Wright, A.; Ancker, J.S.; Steel, P.A.D. “.” PLOS Digital Health, Volume 3, Issue 9, September 2024, Article e0000606, .

This study developed and validated a machine learning model to predict 72-hour return visit admissions (RVA) to the hospital after an emergency department (ED) discharge. Using data from over 135,000 patients across three urban EDs, researchers tested various algorithms, including advanced methods like DICE and XGBoost, comparing them to an existing clinical risk score. The best model combined DICE and logistic regression, achieving strong predictive accuracy (AUC 0.87 in development data and 0.75 in validation data).

Clinicians reviewed cases identified by the model to understand its strengths and weaknesses. While the model performed well overall, its accuracy varied by diagnosis and some patient groups, limiting immediate clinical usefulness. Insights from this study suggest ways to refine predictions and improve the model’s practical application in enhancing care quality and preventing adverse outcomes.

Fig 1. Inclusion Criteria (Development Site). WCM: Weill Cornell Medicine. LMH: Lower Manhattan Hospital.

]]>
Insufficient evidence for interactive or animated graphics for communicating probability /valiant/2024/11/21/insufficient-evidence-for-interactive-or-animated-graphics-for-communicating-probability/ Thu, 21 Nov 2024 17:40:21 +0000 /valiant/?p=3330 Ancker, J.S.; Benda, N.C.; Zikmund-Fisher, B.J. “” Journal of the American Medical Informatics Association, Volume 31, Issue 11, 2024, pp. 2760-2765, .

This study examined whether interactive or animated visualizations are better than static graphics or numerical formats for explaining health probabilities, such as disease risks or side effects. Researchers reviewed data from a large systematic study and focused on four types of visualizations: simulations of probabilistic events, displays of randomness, tools that simplify information by focusing attention, and those encouraging deeper thinking.

The results showed no strong evidence that interactive or animated visuals improve understanding, risk perception, or health decisions compared to static graphics. The study suggests two possibilities: either the most effective designs haven’t been tested, or these formats are not inherently better. Future research should directly compare novel interactive visuals with traditional graphics to guide health communication strategies, ensuring accessibility for all audiences.

Figure 1.

Interactive animation with original explanatory caption. Clicking on the image caused the pointer to spin around and come to rest at different points around the circle. Reproduced with permission fromArthritis Care and Research(Fraenkel et al7).

]]>
A multimodal approach to support teacher, researcher and AI collaboration in STEM+C learning environments /valiant/2024/11/21/a-multimodal-approach-to-support-teacher-researcher-and-ai-collaboration-in-stemc-learning-environments/ Thu, 21 Nov 2024 17:06:19 +0000 /valiant/?p=3310 Cohn, C.; Snyder, C.; Fonteles, J.H.; Ashwin, T.S.; Montenegro, J.; Biswas, G. British Journal of Educational Technology, 2024,

 

Advances in generative AI and multimodal learning analytics (MMLA) are creating innovative ways to support K-12 students’ collaborative learning in STEM+C fields. However, AI systems alone often struggle to interpret students’ emotions, understand social interactions, or grasp domain-specific challenges, especially in open-ended learning environments. To address these limitations, this study explores a hybrid human-AI approach that combines AI efficiency with human insight.

A collaboration framework is introduced where teachers and researchers use an AI-generated multimodal timeline to guide feedback for students tackling STEM+C problems, such as building computational models. Through discussions with a high school teacher, key moments when students face challenges were identified: the “difficulty threshold” (when problems arise) and the “intervention point” (when teacher feedback should occur). The teacher emphasized the need for a delay between these moments to allow students to work through challenges independently—a feature often missing in AI-driven systems.

Findings show that the multimodal timeline helps teachers provide better feedback while researchers refine the timeline based on teacher input. This iterative process highlights how human-AI collaboration can improve learning outcomes. Future applications could include developing tools for teacher interventions, creating pedagogical agents, and aiding curriculum design.

FIGURE 1

C2STEMTruck Task(Snyder etal.,).

]]>
MARVEL: Bringing Multi-Agent Reinforcement-Learning Based Variable Speed Limit Controllers Closer to Deployment /valiant/2024/11/21/marvel-bringing-multi-agent-reinforcement-learning-based-variable-speed-limit-controllers-closer-to-deployment/ Thu, 21 Nov 2024 16:41:20 +0000 /valiant/?p=3292 Zhang, Y.; Quinones-Grueiro, M.; Zhang, Z.; Wang, Y.; Barbour, W.; Biswas, G.; Work, D. “.” IEEE Access, 2024, .

Variable Speed Limits (VSL) are used worldwide to help manage traffic flow on highways. Most current systems use fixed rules, which can limit their effectiveness in handling different traffic situations. Recent research has explored using advanced machine learning techniques, specifically multi-agent reinforcement learning (MARL), to improve VSL systems. However, existing MARL approaches don’t meet the real-world requirements set by U.S. traffic agencies.

This study introduces a new MARL framework called MARVEL, designed to control VSL on large highway networks while meeting practical deployment needs. MARVEL only uses data from sensors that are commonly available on highways and learns to manage speed limits based on three key traffic goals to ensure it adapts well to different conditions. It shares learned strategies among multiple VSL control points, allowing it to scale across long stretches of road.

The framework was first tested in a detailed traffic simulation with 8 VSL control points over a 7-mile section. Then, it was applied to a larger 17-mile section of Interstate 24 (I-24) near Nashville, Tennessee, involving 34 control points. MARVEL showed significant improvements, increasing traffic safety by 63.4% compared to no VSL control and improving traffic flow by 58.6% compared to the current system used on I-24. The model was also tested using real-world traffic data from I-24, demonstrating its potential for real-world application.

FIGURE 1. We consider a large-scale VSL control problem with multiple gantries evenly distributed along the freeway, where the posted speed limit is identical across lanes for each gantry. Note that there is a traffic sensor collocated with each gantry to provide state input information. We order the VSL agents starting from the most downstream one, i.e., agent 1 manages the most downstream VSL gantry (controller), and agent n manages the most upstream VSL gantry (controller). Interactive Segmentation Model for Placenta Segmentation from 3D Ultrasound Images

]]>
A Tale of Two Comprehensions? Analyzing Student Programmer Attention during Code Summarization /valiant/2024/11/21/a-tale-of-two-comprehensions-analyzing-student-programmer-attention-during-code-summarization/ Thu, 21 Nov 2024 16:37:21 +0000 /valiant/?p=3286 Karas, Z.; Bansal, A.; Zhang, Y.; Li, T.; Mcmillan, C.; Huang, Y. “.” ACM Transactions on Software Engineering and Methodology, Volume 33, Issue 7, 2024, Article 193, .

Code summarization involves creating short, natural language descriptions of source code to help people understand it better. While previous research has looked at how programmers focus on different parts of the code when writing their own summaries, there hasn’t been much study on how they read and understand code with existing summaries. We don’t yet know how these two activities—reading and writing code summaries—compare, or how programmers pay attention to the meaning of the code during these tasks.

To explore this, we conducted an eye-tracking study with 27 participants to see where they focus when reading versus writing code summaries. We analyzed their gaze patterns, finding some differences in attention between the two tasks, as well as similarities in how they read code. We also noticed that factors like experience can influence these patterns. Additionally, we compared their gaze data to a structured representation of the code (Abstract Syntax Tree) and found that their visual focus doesn’t always match up with the actual code structure. These insights can help improve code comprehension in programming education and guide the development of automated tools for summarizing code.

.1. Example stimuli used in the task. In both conditions, the code was displayed on the left, and the summaries, pre-written or participant generated, were located in the top right. In the Reading condition, Likert scale questions for assessing summary quality were presented on the right below the pre-written summary.

]]>
Identification and multimodal characterization of a specialized epithelial cell type associated with Crohn’s disease /valiant/2024/09/22/identification-and-multimodal-characterization-of-a-specialized-epithelial-cell-type-associated-with-crohns-disease/ Sun, 22 Sep 2024 15:41:06 +0000 /valiant/?p=3032
Li, Jia, Simmons, Alan J., Hawkins, Caroline V., Chiron, Sophie, Ramirez-Solano, Marisol A., Tasneem, Naila, Kaur, Harsimran, Xu, Yanwen, Revetta, Frank, Vega, Paige N., Bao, Shunxing, Cui, Can, Tyree, Regina N., Raber, Larry W., Conner, Anna N., Pilat, Jennifer M., Jacobse, Justin, McNamara, Kara M., Allaman, Margaret M., Raffa, Gabriella A., Gobert, Alain P., Asim, Mohammad, Goettel, Jeremy A., Choksi, Yash A., Beaulieu, Dawn B., Dalal, Robin L., Horst, Sara N., Pabla, Baldeep S., Huo, Yuankai, Landman, Bennett A., Roland, Joseph T., Scoville, Elizabeth A., Schwartz, David A., Washington, M. Kay, Shyr, Yu, Wilson, Keith T., Coburn, Lori A., Lau, Ken S., & Liu, Qi. (2024). Identification and multimodal characterization of a specialized epithelial cell type associated with Crohn’s disease. Nature Communications, 15(1), 7204.
This study investigates Crohn’s disease (CD), a chronic inflammatory condition affecting both the gastrointestinal system and other parts of the body due to immune system dysregulation. By analyzing over 202,000 cells from 170 tissue samples across 83 patients, the researchers identified a specific epithelial cell type, termed ‘LND,’ present in both the terminal ileum and ascending colon. These LND cells, which show high expression of genes related to antimicrobial response and immune regulation (such as LCN2, NOS2, and DUOX2), were found to be rare in individuals without inflammatory bowel disease (IBD) but significantly expanded in patients with active CD.

Further in-situ RNA and protein imaging confirmed the presence of LND cells, which interact closely with immune cells and express genes linked to CD susceptibility, suggesting their involvement in the disease’s immune dysfunction. Additionally, the study identified early and late subpopulations of LND cells, each with distinct developmental trajectories. Interestingly, patients with a higher ratio of late-to-early LND cells were more likely to respond positively to anti-TNF treatment, a common therapy for CD. These findings highlight a potentially pathogenic role for LND cells in CD and provide new insights into disease mechanisms and treatment responses.

Single-cell landscape in Crohn’s disease and non-IBD controls.
A Schematic for processing endoscopic and surgical samples from TI and AC for
non-IBD controls, inactive and active CD patients. B Summary of the number of
samples in each group. C UMAP of 155,093 cells from endoscopy samples colored
by cell clusters. D Dotplot showing markers for each cell type. E UMAP of 155,093
cells colored by tissue origin, TI (brown) or AC (blue). F Proportion of each cell
cluster in TI (brown) and AC samples (blue). G UMAP of 155,093 cells colored by
disease status, controls (tan), inactive (green) or active CD (purple). H MDS plot of
cell compositional differences across all endoscopy specimens
]]>
Integration of estimated regional gene expression with neuroimaging and clinical phenotypes at biobank scale /valiant/2024/09/22/integration-of-estimated-regional-gene-expression-with-neuroimaging-and-clinical-phenotypes-at-biobank-scale/ Sun, 22 Sep 2024 15:32:32 +0000 /valiant/?p=3027 Hoang, Nhung, Sardaripour, Neda, Ramey, Grace D., Schilling, Kurt, Liao, Emily, Chen, Yiting, Park, Jee Hyun, Bledsoe, Xavier, Landman, Bennett A., Gamazon, Eric R., Benton, Mary Lauren, & Capra, John A., Rubinov, Mikail.(2024). Integration of estimated regional gene expression with neuroimaging and clinical phenotypes at biobank scale. PLoS Biology, 22(9), e3002782.

This study aims to deepen our understanding of human brain individuality by integrating various large-scale data sets, including genomic, transcriptomic, neuroimaging, and electronic health records. The researchers used computational genomics methods to estimate genetically regulated gene expression (gr-expression) for 18,647 genes across 10 brain regions in over 45,000 people from the UK Biobank. Their analysis revealed that gr-expression patterns align with known genetic ancestry relationships, brain region identities, and gene expression correlations across different regions.

Through transcriptome-wide association studies (TWAS), they discovered 1,065 associations between gr-expression and individual differences in gray matter volumes across people and brain regions. These findings were compared to genome-wide association studies (GWAS) in the same sample, revealing hundreds of novel associations. The study also linked gr-expression to clinical phenotypes by integrating results from the ý Biobank.

Further analysis involved the Human Connectome Project (HCP), where they identified associations between polygenic gr-expression and MRI-based structural and functional brain phenotypes. The results were highly replicable, strengthening the reliability of their findings. Overall, this work offers a valuable new resource for connecting genetically regulated gene expression to brain organization and diseases, advancing our understanding of brain individuality and its clinical relevance.

Estimation of genetically regulated gene expression from genetic data.
(A) Pipeline for estimation of gr-expression with Joint-Tissue Imputation. Left: Joint-Tissue Imputation models are trained on genetic sequences and directly assayed gene expression from postmortem brain samples in the GTEx and PsychEncode projects. Center: The models are trained to estimate gr-expression as a weighted sum of SNPs that are close to the gene of interest along the linear genome. The estimation includes elastic-net regularization because the number of these SNPs typically exceeds the number of samples in the training data. Right: The trained models were used to estimate gr-expression from genetic sequences of neuroimaging-genomic samples in the UK Biobank and the HCP. (B) An illustration of the 10 cortical and subcortical regions with available models of gr-expression. Numbers in parentheses refer to all models that passed baseline performance thresholds for the prediction of observed gene expression on held-out data (r2 > 0.01 and pFDR < 0.05). (C, D) Predictive performance of gr-expression models on held-out data from the GTEx data set. (C) Histograms of r [2], the variance of directly assayed gene expression explained by estimated gr-expression. (D) Histograms of p-values (−log10 pFDR) on these r2 values. Regions are colored as in panel B. FDR, false discovery rate; GTEx, Genotype-Tissue Expression Project; HCP, Human Connectome Project; SNP, single-nucleotide polymorphism.
]]>
Benchmarking clustering, alignment, and integration methods for spatial transcriptomics /valiant/2024/08/22/benchmarking-clustering-alignment-and-integration-methods-for-spatial-transcriptomics/ Thu, 22 Aug 2024 16:36:44 +0000 /valiant/?p=2879 Hu, Yunfei; Xie, Manfei; Li, Yikang; Rao, Mingxing; Shen, Wenjun; Luo, Can; Qin, Haoran; Baek, Jihoon; Zhou, Xin Maizie. Genome Biology, volume 25, Article number: 212 (2024). . Published: 09 August 2024.

Understanding the complexities of tissues and organisms is no small feat. However, scientists are making great strides with a cutting-edge technique called spatial transcriptomics (ST). This method allows us to study tissues at a microscopic level, revealing valuable information about their structure and function.But here’s the catch: analyzing and integrating data from multiple tissue slices and finding meaningful patterns within a single slice can be quite challenging. To overcome this hurdle, researchers have developed several algorithms specifically tailored for ST data analysis. These algorithms help identify distinct spatial regions within a tissue slice and align data from different sources for further analysis.

To guide researchers in choosing the right methods and paving the way for future advancements, a team of scientists conducted a comprehensive benchmarking study. They evaluated various state-of-the-art algorithms by analyzing real and simulated datasets with different sizes, technologies, species, and complexities.The researchers assessed each algorithm using a range of quantitative and qualitative metrics. These metrics included measures of clustering accuracy, visualization techniques to understand spatial relationships, alignment accuracy, and even 3D reconstruction. By considering both method performance and data quality, they provided a holistic evaluation to aid researchers in selecting the best tools for their specific needs.

The team has made all their evaluation code available on GitHub, along with online notebooks and documentation. This ensures transparency and reproducibility, allowing other researchers to validate the benchmarking results and explore new methods using different datasets.In conclusion, this groundbreaking study provides comprehensive recommendations to researchers, offering guidance in choosing optimal tools and inspiring future developments. With these advanced techniques, we are unlocking new possibilities and gaining deeper insights into the fascinating world of complex tissues.

Benchmarking framework for clustering, alignment, and integration methods on different real and simulated datasets. Top, illustration of the set of methods benchmarked, which includes 16 clustering methods, five alignment methods, and five integration methods. Bottom, overview of the benchmarking analysis, in terms of different metrics (1–7). Different experimental metrics and analyses, Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI), Homogeneity (HOM), Average Silhouette Width (ASW), CHAOS, Percentage of Abnormal Spots (PAS), Spatial Coherence Score (SCS), uniform manifold approximation and projection (UMAP) visualization, layer-wise and spot-to-spot alignment accuracy, 3D reconstruction, and runtime, are designed to quantitatively and qualitatively assess method performance as well as data quality. Additional details are provided in the “Results” section
]]>
SCCNAInfer: a robust and accurate tool to infer the absolute copy number on scDNA-seq data /valiant/2024/08/22/sccnainfer-a-robust-and-accurate-tool-to-infer-the-absolute-copy-number-on-scdna-seq-data/ Thu, 22 Aug 2024 16:31:43 +0000 /valiant/?p=2870 Zhang, Liting; Zhou, Xin Maizie; Mallory, Xian. “Bioinformatics, Volume 40, Issue 7, July 2024, btae454, .

In diseases like cancer, changes in our cells called copy number alterations (CNAs) are important to understand. These changes can tell us a lot about how diseases progress. Single-cell DNA sequencing (scDNA-seq) helps researchers detect CNAs in individual cells, but current tools can make mistakes across the entire genome due to wrong estimates of cell chromosome numbers, or “ploidy.”

SCCNAInfer is a new tool designed to improve this process. It uses information from inside tumor cells to more accurately estimate each cell’s ploidy and CNAs. SCCNAInfer works alongside existing CNA detection methods by grouping cells, calculating ploidy for each group, refining the data, and accurately identifying CNAs for each cell.

Tests show that SCCNAInfer does a better job compared to other tools like Aneufinder, Ginkgo, SCOPE, and SeCNV. This new tool can help researchers get clearer insights into cell changes, aiding in the study of cancer and other diseases.

SCCNAInfer is freely available at .

Overview of SCCNAInfer. Raw read count and optionally the segmentation of each cell from an existing tool are the input to SCCNAInfer. If the segmentation result is not provided, SCCNAInfer allows the users to select a state-of-the-art method to produce the segmentation result. Step 1 identifies the normal cells if any, and normalizes the raw read count. Step 2 calculates the pairwise distance among each pair of cells based on the normalized read count and the segmentation result from an existing tool. Given the pairwise distance among the cells, Step 3 clusters the cells by a hierarchical clustering approach which automatically selects the optimal cluster number. Here, K refers to the number of clusters, and E refers to the cost function. Whichever K minimizes E is selected. Step 4 searches the optimal subclonal ploidy (P) for each cluster. For each cluster, whichever P that can minimize a cost function F is selected. Step 5 refines the read count by clustering the bins inside each cell cluster. Finally, based on the corrected read count from Step 5 and the optimal subclonal ploidy from step 4, the absolute copy number for each cell is calculated as the output of SCCNAInfer.
]]>
Deep Learning-Based Open Source Toolkit for Eosinophil Detection in Pediatric Eosinophilic Esophagitis /valiant/2024/06/20/deep-learning-based-open-source-toolkit-for-eosinophil-detection-in-pediatric-eosinophilic-esophagitis/ Thu, 20 Jun 2024 17:14:02 +0000 /valiant/?p=2581 Juming Xiong, Yilin Liu, Ruining Deng, Regina N. Tyree, Hernan Correa, Girish Hiremath, Yaohong Wang, and Yuankai Huo. “.” Proceedings of SPIE Medical Imaging 2024: Digital and Computational Pathology, vol. 12933, 129330X, 2024, San Diego, California

Eosinophilic Esophagitis (EoE) is a chronic, immune/antigen-mediated esophageal disease characterized by symptoms related to esophageal dysfunction and histological evidence of eosinophil-dominant inflammation. Due to the complex microscopic representation of EoE in imaging, current manual identification methods are labor-intensive and prone to inaccuracies.

This study introduces an open-source toolkit, named Open-EoE, designed for end-to-end whole slide image (WSI) level eosinophil (Eos) detection with a single line of command via Docker. The toolkit supports three state-of-the-art deep learning-based object detection models and optimizes performance through an ensemble learning strategy, enhancing precision and reliability.

Experimental results demonstrate that Open-EoE can efficiently detect Eos on a testing set of 289 WSIs. At the widely accepted diagnostic threshold of ≥15 Eos per high power field (HPF) for EoE, Open-EoE achieved an accuracy of 91%, showing good consistency with pathologist evaluations. This suggests a promising avenue for integrating machine learning methodologies into the diagnostic process for EoE.

Open Source Toolkit for EoE DetectionEos detected in WSIOutputsOriginal WSIInputsAggregationSliding WindowObject DetectionEnsembleMaximum Eos count / HPFFigure 1: This figure shows the overview of the Open-EoE Toolkit. The inputs are original WSIs at 40×magnification,while the outputs are the maximum Eos count number and the Eos detected bounding boxes that can ovelay on the original WSIs.
]]>