cancer | VALIANT

Mixed-model and transcriptome-wide association analyses identify transcription factors and genes associated with colorectal cancer susceptibility

waddelma — Wed, 25 Feb 2026 02:27:24 +0000

Chen, Zhishan; Song, Wenqiang; Li, Qing; Li, Chao; Wen, Wanqing; Huyghe, Jeroen R.; Law, Philip J.; Fernandez-Rozadilla, Ceres; Timofeeva, M. N.; Thomas, Minta; Schmit, Stephanie L.; Martin, Vicente; Devall, Matthew A. M.; Dampier, Christopher Heaton; Moratalla-Navarro, Ferran; Cai, Qiuyin; Wang, Jifeng; Shi, Jiajun; Kweon, Sun-seog; Tanikawa, Chizu; Jia, Weihua; Shu, Xiang; Long, Jirong; Gao, Jing; Kim, Jeongseon; Shin, Aesun; Matsuo, Keitaro; Jee, Sun-ha; Jung, Keum-ji; Wang, Nan; Kim, Dong-hyun; Ping, Jie; Yang, Gong; Shin, Minho; Ren, Zefang; Oh, Jae-hwan; Oze, Isao; Ahn, Yoon-ok; Gao, Yutang; Pan, Zhizhong; Kamatani, Yoichiro; van Kaer, Luc; Wu, Lan; Li, Bingshan; Matsuda, Koichi; Shu, Xiaoou; Hsu, Li; Dunlop, Malcolm G.; Gruber, Stephen Bernard; Houlston, Richard S.; Tomlinson, Ian P. M.; Li, Li; Lau, Ken S.; Moreno, Victor R.; Casey, Graham R.; Peters, Ulrike; Zheng, Wei; & Guo, Xingyi. (2026).��.��Nature Communications, 17(1), 1377.��

Transcription factors (TFs) are proteins that bind to DNA and turn genes on or off. Some genetic variants linked to colorectal cancer (CRC) may change how these transcription factors bind to DNA, but the specific TFs involved are not well understood. In this study, researchers analyzed 218 TF ChIP-Seq datasets, which map where transcription factors bind across the genome, together with large genome-wide association study (GWAS) data from more than 100,000 people with CRC and over 150,000 people without CRC from East Asian and European populations. They identified 51 transcription factors and TF–cofactor interactions, including cofactors of the vitamin D receptor (VDR), as important regulators of CRC risk.

To better understand how these regulatory changes affect genes, the researchers combined their findings with transcriptome-wide association studies (TWAS), which estimate how genetically predicted gene expression relates to disease risk. They also examined alternative splicing (different ways RNA messages are assembled) and alternative polyadenylation (differences in how RNA molecules are finalized) using RNA sequencing data from individuals of Asian and European ancestry. This multi-ancestry TWAS identified 222 genes associated with CRC risk, including 95 newly discovered genes and 48 genes that may be possible drug targets. Single-cell RNA sequencing provided additional biological support for about 45 percent of these genes, and laboratory experiments confirmed that three genes—RHPN2, IRS2, and TXN—have cancer-promoting, or oncogenic, roles. Overall, this study maps important transcription factor–gene regulatory networks and uncovers new genes that contribute to colorectal cancer risk.

Fig. 1: Associations between TFs with CRC risk using generalized linear mixed models.

A��A flow chart to illustrate the integrative analysis of ChIP-seq data (n = 218) for 84 TFs and CRC GWAS summary statistics from 100,204 cases and 154,587 controls of European and East Asian ancestry.��B��A total of 51 identified TFs with genetic variation of TF-DNA bindings significantly associated with CRC risk.��P-values were determined by a two-sided Wald Z test. The dashed line represents a Bonferroni-corrected��P < 0.05.��C��The host motifs of identified TFs were enriched in their ChIP-seq peaks.��D��Analysis of co-occupied binding regions of the top 10 CRC risk-associated TFs. Venn diagrams in the upper-right triangle show the number of genetic variants (multiplied by 1000) that are occupied by specific TFs or co-occupied by two TFs in each TF pair. Bar plots in the lower-left triangle show the association strengths (regression coefficients) for the genetic variants occupied by two TFs (only the first TF and only the second TF, respectively) as indicated from left to right. Two TFs with significant interactions at the Bonferroni-threshold of��P �ĉ&��; �ĉ3.92 ×�ĉ10⁻⁵(0.05/1,275 TF pairs from 51 TFs) are highlighted in red.��P-values were determined by a two-sided Wald Z test.

Mutant GNAS drives a pyloric metaplasia with tumor suppressive glycans in intraductal papillary mucinous neoplasia

waddelma — Wed, 28 Jan 2026 16:19:01 +0000

Quoc-Huy Trinh, Vincent; Ankenbauer, Katherine E.; Torbit, Sabrina M.; Taranto, Christopher P.; Liu, Jiayue; Batardiere, Maelle; Kumar, Bhoj; Carlo Maurer, Hans; Revetta, Frank L.; Chen, Zhengyi; Kruse, Angela R.S.; Judd, Audra M.; Copeland, Celina; Wong, Jahg; Ben-Levy, Olivia; Jarvis, Brenda; Brown, Monica E.; Brown, Jeffrey W.; Das, Koushik K.; Makino, Yuki; Spraggins, Jeffrey M.; Lau, Ken S.; Azadi, Parastoo; Maitra, Anirban; Tan, Marcus Chuan Beng; & DelGiorno, Kathleen E. (2025).��.��Cell Reports,��44(12), 116684.��

Intraductal papillary mucinous neoplasms (IPMNs) are cystic lesions of the pancreas and are established precursors to pancreatic ductal adenocarcinoma (PDAC), one of the most lethal solid cancers. Although about 90 percent of IPMNs are detected before PDAC develops, there are no reliable biomarkers to distinguish benign from malignant lesions, which leads to many unnecessary surgical resections. Recent work has shown that pancreatic precancer often adopts a pyloric phenotype, meaning the cells resemble those found in the pylorus of the stomach.

To identify the regulators of this cellular plasticity, the study analyzed cell lines, organoids, mouse models of IPMNs, and human patient samples using multiplex immunostaining, RNA sequencing, glycosylation profiling, and computational analyses. These approaches revealed that the GNASR201C mutation promotes an indolent, or slow growing, phenotype in IPMNs by enhancing a differentiated pyloric program through the transcription factors SPDEF and CREB3L1. This pyloric state is associated with specific patterns of glycosylation, or sugar modifications on proteins.

GNASR201C acts as a glycan rheostat, meaning it shifts the balance of cell surface sugars by increasing LacdiNAc structures while reducing pro tumorigenic acidic Lewis epitopes. This glycan switch suppresses cancer cell invasion and slows disease progression. Importantly, LacdiNAcs and 3′-sulfo-LeA/C glycans are mutually exclusive, and their presence may serve as biomarkers to stratify IPMN patients by cancer risk and guide surgical decision making.

Figure 1��Human IPMN are characterized by pyloric metaplasia

(A) Bar plots comparing the expression of��MUC5AC,��AQP5, or��CD44��in the epithelium and stroma in IPMNs (yellow,��n��= 19), PanIN (green,��n��= 26), and PDAC (purple,��n��= 197 epithelium, 124 stroma).

(B) Hematoxylin staining and pseudo-colored immunohistochemical staining for MUC5AC (red), AQP5 (yellow), or CD44v9 (green), top row, and automated detection of signal (bottom row) by QuPath to merge MxIHC data.Scale bars, 20 μm.

(D) Quantification of staining in (B)–(C) for 40 IPMN patients, including normal ducts (ND;��n��= 95–109), acinar-to-ductal metaplasia (ADM;��n��= 109), low-grade IPMN (LG;��n��= 110), high-grade IPMN (HG;��n��= 70), and invasive IPMN (Inv;��n��= 22).

^∗p��< 0.05;��^∗∗p��< 0.01;��^∗∗∗p��< 0.005; and��^∗∗∗∗p��< 0.001.

SCGclust: Single-Cell Graph Clustering Using Graph Autoencoders That Integrate SNVs and CNAs

waddelma — Wed, 28 Jan 2026 15:36:09 +0000

Potu, Teja; Hu, Yunfei; Wang, Judy; Chi, Hongmei; Khan, Rituparna; Dharani, Srinija; Ni, Jingchao; Zhang, Liting; Zhou, Xin Maizie; & Mallory, Xian Fan. (2026).��.��Mathematics,��14(1), 46.��

Intra-tumor heterogeneity, or ITH, refers to the differences between cells within a single tumor, which can affect cancer outcomes and treatment responses. Single-cell DNA sequencing (scDNA-seq) allows researchers to study these differences at the level of individual cells. Low-coverage scDNA-seq can analyze many cells, but accurately grouping or clustering cells is essential to understand the tumor’s complexity. Most existing methods use either single-nucleotide variations (SNVs) or copy number alterations (CNAs) alone to cluster cells, even though both types of signals reflect subpopulations within the tumor. To address this, we developed a new cell-clustering tool that combines SNV and CNA information using a graph autoencoder. This model is trained alongside a graph convolutional network to ensure meaningful clusters and prevent all cells from being grouped together. The resulting low-dimensional cell representations are then clustered using a Gaussian mixture model. When tested on eight simulated datasets and one real cancer sample, our method outperformed existing SNV-only and CNA-only approaches, showing that integrating both types of genetic information improves the accuracy of identifying distinct cell populations within tumors.

Figure 1.��Overview of SCGclust. (A). There are two inputs to SCGclust, the cell by SNV matrix (top) and the cell by genomic region matrix (bottom). The cell by SNV matrix has entries “1”, “0”, and “3”. The “1” and “0” entries represent that the SNV is present or absent in the cell, respectively. The “3” entries represent that there is no read covering the site, and thus, the signal is missing. The cell by genomic region matrix has the read count for each genomic region at each cell. Six cells (C1–C6) are shown as an illustration. It can be observed that C1–C3 and C4–C6 have relatively similar SNV and CNA profiles, respectively. (B). The two matrices are then used as the edge weight and the node feature for the graph autoencoder. The graph autoencoder has six nodes, representing the six cells. On top of each node is a vector of the node feature, which uses the cosine similarity vector of the read count that reflects the CNA signal. Between every two nodes is an edge weight, represented by the Euclidean distance of the SNV profiles between the two cells. Here, C1, C2, and C3 have larger edge weights (thicker edges) because their SNV profiles are closer to each other. Similarly, C4, C5, and C6 have larger edge weights (thicker edges). (C). The built graph is the input for the graph autoencoder, which reduces the dimensions of the node features in the encoder and recovers the original node features in the decoder. The dimension reduction process also considers the edge weight such that two cells with similar SNV profiles will have more similar embedding in the low dimension. The graph autoencoder has four layers in total; layers 1 and 2 are the encoder, and layers 3 and 4 are the decoder. (D). A graph convolutional network (GCN) is co-trained with the graph autoencoder, with the objective function composed of three terms: the reconstruction mean squared error (MSE) term, the modularity term, and the collapse regularization term. (E). Finally, we performed the cell clustering based on each cell’s embedded low dimension from the graph autoencoder using the Gaussian mixture model.

Large-scale integration of omics and electronic health records to identify potential risk protein biomarkers and therapeutic drugs for cancer prevention

waddelma — Wed, 28 Jan 2026 15:23:57 +0000

Li, Qing; Song, Qingyuan; Chen, Zhishan; Choi, Jung-yoon; Moreno, Victor R.; Ping, Jie; Wen, Wanqing; Li, Chao; Shu, Xiang; Yan, Jun; Shu, Xiaoou; Cai, Qiuyin; Long, Jirong; Huyghe, Jeroen R.; Pai, Rish K.; Gruber, Stephen Bernard; Yang, Yaohua; Casey, Graham R.; Wang, Xusheng; Toriola, Adetunji T.; Li, Li; Singh, Bhuminder; Lau, Ken S.; Zhou, Li; Zhang, Zichen; Wu, Chong; Peters, Ulrike; Zheng, Wei; Long, Quan; Yin, Zhijun; & Guo, Xingyi. (2026).��.��American Journal of Human Genetics,��113(1), 41–56.��

Finding the right proteins to target and the drugs that act on them is essential for preventing cancer. In this study, we combined and closely analyzed data from large genome-wide association studies covering six common cancers: breast, colorectal, lung, ovarian, pancreatic, and prostate cancer. We identified 710 genetic variants that are independently linked to cancer risk. By connecting these variants to protein quantitative trait loci using blood-based proteomics data from more than 75,000 people, we found 365 proteins associated with cancer risk.

Further analysis showed that 101 of these proteins are very likely to play a direct role in cancer development, including 74 that have not been reported before. Among them, 36 proteins appear to be potentially druggable, meaning they could be targeted by existing or future medications. To explore real-world effects, we analyzed more than 3.5 million electronic health records and carried out emulated clinical trials comparing 11 commonly used drugs across 290 scenarios. We identified three drugs that were associated with a lower risk of colorectal cancer, including caffeine compared with paroxetine, haloperidol compared with prochlorperazine, and trazodone hydrochloride compared with paroxetine. In contrast, caffeine was linked to a higher cancer risk when compared with finasteride for colorectal cancer and fluoxetine for breast cancer.

A combined analysis across studies identified six drugs that were significantly associated with cancer risk. One of these, acetazolamide, was associated with a reduced risk of colorectal cancer. Overall, this study uncovers previously unknown protein biomarkers and potential drug targets across six major cancer types and highlights several already approved drugs that may have promise for cancer prevention.

Figure 1��Overview of the analytical framework

(A) An illustration depicting the identification of proteins associated with the risk of the six major cancers: breast, lung, colorectal, ovarian, pancreatic, and prostate. Population-based proteomics data (for pQTLs) and GWAS data resources (for identifying lead variants) utilized in this study are shown on the left. Meta-analyses of��cis-pQTLs from ARIC and deCODE, conducted through the SOMAscan platform, were combined with pQTL results from the UKB-PPP to identify potential risk proteins, as depicted in the middle images. Colocalization analyses between GWAS summary statistics and��cis-pQTLs were performed to identify cancer risk proteins with high confidence, as illustrated on the right.

(B) The proteins with evidence of colocalization annotated based on drug-protein information from four databases: DrugBank, ChEMBL, TTD, and Open Targets.

(C) The framework for evaluating the effects of drugs approved for indications on cancer risk. The inverse probability of treatment weighting (IPTW) framework was utilized to construct emulations of treated-control drug trials based on millions of patients’ electronic health records stored at VUMC SD (left). In these emulations, the Cox proportional hazard model was conducted for each trial to assess the hazard ratio (HR) of cancer risk between the treated focal drug and the control drug (right).

Embedding Sustainability into the Imaging and Care of Patients with Cancer

waddelma — Thu, 23 Oct 2025 19:21:12 +0000

Northrup, Benjamin E.; Hanneman, Kate A.; Lichter, Katie E.; Rockall, Andrea G.; Zigmund, Beth; D’Anna, Gennaro; Zhang, Zhuoli; Osborne, Joseph R.; Silva, Genevieve S.; Waeldner, Kathleen; Omary, Reed A. (2025 Radiology: Imaging Cancer, 7(6), e250054.

As climate change becomes more serious, it’s important to consider how healthcare, including cancer care, affects the environment. Imaging tests and image-guided procedures—such as CT scans, MRIs, and targeted treatments—play a major role in diagnosing and managing cancer. These technologies have improved patient outcomes, but they also produce carbon emissions and medical waste. This review looks at how cancer imaging and related treatments impact the environment, discusses what sustainable cancer care could look like, and suggests practical ways to reduce the environmental footprint of cancer care while continuing to provide high-quality treatment.

Figure 1:��Sources of greenhouse gas (GHG) emissions in cancer imaging categorized by scope. Scope 1 includes direct emissions, those from sources that an organization owns or controls directly. Scope 2 includes indirect emissions, those that come from where an organization’s energy is produced. Scope 3 includes all sources not covered in scope 1 or 2, including those created by an organization’s value chain.

3D collagen high-throughput screen identifies drugs that induce epithelial polarity and enhance chemotherapy response in colorectal cancer

waddelma — Fri, 26 Sep 2025 19:50:02 +0000

Harmych, Sarah J., Hasaka, Thomas P., Sievers, Chelsie K., Kang, Seung-woo, Ramirez, Marisol Adelina, Jones, Vivian Truong, Zhao, Zhiguo, Kovtun, Oleg, Wahoski, Claudia C., & Liu, Qi. (2025). Communications Biology, 8(1), 1261.

One of the hallmarks of cancer is the loss of cell polarity—the way cells normally organize themselves. In colorectal cancer (CRC), this loss is tied to a process called epithelial-to-mesenchymal transition (EMT), which affects how aggressive the cancer is and how well treatments work. But the mechanisms behind these EMT-related changes, and the drugs that might reverse them, are not well studied. This is partly because traditional lab tests either cannot capture changes in cell shape (in 2D cultures) or are not reliable enough in 3D cultures.

To address this, we created a high-throughput screening method using 3D collagen cultures of CRC cells to study changes in colony shape. With this approach, we identified several FDA-approved drugs that helped CRC cells regain more normal, epithelial-like features. One of these drugs, azithromycin, made the cancer cell colonies rounder, improved the placement of key proteins (E-cadherin and ZO-1) that keep cells connected, triggered gene expression changes consistent with reversing EMT, and made the cancer cells more sensitive to the chemotherapy drug irinotecan.

Looking at patient data, we also found that CRC patients who received azithromycin while being treated with irinotecan had better 5-year survival rates compared to those who received chemotherapy alone. These findings show the value of studying cancer cell shape in 3D screens and point to new possibilities for drug repurposing and combination therapies in colorectal cancer.

Fig. 1: High-throughput drug screen using 3D type I collagen cultures identifies morphological clusters.

A��Schematic of plating, treatment, and analysis of drug screen. Chilled 384-well plates were stamped with a 5 µL bottom layer of type I collagen and allowed to solidify. 1000 SC cells in 10 µL collagen were then added and allowed to solidify. Next, 1059 compounds from an FDA-approved drug library were added at three concentrations and incubated for 8 days. Wells were then stained with Calcein AM, imaged, and analyzed to assess colony morphology. Created in BioRender. (Harmych, S. (2025)��).��B��Pie graph depicting functional targets of drugs tested during screen.��C��Principal component analysis (PCA) plot of morphological characteristics of colonies from high-throughput drug screen. Morphological characteristics were obtained using the InCarta software. Each dot represents a compound-treated well from the screen. Distinct Clusters (A−E) separating from the central cluster were visually assessed and shown on the PCA plot.��D��Heatmap of PC loadings of each variable on PC1 and PC2. Brightfield and Calcein AM indicate which image was used by InCarta software to generate the value.��E��Representative whole well images of morphological clusters identified in the screen. Boxes correspond to images shared in Fig. 1F. Brightfield and fluorescent (Calcein AM) images taken with ImageXpress confocal HT.ai automated high-content imaging system at 4x magnification. (Scale bars: 500 µm).��F��Insets of representative wells shown in Fig. 1E. (Scale bars: 100 µm).

Curating retrospective multimodal and longitudinal data for community cohorts at risk for lung cancer

waddelma — Mon, 28 Jul 2025 15:01:15 +0000

Li, Thomas Z., Xu, Kaiwen, Chada, Neil C., Chen, Heidi, Knight, Michael, Antic, Sanja, Sandler, Kim L., Maldonado, Fabien, Landman, Bennett A., & Lasko, Thomas A. (2025). *Cancer Biomarkers: Section A of Disease Markers, 42*(1).

Large community health studies are valuable tools for understanding lung cancer, helping researchers explore risk factors and build models to predict who might develop the disease. To make the most of this data, a reliable method is needed to identify cases of lung cancer and lung spots known as pulmonary nodules, and to link various types of health information collected over time from electronic health records (EHRs). In this study, researchers used medical coding systems, including SNOMED and ICD codes, to create rules for identifying patients with lung cancer or pulmonary nodules in EHR data. They also applied clinical expertise to determine appropriate timeframes for gathering related health and imaging data. Using this approach, they curated three patient groups, or cohorts, with pulmonary nodules and repeated imaging records from ��ý�� Medical Center.

The method proved highly accurate, correctly identifying lung cancer in 93% of cases (sensitivity) and correctly identifying those without lung cancer in 99.6% of cases (specificity). It also showed high reliability in predicting who truly had or didn’t have lung cancer, based on the data. This study presents an effective and scalable strategy for organizing long-term, multi-type health data about individuals at risk for lung cancer, using routinely collected information from medical records.

Figure 1. Archives linking EHRs to imaging allowed for the selection of subjects via ICD rules. Scans that were low quality and data that did not fall within observation windows were excluded. VU-SPN: subjects with no cancer history prior to an SPN code. VU-LI-SPN: subjects in VU-SPN with imaging. VU-LI-Incidence: subjects with imaging.

Thymomas and Thymic Carcinomas, Version 2.2025

waddelma — Mon, 28 Jul 2025 14:05:25 +0000

Riely, Gregory J., Wood, Douglas E., Loo, Billy W., Aisner, Dara L., Akerley, Wallace, Bauman, Jessica R., Bharat, Ankit, Chang, Joe Y., Chirieac, Lucian R., DeCamp, Malcolm, Desai, Aakash, Dilling, Thomas J., Dowell, Jonathan, Durm, Gregory A., Gettinger, Scott, Grotz, Travis E., Gubens, Matthew A., Juloori, Aditya, Lackner, Rudy P., Lanuti, Michael, Lin, Jules, Lovly, Christine M., Maldonado, Fabien, Morgensztern, Daniel, Mullikin, Trey C., Ng, Thomas, Owen, Dawn, Owen, Dwight H., Patel, Sandip P., Patil, Tejas, Polanco, Patricio M., Riess, Jonathan, Shapiro, Theresa A., Singh, Aditi P., Stevenson, James, Tam, Alda, Tanvetyanon, Tawee, Yanagawa, Jane, Yang, Stephen C., Yau, Edwin, Gregory, Kristina, & Hang, Lisa. (2025). *JNCCN Journal of the National Comprehensive Cancer Network, 23*(6), 255-269.

Thymoma and thymic carcinoma are rare cancers that start in the cells of the thymus, a small organ in the chest. Among the unusual tumors that can grow in the front part of the chest (called the anterior mediastinum),��thymomas are the most common, with about��2 cases per million people each year��in the U.S.��Thymic carcinomas are even rarer, with about��0.48 cases per million annually. Thymomas usually stay in the area where they start, although they can sometimes spread.��Thymic carcinomas, on the other hand, are more aggressive��and are��often found at a more advanced stage, sometimes already spread to other parts of the body at the time of diagnosis.The outlook for these cancers differs:��about 90% of people with thymoma are still alive five years after diagnosis, while��about 60% of people with thymic carcinoma survive that long. These��NCCN Clinical Practice Guidelines in Oncology��give doctors expert recommendations for how to��evaluate and treat people with thymoma or thymic carcinoma. First published in 2007, these guidelines are��updated every year��by a team of experts, including members from the NCCN Guidelines Panel for Non–Small Cell Lung Cancer.

Figure 1.

THYM-1. NCCN Clinical Practice Guidelines in Oncology for Thymomas and Thymic Carcinomas, Version 2.2025.

Citation: Journal of the National Comprehensive Cancer Network 23, 6;��

Radiomic € Stress Test’: exploration of a deep learning radiomic model in a high-risk prospective lung nodule cohort

waddelma — Mon, 28 Jul 2025 13:48:27 +0000

Xiao, David, Forero, Yency, Kammer, Michael N., Chen, Heidi, Paez, Rafael, Heideman, Brent E., Owoseeni, Oreoluwa, Johnson, Ian, Deppen, Stephen A., Grogan, Eric L., & Maldonado, Fabien. (2025). *BMJ Open Respiratory Research, 12*(1), e002687.

Lung nodules—small spots that appear on lung scans—are often biopsied to check for cancer. However, many of these nodules turn out to be harmless. The Lung Cancer Prediction (LCP) score is a deep learning tool that analyzes CT scans and has been shown to work well in identifying whether a nodule might be cancerous when it’s found by chance. But it hasn’t yet been tested in situations where doctors have already recommended a biopsy.

In this study, researchers looked at lung nodules that had already been biopsied at a large medical center. They used the Mayo Clinic’s traditional prediction model to estimate how likely each nodule was to be cancerous, dividing them into low, medium, or high risk using guidelines from the British Thoracic Society. Then, they compared how well three different models could predict cancer: the Mayo model, the LCP radiomic model, and a new��integrated model��that combined the LCP score with key clinical details like the patient’s age, whether the nodule had spiky edges (spiculation), and whether it was located in the upper part of the lung.

The study included 321 nodules total—196 cancerous and 125 benign (non-cancerous). The Mayo model had an accuracy score (AUC) of 0.69, the LCP model had a similar score of 0.67, but the��integrated model��performed best with an AUC of 0.75. It also had a better F1 score, which balances how well the model correctly identifies cancer and avoids false alarms. Importantly, the integrated model correctly reclassified 8 benign nodules from medium to low risk, meaning those patients might have avoided a biopsy—and��no cancer cases were mistakenly downgraded.

In summary, combining the LCP deep learning score with a few key patient details improved the ability to predict whether lung nodules were cancerous. This approach may help reduce the number of people who undergo unnecessary, invasive lung biopsies.

Figure 1

Receiver operating characteristic (ROC) curves for all models. AUC, area under the receiver operating characteristic curve; LCP, Lung Cancer Prediction score; Mayo, Mayo model; Mayo Select, Mayo model excluding all radiographic variables; ROC, receiver operating characteristic.

Optimizing Biomarker Models for Biologically Heterogeneous Cancers: A Nested Model Approach for Lung Cancer

waddelma — Wed, 21 May 2025 15:37:55 +0000

Woodhouse, Palina; Jackson, Laurel; Kammer, Michael N.; Godfrey, Caroline M.; Antic, Sanja; Zou, Yong; Meyers, Patrick; Gawel, Susan H.; Maldonado, Fabien; Grogan, Eric L.; Davis, Gerard J.; Deppen, Stephen A. “” Cancer Epidemiology, Biomarkers & Prevention 34, no. 5 (2025): 788–794.��.��

Lung cancer comes in different forms, each with its own unique biology. This makes it hard to create reliable blood tests (called biomarkers) that can detect cancer early. Traditional methods for building these tests often don’t work well across all the different subtypes of lung cancer. This study tested a new approach, called a “nested biomarker model,” to see if it could better handle this complexity and improve early detection.��

The study looked at 337 patients from two medical centers. Blood samples were collected and analyzed, and the researchers used advanced statistical methods to create the nested model. This model was designed to recognize the differences among various lung cancer subtypes and was compared to more traditional models.��

The patients had a mix of cancerous and non-cancerous lung nodules, covering a range of lung cancer types. The new nested model performed similarly to well-known models like the one used at the Mayo Clinic. It was especially good at identifying small cell lung cancer, one of the more aggressive subtypes.��

The study shows that the variety of lung cancer types makes it difficult to create a one-size-fits-all blood test. However, the nested model offers a promising new way to improve early cancer detection by taking this variety into account. More research with larger groups of patients is needed to confirm these results, but this approach could help build better tests for detecting different types of cancer early.��

Figure 1.��

A,��AUCs of evaluated models in training data. AUCs with 95% CIs for all models evaluated in the training dataset, reported for all malignancy types and each subtype, are shown.��B,��AUCs of evaluated models in testing data. AUCs with 95% CIs for all models evaluated in the testing dataset, reported for all malignancy types and each subtype, are shown.��