Cancer evolution lays the groundwork for predictive oncology. Testing evolutionary metrics requires quantitative measurements in controlled clinical trials. We mapped genomic intratumor heterogeneity in locally advanced prostate cancer using 642 samples from 114 individuals enrolled in clinical trials with a 12-year median follow-up. We concomitantly assessed morphological heterogeneity using deep learning in 1,923 histological sections from 250 individuals. Genetic and morphological (Gleason) diversity were independent predictors of recurrence (hazard ratio (HR)=3.12 and 95% confidence interval (95% CI)=1.34-7.3; HR=2.24 and 95% CI=1.28-3.92). Combined, they identified a group with half the median time to recurrence. Spatial segregation of clones was also an independent marker of recurrence (HR=2.3 and 95% CI=1.11-4.8). We identified copy number changes associated with Gleason grade and found that chromosome 6p loss correlated with reduced immune infiltration. Matched profiling of relapse, decades after diagnosis, confirmed that genomic instability is a driving force in prostate cancer progression. This study shows that combining genomics with artificial intelligence-aided histopathology leads to the identification of clinical biomarkers of evolution.
Reconstructing temporal cellular dynamics from static single-cell transcriptomics remains a major challenge. Methods based on RNA velocity are useful, but interpreting their results to learn new biology remains difficult, and their predictive power is limited. Here we propose NeuroVelo, a method that couples learning of an optimal linear projection with non-linear Neural Ordinary Differential Equations. Unlike current methods, it uses dynamical systems theory to model biological processes over time, hence NeuroVelo can identify what genes and mechanisms drive the temporal cellular dynamics. We benchmark NeuroVelo against several state-of-the-art methods using single-cell datasets, demonstrating that NeuroVelo has high predictive power but is superior to competing methods in identifying the mechanisms that drive cellular dynamics over time. We also show how we can use this method to infer gene regulatory networks that drive cell fate directly from the data.
Immune system control is a major hurdle that cancer evolution must circumvent. The relative timing and evolutionary dynamics of subclones that have escaped immune control remain incompletely characterized, and how immune-mediated selection shapes the epigenome has received little attention. Here, we infer the genome- and epigenome-driven evolutionary dynamics of tumour-immune coevolution within primary colorectal cancers (CRCs). We utilise our existing CRC multi-region multi-omic dataset that we supplement with high-resolution spatially-resolved neoantigen sequencing data and highly multiplexed imaging of the tumour microenvironment (TME). Analysis of somatic chromatin accessibility alterations (SCAAs) reveals frequent somatic loss of accessibility at antigen presenting genes, and that SCAAs contribute to silencing of neoantigens. We observe that strong immune escape and exclusion occur at the outset of CRC formation, and that within tumours, including at the microscopic level of individual tumour glands, additional immune escape alterations have negligible consequences for the immunophenotype of cancer cells. Further minor immuno-editing occurs during local invasion and is associated with TME reorganisation, but that evolutionary bottleneck is relatively weak. Collectively, we show that immune evasion in CRC follows a “Big Bang” evolutionary pattern, whereby genetic, epigenetic and TME-driven immune evasion acquired by the time of transformation defines subsequent cancer-immune evolution.
High-throughput multi-omic molecular profiling allows probing biological systems at unprecedented resolution. However, the integration and interpretation of high-dimensional, sparse, and noisy multimodal datasets remains challenging. Deriving new biology using current methods is particularly difficult because they are not based on biological principles, but instead focus exclusively on a dimensionality reduction task. Here we introduce MIDAA (Multiomic Integration with Deep Archetypal Analysis), a framework that combines archetypal analysis, an approach grounded in biological principles, with deep learning. Using the concept of archetypes that are based on evolutionary trade-offs and Pareto optimality – MIDAA finds extreme data points that define the geometry of the latent space, preserving the complexity of biological interactions while retaining an interpretable output. We demonstrate that indeed these extreme points represent cellular programmes reflecting the underlying biology. We show on real and simulated multi-omics data how MIDAA outperforms state-of-the-art methods in identifying parsimonious, interpretable, and biologically relevant patterns.
In cancer, evolutionary forces select for clones that evade the immune system. Here we analyzed >10,000 primary tumors and 356 immune-checkpoint-treated metastases using immune dN/dS, the ratio of nonsynonymous to synonymous mutations in the immunopeptidome, to measure immune selection in cohorts and individuals. We classified tumors as immune edited when antigenic mutations were removed by negative selection and immune escaped when antigenicity was covered up by aberrant immune modulation. Only in immune-edited tumors was immune predation linked to CD8 T cell infiltration. Immune-escaped metastases experienced the best response to immunotherapy, whereas immune-edited patients did not benefit, suggesting a preexisting resistance mechanism. Similarly, in a longitudinal cohort, nivolumab treatment removes neoantigens exclusively in the immunopeptidome of nonimmune-edited patients, the group with the best overall survival response. Our work uses dN/dS to differentiate between immune-edited and immune-escaped tumors, measuring potential antigenicity and ultimately helping predict response to treatment.
Drug resistance is a largely unsolved problem in oncology. Despite the explanatory power of the genetic model of cancer initiation, most treatment resistance is unexplained by genetics alone. Even when known resistance mutations are present, they are often found in a small proportion of the cells in the tumour. So where is the cellular memory that leads to treatment failure? New evidence suggests resistance is multi-factorial, resulting from the contribution of heritable genetic and epigenetic changes, but also non-heritable phenotypic plasticity. However, cell plasticity has proven hard to study as it dynamically changes over time and needs to be distinguished from clonal evolution where cell phenotypes change because of Darwinian selective bottlenecks. Here we dissected the contribution of different evolutionary processes to drug resistance by perturbing patient-derived organoids with multiple drugs in sequence. We combined dense longitudinal tracking, single cell multi-omics, evolutionary modelling, and machine learning archetypal analysis. We found that different drugs select for distinct subclones, an essential requirement for the use of evolutionary therapy with sequential drug treatment. The data supports a model in which the cellular memory is encoded as a heritable configuration of the epigenome, which however produces multiple transcriptional programmes. Those emerge in different proportions depending on the environment, giving rise to cellular plasticity. Epigenetically encoded programmes include reactivation of developmental genes and cell regeneration. A one-to-many (epi)genotype→phenotype map explains how clonal expansions and non- heritable phenotypic plasticity manifest together, including drug tolerant states. This ensures the robustness of drug resistance subclones that can exhibit distinct phenotypes in changing environments while still preserving the cellular memory encoding their selective advantage.
Colorectal malignancies are a leading cause of cancer-related death and have undergone extensive genomic study. However, DNA mutations alone do not fully explain malignant transformation. Here we investigate the co-evolution of the genome and epigenome of colorectal tumours at single-clone resolution using spatial multi-omic profiling of individual glands. We collected 1,370 samples from 30 primary cancers and 8 concomitant adenomas and generated 1,207 chromatin accessibility profiles, 527 whole genomes and 297 whole transcriptomes. We found positive selection for DNA mutations in chromatin modifier genes and recurrent somatic chromatin accessibility alterations, including in regulatory regions of cancer driver genes that were otherwise devoid of genetic mutations. Genome-wide alterations in accessibility for transcription factor binding involved CTCF, downregulation of interferon and increased accessibility for SOX and HOX transcription factor families, suggesting the involvement of developmental genes during tumourigenesis. Somatic chromatin accessibility alterations were heritable and distinguished adenomas from cancers. Mutational signature analysis showed that the epigenome in turn influences the accumulation of DNA mutations. This study provides a map of genetic and epigenetic tumour heterogeneity, with fundamental implications for understanding colorectal cancer biology.
Most cancer genomic data are generated from bulk samples composed of mixtures of cancer subpopulations, as well as normal cells. Subclonal reconstruction methods based on machine learning aim to separate those subpopulations in a sample and infer their evolutionary history. However, current approaches are entirely data driven and agnostic to evolutionary theory. We demonstrate that systematic errors occur in the analysis if evolution is not accounted for, and this is exacerbated with multi-sampling of the same tumor. We present a novel approach for model-based tumor subclonal reconstruction, called MOBSTER, which combines machine learning with theoretical population genetics. Using public whole-genome sequencing data from 2,606 samples from different cohorts, new data and synthetic validation, we show that this method is more robust and accurate than current techniques in single-sample, multiregion and longitudinal data. This approach minimizes the confounding factors of nonevolutionary methods, thus leading to more accurate recovery of the evolutionary history of human cancers.
Aneuploidy, defined as the loss and gain of whole and part chromosomes, is a near-ubiquitous feature of cancer genomes, is prognostic, and likely an important determinant of cancer cell biology. In colorectal cancer (CRC), aneuploidy is found in virtually all tumours, including precursor adenomas. However, the temporal evolutionary dynamics that select for aneuploidy remain broadly uncharacterised. Here we perform genomic analysis of 755 samples from a total of 167 patients with colorectal-derived neoplastic lesions that cross-sectionally represent the distinct stages of tumour evolution, and longitudinally track individual tumours through metastasis and treatment. Precancer lesions (adenomas) exhibited low levels of aneuploidy but high intra-tumour heterogeneity, whereas cancers had high aneuploidy but low heterogeneity, indicating that progression is through a genetic bottleneck that suppresses diversity. Individual CRC glands from the same tumour have similar karyotypes, despite prior evidence of ongoing instability at the cell level. Pseudo-stable aneuploid genomes were observed in metastatic lesions sampled from liver and other organs, after chemo- or targeted therapies, and late recurrences detected many years after the diagnosis of a primary tumour. Modelling indicates that these data are consistent with the action of stabilising selection that ‘traps’ cancer cell genomes on a fitness peak defined by the specific pattern of aneuploidy. These data show that the initial progression of CRC requires the traversal of a rugged fitness landscape and subsequent genomic evolution, including metastatic dissemination and therapeutic resistance, is constrained by stabilising selection.
Drug resistance mediated by clonal evolution is arguably the biggest problem in cancer therapy today. However, evolving resistance to one drug may come at a cost of decreased fecundity or increased sensitivity to another drug. These evolutionary trade-offs can be exploited using ‘evolutionary steering’ to control the tumour population and delay resistance. However, recapitulating cancer evolutionary dynamics experimentally remains challenging. Here, we present an approach for evolutionary steering based on a combination of single-cell barcoding, large populations of 10^8-10^9 cells grown without re-plating, longitudinal non-destructive monitoring of cancer clones, and mathematical modelling of tumour evolution. We demonstrate evolutionary steering in a lung cancer model, showing that it shifts the clonal composition of the tumour in our favour, leading to collateral sensitivity and proliferative costs. Genomic profiling revealed some of the mechanisms that drive evolved sensitivity. This approach allows modelling evolutionary steering strategies that can potentially control treatment resistance.
Cancers accumulate mutations that lead to neoantigens, novel peptides that elicit an immune response, and consequently undergo evolutionary selection. Here we establish how negative selection shapes the clonality of neoantigens in a growing cancer by constructing a mathematical model of neoantigen evolution. The model predicts that, without immune escape, tumor neoantigens are either clonal or at low frequency; hypermutated tumors can only establish after the evolution of immune escape. Moreover, the site frequency spectrum of somatic variants under negative selection appears more neutral as the strength of negative selection increases, which is consistent with classical neutral theory. These predictions are corroborated by the analysis of neoantigen frequencies and immune escape in exome and RNA sequencing data from 879 colon, stomach and endometrial cancers.
Subclonal architectures are prevalent across cancer types. However, the temporal evolutionary dynamics that produce tumor subclones remain unknown. Here we measure clone dynamics in human cancers by using computational modeling of subclonal selection and theoretical population genetics applied to high-throughput sequencing data. Our method determined the detectable subclonal architecture of tumor samples and simultaneously measured the selective advantage and time of appearance of each subclone. We demonstrate the accuracy of our approach and the extent to which evolutionary dynamics are recorded in the genome. Application of our method to high-depth sequencing data from breast, gastric, blood, colon and lung cancer samples, as well as metastatic deposits, showed that detectable subclones under selection, when present, consistently emerged early during tumor growth and had a large fitness advantage (>20%). Our quantitative framework provides new insight into the evolutionary trajectories of human cancers and facilitates predictive measurements in individual tumors from widely available sequencing data.
Sequential profiling of plasma cell-free DNA (cfDNA) holds immense promise for early detection of patient progression. However, how to exploit the predictive power of cfDNA as a liquid biopsy in the clinic remains unclear. RAS pathway aberrations can be tracked in cfDNA to monitor resistance to anti-EGFR monoclonal antibodies in patients with metastatic colorectal cancer. In this prospective phase II clinical trial of single-agent cetuximab in RAS wild-type patients, we combine genomic profiling of serial cfDNA and matched sequential tissue biopsies with imaging and mathematical modeling of cancer evolution. We show that a significant proportion of patients defined as RAS wild-type based on diagnostic tissue analysis harbor aberrations in the RAS pathway in pretreatment cfDNA and, in fact, do not benefit from EGFR inhibition. We demonstrate that primary and acquired resistance to cetuximab are often of polyclonal nature, and these dynamics can be observed in tissue and plasma. Furthermore, evolutionary modeling combined with frequent serial sampling of cfDNA allows prediction of the expected time to treatment failure in individual patients. This study demonstrates how integrating frequently sampled longitudinal liquid biopsies with a mathematical framework of tumor evolution allows individualized quantitative forecasting of progression, providing novel opportunities for adaptive personalized therapies.
Recurrent successions of genomic changes, both within and between patients, reflect repeated evolutionary processes that are valuable for the anticipation of cancer progression. Multi-region sequencing allows the temporal order of some genomic changes in a tumor to be inferred, but the robust identification of repeated evolution across patients remains a challenge. We developed a machine-learning method based on transfer learning that allowed us to overcome the stochastic effects of cancer evolution and noise in data and identified hidden evolutionary patterns in cancer cohorts. When applied to multi-region sequencing datasets from lung, breast, renal, and colorectal cancer (768 samples from 178 patients), our method detected repeated evolutionary trajectories in subgroups of patients, which were reproduced in single-sample cohorts (n = 2,935). Our method provides a means of classifying patients on the basis of how their tumor evolved, with implications for the anticipation of disease progression.
Despite extraordinary efforts to profile cancer genomes, interpreting the vast amount of genomic data in the light of cancer evolution remains challenging. Here we demonstrate that neutral tumor evolution results in a power-law distribution of the mutant allele frequencies reported by next-generation sequencing of tumor bulk samples. We find that the neutral power law fits with high precision 323 of 904 cancers from 14 types and from different cohorts. In malignancies identified as evolving neutrally, all clonal selection seemingly occurred before the onset of cancer growth and not in later-arising subclones, resulting in numerous passenger mutations that are responsible for intratumoral heterogeneity. Reanalyzing cancer sequencing data within the neutral framework allowed the measurement, in each patient, of both the in vivo mutation rate and the order and timing of mutations. This result provides a new way to interpret existing cancer genomic data and to discriminate between functional and non-functional intratumoral heterogeneity.
What happens in early, still undetectable human malignancies is unknown because direct observations are impractical. Here we present and validate a ‘Big Bang’ model, whereby tumors grow predominantly as a single expansion producing numerous intermixed subclones that are not subject to stringent selection and where both public (clonal) and most detectable private (subclonal) alterations arise early during growth. Genomic profiling of 349 individual glands from 15 colorectal tumors showed an absence of selective sweeps, uniformly high intratumoral heterogeneity (ITH) and subclone mixing in distant regions, as postulated by our model. We also verified the prediction that most detectable ITH originates from early private alterations and not from later clonal expansions, thus exposing the profile of the primordial tumor. Moreover, some tumors appear ‘born to be bad’, with subclone mixing indicative of early malignant potential. This new model provides a quantitative framework to interpret tumor growth dynamics and the origins of ITH, with important clinical implications.