Recent advances in barcoding technologies have made it possible to reconstruct a lineage tree of cells while simultaneously capturing their transcriptomic profiles. However, to fully leverage the resolution provided by these lineage-resolved single-cell RNA sequencing (scRNA-seq) datasets, new computational approaches are needed. These methods must address key challenges, such as ensuring that gene expression analysis goes beyond pairwise comparisons between stages and instead captures the full hierarchical structure of lineage trees, allowing for the detection of gene expression patterns that follow or deviate from lineage relationships.
In a new study published in Nature Communications, researchers at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard introduce PORCELAN, a statistical framework that automatically detects gene expression patterns linked to lineage progression. This method provides a systematic way to study how gene expression and cell state memory evolve through cell divisions, offering new insights into processes such as cancer progression.
Decoding Gene Expression Through Lineage Trees
PORCELAN – short for Permutation, Optimization, and Representation learning-based single Cell gene Expression and Lineage ANalysis – combines representation learning with permutations among leaves in the lineage tree. Using a statistical approach, PORCELAN addresses three questions: How can we jointly capture lineage and gene expression information in cell representations? Which genes best reflect lineage relationships, and in which subtrees is this connection strongest? To what extent does gene expression preserve lineage tree structure across different resolutions?
The researchers validated PORCELAN using synthetic datasets and applied it to three biological systems with lineage-traced scRNA-seq data: lung cancer progression, mouse embryogenesis, and C. elegans development. In lung cancer, PORCELAN identified tumor cell subpopulations that contributed to metastases and pinpointed key genes associated with these transitions – many of which align with known cancer biomarkers and pathways. In developmental systems, the framework uncovered differences in how gene expression memory is maintained across cell divisions, highlighting contrasts between normal development and cancerous progression. These findings underscore the importance of lineage-resolved approaches in understanding fundamental biological processes.
A Flexible Tool for the Future
The study was led by Hannah Schlueter, a Schmidt Center graduate student and PhD student at MIT’s Laboratory for Information & Decision Systems (LIDS), in collaboration with corresponding author Caroline Uhler, Director of the Schmidt Center and Andrew (1956) and Erna Viterbi Professor of Engineering at MIT in the Department of Electrical Engineering and Computer Science (EECS) and the Institute for Data, Systems, and Society (IDSS).
“Our goal was to develop a method that is both rigorous and adaptable,” says Schlueter. “Because PORCELAN is modular, it can be applied to different data modalities, including lineage-resolved imaging data, by replacing the simpler tree-likeness score based on local autocorrelation, used for transcriptomic data, with a representation learning-based tree-likeness score. This flexibility makes it a powerful tool for studying how cellular identity is maintained and altered over time.”
As lineage tracing technologies continue to evolve, methods like PORCELAN highlight the critical role of applying statistical techniques to biological research. This approach, which merges computational tools with biological insights, is central to the work at the Schmidt Center. By developing methods that bridge computational models with biological questions, the Schmidt Center aims to drive discoveries that deepen our understanding of cellular biology, disease mechanisms, and potential therapeutic strategies.