Working Groups

What are Working Groups?

A working group is a collaborative assembly of experts from diverse and complementary fields who unite to investigate specific scientific questions or challenges. At the National Synthesis Center for Emergence in the Molecular and Cellular Sciences (NCEMS), these groups are pivotal to the center’s mission, with plans to support 34 working groups over five years. Each group embarks on a two-year synthesis project, defining their research objectives and receiving comprehensive support from NCEMS. This support includes access to staff scientists, advanced cyberinfrastructure, travel funding, and administrative assistance, all designed to facilitate cross-disciplinary collaboration. Leaders of these groups participate in specialized training to enhance team science and leadership skills. While most interactions occur virtually, an initial in-person meeting is convened to establish a strong foundation. The composition of these groups spans various disciplines, such as bioinformatics, systems biology, physics, and data science, fostering a rich environment for innovative synthesis. Committed to the principles of open and reproducible science, NCEMS ensures that all research outputs are transparent and publicly accessible, reflecting its dedication to advancing collaborative and impactful scientific endeavors.

2026 Working Groups

Predicting Cell Fate and State from RNA and Protein Spatial Maps

Project Leads:

Jean Fan

Tai-Yen Chen

Steve Pressé

Working Group Members: Kalen Clifton, Ayush Saurabh, Guangjie Yan

Cell states and fates are defined not only by molecular abundance but also by their subcellular organization. Advancements in high-resolution spatial omics technologies that profile high-plex RNA (Xenium/CosMx) and protein (CODEX) provide quantifications of molecular abundances at subcellular resolution but remain difficult to integrate into unified models for predicting cell fate or state. We will build an open synthesis framework that registers adjacent-section spatial RNA and protein images at near cellular resolution using diffeomorphic alignment (STalign), harmonizes RNA features into protein-scale rasters (SEraster) to create one-to-one patch correspondence, and quantifies spatial molecular features including membrane and nuclear proximity, polarity, and local density. Using these paired measurements, we will train AI/ML RNA-to-protein predictors and learn interpretable, low-dimensional biophysical fingerprints of cell state that capture cross-modal spatial structure beyond abundance-only baselines. We will validate on paired Xenium/CosMx-CODEX cancer atlas and curated public cohorts using rigorous controls, and release code, models, and harmonized datasets under permissive licenses with containerized, reproducible workflows and public snapshots to enable community-scale discovery. Starting in August 2026.

A Unified Framework for Analyses of Proteome Turnover Dynamics

Project Leads:

Sina Ghaemmaghami

Edward Lau

Christine Vogel

Working Group Members: Anushka Jain, Calvin K. Voong

Proteins vary widely in cellular lifetime, and this variation is fundamental to biological regulation and homeostasis, yet the relative contributions of known turnover mechanisms within and across species remain unclear. Recent advances in mass spectrometry have produced global protein stability datasets, but they are fragmented across heterogeneous studies with inconsistent metadata. This Working Group will harmonize and integrate public protein turnover datasets across four representative species spanning two kingdoms, E. coli, S. cerevisiae, M. musculus, and H. sapiens, using consistent reanalysis workflows and unified metadata standards. The resulting Protein Stability Database will enable comparative discovery of molecular and evolutionary principles of stability, support analyses across cell types and organisms, and establish best practices and shared terminology for future turnover studies. This data resource will serve as a foundational resource for AI/ML model training. Starting in April 2026.

Mapping the non-canonical human translatome and proteome

Project Leads:

Thomas Martínez

Marie Brunet

John Prensner

Working Group Members: Francis Bourassa, Jim Clauwaert, Miranda Kelly

The human genome was long thought to encode roughly 20,000 protein-coding genes, but emerging evidence suggests small open reading frames under 100–150 codons may add thousands of functional microproteins encoded in regions previously labeled non-coding. Despite growing examples of microproteins with important roles in processes like DNA repair, mitochondrial translation, and ER stress, their annotation remains incomplete because ribosome profiling detects many more smORF translation events than mass spectrometry detects stable microproteins. We will resolve this gap by performing a comprehensive, multi-modal analysis of all public human RNA-seq, Ribo-seq, and proteomics datasets to quantify which microproteins are identifiable by each method and to distinguish protein-coding translation from regulatory signals and other sample-dependent effects. This community-scale synthesis will refine the true size of the human proteome and map context-specific alternative protein expression relevant to human physiology and disease. Starting in June 2026.

Emergent Order Parameters of Allostery: An AI Framework for Predicting and Engineering Protein Regulation

Project Leads:

Banu Ozkan

Denise Okafor

Vincent Voelz

Working Group Members: Xingyu Chen, Sabab Hasan Khan, Nikkil Ramesh

Allostery powers catalysis, signaling, and gene regulation, but we still lack a quantitative definition for cross-protein comparison and prediction of how sequence changes rewire long-range control. We will build an AI-ready, physics-informed data resource by integrating molecular dynamics simulations, deep mutational scanning, and evolutionary information to compute dynamic coupling features that track how perturbations propagate through proteins. From these harmonized data we will derive interpretable allosteric order parameters including communication strength, directionality of information flow, and propagation timescale, and combine them into an “allostery score” for classification across proteins and species. Using this standardized representation, we will train machine-learning models to predict how substitutions shift global allosteric wiring and then design and experimentally test mutations that reprogram transcription-factor regulation. The result is a generalizable, predictive framework mapping sequence to allosteric control. Starting in August 2026.

Learning Sequence Conservation from Functional Constraints in Intrinsically Disordered Regions

Project Leads:

Jeetain Mittal

Andrea Soranno

Wenwei Zhang

Working Group Members: Jasmine Cubuk, Shiv Rekhi, Wangfei Yang

Intrinsically disordered regions (IDRs) are central to regulation and signaling, yet their evolutionary conservation is difficult to interpret because they evolve rapidly and lack stable structure, limiting functional annotation. Emerging evidence suggests that subsets of IDRs preserve conserved sequence features such as short linear motifs, charge patterning, and aromatic clustering that encode specificity through transient structure and dynamics. This Working Group will systematically identify the most conserved IDRs by integrating large-scale sequence alignments, molecular dynamics simulations, and public functional annotations from resources including UniProt, ELM, IDRome, and MaveDB. The outcome will be a curated, open-access database that links evolutionary, structural, and functional signatures in IDRs, enabling AI/ML models of how disordered regions maintain specific and evolvable regulatory functions. Starting in December 2026.

Unraveling the Evolutionary Origin of Complex Topologies

Project Leads:

Ellinor Haglund

Sophie Jackson

Jason Parsley

Eric Rawdon

Joanna Sulkowska

Antonio Trovato

Ryan R. Cheng

Working Group Members: David Budean, Mateusz Fortunka, Virangi Hewage, Grace Orellana, Davide Revignas, Julia Sikorska

Proteins can adopt topologically entangled architectures such as knots, slipknots, and lasso-like motifs, but the evolutionary origins and functional consequences of these structures remain poorly characterized. This project will identify when topological changes first arose within protein families by integrating sequence, structure, and phylogenetic data from UniProt, PDB, AlphaFold, AlphaKnot, and AlphaLasso to reconstruct annotated trees that map the emergence of entanglement across evolutionary time. We will test whether specific sequence motifs, disulfide patterns, or intrinsically disordered regions are associated with the onset of entanglement, and whether entangled proteins are enriched in pathogenic or stress-adapted organisms or concentrate functional hotspots such as catalytic and allosteric sites. The outcome will be an open, data-driven framework to trace, predict, and ultimately design protein topologies, advancing understanding of how topology shapes protein evolution, stability, and function. Starting in March 2026.

2025 Working Groups

Transposable Elements and the Emergence of Genomic Innovation

Project Leads:

Shaun Mahony

Miriam Konkel

Working Group Members: Anne-Ruxandra Carvunis, Ed Chuong, Ross Hardison, David Ray, Ayshwarya Subramanian, Ting Wang, Justin Whalley, Ishika Verma

Transposable elements (TEs)—mobile DNA sequences—make up nearly half of the human genome and are key drivers of genetic innovation. However, their repetitive nature has made them difficult to study with traditional sequencing methods. Our project will harness thousands of publicly available genomic datasets, applying cutting-edge computational tools to analyze TEs with unprecedented accuracy. By synthesizing these data, we aim to uncover how TEs shape gene regulation, create new enhancers, and contribute to species diversity. This work will transform our understanding of genome evolution, providing both a public resource for TE analysis and insights into how genetic elements generate novel regulatory networks. Our findings could reshape how we think about genomic innovation and evolutionary adaptation. Started in April of 2025.

Energetic Origins of Connectivity Within Protein Interaction Networks

Project Leads:

Jonathan Schlebach

Shahid Mukhtar

Adrian Serohijos

Working Group Members: SK Ashif Akram, Xavier Catellanos-Girouard, Muskan Goel, Charles Kuntz, Eugene Shakhnovich, Yiqing Wang

Cells constantly manage unstable and disordered proteins, which are prone to misfolding or require specific partners to function. We propose that these proteins shape protein interaction networks by forming highly dynamic or essential connections. To test this hypothesis, we will analyze large-scale interactome datasets to uncover patterns in how unstable proteins contribute to network structure. By extending this analysis across diverse organisms, we aim to reveal general principles linking protein biophysics to network architecture. Using genetic algorithms, we will explore how protein stability constrains network evolution. This synthesis of existing data will provide new insights into molecular evolution, helping us understand how cells organize, adapt, and evolve through protein interactions—critical for fields from bioengineering to synthetic biology. Started in April of 2025.

Intelligent Metadata Compilation to Enhance Synthesis of Mass Spectrometry-Based Proteomics

Project Leads:

Wout Bittremieux

Iddo Friedberg

Shomir Wilson

Working Group Members: Tine Claeys, Eric Deutsch, Janne Heirman, Fatemeh Mirzadehsarcheshmeh, Harikrishnan Ramadasan, Yasset-Perez-Riverol, Juan Antonio Vizcaino

Mass spectrometry-based proteomics generates vast datasets, yet inconsistent metadata makes reuse difficult. Metadata, details about experiments, samples, and data processing, is essential for ensuring the data are findable and reusable. Our project will develop automated workflows using bioinformatics and natural language processing to extract, standardize, and enrich metadata from raw files and publications. This new method will enhance PRIDE, the largest public proteomics repository, making datasets easier to search and reuse. A machine-learning community challenge will drive further innovation. These advancements will transform proteomics research by enabling large-scale data integration, AI-driven discoveries, and more transparent, reusable datasets. Our work will unlock the full potential of existing data, accelerating breakthroughs in systems biology, biomarker discovery, and precision medicine. Started in April of 2025.

Discovering New Protein-Protein Interactions Within Crosslinking Mass Spectrometry Data

Project Leads:

Stephen Fried

Yasset Perez-Riverol

Henning Hermjakob

Working Group Members: Josh Beale

Crosslinking mass spectrometry (XL-MS) is a cutting-edge technique that maps protein interactions in 3D, revealing how proteins assemble and function in living cells. However, valuable XL-MS data remains scattered across different studies with incomplete metadata, limiting its reuse. Our project will standardize, integrate, and enhance public XL-MS datasets from PRIDE, making them more accessible for hybrid structural modeling. By cross-validating these findings with AlphaFold3’s predicted structures, we will provide stronger, experimentally supported models of protein complexes. These insights will be integrated into the EBI Complex Portal, creating a powerful resource for biologists, structural researchers, and biophysicists to explore protein interactions with unprecedented accuracy. Started in December of 2025.

Identifying the Molecular Origins of Heat Resistance in Plants

Project Leads:

Andrei Smertenko

Carolyn Rasmussen

Dawn Nagel

Georgia Drakakaki

Stephen Ficklin

Working Group Members: Bilal Ahmas, Toshisangba Chuba, Kris Rapeta, Rachel Strout, Angel Zarobinksi

Unlike animals, plants cannot move to escape extreme heat, making them vulnerable to heat waves. Survival depends on their ability to complete their life cycle and produce seeds, a process requiring cell division. The final stage, cytokinesis, is particularly sensitive to heat stress in some species but resilient in others. However, the mechanisms behind this difference remain unknown. Our project will analyze publicly available omics datasets to compare cytokinesis under normal and high-temperature conditions across species. By identifying key genetic and molecular factors that enhance heat tolerance, we can predict cytokinetic bottlenecks and uncover evolutionary strategies for heat adaptation. These insights could guide breeding strategies for heat-resilient crops, improving agricultural sustainability in a warming climate. Started in December of 2025.

Oceans of Disorder: Elucidating the Role of Disordered Proteins in Cellular Adaptation

Project Leads:

Keren Lasker

Jerelle Joseph

Alex Holehouse

Working Group Members: Jordan Barrows, Olivia Carmo, Ananya Chakravarti, Kalli Kappel, Katherine Xue, Yumeng Zhang

Intrinsically disordered protein regions (IDRs) help organisms sense and adapt to environmental changes, yet their role across extreme habitats remains largely unexplored. The deep sea, with its high pressure, variable temperatures, and salinity extremes, provides a unique natural laboratory to explore these molecular sensors. Recently, advances in shotgun metagenomics have generated vast datasets capturing the genetic diversity of deep-sea life, offering an unprecedented opportunity to study these molecular adaptations. Our project will harness these datasets to systematically analyze how IDRs have evolved to support survival in extreme environments. By integrating bioinformatics, machine learning, and molecular simulations, we will create the first comprehensive map of IDR adaptations. These findings will advance our understanding of stress resilience and evolutionary biology as well as inspire biotechnological applications for extreme conditions. Started in March of 2025.

Mapping Bacterial Cell States Across Environments and Evolution

Project Leads:

Jeffrey Barrick

Jeremy Schmit

Valérie de Crécy-Lagard

Working Group Members: Evrim Fer, Nkrumah Grant, Betul Kacar, Pranesh Rao, Karl Thompson

Bacteria continuously adapt their molecular makeup to survive, yet our understanding of their cell states remains fragmented. Researchers typically study only a few strains under limited conditions, making it hard to compare findings across studies. Our project will use machine learning to synthesize bacterial gene expression data, creating the first atlas of bacterial cell states. Starting with E. coli, we will expand to wild strains and other bacteria, identifying key states like growth, starvation, biofilm formation, and dormancy. By analyzing gene expression patterns, we aim to uncover evolutionary conservation of cell states and develop a classifier to predict bacterial behavior from metatranscriptomic data. These insights could help reprogram bacteria for environmental remediation and prevent environmental harm. Started in April of 2025.

The Epigenetic Drivers of Neurodegenerative Processes in Human Brain Cells

Project Leads:

Bin Zhang

Longzhi Tan

Tamar Schlick

Dave Thirumalai

Justin Whalley

Working Group Members: Bill Noble, Stephanie Portillo, Guang Shi, Zilong Li, Sheng Wang, Ghulam Murtaza

Single-cell analyses have transformed neuroscience by revealing gene activity in brain cells, but studying gene expression and epigenetics separately provides only a partial picture. To bridge this gap, we will develop a multimodal AI model that integrates single-cell genomic datasets, uncovering how 3D genome architecture shapes gene expression. This approach will reveal hidden patterns in gene regulation and how different cell types control their functions. By applying this model to Alzheimer’s patient data, we aim to identify epigenetic drivers of neurodegenerative progression. This work will enhance our understanding of brain function, offering insights into the fundamental biology spanning the structure and function of the genome to the brain. Started in September of 2025.

Data-Driven Discovery of Regulatory Mechanisms and Cellular Resource Allocation via Multi-Modal Data Integration

Project Leads:

Elizabeth Brunk

Ferhat Ay

Vasant Honavar

William Noble

Working Group Members: Dante Bolzan, Ramana V Davuluri, Anupam Gautam, Natalia Kravtsova, Francesco Morandini, Shahid Mukhtar, Stephanie Portillo, Kriti Shukla, Pallavi Surana, Bishoy Wadie

Massive datasets from consortia like the Dependency Map and Cancer Cell Line Encyclopedia have profiled thousands of cell lines, yet these datasets remain fragmented, making it difficult to connect DNA, RNA, and proteins within the same cell. Our project will integrate multi-omics data from single-cell sequencing and imaging, focusing on well-characterized model cell lines to eliminate variability from different donors and tissues. By aligning gene regulation data across multiple molecular layers, we will build a comprehensive framework for understanding how cells regulate function and respond to perturbations. This benchmarking resource will improve machine learning models, network biology studies, and multi-omics research, accelerating discoveries in gene regulation and cellular adaptation. Started in December of 2025.

Protein Misfolding, Mutations and the Emergence of Disease Phenotypes

Project Leads:

Hyebin Song

James Stephenson

Working Group Member: Maria Fernanda Anglero Mendez

Proteins are essential for life, but when they misfold, they often fail to carry out their proper function. This loss of function has the potential to give rise to disease phenotypes. Our project explores whether proteins with entanglements in their native structure, which are more prone to misfolding, are linked to disease. By analyzing and synthesizing existing structural, sequence, and gene-disease association data, we aim uncover hypothesized relationships between entangled proteins and disease. These insights could reveal previously unrecognized causes of disease and provide a new perspective on their molecular origins. This project will advance our fundamental understanding of the interplay between structure, function, and sequence – and how, in turn, this can lead to the emergence of disease. Started in July of 2024.

Protein Misfolding, Proteostasis, and Aging

Project Leads:

Ed O’Brien

Yang Jiang

Working Group Member: Sina Ghaemmaghami, Anushka Jain, Quyen Vu, Ian Sitarik

Entanglement-based protein misfolding is predicted to be widespread, yet its consequences for protein homeostasis – and its potential implications for aging – remain largely unquantified and unexplored. This working group will use high-throughput mass spectrometry data, including limited proteolysis and ubiquitin mass spectrometry, to investigate how prevalent this form of misfolding is in E. coli, the ability of chaperones to correct it, whether these misfolded proteins are more likely to be targeted for degradation by the proteasome in human cells, and whether misfolding-prone proteins are associated with age-related structural changes in yeast. The group is also focusing on developing mathematical and data science methodologies to algorithmically identify these misfolded states. This synthesis approach has the potential to alter our understanding of fundamental aspects of subcellular and organismal phenotypes. As a pilot Working Group, it started June 2024.