Transformative Science Seminar Series
Talk Seven: Computational infrastructures for consolidating our knowledge regarding the human genome and human diseases
Date: April 14th, 2026: 11:30AM ET – 12:30PM ET
Institution: University of Michigan
Talk Abstract: Our knowledge regarding the human genome and human diseases has been exponentially increasing. The knowledge presents in different formats, including direct measurements of biomedical entities with the ever-evolving biotechnologies, annotations by groups of experts from different consortia, discoveries from individual studies published as free text in biomedical literature, and insights learned from computational models trained on large-scale datasets. However, we currently do not have an infrastructure to consolidate these heterogeneous knowledge sources. As a result, biomedical researchers nowadays spend increasingly more time searching for relevant datasets and literature for scientific discoveries, annotations and conclusions. Unfortunately, they do not have AI-powered tools to navigate existing knowledge and prioritize their hypotheses and research activities. In this talk, I will describe two types of computational infrastructures from my lab for consolidating our knowledge regarding the human genome and human diseases. The first type of the computational infrastructures are knowledge graphs and I will describe the knowledge graphs built in my lab, including GenomicKB , GLKB and PanKgraph. The second type of the computational infrastructures are foundational AI models, and I will describe the genomic foundation model (i.e., EPCOT) built in my lab, Our works not only have an enormous positive impact on biomedical knowledge sharing and discovery, they would also help to promote open science, inclusivity and fairness in the areas of computational genomics and biomedical data science.
Transformative Science Seminar Series
Talk Six: Generative AI to Predict and Engineer Human Tissues and Cells
Date: March 31st, 2026: 11:30AM ET – 12:30PM ET
Institution: Wellcome Sanger Institute
Talk Abstract: In this talk, I will present our lab’s latest research on generative AI models designed to simulate cellular and tissue-level perturbations with unprecedented resolution. These models enable us to ask fundamental questions such as: Which interventions can revert a disease phenotype back to a healthy tissue state? and What perturbations can reprogram cells from state A to state B? By learning causal structure from high-dimensional multi-omics and spatial data, our frameworks can propose actionable interventions, predict patient-specific responses to treatment, and identify the most promising therapeutic targets. I will highlight how these models support target discovery, guide experimental design, and accelerate the development of personalized and precision medicine. Overall, this work demonstrates how generative AI can transform our ability to understand, predict, and engineer complex biological systems.
Transformative Science Seminar Series
Talk Five: Resolving the complexity of plant gene regulation with MPRAs and long reads
Date: March 19th, 2026: 11:30AM ET – 12:30PM ET
Institution: University of Washington
Talk Abstract: Greater scale and genomic context are needed to resolve the complexity of plant gene regulation. To add scale, we have pioneered Plant STARR-seq, a reductionist, highly versatile MPRA, to test the activity of hundreds of thousands of regulatory elements in a dicot and a monocot system. I will present our current efforts to characterize enhancers and insulator-like elements in four plant genomes. The scale of the resulting data allows for machine learning and in silico evolution of regulatory elements, yielding species-and condition specific enhancers and strong insulator-like elements.
To explore genomic context, we have adapted Fiber-seq, a long-read single molecule method, for use in plants. Fiber-seq of maize protoplasts faithfully detects regulatory elements found in matched ATAC-seq samples and doubles the maize regulatory landscape. Moreover, Fiber-seq identifies active regulatory elements in individual LTR retrotransposons. Fiber-seq of maize hybrids reveals the vast divergence of regulatory elements between the well-characterized parental inbreds and many instances of hybrid-specific regulatory activity, including increased regulatory activity of LTR retrotransposons as McClintock predicted.
American Society for Biochemistry and Molecular Biology Conference 2026
NCEMS presented two workshops at the ASBMB Conference 2026 (March 7th – 10th, 2026).
Workshops:
Beginner Hands-On Training: Four Data Science Techniques to Immediately Accelerate your Research
(Sunday, March 8, 11:00 AM–12:30 PM)
Intermediate Hands-On Training: Data Science Tools for Confidence, Complexity, and LLMs in Biophysical Research
(Monday, March 9, 11:00 AM–12:30 PM)
The 2026 ASBMB Annual Meeting is where connections spark breakthroughs. With a program that highlights the broad scope of the molecular life sciences, you are invited to dive deeper into your specific area of research or explore beyond. Surround yourself with cutting-edge science, explore emerging trends and unlock insights that will propel your research forward.
Biophysical Society Annual Meeting 2026
NCEMS will be presenting two professional development workshops, a scientific workshop, a symposia, and a catalyst meeting at the BPS Annual Meeting 2026 (February 21st – 25th, 2026).
Professional Development Workshops:
NSF-NCEMS Beginner Hands-On Training: Four Data Science Techniques to Immediately Accelerate your Research
(Monday, February 23, 1:30 PM–3:30 PM)
NSF-NCEMS Intermediate Hands-On Training: Data Science Tools for Confidence, Complexity, and LLMs in Biophysical Research
(Tuesday, February 24, 1:30 PM–3:30 PM)
Scientific Workshop:
The Dawn of Synthesis Research in Biophysics: Making use of Petabytes of Biological Data
(Tuesday, February 24, 7:30PM–9:30PM)
Symposia:
Molecular Chaperones: Basic Mechanisms and Pathological Consequences
(Tuesday, February 24, 4:00PM–6:00PM)
The BPS2026 Annual Meeting in San Francisco will showcase Biophysics by the Bay, highlighting the exciting advances in science and technology brought forth by big data and AI. This year’s program offers a strikingly diverse and forward-looking slate of Symposia that captures the dynamic, multi-scale nature of our field. From the controlled chaos of intrinsically disordered proteins to the emergent properties of life’s assemblies, our sessions illuminate the physical organizing principles underlying biology. Symposia revisit new perspectives in classics like membrane transport and calcium signaling, while also spotlighting new frontiers such as the biophysics of immunity, cancer, and protein design. Workshops will explore emerging technologies for handling the giant datasets of modern biology and how to use AI to understand and engineer nature. As in previous years, we seek to balance foundational insights and high-risk innovation, highlighting long-standing luminaries, emerging leaders, and exciting discoveries selected from abstract submissions. With our continued commitment to inclusive formats—Flash Talks, Symp/Workshop Select, and integrated poster-platform options—BPS2026 invites every attendee to shape and share in the discovery.
Date: December 4th, 2025: 11:30AM ET – 12:30PM ET
Institution: University of California, Riverside
Talk Abstract: High-throughput mass spectrometry has enabled unprecedented depth and versatility to observe the molecules in the world around us. Traditionally, a handful of molecules were detected in a typical measurement. Today, this has grown to thousands of molecules in a few minutes. The growth in data presents new opportunities for discovery but also challenges in data analysis. The development of new computational approaches for mass spectrometry data has already accelerated drug discovery, revealed the chemical dialog of the microbiome, and characterized the molecular dynamics of our oceans due human activity. We’re going to explore new advances in big data algorithmic and machine learning approaches, tools, and infrastructure that my lab has developed to transform mass spectrometry data analysis from a solitary activity to a community wide collaborative effort – crowd-sourcing mass spectrometry knowledge, computationally amplifying knowledge to make new discoveries, and new data mining techniques that enable large scale data reuse. I will discuss how these tools have transformed the community and how future computational work can help accelerate annotation.
Gordan Research Conference
Linking Protein Dynamics to Structure, Function, and Evolution
NCEMS presented at the Gordan Research Conference (January 4th – 9th, 2026)
The Protein Folding Dynamics GRC is a premier, international scientific conference focused on advancing the frontiers of science through the presentation of cutting-edge and unpublished research, prioritizing time for discussion after each talk and fostering informal interactions among scientists of all career stages. The conference program includes an array of speakers and discussion leaders from institutions and organizations worldwide, concentrating on the latest developments in the field. The conference is five days long and held in a remote location to increase the sense of camaraderie and create scientific communities, with lasting collaborations and friendships. In addition to premier talks, the conference has designated time for poster sessions from individuals of all career stages, and afternoon free time and communal meals allow for informal networking opportunities with leaders in the field.
Biological life depends on motion, spanning femtoseconds vibrations of atoms in enzyme transition states to deleterious protein aggregation that can span days or even weeks. Our current understanding of in vitro protein folding is due to decades of experimental and computational research that provided high-resolution characterization of protein structure, identification of folding principles, and development of folding algorithms. Outstanding challenges surround linking protein dynamics with their function in crowded cellular environments, interaction with chaperones and other players in the cellular ecosystem, evolution of protein sequences, and biomedical applications. In this conference, we will bring together leading experts in protein science who study protein dynamics in the context of these complicated topics. We will leverage the development of AI based methods and high-throughput experiments to scale up our understanding of these dynamic processes from single folding pathways to the collective dynamics of proteins critical to homeostasis.
Gordan Research Seminar
Artificial Intelligence, Design, Evolution, and Emergent Phenomena at the Mesoscale
NCEMS presented at the Gordan Research Seminar (January 3rd – 4th, 2026)
This seminar provides a unique forum for young doctoral and post-doctoral researchers to present their work, discuss new methods, cutting edge ideas, and pre-published data, as well as to build collaborative relationships with their peers. Experienced mentors and trainee moderators will facilitate active participation in scientific discussion to allow all attendees to be engaged participants rather than spectators.
The 2026 Gordon Research Seminar on Protein Folding Dynamics is the premier forum offering graduate students, postdocs, and early-career researchers from diverse scientific backgrounds—including biology, biophysics, chemistry, computer science, and mathematics—to engage in cutting-edge discussions and share work and ideas creating a unique atmosphere for ideation, connection, and collaboration. The Protein Folding Dynamics GRS will highlight exciting advances in artificial intelligence and machine learning, the interplay between evolution, folding, and design, and its critical role in shaping the emergence of mesoscale phenomena. Whether you are unraveling the mysteries of protein dynamics or developing new computational tools, this seminar offers an unparalleled opportunity to engage, inspire, and innovate in a welcoming, trainee-focused setting.
Transformative Science Seminar Series
Talk Four: Tackling Big Data Challenges in Metabolomics - Computational Advances to Accelerate Structure Annotation
Date: December 4th, 2025: 11:30AM ET – 12:30PM ET
Institution: University of California, Riverside
Talk Abstract: High-throughput mass spectrometry has enabled unprecedented depth and versatility to observe the molecules in the world around us. Traditionally, a handful of molecules were detected in a typical measurement. Today, this has grown to thousands of molecules in a few minutes. The growth in data presents new opportunities for discovery but also challenges in data analysis. The development of new computational approaches for mass spectrometry data has already accelerated drug discovery, revealed the chemical dialog of the microbiome, and characterized the molecular dynamics of our oceans due human activity. We’re going to explore new advances in big data algorithmic and machine learning approaches, tools, and infrastructure that my lab has developed to transform mass spectrometry data analysis from a solitary activity to a community wide collaborative effort – crowd-sourcing mass spectrometry knowledge, computationally amplifying knowledge to make new discoveries, and new data mining techniques that enable large scale data reuse. I will discuss how these tools have transformed the community and how future computational work can help accelerate annotation.
ABRCMS Workshop
ABRCMS Workshop
NCEMS presented a workshop called:
NSF-NCEMS Beginner Hands-On Training: Four Data Science Techniques to Immediately Accelerate Your Research
The workshop was held on Saturday (November 22nd) from 2PM to 3:30PM.
Were you eager to accelerate your work through data science and interpretable machine learning, but unsure where to begin? This beginner-friendly training session served as your gateway to utilizing these powerful tools without needing any prior coding experience or software setup. Hosted by the newly established NSF National Center for Emergence in Molecular and Cellular Sciences (NCEMS), this hands-on session equipped participants with the skills to harness four widely used data science techniques. Attendees learned how to calculate the association between a feature and a phenomenon; identify key features driving biological behaviors; control for confounding factors; and avoid common pitfalls like data overinterpretation. In addition, they learned how to state their results in plain English.
Through interactive Jupyter notebook exercises, we explored the theory and scope of these methods and guided participants in interpreting their outputs. By the end of the session, attendees had the opportunity to apply these techniques directly to their own datasets, empowering them to make more informed, impactful discoveries in their research. This training was designed to offer practical, easy-to-use methods that could be broadly applied across fields.
Transformative Science Seminar Series
Talk Three: Algorithmic Innovations for Decoding Transposable Elements
Date: November 18th, 2025: 11:30AM ET – 12:30PM ET
Institution: University of Arizona
Talk Abstract: Accurate characterization of transposable elements (TEs) in genomic data requires specialized algorithms to interpret their evolutionary roles. Associate Professor Travis Wheeler will introduce two new algorithms that enhance TE annotation: AURORA, which adjudicates between competing alignments in RepeatMasker, annotates uncertainty, infers recombination and insertion events, and delineates boundaries between adjacent elements; and SCULU, which clusters redundant or collision-prone subfamilies to improve annotation reliability. Assistant Research Professor Clément Goubert will then present GraffiTE, a pangenomic pipeline for accurately mapping and genotyping TE insertion polymorphisms at the population level, and discuss how these data illuminate genomic innovation from the cell to the ecosystem.
Transformative Science Seminar Series
Talk Two: Unsupervised learning extracts transcriptional regulatory mechanisms from gene expression compendia
Date: November 4th, 2025: 11:30AM ET – 12:30PM ET
Institution: University of California San Diego
Talk Abstract: Gene expression databases continue to grow for various organisms. Various module detection methods have been developed to infer the transcriptional regulatory network (TRN) from these databases. Independent component analysis (ICA) has been particularly successful at recovering experimentally-measured regulons. In this talk, I will discuss, using progress in E. coli as an example, 1) the ICA workflow for gene expression analysis, 2) principles of the TRN underlying the success of ICA, and 3) applications of ICA toward developing fundamental understanding of the TRN and enabling strain design applications.
NSF MCB Virtual Office Hour
Topic: New research and training opportunities at the National Synthesis Center for Emergence in the Molecular and Cellular Sciences (NCEMS)
Date: October 8th, 2025 (Wednesday)
Time: 2:00PM – 3:00PM ET
Presenter: Director Ed O’Brien
Learn about the NCEMS call for applications to create working groups for synthesis research with existing large-scale molecular and cellular data to solve complex biological questions. The deadline for the 5-page working group application is just around the corner (Due Date: November 4, 2025).
The recording can be viewed below:
Transformative Science Seminar Series
Talk One: Structure, Function and Discovery Potential of Microbiomes in the Ocean and Coral Reefs
Date: September 25th, 2025: 11:30AM ET – 12:30PM ET
Institution: ETH Zurich
Talk Abstract: Microbes are remarkably diverse, both phylogenetically and metabolically. Over the past two decades, DNA sequencing of microbial communities (metagenomics) has transformed how we explore this diversity. In my seminar, I will begin by showing how this field has opened the door to the discovery of new microbial taxa, enzymes, and biosynthetic products. Focusing on the reconstruction of ocean metagenomes at a global scale, we uncovered a previously unknown bacterial family with extraordinary biosynthetic potential and characterized an enzyme with unexpected biochemical properties. I will then turn to coral reefs as an example beyond the open ocean and make the case for capturing the microbial diversity closely associated with reef organisms. Adopting this microbial perspective not only reshapes how we understand the temperature-driven loss of coral reef biodiversity but also points to new opportunities for biotechnological and biomedical innovation.
Annual Summit 2025
The second NCEMS Annual Summit was held at Pennsylvania State University, University Park on August 5-8, 2025. This meeting brought together over 120 scientists of diverse backgrounds and career stages to define a set of research questions primed for community-scale synthesis efforts across twelve scientific themes within the broad area of emergent properties at the mesoscale. Below are twelve scientific themes’ presentation (Focus Area Sessions):
Focus Area Sessions:
Participants at the Summit each took part in three Focus Area Sessions. The purpose of each these brainstorming session was to identify the set of key open questions in each area that are primed for community-scale synthesis efforts. Below you can find the introductory slides used to prime conversations in each of the twelve Focus Area Session topic areas.
Additional Documents and Resources from Summit:
Below are the additional documents and resources that were provided to attendees. If you need access, please reach out to ncems@psu.edu
American Society for Biochemistry and Molecular Biology 2025 (Chicago)
NCEMS was truly excited and honored to be a part of the ASBMB 2025 Annual Meeting.
From April 12th – 15th, NCEMS attended the ASBMB 2025 Annual Meeting, partaking in the Education and Career Table on Monday, April 14th. The team interacted with participants of all career stages and shared out opportunities and programs offered by NCEMS.
In addition, NCEMS also provided a hands-on training for over 75 participants on four interpretable machine learning techniques. (More information below)
Hands-on Training: Four Interpretable Machine Learning Techniques to Immediately Accelerate Your Research
Sunday, April 13
Are you eager to accelerate your work through data science and interpretable machine learning, but unsure where to begin? This beginner-friendly training session is your gateway to utilizing these powerful tools without needing any prior coding experience or software setup. Hosted by the newly established NSF National Center for Emergence in Molecular and Cellular Sciences (NCEMS), this hands-on session will equip you with the skills to harness the four most widely used data science techniques. You will learn how to calculate the association between a feature and a phenomenon; identify key features driving biological behaviors; control for confounding factors; and avoid common pitfalls like data overinterpretation. In addition, you will learn how to state your results in plain English. Through interactive Jupyter notebook exercises, we will dive into the theory and scope of these methods and guide you in interpreting their outputs. By the end of the session, you’ll have the chance to apply these techniques directly to your own datasets, empowering you to make more informed, impactful discoveries in your research. This training is designed specifically to benefit biophysicists, offering practical, easy-to-use methods that can be broadly applied across the field.
Trainers
Justin Petucci, Penn State, USA; Dan Nissley, Penn State, USA; Maowei Dong, Penn State, USA; Ian Sitarik, Penn State, USA
Biophysical Society 2025 (Los Angeles)
NCEMS was truly excited and honored to be a part of the BPS 2025 Annual Meeting.
From February 15th – 19th, NCEMS attended the BPS 2025 Annual Meeting, partaking in the Education and Career Table on Sunday, February 16th from 12:30PM PST – 3:30PM PST. The team interacted with participants of all career stages and shared out opportunities and programs offered by NCEMS.
In addition, NCEMS also provided a hands-on training for over 70 participants on four interpretable machine learning techniques. (More information below)
Hands-on Training: Four Interpretable Machine Learning Techniques to Immediately Accelerate Your Research
Monday, February 17, 1:30 PM–3:30 PM
Are you eager to accelerate your work through data science and interpretable machine learning, but unsure where to begin? This beginner-friendly training session is your gateway to utilizing these powerful tools without needing any prior coding experience or software setup. Hosted by the newly established NSF National Center for Emergence in Molecular and Cellular Sciences (NCEMS), this hands-on session will equip you with the skills to harness the four most widely used data science techniques. You will learn how to calculate the association between a feature and a phenomenon; identify key features driving biological behaviors; control for confounding factors; and avoid common pitfalls like data overinterpretation. In addition, you will learn how to state your results in plain English. Through interactive Jupyter notebook exercises, we will dive into the theory and scope of these methods and guide you in interpreting their outputs. By the end of the session, you’ll have the chance to apply these techniques directly to your own datasets, empowering you to make more informed, impactful discoveries in your research. This training is designed specifically to benefit biophysicists, offering practical, easy-to-use methods that can be broadly applied across the field.
Trainers
Ed O’Brien, Penn State, USA; Justin Petucci, Penn State, USA; Dan Nissley, Penn State, USA; Maowei Dong, Penn State, USA; Ian Sitarik, Penn State, USA; Yang Jiang, Penn State, USA
Annual Summit 2024
The inaugural NCEMS Annual Summit was held in Chicago on October 6-9, 2024. This meeting brought together over 100 scientists of diverse backgrounds and career stages to define a set of research questions primed for community-scale synthesis efforts across fifteen scientific themes within the broad area of emergent properties at the mesoscale. Below are the fifteen scientific themes’ presentation (Posing the Problem Sessions) along with the six Strategic Implementation Plan feedback presentation:
Posing the Problem Sessions:
Participants at the Summit each took part in three Posing the Problem sessions. The purpose of each these brainstorming session was to identify the set of key open questions in each area that are primed for community-scale synthesis efforts. Below you can find the introductory slides used to prime conversations in each of the fifteen Posing the Problem session topic areas.
Strategic Implementation Plan Feedback Session:
Participants at the Summit also participated in three Strategic Implementation Plan feedback sessions. In each of these sessions, participants were first given a background presentation by NCEMS leadership before engaging in a directed discussion. The background presentations for each of these feedback sessions are available below.
NSF Office Hour 2024
NCEMS presented at the NSF MCB Virtual Office Hour regarding the Center’s mission and value along with opportunities offered by NCEMS.
The recording can be accessed by the button below: