Our Approach
NCEMS supports and collaborates with multidisciplinary teams to integrate diverse publicly available data and gain deeper and broader insights.
NCEMS accelerates discovery in the molecular and cellular sciences by enabling teams of scientists ranging from physicists to cell biologists and bioengineers to use the latest AI, Machine Learning, and Data Science methods.
Our scientific vision is to understand the unexpected appearance of new biological system properties at different scales of composition, space, time, energy, information, and motion. The Center’s initial focus is on emergent properties at the mesoscale, the scale between biomolecules and organelles, and their influence on higher subcellular and cellular outcomes. The complexity and lack of understanding of mesoscale phenomena and the availability of vast amounts of publicly available data spanning molecular to phenotypic properties makes this an area of synthesis research that is ripe for novel discoveries.
AI, Data, and Community-Powered Discovery
The Center is catalyzing research by creating a synthesis community of interdisciplinary scientists, postdoctoral scholars, graduate students, and undergraduate researchers, along with providing cyberinfrastructure and advanced support for AI applications, machine learning, statistical and systems modeling. NCEMS removes barriers to large-scale synthesis research by providing a range of in-kind support to working groups, Center postdoctoral scholars, and the broader community.
NCEMS is creating and catalyzing transdisciplinary teams that address scientific questions on emergent properties in molecular and cellular sciences.
To accelerate synthesis research, the Center facilitates the creation of interdisciplinary teams of scientists and lowers the barriers to entry for those teams to reuse diverse public data and apply AI, machine learning, and the best data science techniques. For sustained innovation, the Center supports Gold Standard Science practices and best practices in team science, putting the tools and derivative data sets in the hands of scientific practitioners, thereby democratizing research, as well as training students and the community in multidisciplinary approaches.
Some Scientific Themes and Compelling Questions
Subcellular interactions and processes can exist near phase transitions and experience competing forward and backward kinetic processes to maintain steady state. For example, many proteins, are near their solubility limit in vivo (i.e., near a liquid-to-solid phase transition), and biomolecules often experience chemical damage and cellular processes that try to reverse that damage. Biomolecular and cellular behavior can be balanced on a knife’s edge in these situations, where changes at small length scales and short time scales can influence what happens at much larger scales.
What is unknown is which of the cells numerous components and processes are delicately balanced in this way and how they interconnect across scales. This knowledge would allow the community to dissect how perturbations at smaller scales can propagate to influence processes at larger scales, and vice-versa. There is now the opportunity to synthesize next-generation sequencing data, data on cellular dynamics, and phenotypic data with advanced statistics and interpretable machine learning to rapidly identify these signals across scales, and guide researchers to interesting cases to shed light on general principles discovered through physical models that link molecular details to larger scale phenomena.
This theme will advance our understanding of multi-scale coupling and identify physical rules and scenarios connecting molecular, mesoscale, and cellular properties.
Chemistry and physics can reveal universal principles. It follows then that it should be possible to learn the physical basis for emergent properties within one well-studied species and predict the properties of other less-well-studied species. With some 10 million species estimated to exist on earth scientists will only ever be able to study a small fraction of them in detail. By demonstrating what knowledge can be extrapolated between species, general principles can be revealed and the need for a comprehensive cataloging of all species’ properties would be diminished. Community synthesis can address this challenge.
The physical sciences develop new theories by devising minimal models for a specific system or phenomena and then testing the extent to which that model makes accurate predictions for other systems, and then adjusting assumptions and levels of detail to expand the scope of the model and theory as needed. Data-driven synthesis research allows scientists to reverse the order of this process to accelerate discovery. Using machine learning approaches, it is possible to ask what subcellular processes and physical features can be predicted for a test species based on a machine learning model trained on another species. This approach is cheap and fast. By quickly enumerating those transferable and predictable phenomena, working group researchers will be guided to emergent properties most likely to be amenable to general physical models.
The complexity of the biological problems, which require integrating diverse data together with physical models to solve, makes the challenge larger than the capabilities of individual labs. A community-scale synthesis effort is needed.
Understanding and predicting how protein and nucleic acid structures, complexes, and functions combine to form higher levels of mesoscale organization would open new avenues for studying subcellular properties. Cells are more than the sum of the properties of their dilute parts. At the high concentrations within cells, proteins self-organize through weak interactions that can demix cellular components, known as quinary structures. These quinary networks, that exist at the interface between structural and cellular biology, are environmentally adaptive. Many quinary components, structures and functions are unknown. Community-scale synthesis will advance this field by identifying new quinary networks in vivo, revealing new functions and their underlying biophysical basis, and providing key ingredients for current and future efforts to perform whole-cell simulations.
Community-scale synthesis is likely to establish a new paradigm in systems-level mesoscale biology and evolution of complex networks of weak protein-protein and other biomolecular interactions, allow the driving forces for these complexes to be deduced, and their consequences for metabolic dynamics and other cellular processes to be explored.
It is not an accident that many grand challenges in cellular biology, such as the molecular origins of aging, are characterized by dysfunction at the mesoscale. Despite extensive research in these areas, the resulting data presents a situation of paralyzing complexity. This complexity is a symptom of the scientific communities’ incomplete grasp of emergent properties on the mesoscale – as demonstrated by the lack of analytic models that identify emergent concepts and reductionist frameworks that govern mesoscale function. How can synthesis research be used to overcome these large gaps in our knowledge?
Individual mutation and knockout experiments are a common means of investigating the functional dependencies of molecules and regulatory pathways that operate on the mesoscale. However, mesoscale properties are continuous, and binary “either-or” experiments do not reveal whether functional dependencies are linear, exponential, non-monotonic, or saturating. Absent knowledge of trends, it is difficult to develop theories.
Community-scale synthesis offers an alternative approach. We posit it is possible to identify new continuous variables and observables to assess mesoscale functionality. To achieve this goal we need massive amounts of data; that is, dense mesoscale data that is now becoming available. In this approach, gene and protein levels are treated as independent variables that control phenotypic outcomes like growth rates and differentiation status. The data sets resulting from the synthesis will be used to develop analytic models to identify emergent concepts and reductionist frameworks that govern mesoscale function. Through this synthesis research entire classes of grand challenges in biology may become tractable to the tools of theoretical modeling.