Our Approach
NCEMS supports and collaborates with multidisciplinary teams to integrate diverse publicly available data and gain deeper and broader insights.
To have the greatest impact, NCEMS is creating and catalyzing communities that address scientific questions on emergent properties in molecular and cellular sciences.
To accelerate synthesis research, the center facilitates the creation of diverse teams of scientists and lower the barriers to entry for those teams to reuse diverse public data and apply the best data science techniques. For sustained innovation, the center supports best practices in open and team science, putting the tools and derivative data sets in the hands of scientific practitioners, thereby democratizing research, as well as training students and the community in multidisciplinary approaches.
Some Scientific Themes and Compelling Questions
Subcellular interactions and processes can exist near phase transitions and experience competing forward and backward kinetic processes to maintain steady state. For example, many proteins, are near their solubility limit in vivo (i.e., near a liquid-to-solid phase transition), and biomolecules often experience chemical damage and cellular processes that try to reverse that damage. Biomolecular and cellular behavior can be balanced on a knife’s edge in these situations, where changes at small length scales and short time scales can influence what happens at much larger scales.
What is unknown is which of the cells numerous components and processes are delicately balanced in this way and how they interconnect across scales. This knowledge would allow the community to dissect how perturbations at smaller scales can propagate to influence processes at larger scales, and vice-versa. There is now the opportunity to synthesize next-generation sequencing data, data on cellular dynamics, and phenotypic data with advanced statistics and interpretable machine learning to rapidly identify these signals across scales, and guide researchers to interesting cases to shed light on general principles discovered through physical models that link molecular details to larger scale phenomena.
This theme will advance our understanding of multi-scale coupling and identify physical rules and scenarios connecting molecular, mesoscale, and cellular properties.
Chemistry and physics can reveal universal principles. It follows then that it should be possible to learn the physical basis for emergent properties within one well-studied species and predict the properties of other less-well-studied species. With some 10 million species estimated to exist on earth scientists will only ever be able to study a small fraction of them in detail. By demonstrating what knowledge can be extrapolated between species, general principles can be revealed and the need for a comprehensive cataloging of all species’ properties would be diminished. Community synthesis can address this challenge.
The physical sciences develop new theories by devising minimal models for a specific system or phenomena and then testing the extent to which that model makes accurate predictions for other systems, and then adjusting assumptions and levels of detail to expand the scope of the model and theory as needed. Data-driven synthesis research allows scientists to reverse the order of this process to accelerate discovery. Using machine learning approaches, it is possible to ask what subcellular processes and physical features can be predicted for a test species based on a machine learning model trained on another species. This approach is cheap and fast. By quickly enumerating those transferable and predictable phenomena, working group researchers will be guided to emergent properties most likely to be amenable to general physical models.
The complexity of the biological problems, which require integrating diverse data together with physical models to solve, makes the challenge larger than the capabilities of individual labs. A community-scale synthesis effort is needed.
Understanding and predicting how protein and nucleic acid structures, complexes, and functions combine to form higher levels of mesoscale organization would open new avenues for studying subcellular properties. Cells are more than the sum of the properties of their dilute parts. At the high concentrations within cells, proteins self-organize through weak interactions that can demix cellular components, known as quinary structures. These quinary networks, that exist at the interface between structural and cellular biology, are environmentally adaptive. Many quinary components, structures and functions are unknown. Community-scale synthesis will advance this field by identifying new quinary networks in vivo, revealing new functions and their underlying biophysical basis, and providing key ingredients for current and future efforts to perform whole-cell simulations.
Community-scale synthesis is likely to establish a new paradigm in systems-level mesoscale biology and evolution of complex networks of weak protein-protein and other biomolecular interactions, allow the driving forces for these complexes to be deduced, and their consequences for metabolic dynamics and other cellular processes to be explored.
It is not an accident that many grand challenges in cellular biology, such as the molecular origins of aging, are characterized by dysfunction at the mesoscale. Despite extensive research in these areas, the resulting data presents a situation of paralyzing complexity. This complexity is a symptom of the scientific communities’ incomplete grasp of emergent properties on the mesoscale – as demonstrated by the lack of analytic models that identify emergent concepts and reductionist frameworks that govern mesoscale function. How can synthesis research be used to overcome these large gaps in our knowledge?
Individual mutation and knockout experiments are a common means of investigating the functional dependencies of molecules and regulatory pathways that operate on the mesoscale. However, mesoscale properties are continuous, and binary “either-or” experiments do not reveal whether functional dependencies are linear, exponential, non-monotonic, or saturating. Absent knowledge of trends, it is difficult to develop theories.
Community-scale synthesis offers an alternative approach. We posit it is possible to identify new continuous variables and observables to assess mesoscale functionality. To achieve this goal we need massive amounts of data; that is, dense mesoscale data that is now becoming available. In this approach, gene and protein levels are treated as independent variables that control phenotypic outcomes like growth rates and differentiation status. The data sets resulting from the synthesis will be used to develop analytic models to identify emergent concepts and reductionist frameworks that govern mesoscale function. Through this synthesis research entire classes of grand challenges in biology may become tractable to the tools of theoretical modeling.