Hunger for new technologies, metrics, and spatiotemporal models in
functional genomics.
Functional genomics, as a field, is applying genomic self-improvement
protocols (cost-effective, comprehensive, precise, accurate, and useful)
to the kinetics of complex cellular systems. Radical surgery in functional
biology aims to mimic the success of structural biology along all five of
those axes. Technologies of recombinant DNA and automation have brought
costs down exponentially (100-fold in ten years) in structural studies.
That combined with definitions of completeness push the second axis (to
better than 99.99%). Those two then conspire to reduce random errors of
the third axis by the beautiful, brute force of repetition. To reduce
systematic errors requires more finesse. Models allow integration of
wildly different experimental methods (e.g. models based on the genetic
code plus phylogeny provide quite independent checks of models based on
DNA electrophoretic images). Model interchange specifications and metrics
for model comparison mutually reinforce one another and provide one path
along the fifth axis, that of utility, via killer-applications such as
homology searches. This combination of modeling and searching provides
serendipity and "functional hypothesis generation" in abundance. It
instantly connects previously separately studied processes and organisms.
Statistical assessment of agreement between experiment and calculation can
lead to improvement of the types of model parameter as well as parameter
values. What are the analogous metrics and models for functional
genomics? How can we estimate possible lower limits to costs? How do we
define completion and accuracy? Finally, how to we create and assess
searches (not just on data but on models) and the utility of applications
in general? How do these feed back to experimental design and feed
forward to bioengineering?
The functional genomics measures that are now thought to be prime for
automation, miniaturization, and multiplexing include electrophoresis,
molecular microarrays, mass-spectrometry, microscopy. Microscopy is well
suited for non-destructive time series, measures concerning spatial
effects and stochastic kinetics of systems of one or a few of any critical
molecule. The other methods currently offer richer signatures for
multiplex (measure many molecules from the same source atonce). Such
extensive multiplexing can reduce errors due to misalignment of the
(unmultiplexed) measures in space and/or time. These misalignments are
dramatic, but by no means limited to unplanned (meta) comparisons between
literature values. In the spirit of eliminating systematic errors, we see
a major role for models as integrating as disparate a set of measures as
possible. The dynamic and spatial biomodels of yore thought doomed by
some by lack of data, will soon promote fresh study in the glaring light
of overdetermination, i.e. more datapoints than adjustable parameters and
feedback to the experiments justification for even more data for even more
accuracy.
We illustrate the above themes in the context of stress responses in
wildtype and mutant human erythrocytes, E. coli and yeast time series. We
assess measures of up to 19 metabolites, 400 proteins, and over 7000 RNAs.
These measures touch most of the critical 34 metabolites in erythrocytes
but only a tiny fraction of the over 1200 in E.coli. They so far
quantitate fewer than 10% of the proteins per experiment (and even these
often have unknown covalent structure). For the RNAs (assayed with a
dense set of oligonucleotides) we see a rich, probably comprehensive set,
including many unpredicted transcripts. So what are the next steps?
Spatial effects seen for DNA-motifs at a few bp, hundreds, and thousands
of bp (for three separate reasons) can be found by automatable methods.
Time-series of molecular concentration data can be aligned by discrete
and/or interpolative dynamic programming. Components of regulatory
networks evident in time-series can be assessed by these independent
models. The components of decay as well as steady-state levels have been
modeled for a complete RNA sets. These time series benefit from the sharp
specific transitions that can be achieved through conditional mutants and
drugs (chemical biology in general). Overarching questions remain as to
how we will systematize (automate) kinetic modeling and applications to a
point analogous with strucural data modeling all the while connecting with
issues of global quality of life?