With the advent of DNA microarrays data
about the transcription rates of genes can be acquired far more
efficiently than every before. A single
array experiment can measure the levels of thousands of mRNAs.
By measuring these levels under different experimental conditions
one can observe the effects of different external conditions or
gene knockouts and inductions on the functioning of cells. By
measuring transcription in different tissue samples one can discover
diagnostic tests for distinguishing normal tissue from neoplastic
tissue.
The results of m array experiments on
a set of n genes can be represented by a m x n matrix of numbers.
The i-j entry of the matrix
gives the transcription level of the jth gene in the ith experiment.
The experiments may be performed on different tissue samples,
or on the same tissue sample or cell colony under different conditions,
affected by temperature, time, growth conditions, drug treatments,
gene knockouts and inductions etc.. A fundamental tool for mining
this data is to perform clustering to partition the genes into
sets of coregulated genes or to partition the experiments into
sets of conditions with similar patterns of gene transcription.
We will describe several different approaches to these clustering
problems. One can also go beyond clustering to look for more refined
patterns in the data; for example, certain sets of genes may behave
similarly under certain experimental conditions, even though they
are not coregulated under all conditions. We will describe some
approaches to discovering such patterns of conditional coregulation.
One would like to use DNA microarrays
to discover the structure of the pathways that regulate gene expresion
in cells. A pathway can be
regarded as a dynamical system whose state includes the abundances
of certain mRNAs and proteins, and whose inputs include the experimental
conditions described above. A variety of mathematical models have
been proposed for such pathways: the state variables can be treated
as either discrete or continuous, the dynamics can be deterministic,
nondeterministic or stochastic, and one can be interested either
in
transient behavior or in steady-state behavior. We shall describe
some initial work on the design of efficient experiments for inferring
or verifying the structure of such pathways.
This talk represents joint work with many colleagues at the University of Washington and other institutions in the Seattle area.