BDGP Resources
The Role of the Genome Project
Cell 86: 521-529, 1996
The Role of the Genome Project in Determing Gene Function:
Insights From Model Organisms
George L. Gabor Miklos1
Gerald M. Rubin2
1 The Neurosciences Institute, 10640 John Jay
Hopkins Drive, San Diego, California 92121
2 Department of Molecular and Cell Biology, Howard Hughes Medical
Institute, University of California at Berkeley, Berkeley,
California 94720-3200
Introduction
Large amounts of data, from DNA sequences to on-line brain atlases,
are rapidly accumulating in public databases, and there is a
heightened expectation that the increasingly powerful computer
analyses of integrated databases will be sufficient to take us from
DNA sequence to biological function. To what extent is this likely
to be the case? We have examined this question by considering: gene
number in different evolutionary lineages; data derived from
mutagenesis and gene knockouts in Drosophila melanogaster,
Caenorhabditis elegans, Danio rerio, Mus musculus, Arabidopsis
thaliana and Saccharomyces cerevisiae; gene regulatory dynamics in
different systems; the utility of gene transfer methods that allow
precisely controlled misexpression of genes; and the extent to
which various processes are conserved between organisms in
different lineages.
Our analysis suggests that the information in databases will not,
by itself, be sufficient to determine biological function, but will
provide an important foundation for the design of appropriate
experiments. The application of transgenesis and other genetic
methods - in conjunction with total genome sequence and database
information on gene expression patterns, morphological changes
during development, and mutant phenotypes
-
should significantly enhance our ability to unravel the
multilayered networks that control gene expression and
differentiation. This knowledge, which will only be rapidly
obtainable in the model organisms, will allow the reduction of most
of the approximately 70,000 individual genes encoded by the human
genome into a much smaller number of multicomponent, core processes
of known biochemical function.
Bacterial Gene Numbers Vary from Approximately 500 to 8000
and Overlap Those of Single-Celled Eukaryotes
The bacterial genome projects already provide excellent estimates
for the number and types of protein and RNA molecules made by free
living prokaryotes (Table 1). Their gene densities of approximately
1 gene per 1.1 kb, suggest that bacterial gene numbers will vary
from the 473 identified genes in Mycoplasma genitalium (Fraser et
al., 1995), to an estimated 8000 or so in Myxococcus xanthus (Table
1) .
Estimates from the S. cerevisiae genome project indicate that there
are roughly 5800 protein coding genes in the genome of this fungus
(Dujon, 1996). In the tiny free living alga Cyanidioschyzon merolae
(Maleszka, 1993), we have estimated that there will be
approximately 5000 genes if the gene density in this single celled
alga is similar to that in yeast. In the single celled protozoan
Oxytricha similis, there are about 12,000 genes (John and Miklos,
1988).
Eukaryotes of Very Different Organizational Complexity,
such as Protozoa, Caenorhabditis and Drosophila, Have Similar Gene
Numbers in the 12,000 to 14,000 Range
In D. melanogaster, previous estimates of gene number range from
8,000 to 20,000 (Lewin,1994; Nusslein-Volhard, 1994). We have
examined the available data and conclude that gene number in this
fly is closer to 12,000, a figure comparable to that in Oxytricha
and Caenorhabditis.
TABLE 1. Current Predictions of Approximate Gene
Number and Genome Size in Organisms in Different Evolutionary
Lineages.
|
|
Genes |
Genome Size in Mb |
| Prokaryota |
Mycolplasma genitalium |
473 |
0.58 |
|
Haemophilus influenzae |
1,760 |
1.83 |
|
Bacillus subtilis |
3,700 |
4.2 |
|
Escherichia coli |
4,100 |
4.7 |
|
Myxococcus xanthus |
8,000 |
9.45 |
| Fungi |
Saccharomyces cerevisiae |
6,300 |
13.5 |
| Protoctista |
Cyanidioschyzon merolae |
5,400 |
11.7 |
|
Oxytricha similis |
12,000 |
600 |
| Arthropoda |
Drosophila melanogaster |
12,000 |
165 |
| Nematoda |
Caenorhabditis elegans |
14,000 |
100 |
| Mollusca |
Loligo pealii |
> 35,000 |
2,700 |
| Chordata |
Ciona intestinalis |
N |
165 |
|
Fugu rubripes |
70,000 |
400 |
|
Danio rerio |
N |
1,900 |
|
Mus musculus |
70,000 |
3,300 |
|
Homo sapiens |
70,000 |
400 |
| Plantae |
Nicotiana tabacum |
43,000 |
4,500 |
|
Arabidopsis thaliana |
16,000-33,000 |
70-145 |
N, not available. Data from Kamalay and Goldberg, 1980; Capano et
al., 1986; John and Miklos, 1988; Miklos, 1993a; Brenneret
al
., 1993; Gibson and Somerville, 1993; Fleischmann et
al., 1995; Fraser et al., 1995; Collins, 1995; Waterston and
Sulston, 1995; Dujon, 1996.
The haploid genome of D. melanogaster consists of two compartments,
a heterochromatic gene-poor 50 Mb and a euchromatic gene-rich 115
Mb. The 50 Mb houses no more than 25 essential loci and consists
largely of satellite DNA sequences, ribosomal genes, and
transposable elements (John and Miklos, 1988). We have estimated
the coding capacity of the 115 Mb compartment in three ways. First,
we determined the lengths of transcription units by analyzing cDNAs
from the literature using 278 cases where the cDNA could be aligned
with genomic DNA. These transcription units come from nearly every
division of the genome and have been isolated in chemical and
ionizing radiation mutagenesis screens, by insertion of
transposons, in chromosomal walks, by molecular sequence
similarity, and in mutagenesis screens designed to isolate
behavioral mutants as well as mutants with altered brain anatomy. A
transcribed genomic sequence that gives rise to one or more
proteins with shared exons was scored as one transcription unit,
and its length was taken from the position of RNA initiation to
that of the site of polyadenylation, as measured on the underlying
genomic sequence. Multiple transcripts arising from alternative
initiation or polyadenylation sites, or alternative RNA splicing at
a single locus were not considered as multiple genes but as
variants of a single transcription unit.
When placed end to end, the 278 transcription units used for
analysis occupied 2.4 Mb of genomic DNA. Assuming this ratio
applies generally, the 115 Mb euchromatic genome could accommodate
13,200 transcription units. This is an overestimate, since the 115
Mb portion contains at least 15 Mb of mobile elements, and since we
have not allowed for any regulatory DNA sequences between
transcription units or included transcription units in excess of
100 kb. Our second estimate utilizes only those examples in which a
minimum of two transcription units are available in any contiguous
stretch of genomic DNA and hence includes the DNA between
transcription units. This yields 158 transcription units embedded
in 1.7 Mb of DNA, approximately 11,000 transcription units per
genome. Our third estimate is a reevaluation of polysomal mRNA
hybridization data which was originally based on an average mRNA
size of 1250 nt (Levy and Manning, 1981). The appropriate mRNA
length estimated from current molecular data is 2100 nt (Maroni,
1994, 1996), leading to a revised estimate of 10,000 transcription
units. Since the two most reliable estimates based on cloned
material vary from 11,000 to about 13,000 transcription units, we
take 12,000 as a working figure for the number of protein coding
genes in D. melanogaster.
A comparison with other organisms reveals that a unicellular
protozoan, a nematode worm, and a fly develop and function with
12,000-14,000 genes (Table 1). These three examples illustrate that
there can be large differences in morphological complexity among
different organisms that have similar numbers of genes. Gene number
per se is not likely to provide a useful measure of biological
complexity. The increase in the average amount of DNA occupied by a
genetic unit from 1kb in bacteria, to 2kb in yeast, to 10kb in
flies is likely to reflect an increased requirement for
cis-acting regulatory elements in metazoan
organisms.
The Number of Core Biochemical Pathways and Mechanisms is
Likely to be Similar in All Metazoa
Polysomal mRNA data indicate that the squid Loligo pealii has at
least 35,000 genes (Capano et al., 1986), and our re-evaluation of
data from tetraploid tobacco (Kamalay and Goldberg, 1980), in the
light of cloned mRNA lengths (Maroni, 1996), shows that this plant
has approximately 43,000 genes. Thus, excluding vertebrates, the
variation in gene number in multicellular eukaryotes
currentlyranges from approximately 12,000 to about 43,000, assuming
the squid and tobacco estimates, which are based solely on a single
method, are accurate.
Human and mouse genomes are thought to have approximately 70,000
genes (Antequera and Bird, 1993; Collins, 1995). Although over
270,000 human expressed sequence tags (ESTs) were available in
public databases as of October, 1995, it is still unclear how many
genes have been identified by this methodology (Jordan, 1996). On
the basis of presently available data, the human genome could have
less than 50,000, or greater than 100,000 genes. This uncertainty
is unlikely to be resolved until a large sample of the genome has
been sequenced so that the fraction of genes represented in the EST
databases can be assessed.
Why are mammals likely to have four to six times as many genes as
Caenorhabditis and Drosophila? One possibility is that a
significant component of the mammalian increase has occurred by
polyploidization, a common evolutionary feature in most unicellular
and metazoan lineages (John and Miklos, 1988). The evolution of
mammalian genomes is thought to include at least two whole genome
duplications of an ancestral genome (Holland et al., 1994), as well
as duplication of sub-chromosomal segments together with extensive
gene duplication that has given rise to many large multigene
families (Lundin, 1993). If the genome projects verify the
underlying octoploid nature of the human and mouse genomes, then
the basic vertebrate gene number may be similar to that of the fly
and worm, about 12,000 to 14,000 genes. Interestingly, the
urochordate Ciona intestinalis has a genome size and a repetitive
DNA content similar to that of D. melanogaster (John and
Miklos,1988). If this were to be indicative of a basic chordate
genome, then the number of core biochemical pathways and mechanisms
is unlikely to be greatly different in flies, nematode worms, early
chordates and humans. The duplicated pathways in mammals are,
however, likely to have adopted specialized expression patterns and
biological functions.
How widespread is duplication at the genomic level? Analysis of
Haemophilus influenzae reveals that 30% of its 1760 genes are
essentially identical duplication products (Brenner et al., 1995).
Estimates from Escherichia coli indicate that 46% of its 4100 genes
are recognizable as gene duplicates (Koonin et al., 1995). In
yeast, the published genomic sequences show that at least 14% of
its 5800 genes are clear duplicates. In worms, flies, mice and
human, there are insufficient data as yet to determine what
proportion of genes are duplication products. The majority of genes
in the mouse and human genome exist as multigene families, some of
whose memberships are in the hundreds to thousands. It is estimated
that there are 2000 or so protein kinases and perhaps as many as
1000 phosphatases (Hunter, 1995). This compares with an estimated
350 protein kinases and 80 phosphatases in the worm (Hodgkin et
al., 1995). However, if mammalian genomes are minimally octoploid,
then a substantial proportion of the mouse and human genomes will,
initially at least, have consisted of duplication products.
Anecdotal data on an increasing number of genes support this view:
Drosophila has one copy each of the Ras, Raf and Notch genes, as
well as of the genes of the Hox cluster, while vertebrates have
three or more of each of these genes.
In multicellular organisms, functional duplicate copies of a gene
can exist in a genome, but if their expression patterns do not
overlap, their products are unable to compensate for each other if
either gene is mutated. The information gathered in databases will
provide an essential guide to analyzing the extent of
potentialcompensation during the life cycle of an organism by
providing detailed information on the sites of expression of each
gene.
In Yeast, Worms, Flies, and Mice, Only About 1 in 3 Genes
is Essential for Viability
The consequences of some genomic perturbations cannot be
compensated for by normal epigenetic processes and result in the
death of the organism prior to adulthood. To determine the extent
of compensation, we first summarize data on Drosophila genes whose
inactivation leads to lethality, and then compare the fly data with
those from other organisms.
The number of lethal loci in the Drosophila genome is thought to be
about 5000 (Nusslein-Volhard, 1994; Lewin, 1994), but new data
allow us to refine this figure downwards. We have evaluated the
published data from 27 different chromosomal regions that have been
subjected to extensive mutagenesis. From this large sample
comprising a quarter of the fly genome we estimate that there are
approximately 3600 lethal loci in a Drosophila genome of 12,000
genes (Table 2).
TABLE 2. Frequencies of Lethal Loci in Different
Regions of the
Drosophila Genome Expressed in Terms of the
Number of Polytene Bands in the Mutagenized Interval
| Chromosome |
Number of Bands Analyzed |
Number of Lethal Loci |
Ratio of Bands to Lethal Loci |
Extrapolation of Lethal Loci per
Genome |
| X |
450 |
298 |
0.66 |
3350 |
| 2 |
415 |
267 |
0.64 |
3260 |
| 3 |
343 |
235 |
0.69 |
3470 |
| 4 |
50 |
34 |
0.68 |
3440 |
| Total |
1253 |
836 |
0.67 |
3380 |
Most Intensively Studied Regions on the X Chromosome
________________________________________________________________________
The 27 individual chromosomal intervals analyzed and the references
on which these estimates are based, are available from G.L.G.M. or
G.M.R.
The estimates for Caenorhabditis range from 2,900 to 3,500 lethal
loci in a genome of approximately 14,000 genes (Table 3). These
estimates are based on extrapolations from three regions of the
worm genome that together constitute about 8% of the genetic map
(Clark et al., 1988; Howell and Rose, 1990; Johnsen and Baillie,
1991).
TABLE 3. Estimation of Lethal Loci in Different
Regions of the
C. elegans Genome.
| Chromosome Region |
Length in Map Units |
Loci Found |
Predicted Number of Loci |
Extrapolated Number per Genome |
| unc-22(sDf2) Chromosome 4 |
2.2 |
31 |
48 |
3500 |
| hDf6 Chromosome 1 |
1.5 |
19 |
25 |
3300 |
| eT1(III;IV) Chromosome 5 |
23.0 |
101 |
120 |
2850 |
In S. cerevisiae, approximately 900 genes out of 5800 are cell
lethals, and an additional 900 act to stop cell cycle processes or
cause impairment of growth on specific media ( Burns et al., 1994).
Hence about 1800 genes in toto are equivalent to the lethal class
of multicellular organisms (Table 4).
TABLE 4. The Estimated Number of Transcription
Units and Lethal Loci in Different Organisms.
|
Transcription Units |
Lethal Loci |
| S. cerevisiae |
6,300 |
1,900 |
| C. elegans |
14,000 |
2,700-3,500 |
| D. melangoaster |
12,000 |
3,600 |
| A. thaliana |
25,000 |
500 |
| D. rerio |
N |
5,000 |
| F. rubripes |
70,000 |
N |
| M. musculus |
70,000 |
5,000-26,000 |
In Arabidopsis there are approximately 500 lethal genes (Meinke,
1994) in a genome that is reported to house about 25,000 genes
(Goodman et al., 1995). Whether this finding is a peculiarity of
plant reproductive processes and embryonic development, whether the
current estimates for the number of lethal genes or total gene
number are unreliable ones (or both) awaits future analysis.
In the zebrafish it is estimated that there are roughly 5,000
lethal genes (Haffter et al., submitted), although the total number
of genes in the genome is not known. The only estimate of gene
number in a teleost is from the pufferfish Fugu rubripes, which is
claimed to have as many genes as humans (Brenner et al., 1993),
although this estimate is based on a sample of only 0.1% of the
genome.
In Mus, the available data on lethal loci largely stem from three
sources: from the 263 gene knockouts summarized by Brandon et al.
(1995); from small promoter trap analyses, such as that of
Friedrich and Soriano (1991); and from a mutagenesis analysis of
the t region of chromosome 17 (Dove, 1987). Of the 263 knockouts,
approximately 25 percent are embryonic lethals. Taken at face
value, these figures indicate that there would be approximately
18,000 lethal loci if the mouse genome houses 70,000 genes.
However, this is a highly selected sample of genes and the extent
to which it is a reliable guide to the whole genome is not known.
In the promoter trap study, 9 out of 24 knockout strains yield
homozygous embryonic lethals, indicating that there would be
approximately 26,000 lethal loci if these figures are used. In the
genetic analysis of the t region, 17 lethal loci were recovered and
it is on this minute sample that the figure of 5,000 to 10,000
lethal loci in the mouse genome is based (Dove 1987). It is clear
that an estimate of the number of lethal loci in the mouse genome
is uncertain, and presently ranges from 5,000 to 26,000.
The Phenotypic Consequences of Gene Inactivation Depend on
Genetic Background
The interpretation of gene inactivation, deletion, or knockout data
needs to be treated with caution (Erickson, 1993; Weintraub, 1993;
Thomas, 1993; Crossin, 1994; Pickett and Meeks-Wagner, 1995), since
detecting subtle phenotypic alterations under laboratory conditions
is difficult. In addition, the current methods used in evaluating
function are often inadequate, and small reductions in fitness are
usually not measured. In yeast, for example, the total deletion of
a membrane protein coding for a probable acetic acid exit pump
usually has little phenotypic effect. However, the cells die when
grown on glucose at low pH and when perturbed with acetic acid
(Oliver, 1996). In multicellular organisms, it is not always
possible to comprehend fully the phenotypic consequences of a
knockout or gene perturbation. In Drosophila, for example,
second-site mutations often partially suppress the phenotype of a
gene perturbation, and these modifiers accumulate in cultures of
Drosophila maintained as homozygous stocks (Ashburner, 1989). In
humans it is clear that simple single gene diseases are rare (van
Heyningen, 1994; Mulvihill, 1995; Brandon et al., 1995). As
described below, to fully understand the phenotypic changes caused
by mutation of a gene requires knowledge of the different cell
types, developmental stages and cellular processes in which it
functions as well as of the compensatory changes that may occur to
allow that function to be accomplished in a different way.
A gene knockout can result in different phenotypes when it is
placed in different genetic backgrounds. For the mouse epidermal
growth factor receptor knockout there is preimplantation death on a
CF-1 background. There is mid-gestation death on a 129 / Sv
background. On a CD-1 background, the mutant mice live for 3 weeks
or so (Threadgill et al., 1995). Similarly the mouse activin /
inhibin
bB subunit knockout has an eye
defect that is not seen in a 129 Sv background, but is penetrant in
both 129Sv / C57BL / 6 and 129 Sv / BALB / c backgrounds (Vassalli
et al., 1994). Different genetic backgrounds can allow or eliminate
intestinal tumors in mice, and in humans there is variation between
different members of the same family inheriting the APC mutation,
which predisposes them to colon cancer (Dietrich et al., 1993). The
human phenotypic spectrum can differ from that of the mouse for the
same gene, the perturbations of the
ret receptor tyrosine
kinase being a good example (van Heyningen, 1994). All of these
data draw attention to the compensatory resiliency that is known to
occur in developmental networks in different organisms (Crossin,
1994; Pickett and Meeks-Wagner, 1995). One of the challenging
future research avenues is to examine single and multiple gene
inactivations in different genetic backgrounds and to map, isolate
and characterize the major contributors to the variation (Lander
and Schork, 1994).
Nearly All Gene Products are Expressed and Utilized at
Multiple Places and Times during Development
The classical genetic studies in Drosophila and Mus revealed that
certain genes affected many aspects of the phenotype, and these
were termed pleiotropic. Indeed Gruneberg (1952) first pointed out
for the mouse that all genes that had been studied with any care
had pleiotropic effects. In Drosophila, pleiotropy is the rule
rather than the exception (Greenspan et al., 1996). In molecular
terms, pleiotropy can arise if a protein (or RNA) is functionally
required in different places, at different times, or both. The
expression of the Notch transmembrane protein of Drosophila is one
example. It is involved with different ligands in cell-cell
interactions in different tissues in a variety of regulative events
(Artavanis-Tsakonas et al., 1995).
A large scale analysis of functional requirements has been
undertaken in the Drosophila germ line and the compound eye
(Perrimon et al., 1989; Thaker and Kankel, 1992). The data suggest
that 75% of the 3600 lethal loci in the fly genome are functionally
required during oogenesis, since the absence of their products
results either in cell death or in abnormal oogenesis. An analysis
of the assembly and neural connectivity of the developing eye
yields a similar result: 70% of the 3600 lethal loci are predicted
to be functionally required for the development of the eye (Thaker
and Kankel, 1992). If the pleiotropy of lethal loci is not
substantially different from that of non-lethal loci, then in
excess of 70% of the genes in the genome would be used in the
construction of each of these organ systems.
A further indication of potential pleiotropy emerges from studies
of gene expression that almost always reveal expression of a gene
in more than one place or at more than one time. In a study of
nearly 600 randomly selected enhancer trap lines found to be
expressed in the Drosophila larval brain, only two lines gave
staining exclusively in the nervous system. Most lines revealed
expression outside of the central nervous system with little tissue
or organ specificity (Datta et al., 1993). In a similar study of
nearly 20,000 enhancer trap lines, over 15% were expressed early
during development of the retina, but only one was found to be
limited to the visual system (U. Gaul, L. Higgins and G. Rubin,
unpublished data). Furthermore, in studies of reporter gene
expression in over 3700 enhancer trap lines during embryogenesis,
there was extensive expression at different times and at different
places (Bier et al., 1989). These are large samples of localized
genomic activity and it is clear that nearly all Drosophila genes
are expressed in at least two different places or times during
development. However, it is not safe to assume that whenever a
protein is expressed in a cell, it is expressed there for
functional reasons; aspects of an expression pattern may simply
reflect the default outcome of the regulatory networks in which
that gene happens to be embedded.
Databases of Gene Structure and Expression Patterns will be
Critical but Insufficient to Decipher Gene-Regulatory
Networks
One approach to the functional evaluation of regulatory elements is
to identify evolutionarily conserved regulatory regions by
interspecies comparisons, in combination with transgenic analyses.
For example, DNA sequence comparisons of the promoter regions of
four different rhodopsin genes from D. melanogaster and D. virilis
reveal an interchangeable conserved set of core sequences with
additional upstream sequences conferring cell type specificity.
Detailed mutagenesis studies of 31 regulatory regions reveal that 7
of the 8 conserved sequences are compromised in their functions
when mutagenized, whereas none of the 23 nonconserved regions
perturb normal function when altered (Fortini and Rubin, 1990). It
is likely that computer analyses between different species will
reveal a proportion of conserved core regulatory sequences for
genes. The extent to which this holds within and between phyla
awaits experimental analyses. It is already clear that such
comparisons between Mus and Homo will be a preferred method for
defining the control regions of mammalian genes (Ravetch et al.,
1980) and provide a strong argument for syntenic sequencing of the
human and mouse genomes.
To what extent will the knowledge of all the regulatory components
during development provide information on the strengths of
molecular interactions and the thresholds which determine normal
developmental or physiological responses? We turn to this issue,
which relates to networks, thresholds, and non-linear responses in
biological systems (Edelman, 1987, 1988; Weintraub, 1993).
Many biological systems function synergistically rather than as on
/ off switches. Protein tyrosine phosphatases for example, act
synergistically with protein kinases to produce particular
physiological responses (Fischer, 1993; Cool and Fischer, 1993). In
addition, there are threshold effects when transcription factors
bind combinatorially to other proteins, as well as to high and low
affinity DNA sites, or when the spacing between DNA binding sites
is altered (Gray et al., 1995). For example, high levels of the
Dorsal protein activate the
twist and
snail
genes, whereas low levels repress
zerknult and
decapentaplegic (Jiang and Levine, 1993). Multiple
protein-protein interactions also have significant effects on
target affinities (Struhl, 1996). In general, synergistic
interactions can lead to large responses following small changes in
the concentrations of transcriptional components, an effect that is
also produced by phosphorylation of transcription factors.
The order in which proteins are assembled into a multisubunit
transcription complex is important, as are the rate-limiting steps
in assembly and the physiologically relevant protein-protein
interactions (Tjian and Maniatis, 1994; Struhl, 1996; Goodrich et
al., 1996). However, neither the order nor the rate of assembly can
be derived by computer analysis from the knowledge of the number
and type of protein components active in a particular cell. The
outputs of multisubunit protein complexes, be they transcription
complexes or phosphorylated receptor-docking protein complexes, are
nonlinear. Insights into their nature cannot be extracted directly
from any combination of databases because they are not an explicit
property of the information itself, but of time-dependent
combinatorial interactions that must be analyzed across many
levels. To obtain insights into these time-dependent interactions,
thresholds and networks, particularly during development, will
require transgenic organisms in which precise molecular alterations
have been engineered.
Core Cellular Processes and Pathways are Largely Conserved
among the Model Organisms
The problem of understanding developmental processes in different
organisms is compounded by the finding that, on the one hand, there
are highly conserved genes and gene networks in distantly related
organisms, yet on the other hand some genes and gene networks occur
in one lineage but are absent from another. For example, the
bacterial genome projects reveal that H. influenzae has 68 genes
for amino acid biosynthesis whereas Mycoplasma genitalium has only
1. In addition, the majority of genes in the archaebacterium
Methanococcus jannaschii are claimed to have no equivalents in
other organisms (Holden, 1996). In S. cerevisiae, over 30% of the
genes as yet has no relatives in any other organism (Dujon, 1996).
Furthermore, we still have little idea how many of the genes that
are present in vertebrate, invertebrate, fungal, plant and
Protoctistan genomes are unique to a lineage. Some major classes of
genes, however, are clear signatures for particular lineages. The
immunoglobulin genes of the vertebrate immune system are not found
in the yeast, fly or worm genomes. Collagens are not found in
unicellular eukaryotes, and receptor tyrosine kinases appear to be
a metazoan invention (Hunter, 1994).
On the other hand, many thousands of proteins, with varying degrees
of sequence similarity, are common to many lineages, and these
proteins make up much of the cellular machinery. Conservation of
function also occurs at higher levels. In many cases not only
individual protein domains and proteins, but entire multisubunit
complexes and biochemical pathways are conserved. In some cases,
the way in which these complexes and pathways are utilized in the
development and physiology of the organism are also conserved. For
example, it is known that intracellular protein transport in yeast
and synaptic vesicle release in neurons have conserved protein
components (Rothman, 1994). In signaling pathways such as those
involving the Ras and Notch cascades, many of the protein
components are conserved between yeasts, flies, worms and humans
(Wassarman et al., 1995; Artavanis-Tsakonas et al., 1995). The CREB
transcription factor has been implicated in the cAMP-PKA pathway
involved in synaptic plasticity and the formation of long term
memory processes in the Mollusca, Arthropoda and Vertebrata
(Greenspan, 1995; Deisseroth et al., 1996). The use of similiar
cell adhesion molecules by Drosophila, Caenorhabditis, and Gallus
gallus provides evidence for phylogenetically conserved mechanisms
of growth cone guidance of neurons (Goodman, 1996). Examples of
apparent conservation even extend to processes that had not been
thought to have a shared ancestor, such as vertebrate and
invertebrate limb formation (Shubin et al., 1996).
Assessing conservation of function is much more difficult than
assessing structural conservation. For structure, the different
genome projects will provide the absolute basis on which core
components such as protein domains, proteins, and multisubunit
complexes, can be compared in different evolutionary lineages.
However, to assess functional conservation one must determine the
function of a protein or pathway in more than one organism. As we
argue, obtaining the requisite knowledge of gene networks and
regulatory elements will require sophisticated genetic and
transgenic experimentation that is now only possible in a few
organisms. One important task will be to determine the extent to
which the novel use of conserved core processes in a given lineage,
as opposed to the invention of new molecular processes, has
contributed to producing morphological and biochemical novelties
(for further discussion see Miklos, 1993a,1993b; Miklos et al.,
1994; Miklos and Campbell, 1994). In the Chordate lineage alone,
these novelties include the immune system, the presence of myelin
sheaths, electroreception and infra-red vision. The genome project
and transgenic data will not only help to determine the extent to
which functional interchangeability at the gene level is possible
among different organisms, but will also allow better choices to be
made about what genes to use for such interspecies transfers.
Analysis of Loss-of-Function Mutations Needs to Be
Complemented with Studies of the Effects of Gene
Misexpression
Much of the knowledge of developmental processes in the fly, worm,
mouse and zebrafish, and of the cell biology of yeast, has been
obtained via loss-of-function perturbations (Nusslein-Volhard,
1994; Mullins et al., 1994; Burns et al., 1994; Spradling et al.,
1995; Brandon et al., 1995). The information gained from careful
analysis of loss-of-function phenotypes has proven to be valuable
in elucidating complex genetic pathways such as the yeast cell
cycle (Hartwell, 1991) and early pattern formation in the
Drosophila embryo (Nusslein-Volhard and Weischaus, 1980).
Nevertheless, the loss-of-function approach quickly reaches a
pragmatic limit for several reasons. First, the majority of genes
have no easily assayable loss-of-function phenotype. Second, even
when a phenotype is observed, it only reflects that part of the
function of a gene that cannot be compensated by other genes and
pathways. In many cases this will represent only a small fraction
of the function of a gene in the organism. Third, pleiotropy of
gene function complicates analysis. For example, it is difficult to
examine the role of a gene in a cellular process if its mutation
arrests cell proliferation and thereby prevents the generation of a
population of homozygous mutant cells. If a mutation results in
embryonic lethality, it is difficult to study the role of that gene
in the formation of an adult organ, although it is sometimes
possible to use temperature-sensitive mutations to surmount these
dificulties. The use of site-specific recombination systems in
transgenic animals offers a general approach to generating lineage
specific mutations. These approaches are being exploited in the
mouse by the use of the bacteriophage P1
cre-loxP system
(Gu et al., 1994) and in the fly by the yeast
FLP-FRT
system (Xu and Rubin, 1993).
Spatially and temporally targeted misexpression of individual genes
provides an alternative way to perturb gene-regulatory networks. In
Drosophila, this can be achieved using the
GAL4-UAS system
in which an enhancer trap vector that expresses the yeast
transcriptional activator
GAL4, has been mobilized to
generate hundreds of lines that drive
GAL4 expression from
a large number of genomic enhancers (Brand and Perrimon, 1993).
Each of these can be used to activate specifically a target gene of
choice. The
GAL4 line whose individuals exhibit the
experimentally desired expression pattern is then crossed to
UAS target gene-bearing individuals and the gene is
activated only in those cells where
GAL4 is expressed. The
target gene can come from any organism, can be a synthetic
combination of domains, or encode a protein with either unregulated
or dominant negative (Herskowitz, 1987) function, can code for a
site-specific recombinase. In this way, and without interfering
with the developmental processes leading to adult structures, gene
products from any organism or synthetic source can be expressed in
specific parts of the fly, such as the nervous system, as well as
targeted to different subcellular locations such as synapses
(Callahan and Thomas, 1994). As with all such modifications, the
difficult task is to make sense of the organismal behaviors
following genomic changes (Ferveur et al., 1995).
Another approach to generating controlled misexpression relies on
site-specific recombination systems to remove a transcriptional
terminator that separates a gene from its promoter (Struhl and
Basler, 1993). For example, misexpressing the decapentaplegic
(
dpp) gene in the region of the developing fly leg where
the wingless (
wg) product is made produces a secondary
proximal-distal axis (Diaz-Benjumea
et al., 1994). While
the expression patterns of
dpp and
wg suggested
that interactions between
dpp-expressing and
wg-expressing cells might induce the proximal-distal axis,
this was difficult to confirm by analysis of loss-of-function
mutations. Both
dpp and
wg have early and
essential roles in development and also affect cell proliferation.
The ability to create a new proximal-distal axis at an ectopic site
of contact between
wg-expressing and
dpp-expressing cells validates the hypothesis of the
induction of a new proximal-distal axis, in a way that was not
possible using loss-of-function mutations.
It is clear from such studies that future work will be driven
increasingly by powerful transgenic technologies which will allow
finer and finer orchestrations of multiple developmental networks
in vivo. Saccharomyces, Drosophila, and Mus are the only organisms
where the techniques to accomplish these types of manipulation are
now possible. While yeast, fly, worm, zebrafish, pufferfish,
Xenopus, chicken and mouse each continue to contribute heavilyto
solving common problems, they do have limitations in serving as
models for each other or for humans.
Perspectives
We believe that the data to which we have drawn attention, provide
reasonable indicators of the diversity of information that is
likely to be available in the not too distant future. We also think
that, in terms of experimental challenges, the next period will
need to be one of expanded transgenic biology in which: multiple
modifications are made within a genome; an increasing number of
genes and regulatory sequences are shuttled between different
organisms; and natural variation within and between species is more
extensively used to understand parts of biological networks.
Even in their integrated form, the databases have significant
limitations. They do not hold information on non-linear responses
and thresholds, both of which underpin development and each of
which can only be analyzed by in vivo experimentation. Furthermore,
the fundamental issues of comparative morphogenesis (Edelman and
Jones, 1995; Bard, 1990; Garcia Bellido, 1994) and comparative
brain function (Edelman, 1987; Miklos, 1993a) will depend on a
deeper understanding of place-dependence, complexity, and
degeneracy in biological systems (Kampis and Csanyi, 1987; Edelman,
1993; Tononi et al., 1994, 1996).
What will be the major near-term contribution of model organisms to
the understanding of human biology, and how will the information
from the genome projects help? While the human genome may contain
approximately 70,000 genes, these genes will encode the components
of perhaps only a few hundred biological processes - for example,
amino acid biosynthesis, protein synthesis, protein secretion, cell
cycle regulation, signal transduction pathways, and cell-cell and
cell-substrate adhesion. Of all the invertebrate model organisms
whose genomes are currently being sequenced, the gene systems in
Drosophila show the highest degree of structural conservation to
those of humans (see for examples Sidow and Thomas, 1994;
Artavanis-Tsakonas et al. 1995), and it seems reasonable to expect
that most of the components of these biological processes, and the
way in which they interact with each other, will be conserved
between flies and human. Perhaps more surprising is the extent to
which the developmental and physiological functions of these core
processes between fly and human appear to be conserved. As we have
discussed, the experimental tools exist in the model organisms, but
not in humans, for assembling genes into pathways. The genome
projects in each of the model organisms will greatly facilitate
this experimental work and, together with the sequence analysis of
human genome, will allow for the transfer of this information to
human biology. Thus, the principal contribution of the model
organisms to human biology over the next 5 years will be the
reduction of most of the approximately 70,000 individual components
encoded by the human genome into a much smaller number of
multicomponent core processes of known biochemical function.
Knowledge of the precise ways in which each of the evolutionarily
conserved core processes are used in humans, and the many ways in
which their perturbations can lead to disease, will only come from
the study of humans themselves, with some contribution form
vertebrate models such as the mouse.
In the Post-Sequence Era, we may eventually be able to move beyond
what evolutionary processes have actually produced and ask what can
be produced. The gene transfer approach may ultimately be
superseded by an even more radical way of tackling development,
namely, by making novel combinations of protein domains and
regulatory motifs, and building novel gene networks and
morphogenetic pathways. That is, we may be able not only to discern
how organisms were built and how they evolved, but more
importantly, estimate the potential for the kinds of organisms that
can still be built.
ACKNOWLEDGEMENTS
This work has been supported by the Neurosciences Research
Foundation and the Howard Hughes Medical Institute. We would also
like to thank our many colleagues, especially D. Botstein, C.
Coyle-Thompson, K. Crossin, G.M. Edelman, R. Greenspan, I.
Herskowitz, F. Jones, M. Levine, and A. Spradling for help in
different aspects of this work.
REFERENCES
Antequera, F., and Bird, A. (1993). Number of CpG island and genes
in human and mouse. Proc. Natl. Acad. Sci. USA
90,
11995-11999.
Artavanis-Tsakonas, S., Matsumo, K., and Fortini, M. E. (1995).
Notch signaling. Science
268, 225-232
Ashburner, M. (1989). Drosophila, a laboratory handbook (New York:
Cold Spring Harbor Laboratory Press)
Bard, J. (1990). Morphogenesis: The Cellular and Molecular
Processes of Developmental Anatomy (Cambridge: Cambridge University
Press).
Bier, E., Vaessin, H., Shepherd, S., Lee, K., McCall, K., Barbel,
S., Ackerman, L., Carretto, R., Uemura, T., Grell, E., Jan, L.Y.,
and Jan, Y.N. (1989). Searching for pattern and mutation in the
Drosophila genome with a P-lacZ vector. Genes and Development
3, 1273-1287.
Brand, A.H., and Perrimon, N. (1993). Targeted gene expression as a
means of altering cell fates and generating dominant phenotypes.
Development
118, 401-415
Brandon, E.P., Idzerda, R.L., and McKnight, G.S. (1995). Targeting
the mouse genome: a compendium of knockouts. Current Biology
5, 1-27
Brenner, S., Elgar, G., Sandford, R., Macrae, A., Venkatesh, B.,
and Aparicio, S. (1993). Characterization of the pufferfish (Fugu )
genome as a compact model vertebrate genome. Nature
366,
265-268
Brenner, S.E., Hubbard, T., Murzin, A., and Chothia, C. (1995).
Gene duplications in H influenzae. Nature
378, 140
Burns, N., Grimwade, B., Ross-Macdonald, P.B., Choi, E-Y., Finberg,
K., Roeder, G.S., and Snyder, M. (1994). Large-scale analysis of
gene expression, protein localization, and gene disruption in
Saccharomyces cerevisiae. Genes and Development
8,
1087-1105.
Callahan, C.A., and Thomas, J.B. (1994). Tau-
b-galactosidase, an axon-targeted fusion protein. Proc.
Natl. Acad. Sci. USA
91, 5972-5976
Capano, C.P., Gioio, A.E., Giuditta, A., and Kaplan, B.B. (1986).
Complexity of nuclear and polysomal RNA from squid optic lobe and
gill. Journal of Neurochemistry
46, 1517-1521
Clark, D.V., Rogalski, T.M., Donati, L.M., and Baillie, D.L.
(1988). The unc-22 (IV) region of Caenorhabditis elegans: genetic
analysis of lethal mutations. Genetics
119, 345-353
Collins, F.S. (1995). Ahead of schedule and under budget: the
genome project passes its fifth birthday. Proc. Natl. Acad. Sci.
USA
92,10821-10823
Cool, D.E., and Fischer, E.H. (1993). Protein tyrosine phosphatases
in cell transformation. Cell Biology
4, 443-453
Crossin, K.L.(1994). Functional role of cytotactin/tenascin in
morphogenesis: a modest proposal. Perspectives on Developmental
Neurobiology
2, 21-32
Datta, S., Stark, K., and Kankel, D.R. (1993). Enhancer detector
analysis of the extent of genomic involvement in nervous system
development in Drosophila melanogaster. J. Neurobiology
24, 824-841
Deisseroth, K., Bito, H., and Tsien, R.W. (1996). Signaling from
synapse to nucleus: postsynaptic CREB phosphorylation during
multiple forms of hippocampal synaptic plasticity. Neuron
16, 89-101.
Diaz-Benjumea, F. J., Cohen, B., and Cohen, S. M. (1994). Cell
interaction between compartments establishes the proximal-distal
axis of Drosophila legs. Nature
372, 175-179
Dietrich, W.F., Lander, E.S., Smith, J.S., Moser, A.R., Gould,
K.A., Luongo, C., Borenstein, N., and Dove, W., (1993). Genetic
identification of Mom-1, a major modifier locus affecting
Min-induced intestinal neoplasia in the mouse. Cell
75,
631-639
Dove, W.F. (1987). Molecular genetics of Mus musculus: point
mutagenesis and millimorgans. Genetics
116, 5-8.
Dujon, B. (1996). The yeast genome project: what did we learn?
Trends Genet.
12
Edelman, G.M. (1987). Neural Darwinism: The Theory of Neuronal
Group Selection (New York: Basic Books)
Edelman, G.M. (1988). Topobiology: An Introduction to Molecular
Embryology (New York: Basic Books)
Edelman, G.M. (1993). A golden age for adhesion. Cell Adhesion and
Communication
1, 1-7
Edelman, G.M., and Jones, F.S. (1995). Developmental control of
N-CAM expression by Hox and Pax gene products. Phil. Trans. R. Soc.
Lond. B
349, 305-312
Erickson, H.P. (1993). Gene knockouts of c-src, transforming growth
factor b1, and tenascin suggest superfluous, nonfunctional
expression of proteins. The Journal of Cell Biology
120,
1079-1081.
Ferveur, J-F., Störtkuhl, K.F., Stocker, R.F., Greenspan,
R.J., (1995). Genetic feminization of brain structures and changed
sexual orientation in male Drosophila. Science
267,
902-905.
Fischer, E.H., (1993). Protein phosphorylatin and cellular
regulation II (Novel Lecture). Angew. Chem. Int. Ed. Engl.
32, 1130-1137.
Fleischmann, R.D., Adams, M.D., White, O., Clayton, R.A., Kirkness,
E.F., Kerlavage, A.R., Bult, C.J., Tomb, J.-F., Dougherty, B.A.,
Merrick, J.M., et al. (1995). Whole-genome random sequencing and
assembly of Haemophilus influenzae Rd. Science
269,
496-512.
Fortini, M.E., and Rubin, G.M. (1990). Analysis of cis-acting
requirements of the Rh3 and Rh4 genes reveals a bipartite
organization to rhodopsin promoters in Drosophila melanogaster.
Genes and Development
4, 444-463
Fraser, C.M., Gocayne, J.D., White, O., Adams, M.D., Clayton, R.A.,
Fleishmann, R.D., Bult, C.J., Kerlavage, A.R., Sutton, G., Kelley,
J.M., et al. (1995). The minimal gene complement of Mycoplasma
genitalium. Science
270, 397-403
Friedrich, G., and Soriano, P. (1991). Promoter traps in embryonic
stem cells: a genetic screen to identify and mutate developmental
genes in mice. Genes and Development
5, 1513-1523
García-Bellido, A. (1994). How organisms are put together.
European Review
2, 15-21
Gibson, S., and Somerville, C. (1993). Isolating plant genes.
TIBTECH
11, 306-312
Goodman, C.S. (1996). Mechanisms and molecules that control growth
cone guidance. Annu. Rev. Neurosci.
19, 341-377
Goodman, H.M., Eckers, J.R., and Dean, C. (1995). The genome of
Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA
92,
10831-10835.
Goodrich, J.A., Cutler, G., and Tjian, R. (1996). Contacts in
context: promoter specificity and macromolecular interactions in
transcription. Cell
84, 825-830
Gray, S., Cai, H., Barolo, S., and Levine, M. (1995).
Transcriptional repression in the Drosophila embryo. Phil. Trans R.
Soc. Lond. B
349, 257-262
Greenspan, R.J. (1995). Flies, genes, learning and memory. Neuron
15, 747-750
Gruneberg, H. (1952). The Genetics of the Mouse. (The Hague,
Netherlands: Martinus Nijhoff)
Gu, H., Marth, J.D., Orban, P.C., Mossmann, H., and Rajewsky, K.
(1994). Deletion of a DNA polymerase b Gene segment in T cells
using cell type-specific gene targeting. Science
265,
103-106.
Hartwell, L.H. (1991). Twenty-five years of cell cycle genetics.
Genetics
129, 975-980
Herskowitz, I. (1987). Functional inactivation of genes by dominant
negative mutations. Nature
329, 219-22.
Hodgkin, J., Plasterk, R.H.A., and Waterston, R.H. (1995). The
Nematode
Caenorhabditis elegans and Its Genome. Science
270, 410-414
Holden, C. (1996). Genes confirm Archae's uniqueness. Science
271, 1061
Holland, P.W.H., Garcia-Fernandez, J., Williams, N.A., and Sidow,
A. (1994). Gene duplications and the origins of vertebrate
development. Development 1994 Supplement, 125-133
Howell, A.M., and Rose, A.M. (1990). Essential genes in the hDf6
region of chromosome I in Caenorhabditis elegans. Genetics
126, 583-592
Hunter, T. (1994). 1001 protein kinases redux - towards 2000.
Seminars in Cell Biology
5, 367-376
Hunter, T. (1995). Protein Kinases and phosphatases: the yin and
yang of protein phosphorylation and signaling. Cell
80,
225-236
Jiang, J., and Levine, M. (1993). Binding affinities and
cooperative interactions with bHLH activators delimit threhold
responses to the dorsal gradient morphogen. Cell
72,
741-752
John, B. and Miklos, G.L.G. (1988). The Eukaryote Genome in
Development and Evolution. (London: Allen and Unwin)
Johnsen, R.C., and Baillie, D.L. (1991). Genetic analysis of a
major segment [LGV(left)] of the genome of Caenorhabditis elegans.
Genetics
129, 735-752
Jordan, B.R. (1996). Putting ESTs on the map. Genome Digest
3, 11
Kamalay, J.C., and Goldberg, R.B. (1980). Regulation of structural
gene expression in tobacco. Cell
19, 935-946
Kampis, G., and Csányi, V. (1987). Notes on order and
complexity. J. theor. Biol.
124, 111-121.
Koonin, E.V., Tatusov, R.L., and Rudd, K.E. (1995). Sequence
similarity analysis of Escherichia coli proteins: Functional and
evolutionary implications. Proc. Natl. Acad. Sci. USA
92,
11921-11925.
Lander, E.S., and Schork, N.J. (1994). Genetic dissection of
complex traits. Science
265, 2037-2048.
Levy, L.S., and Manning, J.E. (1981). Messenger RNA sequence
complexity and homology in developmental stages of Drosophila.
Developmental Biology
85, 141-149
Lewin, B. (1994). Genes V. (New York: Oxford University
Press).
Lundin, L.G. (1993). Evolution of the vertebrate genome as
reflected in paralogous chromosomal regions in man and the house
mouse. Genomics
16, 1-19
Maleszka, R. (1993). Electrophoretic analysis of the nuclear and
organellar genomes in the ultra-small alga Cyanidioschyzon merolae.
Current Genetics
24, 548-550
Maroni, G. (1994). The organization of Drosophila genes. DNA
Sequence
4, 347-354
Maroni, G. (1996). The organization of eukaryotic genes.
Evolutionary Biology
29, 1-19).
Meinke, D.W. (1994). Seed development in Arabidopsis thaliana.
(Cold Spring Harbor, New York: Cold Spring Harbor Laboratory
Press). 253-295
Miklos, G.L.G. (1993a). Molecules and Cognition: The latterday
lessons of levels, language, and lac. Journal of Neurobiology
24, 842-890
Miklos, G.L.G. (1993b). Emergence of organizational complexities
during metazoan evolution: perspectives from molecular biology,
palaeonology and neo-Darwinism. Memoirs Australasian Assn.
Palaeontologists
15, 7-41
Miklos, G.L.G. and Campbell, K.S.W. (1994). From protein domains to
extinct phyla: reverse-engineering approaches to the evolution of
biological complexities. In: Early Life on Earth, Nobel Symposium
84, S. Bengtson, ed., Columbia U.P., New York, pp. 501-516
Miklos, G.L.G., Campbell, K.S.W., and Kankel, D.R. (1994). The
rapid emergence of bio-electronic novelty, neuronal architectures,
and organismal performance. In: Flexibility and Constraint in
Behavioral Systems, R.J. Greenspan and C.P. Kyriacou, eds. John
Wiley and Sons Ltd., 269-293
Mullins, M.C., Hammerschmidt, M., Haffter, P., and
Nüsslein-Volhard, C. (1994). Large-scale mutagenesis in the
zebrafish: in search of genes controlling development in a
vertebrate. Current Biology
4, 189-202.
Mulvihill, J.J. (1995). Craniofacial syndromes: no such thing as a
single gene disease. Nature Genetics
9, 101-103
Nusslein-Volhard, C. (1994). Of flies and fishes. Science
266, 572-574
Nusslein-Volhard, C., and Weischaus, E. (1980). Mutations affecting
segment number and polarity in Drosophila. Nature
287,
795-801
Oliver, S.G. (1996). From DNA sequence to biological function.
Nature
379, 597-600
Perrimon, N. Engstrom, L., and Mahowald, A.P. (1989). Zygotic
lethals with specific maternal effect phenotypes in Drosophila
melanogaster. I. Loci on the X Chromosome. Genetics
121,
333-352
Pickett, F.B., and Meeks-Wagner, D.R. (1995). Seeing Double:
appreciating genetic redundancy. The Plant Cell
7,
1347-1356
Ravetch, J.V., Kirsch, I.R., and Leder P. (1980). Evolutionary
approach to the question of immunoglobulin heavy chain switching:
evidence from cloned human and mouse genes. Proc. Natl. Acad. Sci.
77, 6734-6738
Rothman, J.E. (1994). Mechanisms of intracellular protein
transport. Nature
372, 55-63
Shubin,N., Carroll, S., and Tabin, C. (1996). Fossils, Genes and
the Evolution of Limbs
Nature, in press
Sidow, A. and W.K. Thomas. (1994). A molecular evolutionary
framwork for eukaryotic model organisms. Curr. Biol.
4:
596-603.
Spradling, A.C., Stern, D.M., Kiss, I., Roote, J., Laverty, T., and
Rubin, G.M. (1995). Gene disruptions using P transposable elements:
An integral component of the Drosophila genome project.Proc. Natl.
Acad. Sci. USA
92, 10824-10830
Struhl, G., and Basler, K. (1993). Organizing activity of wingless
protein in Drosophila. Cell
72, 527-540
Struhl, K. (1996). Chromatin structure and RNA polymerase II
connection: Implications for transcription. Cell
84,
179-182
Thaker, H.M., and Kankel, D.R. (1992). Mosaic analysis gives an
estimate of the extent of genomic involvement in the visual system
in Drosophila melanogaster. Genetics
131, 883-894
Thomas, J. H. (1993). Thinking about genetic redundancy TIG
9, 395-399
Threadgill, D.W., Dlugosz, A.A., Hansen, L.A., Tennenbaum, T.,
Lichti, U., Yee, D., LaMantia, C., Mourton, T., Herrup, K., Harris,
R.C., Barnard, J.A., Yuspa, S.H., Coffey, R.J., Magnuson, T.
(1995). Targeted disruption of mouse EGF receptor: effect of
genetic background on mutant phenotype. Science
269,
230-238
Tononi, G., Sporns, O., and Edelman, G.M. (1994). A measure for
brain complexity: relating functional segregation and integration
in the nervous system. Proc. Natl. Acad. Sci. USA
91,
5033-5037.
Tononi, G., Sporns, O., and Edelman, G.M. (1996). A complexity
measure for the selective matching of signals by the brain. Proc.
Nat. Acad. Sci. USA
93, 3422-3427
van Heyningen, V. (1994). One gene - four syndromes. Nature
367, 319-320.
Vassalli, A., Matzuk, M.M., Gardner, H.A.R., Lee, K-F., Jaenisch,
R. (1994). Activin/inhibin
bB subunit gene
disruption leads to defects in eyelid development and female
reproduction. Genes and Development
8, 414-427
Wassarman, D.A., Therrien, M., and Rubin, G.M. (1995). The Ras
signaling pathway in Drosophila. Current Opinion in Genetics and
Development
5, 44-50
Waterston, R., and Sulston, J. (1995). The genome of Caenorhabditis
elegans. Proc. Natl. Acad. Sci. USA
92, 10836-10840
Weintraub, H. (1993). The MyoD family and myogenesis: redundancy,
networks, and thresholds. Cell
75, 1241-1244
Xu, T., and Rubin, G.M. (1993). Analysis of genetic mosaics in
developing and adult Drosophila tissues. Development
117,
1223-1237