|
COS 323 - Computing for the Physical and Social Sciences
|
Fall 2010
|
Assignment 5: Simulating Population Genetics
Assignment by Aniket Kittur, adapted from Dannie Durand, modified by Ken Steiglitz
Due Friday, Dec. 17
Some introductory genetics: Genes are DNA sequences whose code
determines which proteins are produced, and are grouped
together in chromosomes.
Higher organisms have two copies of each chromosome, one
from the male and one from the female; such organisms are referred to
as diploid. Thus each organism has two, possibly
different, copies of each gene (these copies of a gene are called
alleles).
To mate, diploid cells divide to produce sex cells, sperm or eggs.
Each sex cell is haploid; that is, it contains only one set of chromosomes
from the parent instead of two. If we consider a diploid organism, say
a mouse, with two possible alleles, $a$ and $A$, this usually means
that fifty percent of the sperm from a male $Aa$ mouse will contain
the $a$ allele and the other fifty percent will contain the $A$ allele.
The offspring of two $Aa$ mice would thus have a 25% chance of being $aa$,
a 50% chance of being $Aa$, and a 25% chance of being $AA$. The
combination of genes in a particular mouse is referred to as its
genotype.
If we know the initial distribution of alleles in a population we can
calculate a number of useful probabilities. Assuming we know the frequency
of each allele in the population, we know the probabilities of each allele
are $P(A) = p$ and $P(a) = (1-p)=q$. From this we can calculate the
probability of finding a mouse in the population with a particular
genotype ($AA$, $Aa$, or $aa$), how these probabilities vary over time
and what the steady state probabilities will be. G. H. Hardy and
W. Weinberg independently solved this problem in 1908 under a set of
idealized assumptions:
- The population is infinite.
- All male, female pairs are equally likely to mate.
- Alleles do not spontaneously appear or disappear from the population
(i.e. no migration or mutation.)
- All alleles are equally fit.
Under these conditions, the following steady-state genotype frequencies are
established:
$$
\begin{array}{ccc}
P(AA) & P(Aa) & P(aa) \\
p^2 & 2pq & q^2
\end{array}
$$
Your assignment:
The first two conditions never hold; the remaining
two conditions hold some of the time, at best. We will examine these
assumptions of the model and see if and how they affect the predictions
and results, by conducting a series of simulations.
Implement the following in Java, C, or C++:
- Simulate a finite population of mice with two alleles at
a single locus (e.g.
$A$ and $a$). Assume that mice mate randomly and that the population size
is fixed; because of chance not all the mice will necessarily mate and some may
mate more than once. Also assume that the generations are synchronized:
all the mice in generation $i$ are offspring of mice in generation $i-1$.
It is not necessary to get fancy with implementation (e.g. don't use
linked lists when arrays will suffice).
Track how the frequency of each genotype changes over time.
- How do the allele frequencies vary over time? Is an equilibrium
condition
reached? How do you recognize when (if?) equilibrium is reached and how
quickly do you reach this condition, if it occurs? Does it differ from the
steady state allele frequencies predicted by Hardy and Weinberg?
- How does population size affect the answers to these questions? (For
concreteness, try populations of 50 and 5000.)
How about the
allele frequencies in the initial population? (Try A:a ratios of 100%/0%,
90%/10%, and 50%/50%)
How many times should you run
each simulation (i.e. each set of conditions) to have confidence in your
results?
- Relax the assumption that all alleles are equally fit. Choose
the $a$ allele to be lethal recessive; that is,
$aa$ mice die at birth but $Aa$ and $AA$ mice
don't. How does this change the equilibrium? Can any starting conditions
change the final equilibrium? In some inherited lethal recessive
diseases, such as Huntington's chorea, alleles continue to propagate
in populations in accordance with the Hardy-Weinberg predicted frequencies.
How is this possible?
Extra Credit:
- Now we will model a real-life example of the forces
working in evolution, using the
t-haplotype condition found in mice. Mice which have two copies
of a mutant t-haplotype gene (call it $t$) die at birth.
Mice with
one normal gene and one mutant gene ($+t$) seem almost exactly
the same as mice with two normal genes ($++$), but have one major
difference: a +/t male mouse passes t to more than 90% of his offspring,
and thus passes the normal allele to less than 10%.
The t-haplotype has no effect on +/t females. Can a stable equilibrium
be reached with these conditions?
- Current studies suggest that there might be another force involved in the
t-haplotype condition. Some researchers believe that there is
sexual selection at work as well: females slightly prefer males who have
two normal genes over those who have one normal and one mutant gene
(of course, males with two mutant genes never get to mate).
This difference is small but detectable. This type of selection
is evolutionarily plausible, since it would lead to a greater number
of viable offspring. Modify your simulator for the t-haplotype to determine
the effects of sexual selection on t-haplotype frequencies.
Submitting
This assignment is due Friday, December 17, 2010 at 11:59 PM.
Please see the general
notes on submitting your assignments, as well as the
late policy and the
collaboration policy.
Please submit:
- One or more well-commented source files containing all the code
written the assignment. Please include a instructions for running the
code.
- Answers to the questions posed above
- Answers to the questions in the
README.txt
The Dropbox link to submit your assignment is
here.
Last update
7-Dec-2010 13:41:14
smr at princeton edu