Worked and annotated sample of question for take-home exam
(to be handed out 21-Apr-06, due Fr. 28-Apr-06)
Output from Mathematica program FstCalc.nb by David B. McDonald, Dept. Zool., U. of Wyoming, Laramie, WY 82071 dbmcd@uwyo.edu.
My "comments" are in red.
The input matrix of allele-by-allele genotypes is:
A1 A2 A3
55 30 100
0 45 90
0 0 48
35
85 160
0 40 50
0 0 80
120 45 25
0 80 35
0 0 75
40 130 70
0 60 120
0 0 75
First, calculate the population sizes by summing the elements/cells in each subpopulation (remember, the number of individuals is half the number of alleles) :
e.g., N1 = 55 + 30 + 100 + 45 + 90 + 48 = 368Repeat for the three remaining subpopulations to get the values below:
Population sizes are:
N1 N2 N3 N4Next, allele frequencies (pi, for i = 1 to k, where k is the total number of possible alleles; here k is three)368 450 380 495
E.g., p1 = freq(A1) = (2*55 + 30 + 100) / 2*368 = 0.326
Twice the number of A1 homozygotes + one times the number of each of the types of hets (which gives us the total number of A1 alleles) divided by twice the population size (2*N1 is the total number of alleles in subpop. 1).
p2 = freq(A2) = (30 + 2*45 + 90) / 2*N1 = 0.285
Repeat for the remaining alleles and subpopulations to get the values below:
p1 p2 p3
Subpop. 1 0.326 0.285 0.389
Subpop. 2 0.350 0.239 0.411
Subpop. 3 0.408 0.316 0.276
Subpop. 4 0.283 0.374 0.343
Next, the genotypic counts. For homozygotes this will be the gene frequency squared times the population size, for heterozygotes that will be twice the gene frequencies squared times the population size.
ROUNDING: I don’t mind rounding to integers here, because we aren’t going to use these numbers in subsequent calculations. Note, though, that I am keeping my allele frequencies to three decimal places for the calculations where I use them. If you round re-used numbers too much, too early, you will get off course.
Subscript note: I am not subscripting for subpopulation, just assuming we can keep track.
s=1, A1A1 = p12*N1 = 0.3262*368 = 39.11 (rounds to 39).
A1A2 = 2*p1*p2*N1 = 2*0.326*0.285*368 = 68.38 (rounds to 68).
One more example, picked haphazardly: (third subpopulation, A2A3 heterozygote expected count)
s=3, A2A3 = 2*p2*p3*N3 = 2*0.316*0.276*380 = 66.28 rounds to 66
Repeat for all the 24 genotypes (six per pop.)
Expected genotypic counts for the populations are:
A1 A2 A3
A1 39
68
93
A2 0
30 82
A3
0
0 56
A1 55
75
130
A2 0
26 88
A3
0
0 76
A1 63
98
86
A2 0
38 66
A3
0
0 29
A1 40
105
96
A2 0
69 127
A3
0
0 58
Subtract the expected from the observed to get excess or deficiency in the Observed:
s = 1, A1A1 = 55 - 39 = 16
A1A2 = 30 - 68 = -38
Excesses (+ numbers) or deficiencies (- numbers) of observed relative to Hardy-Weinberg genotypic counts:
A1 A2 A3
16 -38 7
0 15 8
0 0 -8
-20 10 30
0 14 -38
0 0 4
57 -53 -61
0 42 -31
0 0 46
0 25 -26
0 -9 -7
0 0 17
For observed - expected homozygotes in subpopulation 1 we have + 16, +15, and -8. Their sum is +23, so we observe an excess of homozygotes (possible evidence for inbreeding). Note that the homozygote excess of 23 is balanced by a heterozygote deficiency of 23 (for the first matrix above, total of -38, 7, and 8 above the diagonal is -23).
Repeat this logic to get the following set of values:
Observed homozygote counts relative to HWE expectation (- means homozygote deficiency/heterozygote excess, outbreeding or Wahlund effect):
Pop. 1 Pop. 2 Pop. 3 Pop. 4
23 -2 145 8
Observed heterozygosities. What proportion of each subpopulation are heterozygotes? The sum of the three kinds of heterozygotes from our original input matrix, all divided by the subpopulation size.
Subpop. 1 = (30 A1A2 + 90 A1A3 + 100 A2A3) / 368 = 220/368 = 0.598
Repeat for remaining subpops.
Observed heterozygosities are:
Pop. 1 Pop. 2 Pop. 3 Pop. 4
0.598 0.656 0.276 0.646
Expected heterozygosities. We use the 1 minus sum of the allele frequencies squared approach to calculate this quantity for each subpopulation.
E.g., subpop. 1 is 1 - SUM[0.3262 + 0.2852 + 0.3892] = 1-[0.1063 + 0.0812 + 0.1513] = 1-0.3388 = 0.6612
Repeat for remaining subpops.
Expected heterozygosities (Gene diversities) are:
Pop. 1 Pop. 2 Pop. 3 Pop. 4
0.661 0.651 0.658 0.662
Things are starting to get easier (or at least they will involve fewer terms). F = (Hexp-Hobs) / Hexp
Subpop. 1 F = (0.661 - 0.598) / 0.661 = 0.096
Local inbreeding coefficients (F) are:
Pop. 1 Pop. 2 Pop. 3 Pop. 4
0.096 -0.006 0.580 0.024
Next, we calculate the pi-bar
(across all the subpopulations).
We need to weight by population size.
p1-bar = (0.326*368 + 0.350*450 + 0.408*380 + 0.283*495) / (368 + 450 + 380 + 495) = 0.338
Repeat for the other two alleles to get:
Global allele frequencies (pi-bar) are:
p1-bar p2-bar p3-bar
0.338 0.306 0.356
HI is the weighted average of the observed heterozygosities.
(0.598*368 + 0.656*450 + 0.276*380 + 0.646*495) / (368 + 450 + 380 + 495) = 0.555
HI is: 0.555
HS is the weighted average of the expected heterozygosities across subpopulations.
(0.661*368 + 0.651*450 + 0.658*380 + 0.662*495) / (368 + 450 + 380 + 495) = 0.658
HS is: 0.658
HT is the expected heterozygosity based on the global gene frequencies.
1 - SUM[pi-bar2] = 1 - [0.3382 + 0.3062 + 0.3562] = 1 - 0.3346 = 0.665
HT is: 0.665
FIS = (HS - HI) / HS = (0.658 - 0.555) / 0.658 = 0.103/0.658 = 0.156
FIS is: 0.156
FST = (HT - HS) / HT = (0.665 - 0.658) / 0.658 = 0.007/0.658 = 0.011
FST is: 0.011
FIT is: 0.166
Conclusions:
The allele frequencies differ among the four subpopulations. Subpops. 1 and 2 have the A3 allele as the most common, subpop. 3 has A1 as the most common allele, and subpop. 4 has the A2 allele most frequent. Evidence
Only subpop. 2 has higher observed heterozygosity than expected. Evidence
Subpops. 2 and 4 are neither inbred nor outbred (F near zero), while Pop. 1 is mildly inbred and Pop. 3 is highly inbred. Evidence
Overall the population shows moderate inbreeding (moderately positive FIS). This is largely driven by the high local F of Pop. 3.
The populations are only slightly differentiated genetically (fairly low value of FST). Generally, when FST exceeds 0.15 or so we conclude that differentiation is substantial. For a more rigorous answer we can compute confidence intervals on FST and see whether they overlap zero -- if they do we conclude that we have no real evidence for any structure).