Lecture 36 (24-Apr-06)
Return to Main Index page Go back to notes for Lecture 35, 21-Apr
Go to worked example for take-home exam
Population Genetics (continued)
Go to Glossary of genetic terms
Go to worked FST calculation web page.
Last time I introduced the topic of hierarchical F-statistics. We build those up by calculating three kinds of heterozygosities, HI, HS, and HT. Let's look at the general formulae for these heterozygosities, and how they contribute to the calculation of hierarchical F-statistics, and then I will work through an example.
We begin with HI, the observed heterozygosity in individuals, calculated as a weighted average across the subpopulations.
where the subscript s refers to the sth of n subpopulations. That is, first we multiply each subpopulation's observed heterozygosity by its population size. Then we sum those weighted heterozygosities. Finally, we divide by the sum of all the subpopulation sizes. See an example of a specific case example calculation in the FST example page.Eqn 36.1
Next we calculate HS as the global weighted average of the expected heterozygosities across all the subpopulations:
The formula differs from that of Eqn 36.1 only because we are now using Hexp (calculated from each subpopulation's gene frequencies by Eqn 37.1) instead of Hobs.Eqn 36.2
Finally we use the global mean gene frequencies to calculate HT, the global expected heterozygosity. This will not give us the same answer as the weighted average of the separate subpopulation values for expected heterozygosity. The formula is:
The only difference between this formula and that of Eqn 37.1 is that here we specify the global mean (pi-bar) for the gene frequencies over all the subpopulations, rather than the subpopulation-by-subpopulation values.Eqn 36.3
With HI, HS, and HT in hand we are ready to calculate our hierarchical F-statistics. First, FIS:
You will often see this written in the mathematically equivalent form:Eqn 36.4
Eqn 36.5
This first "global" F-statistic is the ratio of the difference between the global-average expected and observed heterozygosities in subpopulations (HS - HI) to the global-average expected heterozygosity (HS). It gives us a view of the average inbreeding over the entire set of subpopulations (that is, it very closely resembles the local F or Fs of Step 5 in the FST example page.
Next we calculate the F-statistic that tells us the most about the degree of genetic difference among the subpopulations -- FST. It is calculated as
Here we assess the difference between the expected heterozygosities in the subpopulations and the expected heterozygosity based on the global gene frequencies.Eqn 36.6
Let's consider two extreme examples that will illustrate how FST can vary between zero and one. Consider a system with three alleles where we have three subpopulations.
Case 1: Maximal FST. If each subpopulation is fixed for a different allele, then HS will be zero (if we have only one allele, we don't expect any heterozygotes). In that case, Eqn 33.6 simplifies to HT / HT = 1.
Case 2: Minimal FST. If the gene frequencies are the same in each of the subpopulations, they will all have the same HS, which will be the same as HT. In that case, the numerator of Eqn 33.6 goes to zero and FST is zero. Why can't FST be negative? You cannot arrange a set of populations to have HS > HT.
The final (and least often used) global F-statistic is FIT, given by the formula:
FIT is relatively little used for two reasons. First, it is often quite similar to FIS, thereby providing little new information. If and when it does differ from FIS it may even be somewhat misleading. It is possible to construct scenarios in which FIT produces an "overall" picture that differs from the picture in any particular subpopulation. The reasonable context for an individual (observed heterozygosity) is against its own subpopulation. Juxtaposing individual against total population is less intuitively meaningful.Eqn 36.7
Go to worked FST calculation web page.
§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§