line transect and mark-recapture methods:
Fig. 3.2. Diagram of layout for line transect
sampling scheme. Transect lines are laid out in some systematic fashion
across an area large enough to detect a reasonable sample (>= 40, preferably
more) of "objects" (animals, plants). Observer walks the transect lines
noting distance and angle to objects detected.
Fig. 3.3. Diagram of detection events. Objects detected have a perpendicular line connecting them to the transect line. Objects without lines went undetected. Note that objects on the line are always detected, while those at greater distances have lower probability of detection. In this example n (number of objects detected) = 13, including the two objects detected on the transect line itself.
Data that must be collected
for line transect sampling:
Number of objects detected, n.
Number of individuals within each object (e.g.,
birds per covey)
Perpendicular distance, xi from
transect line to object i (i = 1 to n).
It is usually best also to measure angle and distance from detection point
to object (since many objects will be detected before they are directly
perpendicular to the transect line).
Maximum distance of detection will be w*
(the one-way "width" of the transect detection area). Total "detection"
area will then be 2Lw*.
Sample size (n) of number of objects detected should be no less than 40.
Our task is to come up with a good estimator
that will allow us to "fill in" the gap between the density (Dw)
of objects we detected (n/2Lw*) and a good estimate of the
actual density. That "fill-in estimator" will be a function we will call
"f-hat(0)", where the hat stands for "estimator".
Our end result will be a function to estimate
density given as
In order to get this "fill-in estimator" we need
to make (and meet) some assumptions:
2)Objects are fixed at the initial sighting position. That is, they do not avoid the observer before the observer detects them, and they don�t move away in such a way that the observer risks counting them twice.
3)Distances and angles are measured without error
4) Sightings are independent events. If we see one animal it does not affect (positively or negatively) the probability that we will see other animals.
If we were doing a complete count in a strip transect
of width w, this should be number counted/area covered
where the "hat" over the D means we are
estimating D,
the small n means it is our count (not the full population total
as before), L is the total length of our transects, and w
is the one-sided width of the transect.
Develop a "detection function", which we will call
g(x),
where x is the perpendicular detection distance to objects we detect.
The function g(x) tells us the probability that we will detect
an object, given that it lies a distance x from the centerline of
our transect. What we will do is integrate over the distance from x
= 0 to x = °. This gives us, essentially, the sum of the detection
probabilities.
Now if we divide the original function g(x)
by the integral, a, we will have a function that sums to one (we
have "normalized" it). This function is formally called a probability
density function or pdf.
We have simply rescaled g(x) by
the integral a, to give a function, f(x), that sums
to one [f(x) has exactly the same shape as g(x)!!].
It tells us what the proportion of the total probability is of detecting
an object at a given distance, x occurs at that distance. Say x
= 3.5 meters and the original g(x=3.5) was 0.43. How much
of the total probability (a number that might be much greater than one)
lies at x = 3.5? Again, though the values are different, this new
f(x)
will have exactly the same shape as our original function
g(x).
So, from where we stand on the centerline, we are
asking: what proportion of all the objects out there are we seeing over
the range of distances? At the centerline we assume that we actually see
every
object that is there. Given that we see every object on the line (seems
reasonable) we can say.
substitute Eqn 3.6 into Eqn 3.5 to get
Now let�s revisit Eqn 3.3.
For a finite w, the probability of detecting an object within the
(half)-strip will be
That is, the total probability, a, divided
by the half-width of the strip, w. If a total of Nw objects
actually occur, the expectation of number of objects we will see will be
where E(n) stands for the "expectation
of n". This is simply the probability of detecting an object
times the total number of objects there. We will know n, the number
of objects we saw, so we can develop an estimator of Nw
as
where we have used the right-hand-side of Eqn
3.8 as a substitute for Pw in the last form of the equation.
Now substitute nw/a for the N of Eqn 3.2
to get:
NOTE that
the w�s cancel out. This means that we can use an "infinite" w
(as far as we can see, or the furthest object detected, called distance
w*).
One more step and we will have our handle on estimating density. Substitute
the left hand of Eqn 3.7 for the 1/a of Eqn 3.11 to get:
Great! If we can just get some estimator f-hat(0),
we can estimate density from our line transect data.
I began by reviewing the derivation involved in going from Eqn 3.1 back to itself as Eqn 3.12.
Animal response behavior and our ability to detect them may change during the survey (because of changing wind levels and other factors.)
Individual animals may respond differently.
Robust to pooling (variation in detection probability factors - some unknown)
Shoulder near zero -- near the centerline we expect to see almost all the animals (the "shoulder" but then the detection probability drops off from some moderate distance.
Efficient estimator (low variance, low bias)
Deriving the best estimator goes beyond the scope
of this course. Suffice it to say that sharp minds have shown that a Fourier
series estimator is robust (works under a variety of conditions)
and has reasonable precision and no bias (at least in the limit).
It boils down to an expansion several terms long
that is based on a cosine function of the observed detection distances.
where the w* is just the furthest distance
we actually saw an object at (after throwing out any outliers), the m
is the number of expansion terms (indexed across k). The quantity
a-hat-sub-k is estimated as follows:
So... let�s say we wanted to estimate the term
for k = 1. Start inside the rounded brackets toward the right hand
end. We have 1 (= k) times p,
times xi (our first observed distance; i = 1 to n,
the total number of objects seen), all over our maximum distance, w*;
take the cosine of that. Now comes our summation term. Work through our
full set of distances (i = 1 to
n) and add all those n
terms together. Multiply that sum of n terms by 2/nw*. That�s
a-hat-sub-k.
Even better, a computer program can do it all for us.
Note for calculations from "heaped" data (like the homework problem and LineTransect.XL). The above formula works best when each of the sighting distances is recorded separately (the actual measurement) rather than when working with midpoint distances for lumped data.
For lumped data we can rewrite Eqn 3.14 as follows so that the calculation method is more explicit:
(Eqn 3.14a)
here we see explicitly the term Ntot for ALL the objects detected versus the ni (number of objects detected in distance category i, where i goes from 1 to Dists, the number of "lumps" or distance categories � 5 in the example in LineTransects.XL and 6 in your homework problems). Rather than go one by one through all the objects we do the Cosine operation on each of the Midpointi distances TIMES the number of objects, ni , in that lump (at that midpoint distance).
CALCULATORS: RADIANS vs. DEGREES.
Excel and some calculators assume the input is in radians, as do the formulas
above. However, your calculator may use degrees instead. In
that case, you would want to use the term:
I then said that if we could just estimate f(0)
we could turn that back into an estimate of density, given the observed
number of animals, n., and our transect length, L. The estimator
function we will use is based on a Fourier series:
Here is a summary of the real-world issues:
Here are the criteria that Burnham and Anderson
set for a good model of f(x).
That ROBUST estimator is the Fourier series function
given in Eqns 3.13 and 3.15.
We�ve now covered some of the the theory of line transect sampling and obtained an idea of how we can turn field data in actually calculating a density estimate. We�ll now turn to a brief consideration of possible layouts for the transects. You may remember that I mentioned that one will usually want multiple transect lines that sum to L. Mathematically
Here are two possible schemes. Which is the more
practical? Which is the "ideal"?
We could use any of a series of different possible layouts. These could include connected lines with "kinks" that change angle, systematic designs that cover the study area in a regular design, or subdivided plots that partition the study area.
Somehow, though, we should try hard to incorporate some randomization. This could involve a randomly selected starting point on the perimeter, a randomly chosen angle for the parallel transect lines, or some other way of avoiding systematic bias (again, at all costs we want to avoid some obvious source of bias such as paralleling ridge tops.
SUMMARY of considerations in designing the layout:
Delimit the population (figure out the logical and logistically feasible boundaries)
Devise a sampling scheme that truly samples the population of interest
Conduct a survey the gathers the required data
Goal of the design: Sample the population of interest
in a way that yields an adequate representation of reality
Must have adequate replication (adequate
sample size)
Sample must have adequate spatial dispersion (and possibly temporal dispersion)
Must avoid correlation between transect orientation and particular features of the landscape (such as ridges or valleys). It�s acceptable to have one line (of many) that (by chance) aligns with the environmental feature (road, river), just not that they ALL do so -- that may produce bias.
See the Excel
spreadsheet (link below) for a fully worked example of applying the Fourier
series estimator and the stopping rule to the set of data given in Fig.
3.5.
Here is a website for the current documentation
on line transect sampling using the program DISTANCE
End of material on line transect
sampling
Our next topic will be
M = initial sample that we mark
n = number recaptured
m = number in recaps that are already marked.
Last time we began on mark-recapture schemes. Remember to look back at where it lies on our dichotomous key. Unlike line transect sampling, which is largely restricted to obtaining a density estimate, mark-recapture can actually do several things for us. It can help us estimate one or more of the following:
Those that lost both bands
Those that were never marked.
To use mark-recapture for estimating density and the other Type B uses, we are interested in both the caught and the uncaught. In that case we have one overriding assumption that we should meet
Individuals may learn to seek or avoid traps or marking stations
Spatial dispersion may affect opportunity for capture (can�t catch an animal if the trap is not in its home range)
The last assumption can be addressed by proper randomization. Even with randomization, sexes may differ in catchability (home range size) and may need to be analyzed separately.
The second assumption can be tested by analysis of the pattern of captures and recaptures.
M = initial sample that we mark
n = number recaptured
m = number of recaptured individuals (n) that are already marked.
Last time I gave formulae (Eqns 3.22 to 3.24) for reduced bias estimation of the population size, its variance and a confidence interval. We can use Eqn 3.24/3.20 to calculate a confidence interval on our estimate. The attached Excel spreadsheet provides a concrete example of all the necessary calculations.
Go to Mark-recapture Excel spreadsheetSay we caught and marked 301 animals on a first round and captured 236 on a second round. If 146 of the animals were previously marked our would be 485, the variance would be 604.3 and the 95% CI would range from 437 to 534 (= ± 10 % of ).
[That CI corresponds to an alpha level of 0.05 and we use a Student�s t-distribution with the requisite number of degrees of freedom to calculate the interval].
The estimator above is still somewhat biased. We can come up with a completely unbiased estimator by deciding beforehand on a number of recaptures (m). We continue trapping until we catch the required number of already-marked individuals. In that case our estimate is:
m { 55, 80, 90}
= 491, Var () = 612.9, 95% CI = 442 to 540 (= ± 9.96%).
Notice that in the two previous examples we caught a fairly large percentage of the total population (in fact 40-60%). It may not always be feasible to catch that high a proportion of the population. In that case we can use the slightly more complex multi-capture method. We can use methods developed by Schnabel in the 1930's and modified by Schumacher and Eschmeyer in the 1940's. The applications were developed largely for estimating fish populations. (Method 4 of our dichotomous key)
References for the interested:
Jolly, G.M. 1965. Explicit estimates from capture-recapture data with both death and immigration -- stochastic model. Biometrika 52: 225-247.
Schnabel, Z.E. 1938. The estimation of total fish in a lake. Am. Math. Mon. 348-352.
Schumacher, F.X., and R.W. Eschmeyer. 1943. The estimation of fish populations in lakes and ponds. J. Tenn. Acad. Sci. 18: 228-249.
Seber, G.A.F.
1965. A note on the multiple capture-recapture census. Biometrika 52: 249-259.
http://canuck.dnr.cornell.edu/misc/cmr/
http://www.cnr.colostate.edu/~gwhite/mark/mark.htm
We will start next time with an overview of patterns of mortality.
END of line transect and mark-recapture
§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§§