I am currently working on pedigree reconstruction in a marble trout population (Lipovscek) that was affected by two big, destructive flash floods in 2007 and 2009, focusing in particular on the processes that helped the population bounce back to pre-flood density.
After SNPs discovery, we have successfully genotyped all the samples (~800) collected from 2006 until September 2014. The first step of the analysis is to merge together data coming from samples with different IDs, but that refer in reality to the same individual. My colleague field biologist Alain Crivelli uses Carlin tags (a metal tag that is attached with a piece of wire under the dorsal fin of the trout) only on fish longer than 110 mm; if the fish is shorter than 110 mm, a piece of the adipose fin is collected and the tube with the tissue sample is IDed. Every time a fish is provided with a Carlin tag, a piece of adipose fin is collected and the tube with the tissue sample has the same ID of the Carlin tag. Thus, the genotypes coming from the tissue sample of a fish that has first been sampled when shorter than 110 mm and later sampled when longer than 110 mm (and thus a Carlin tag was provided) should be the same, except for genotyping errors. Another instance of genotype matching occurs when a fish has lost the Carlin tag and it is later sampled and retagged with another Carling tag, since it is not possible to establish the previous ID of the fish (several fish lose their tags between sampling occasions). However, when two genotypes match, it is possible to merge together the demographic histories of the fish with different IDs, but that in reality refer to the same fish. It may happen that four different IDs refer to the same fish. For instance, a fish might have been sampled when aged 0 (i.e. before the first winter, first ID) and shorter than 110 mm, then sampled again the following year as age-1 fish shorter than 110 mm (second ID), then sampled the following year as age-2 longer than 100 mm and tagged (third ID), then sampled the following year as a fish that has already been tagged (when fish lose the tag, the scar is visible) and re-tagged (fourth ID).
Merging together fish histories and genotypes referring to the same fish is important for multiple reasons:
- if the fish is a potential parent, it avoids not being able to assign the offspring to a parent since 2 potential parents (i.e. the same fish) have the same probability of being the true parent;
- it avoids overestimating the production of young;
- it helps for estimating more accurately growth (it helps having longer fish histories) and survival probabilities (tag loss is a tricky problem, since the fish that lost the tag is "apparently" dead).
Parts of the "matching genotypes" analysis can be automated (e.g., given ~160 alleles per fish and allowing up to 2 or 3 mismatches, it is quite easy to write a script that extracts fish IDs with the same genotype), but then the demographic histories of the fish with the same genotype should be checked one by one. This was necessary as some demographic histories of fish with the same genotype made little sense (two different cohorts or IDs referred to fish sampled in the same year, but with vastly different length and weight), and thus I had to find out whether the mistake occurred in the lab or in the field.
I am getting closer to having a final, semi-clean, dataset allowing me to proceed with pedigree reconstruction.