Category Archives: Uncategorized

Manuscript submitted

I recently submitted a new manuscript on vital rates and life histories in marble trout. Dense paper, lots of models, lots of results. Currently under review. Here below are the Title and Abstract.

Title

Within and among-population variation in vital rates and population dynamics in a variable environment --- Vincenzi, Mangel, Jesensek, Garza, Crivelli.

Abstract

Understanding the causes of within- and among-population differences in vital rates, life histories, and population dynamics is a central topic in ecology. In order to understand how within- and among-population variation emerge, we need long-term studies that include episodic events and contrasting environmental conditions, tag-recapture data for the estimation and characterization of individual and shared variation, and statistical models that can tease apart population-, shared-, and individual contribution to the observed variation.

We used long-term tag-recapture data and novel statistical and modeling techniques to investigate and estimate within- and among-population differences in vital rates, life histories and population dynamics of marble trout Salmo marmoratus, a narrow endemic freshwater salmonid. Only ten populations of pure marble trout still persist in Western Slovenian headwaters. Marble trout populations are also threatened by floods and landslides, which have already caused the extinction of two populations in recent years.

In particular, we estimated and determined causes of variation and trade-offs within- and among populations in growth, survival, and recruitment in response to variation in water temperature, density, sex, early conditions, and extreme events.

In all ten populations, we found that the effects of population density on traits were mostly limited to the early stages of life and that individual growth trajectories were established early in life. We found no clear effects of water temperature on survival and recruitment. Population density was variable over time in all populations, with flash floods and debris flows causing massive mortalities and threatening population persistence. Apart from flood events, variation in population density within streams was largely determined by variation in recruitment, with survival of older fish being relatively constant over time within populations, but substantially different among populations. A fast- to slow-continuum of life histories in marble trout populations seemed to emerge, with slow growth associated with higher survival at the population level, possibly determined by food conditions and age at maturity.

Our work provides unprecedented insight into the causes of variation in vital rates, life histories, and population dynamics in an endemic species that is teetering on the edge of extinction.

Some reflections on my science

Since I had to change the links to my publications due to some obscure passage of pdfs from one folder to another, I had a chance to have a look at all the papers I published so far. 40 total, 28 as first author, 4 under review (2 as first author, other 2 under review). Surprisingly (or not, upon further reflection) I barely remember the content of most of my papers and I have little idea on how they were originally thought, what was the development, what was the contribution of co-authors, why I used certain methods and not others. I saw big tables I did not remember I had prepared. I saw a Figure in which fish are one year older than what they should be (I also thought I sent the correct Figure during the revision process, apparently not). I read long Introductions and longer Discussions (I write a lot, no doubt). I remember long struggles to get papers accepted even if I currently do not remember the major contentious points.

Just to be clear, I do not have any memory disorder. However, I have published in many different areas, in part because I prefer to zig-zag than follow a straight-ish line, in part because I have been supported by soft money throughout all my career and I haven't been too rigid in my research/grant choices. I also tried to use novel methods (for me or in general), since I like to challenge myself and expand my research tools. I tend to go very deep and very fast in my research and this - like cramming for a test - is not conducive to long-term retention of information.

This "discovery" made me think about my research trajectory, what kind of tools and skills I have acquired, and whether production of science is like the production of eggs in fish: you give your contribution and you let it find its way.

Akin's Laws of Spacecraft Design

A great read with wider application http://spacecraft.ssl.umd.edu/akins_laws.html

Some of my favorites:

1. Engineering is done with numbers. Analysis without numbers is only an opinion.

4. Your best design efforts will inevitably wind up being useless in the final design. Learn to live with the disappointment.

6. (Mar's Law) Everything is linear if plotted log-log with a fat magic marker.

9. Not having all the information you need is never a satisfactory excuse for not starting the analysis.

16. The previous people who did a similar analysis did not have a direct pipeline to the wisdom of the ages. There is therefore no reason to believe their analysis over yours. There is especially no reason to present their analysis as yours.

17. The fact that an analysis appears in print has no relationship to the likelihood of its being correct.

19. The odds are greatly against you being immensely smarter than everyone else in the field. If your analysis says your terminal velocity is twice the speed of light, you may have invented warp drive, but the chances are a lot better that you've screwed up.

21. (Larrabee's Law) Half of everything you hear in a classroom is crap. Education is figuring out which half is which.

24. It's called a "Work Breakdown Structure" because the Work remaining will grow until you have a Breakdown, unless you enforce some Structure on it.

29. (von Tiesenhausen's Law of Program Management) To get an accurate estimate of final program requirements, multiply the initial time estimates by pi, and slide the decimal point on the cost estimates one place to the right.

32. (Atkin's Law of Demonstrations) When the hardware is working perfectly, the really important visitors don't show up.

34. (Roosevelt's Law of Task Planning) Do what you can, where you are, with what you have.

37. (Henshaw's Law) One key to success in a mission is establishing clear lines of blame.

41. Space is a completely unforgiving environment. If you screw up the engineering, somebody dies (and there's no partial credit because most of the analysis was right...)

Links to R-related stuff

Git and GitHub http://r-pkgs.had.co.nz/git.html

Predicting Baseball Game Attendance with R https://r-dir.com/blog/2015/02/predicting-baseball-game-attendance-with-r.html

Data wrangling process as seen by Hadley Wickham http://blog.ouseful.info/2015/02/11/code-as-magic-and-the-vernacular-of-data-wrangling-verbs/

Bayesian Rugby http://springcoil.github.io/Bayesian_Model.html

Practical Data Science by Sebastian Raschka http://www.slideshare.net/SebastianRaschka/nextgen-talk-022015

10 things statistics taught us about big data analysis http://www.kdnuggets.com/2015/02/10-things-statistics-big-data-analysis.html

Writing Scientific Papers Using Markdown https://danieljhocking.wordpress.com/2014/12/09/writing-scientific-papers-using-markdown/

Data Processing with dplyr & tidyr http://rpubs.com/bradleyboehmke/data_wrangling

Hierarchical Bayesian Survival Analysis for CVD Risk Prediction in Diabetic Individuals http://becs.aalto.fi/en/research/bayes/diabcvd/

RStudio & git/github Demonstration (Video) https://vimeo.com/119403806

Using generalized linear models to compare group means in R http://stackoverflow.com/questions/28614798/using-generalized-linear-models-to-compare-group-means-in-r

R graph catalog http://shinyapps.stat.ubc.ca/r-graph-catalog/

Paper submitted

With Scott Hatch, Thomas Merkling, Sasha Kitaysky

Title: Food supplementation early in life delays viability selection in a long‑lived animal

Abstract

Supplementation of food to wild animals is extensively applied as a conservation tool to increase local production of young. However, the effects of food supplementation on the subsequent recruitment as breeders of long-lived migratory animals into natal populations and their lifetime reproductive success are largely unknown. We examine how experimental food supplementation affects (a) recruitment as breeders of kittiwakes Rissa tridactyla born in a colony on Middleton Island (Alaska) between 1996 and 2006 (n = 1629) that bred in the same colony through 2013 (n = 235); and (b) breeding success of individuals that have completed their life cycle at the colony (n = 56). Birds were raised in nests that were either supplemented with food (Fed) or unsupplemented (Unfed). Fledging success was higher in Fed compared to Unfed nests. After accounting for hatching rank, growth, and oceanic conditions at fledging, Fed fledglings had a lower probability of recruiting as breeders in the Middleton colony than Unfed birds, but the per-nest contribution of breeders was still significantly higher for Fed nests. Lifetime reproductive success of a subset of breeders that completed their life cycle was not affected by the food supplementation during development. Our results cast light on the interaction between intrinsic quality and early food conditions in determining fitness of long-lived animals.

Keywords: Individual quality; supplemental feeding; long-lived animals; viability selection.

Paper submitted to Axios Review

A few months ago, my colleagues and I submitted to Fish and Fisheries a manuscript on the trade-offs between complexity and accuracy in random-effects models of body growth.

The paper was rejected mostly on the basis of lack of fit (i.e. the topic was only marginally interesting for the journal's readership). One Reviewer found the paper interesting and valuable, and recommended the submission of the manuscript to a more general journal, such as Ecology or Oikos. The other Reviewer commented on some unclear technical aspects of the work (the review was quite detailed and the recommendations/suggestions/critiques were valuable, thanks anonymous Reviewer).

I believe the paper should be of interest for a large audience of biologists, ecologists, computational scientists/statisticians. The main motivation of the paper is quite simple and very general: "We often face trade-offs between model complexity, biological interpretability of parameters, and goodness of fit." Then, with reference to models of growth: "Depending on formulation, parameters of some growth models may or may not be biologically interpretable. For instance the parameters of the widely used von Bertalanffy growth function (von Bertalanffy 1957) to model growth of fish may be considered either curve fitting parameters with no biological interpretation (i.e. providing phenomenological description of growth) or parameters that describe how anabolic and catabolic processes govern the growth of the organism (i.e. mechanistic description); see Mangel (2006). The classic von Bertalanffy growth function has 3 parameters: asymptotic size, growth coefficient, and theoretical age at which size is equal to 0. In the original mechanistic formulation of von Bertalanffy, asymptotic size results from the relationship between environmental conditions and behavioral traits and the growth coefficient is closely related to metabolic rates and behavioral traits (i.e. the same physiological processes affects both growth and asymptotic size). However, in the literature asymptotic size and growth rate are commonly treated as independent parameters with no connection to physiological functions, thus offering just a phenomenological description of growth."

However, I understand Editors may not fully grasp the relevance of the paper for their journal. For instance, the manuscript was previously submitted to another journal, but the Editor wrote: "I feel that the work is too specialised, as relatively few researcher work on growth curves". I might disagree on the claim that few researchers work on growth curves. I am sure that lots of scientists use growth models in their work, but I might agree on the number of people working on the development of growth models or methods for the estimation of growth model parameters (it is also quite hard).

My colleagues and I (my idea, my colleagues agreed) decided to submit the manuscript to Axios Review, a new service that should help authors publish their papers in higher profile journals. This is how it works: "Axios Review solves this problem by putting papers through rigorous external peer review and then referring them to the appropriate journal. When a journal asks the authors to revise and submit, the journal has effectively said that: i) the paper is within their scope, ii) that it is not fatally flawed, and iii) that it could be published in their journal. The Axios Review process effectively eliminates rejections on the grounds of novelty and significantly reduces the chances of rejection on quality. It’s similar to getting a ‘reject, encourage resubmission’ decision from the journal itself; for comparison, about 75% of resubmissions to top tier evolution journals get accepted. Authors submitting to Axios Review can have the reviewers comment on the suitability of their paper for any journal they choose, allowing them to aim for a high profile journal without the effort of formally submitting."

I submitted the manuscript to Axios Review a couple of days ago (target journals following an order that may or may not be the one I chose: Oikos, Ecology, Journal of Theoretical Biology, Ecological Applications). So far, communication with the Editorial staff has been excellent.

I did not upload the manuscript on arxiv or bioRxiv (I don't know where the manuscript will end up and thus which policy related to uploading of pre-print should I follow), please send an email if you'd like to read a pre-print.

R documents/packages

Some R documents/packages I came across recently:

Some extra geoms, scales, and themes for ggplot -- https://github.com/jrnold/ggthemes

A document on R Markdown -- http://arxiv.org/pdf/1501.01613v1.pdf

Intro to dplyr -- http://seananderson.ca/2014/09/13/dplyr-intro.html

How to re-create in R a Tufte's weather map - https://rpubs.com/bradleyboehmke/weather_graphic

19th Dec 2014 - Update on research

I am currently working on pedigree reconstruction in a marble trout population (Lipovscek) that was affected by two big, destructive flash floods in 2007 and 2009, focusing in particular on the processes that helped the population bounce back to pre-flood density.

After SNPs discovery, we have successfully genotyped all the samples (~800) collected from 2006 until September 2014. The first step of the analysis is to merge together data coming from samples with different IDs, but that refer in reality to the same individual. My colleague field biologist Alain Crivelli uses Carlin tags (a metal tag that is attached with a piece of wire under the dorsal fin of the trout) only on fish longer than 110 mm; if the fish is shorter than 110 mm, a piece of the adipose fin is collected and the tube with the tissue sample is IDed. Every time a fish is provided with a Carlin tag, a piece of adipose fin is collected and the tube with the tissue sample has the same ID of the Carlin tag. Thus, the genotypes coming from the tissue sample of a fish that has first been sampled when shorter than 110 mm and later sampled when longer than 110 mm (and thus a Carlin tag was provided) should be the same, except for genotyping errors. Another instance of genotype matching occurs when a fish has lost the Carlin tag and it is later sampled and retagged with another Carling tag, since it is not possible to establish the previous ID of the fish (several fish lose their tags between sampling occasions). However, when two genotypes match, it is possible to merge together the demographic histories of the fish with different IDs, but that in reality refer to the same fish. It may happen that four different IDs refer to the same fish. For instance, a fish might have been sampled when aged 0 (i.e. before the first winter, first ID) and shorter than 110 mm, then sampled again the following year as age-1 fish shorter than 110 mm (second ID), then sampled the following year as age-2 longer than 100 mm and tagged (third ID), then sampled the following year as a fish that has already been tagged (when fish lose the tag, the scar is visible) and re-tagged (fourth ID).

Merging together fish histories and genotypes referring to the same fish is important for multiple reasons:

if the fish is a potential parent, it avoids not being able to assign the offspring to a parent since 2 potential parents (i.e. the same fish) have the same probability of being the true parent;
it avoids overestimating the production of young;
it helps for estimating more accurately growth (it helps having longer fish histories) and survival probabilities (tag loss is a tricky problem, since the fish that lost the tag is "apparently" dead).

Parts of the "matching genotypes" analysis can be automated (e.g., given ~160 alleles per fish and allowing up to 2 or 3 mismatches, it is quite easy to write a script that extracts fish IDs with the same genotype), but then the demographic histories of the fish with the same genotype should be checked one by one. This was necessary as some demographic histories of fish with the same genotype made little sense (two different cohorts or IDs referred to fish sampled in the same year, but with vastly different length and weight), and thus I had to find out whether the mistake occurred in the lab or in the field.

I am getting closer to having a final, semi-clean, dataset allowing me to proceed with pedigree reconstruction.

Updating the blog and present/future research

I decided to start providing regular updates on my Marie Curie research on this blog.

These are my plans for the next 6 months, the content is basically coming from email exchanges with my collaborators.

- Pedigree reconstruction in the marble trout population of Lipovesck. We have recently genotyped the individuals that have been sampled in 2013 (June and September) and 2014 (June and September). All samples have been sexed and I am ready to carry out the pedigree reconstruction, with particular focus on the post-2009 generation. In 2009, a huge flood hit Lipovesck and just a handful of marble trout survived. The main goal is to understand the recovery process after the flood, who reproduced and whether certain traits were related with higher chances of post-flood reproduction. Then, I'd like to test whether my predictions on post-flood recovery (younger age at reproduction due to the relaxation of density-dependent processes after an episode of massive mortality) were right. If it is true, we should expect the 2011 cohort to have reproduced (at least some individuals) in 2013, thus anticipating one year reproduction (at age 3 or 4 under normal conditions).

- Phylogeny of marble trout living in Western Slovenia. We are sequencing right now additional fish from the populations of Idirijca, Svenica, and Studenc. For Idirijca, the sequencing of additional individuals was motivated by the lack of a sufficient number of SNPs for pedigree reconstruction (96 SNPs recommended, ~80 should be enough). Fish from Svenica and Studenc have never been sequenced. In order to save some money, we tried to sequence fish from some of the populations part of the cluster identified by Fumagalli et al. 2002 (14 microsatellite loci were used). However, since almost none of the SNPs found in the populations of Trebuscica and Idrijca were found to be variable also in Svenica and Studenc, we now have the suspicion that Svenica and Studenc are not as genetically close to Trebuscica and Idrijca as reported by Fumgalli et al. After this sequencing run, we should have all the elements for studying the phylogeny of marble trout, inbreeding, loci under selection etc.

- Writing a technical paper on SNP discovery for marble trout. While we are still discovering and characterizing SNPs for the populations of Idirjca, Svenica and Studens, I am confident we will discover the population-specific panel of SNPs soon. The only real problem is the population of Huda, for which we found very little to almost non-existent polymorphism. Given money, we might try to sequence some fish from Huda using an Hi-seq machine (we are currently using a Mi-seq with size selection at 500 base pairs).

- Differences in life-histories (growth, survival, morphology) in marble trout, including density-dependent patterns, with the main focus on the relationship between growth and survival as possibly mediated by cannibalism. Most of the analysis have been done.

New paper published in PLOS Comp Bio on random-effects models for individual growth

Here below is the abstract and here is the code for all the analysis.

Vincenzi S, Mangel M, Crivelli AJ, Munch S, Skaug HJ (2014) Determining individual variation in growth and its implication for life-history and population processes using the Empirical Bayes method. PLOS Computational Biology 10(9): e1003828. doi:10.1371/journal.pcbi.1003828 [pdf]

Abstract

The differences in demographic and life-history processes between organisms living in the same population have important consequences for ecological and evolutionary dynamics. Modern statistical and computational methods allow the investigation of individual and shared (among homogeneous groups) determinants of the observed variation in growth. We use an Empirical Bayes approach to estimate individual and shared variation in somatic growth using a von Bertalanffy growth model with random effects. To illustrate the power and generality of the method, we consider two populations of marble trout Salmo marmoratus living in Slovenian streams, where individually tagged fish have been sampled for more than 15 years. We use year-of-birth cohort, population density during the first year of life, and individual random effects as potential predictors of the von Bertalanffy growth function’s parameters k (rate of growth) and L_inf (asymptotic size). Our results showed that size ranks were largely maintained throughout marble trout lifetime in both populations. According to the Akaike Information Criterion (AIC), the best models showed different growth patterns for year-of-birth cohorts as well as the existence of substantial individual variation in growth trajectories after accounting for the cohort effect. For both populations, models including density during the first year of life showed that growth tended to decrease with increasing population density early in life. Model validation showed that predictions of individual growth trajectories using the random-effects model were more accurate than predictions based on mean size-at-age of fish.

Simone Vincenzi

Technology/Machine Learning/Mathematical Biology

Category Archives: Uncategorized

Manuscript submitted

Some reflections on my science

Akin's Laws of Spacecraft Design

Links to R-related stuff

Paper submitted

Paper submitted to Axios Review

R documents/packages

19th Dec 2014 - Update on research

Updating the blog and present/future research

New paper published in PLOS Comp Bio on random-effects models for individual growth