Author Archives: admin

What I am reading (journal papers)

Allan, G. J., S. M. Shuster, S. Woolbright, F. Walker, N. Meneses, A. Keith, and J. K. Bailey. 2012. Perspective : interspecific indirect genetic effects ( IIGEs ). Linking genetics and genomics to community ecology and ecosystem processes. Trait-Mediated Indirect Interactions - Ecological and Evolutionary Perspectives.

Ancona, S., and H. Drummond. 2013. Life history plasticity of a tropical seabird in response to El Niño anomalies during early life. PLoS One 8:e72665.

Andrade-Domínguez, A., E. Salazar, M. Del Carmen Vargas-Lagunas, R. Kolter, and S. Encarnación. 2013. Eco-evolutionary feedbacks drive species interactions. The ISME journal 1–14.

Andreev, E. M., D. Jdanov, V. M. Shkolnikov, and D. a Leon. 2011. Long-term trends in the longevity of scientific elites: evidence from the British and the Russian academies of science. Population Studies 65:319–34.

Cameron, T. C., D. O’Sullivan, A. Reynolds, S. B. Piertney, and T. G. Benton. 2013. Eco-evolutionary dynamics in response to selection on life-history. Ecology Letters 16:754–63.

Civelek, M., and A. J. Lusis. 2013. Systems genetics approaches to understand complex traits. Nature Reviews Genetics.

Davidson, W. S. 2013. Understanding salmonid biology from the Atlantic salmon genome. Canadian Journal of Fisheries and Aquatic Sciences 56:548–50.

Gelman, A. 2013. P values and statistical practice. Epidemiology 24:69–72.

Grossman, G. D. 2013. Not all drift feeders are trout: a short review of fitness-based habitat selection models for fishes. Environmental Biology of Fishes.

Hein, C. L., G. Öhlund, G. Englund, P. R. S. B, and O. Gunnar. 2014. Fish introductions reveal the temperature dependence of species interactions Fish introductions reveal the temperature dependence of species interactions. PRSB

Heras, J. De, D. Moya, J. A. Vega, E. Daskalakou, R. Vallejo, N. Grigoriadis, T. Tsitsoni, et al. 2012. Post-Fire Management and Restoration of Southern European Forests. (F. Moreira, M. Arianoutsou, P. Corona, & J. De las Heras, eds.) (Vol. 24). Springer Netherlands, Dordrecht.

Jones, O. R., A. Scheuerlein, R. Salguero-Gómez, C. G. Camarda, R. Schaible, B. B. Casper, J. P. Dahlgren, et al. 2013. Diversity of ageing across the tree of life. Nature

Kass, R. E. 2011. Statistical Inference: The Big Picture. Statistical Science 26:1–9.

Kopp, M., and S. Matuszewski. 2013. Rapid evolution of quantitative traits: theoretical perspectives. Evolutionary Applications

Mackay, T. F. C. 2013. Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nature Reviews Genetics.

Moreno-Estrada, A., S. Gravel, F. Zakharia, J. L. McCauley, J. K. Byrnes, C. R. Gignoux, P. a. Ortiz-Tello, et al. 2013. Reconstructing the Population Genetic History of the Caribbean. PLoS Genetics 9:e1003925.

Penney, Z. L., and C. M. Moffitt. 2013. Histological assessment of organs in sexually mature and post-spawning steelhead trout and insights into iteroparity. Reviews in Fish Biology and Fisheries.

Vincent, B., M. Dionne, M. P. Kent, S. Lien, and L. Bernatchez. 2013. Landscape Genomics in Atlantic Salmon ( Salmo Salar ): Searching for Gene-Environment Interactions Driving Local Adaptation. Evolution 67:3469–3487.

Yvert, G. 2013. “Particle genetics”: treating every cell as unique. Trends in Genetics 1–8.

Math, prediction and the decreasing value of old-fashioned experts

This article was published in August in Business Insider, a website which - I am sorry to disappoint the snobbish folks out there - I quite enjoy reading from time to time. Anyway, the article popped up in twitter today and I discovered a few interesting things here and there.

For instance:

So there was no small amount of irony surrounding the recent high profile negotiations between ESPN and The New York Times over the services of Nate Silver. The stakes were enormous, as Mr. Silver accounted for as much as 20% of the Grey Lady’s web traffic.

I did not know that Nate Silver's blog accounted for such a large fraction of the NYT's web traffic, quite surprising. I wonder how many blogs are cumulatively responsible for 80% of the traffic.

Persad compares her work to how algorithmic trading has replaced intuition on Wall Street  and says, “The program—not a human—does the pattern recognition, the bias and error adjustments of models, the tracking, and makes the “judgment call” based upon the most dynamically optimal combination of models.”

I certainly agree, but do not forget that who is setting up the model(s) is an expert herself, just not the hand-waving, gut-feeling, neck-scractching kind of expert. She is an expert in developing model, collecting data, asking questions, testing results, improving predictions. Now every other guy is shooting (figuratively) on the stereotypical pundit, and I agree that political and sport pundits are one boring, inaccurate, talkative group of hand wavers, but at least we can make fun of someone, right?

An interesting read overall.

Update on research

Quite busy before Christmas. I have submitted a couple of days ago a manuscript on the joint effects of a climate trend and an increase probability of occurrence of extreme events on risk of population extinction and other demographic and genetic traits of a population of moderate size (why moderate size? Large size is kinda boring, honestly, what happens in small populations is for both more relevant - given the intrinsic higher risk of extinction of smallish populations - and interesting - since much of the ecological and genetic research has focused on large/very large populations, if not of infinite size). I submitted the manuscript to the Journal of the Royal Society Interface, since  a) it is a good journal, b) the manuscript should be a good fit (or the other way around), c) it allows long papers (8000 words is the maximum and I was scratching the ceiling, well, actually I had to reduce the paper by one thirds, so a lot of details about the model went straight into Electronic Supplementary Material).

This below is the abstract/summary of the paper and all analyses are reported here
http://dx.doi.org/10.6084/m9.figshare.706347. Please ask me for a preprint if interested (due to the purposefully-unclear policies of many journals, I am quite reluctant to post preprints online).

One of the most dramatic consequences of climate change will be the intensification and increased frequency of extreme events. I used numerical simulations to understand and predict the consequences of directional trend and increased variability of a climate variable (e.g. temperature), increased probability of occurrence of point extreme events (e.g. floods), selection pressure, and amplitude of mutations on a quantitative trait determining individual fitness, as well as the consequent effects on the population and genetic dynamics of a population of moderate size (i.e. 500 individuals). The interaction among climate trend, variability and probability of point extremes had a minor effect on risk of extinction, time to extinction and distribution of the trait after accounting for their independent effects. The survival chances of a population strongly and linearly decreased with increasing selection pressure, as well as with increasing directional climate trend and climate variability. Mutation amplitude had no effect on extinction risk, time to extinction or the shift of the mean phenotype. Directional trend and strength of selection largely determined the shift of the mean phenotype in the population. The extinction or persistence of the populations in a “extinction window” of 10 years was well predicted by a simple model including mean population size and mean genetic variance over a 10‑year time frame preceding the “extinction window”, along with probability of occurrence of point extremes and strength of selection.

There were two challenging points in this works, and both of them are coming up pretty often in my work.

Unfortunately, I still have not found an optimal solution for either of them. First, I am talking about how to summarize the results of large simulations. In this specific work I had multiple parameters I was interested in (basically, how variation in these parameters affected simulation results), e.g. probability of occurrence of point extreme events (i.e. floods), climate trend (increase/decrease in mean summer temperature over 10 yrs), climate variability (year to year change in mean summer temperature), amplitude and rate of genetic mutations, strength of selection for the polygenic trait determining relative fitness, and other that could have been potentially studied/varied as well, but I fixed them (e.g. number of genes coding for the trait under selection, number of alleles for each gene at the start of the simulation, heritability of the trait at the start of the simulation, population size, age at sexual maturity, mean/variance of offspring produced by a mating pair etc.). For a specific combination of parameters, I ran 50 replicates and then, after some graphical reporting of results (that means, figures), I carried out some statistical analyses using simulation results as pseudo-empirical data after standardization of predictors (for comparison purposes). Could I have analyzed results differently? A global sensitivity analysis maybe (a topic/technique I was interested in some time ago before getting engulfed by other things)? Probably. Anyway, from the statistical analyses on the pseudo-empirical data I got the insights I was looking for, i.e. what is the relative importance of the predictors described above (climate trend, variability etc.) in determining population extinction/persistence, the shift of the phenotypic trait under selection, changes in additive genetic variance for the trait under selection. I am working right now on including phenotypic plasticity in the model of population and genetic dynamics and checking whether phenotypic plasticity (which might be also cost-free, that is modeled without taking into account the cost of maintaining the "machinery" allowing plasticity in the trait) increases survival probability (and scaling up, probability of population persistence) in a highly-stochastic world.

The second struggle was related to the trade-off between conciseness and clarity in the preparation of the manuscript. It is a complex paper, it is a dense paper and to have any impact at all, it has to be carefully introduced, explained, discussed. The manuscript is thus quite long, the code is rich, the number of figures near the maximum normally allowed. In synthesis, the paper is a monster that if published in the present form would be around 16-18 printed pages, with a 30 pages supplementary material and some thousand code lines available online. And I had move online one third of it. Go figure.

Getting back to the scientific work itself, I was very intrigued by this result:

The extinction or persistence of the populations in a “extinction window” of 10 years was well predicted by a simple model including mean population size and mean genetic variance over a 10‑year time frame preceding the “extinction window”, along with probability of occurrence of point extremes and strength of selection.

It is very clear that smaller populations, all things being equal, are at higher risk of extinction than larger population, especially (I write especially because demographic stochasticity increases per se the risk of extinction of small populations) when extreme events (or a variable environment more in general) can rapidly reduce population size. However, the surprising part was that a simple logistic model had the following predictive abilities (abridged from the manuscript):

Prediction of population extinction

Population size in the “sampling window” was the most important predictor of extinction in the “extinction window”. Higher values of additive genetic variance had a positive effect on population probability of persistence, although the importance of population size was substantially greater than that of additive genetic variance. Both stronger selection and higher probability of occurrence of point extremes increased the risk of population extinction, although probability of occurrence of point extremes had a minor role.

The model predicted an 8.0% false positive rate (model predicted extinction, but the populations persisted) and a 7.3% false negative rate (model predicted persistence, but the populations went extinct) on the calibration dataset.

The validation part on a independent dataset not used for the calibration of the logistic model had similar results. As I said, it is an intriguing result and I am probably going to delve further into it in a follow-up work.

Trees work pretty well

From Kaggle newsletter:

Kaggle has a new #1 ranked data scientist.Congratulations José Guerrero! He's worked in the health sector in Spain for more than 25 years, and is currently chomping at big databases at the region's main hospital. He has a BSc & MSc in Mathematics, Statistic and Operations Research and did his postgraduate work in Scientific Programming. Perhaps that's helped out on Kaggle ... José says, “My first option with a dataset is almost always tree-based (boosted or bagged). Trees are robust, manage unknown data well, and have ability for interaction modeling.” José mainly uses R and more recently Python with scikit-learn. And what is his view getting there at the top? “I learn in every challenge, and the community interaction is really amazing.”

When I have a big database or I want to know if the variables I have are able to explain variation in the response variable, one of the things I might do first is to apply a random forest. Brief data exploration and then I move to the actual modeling. See here.