Grand Challenges in Phylogenomics
Venue: Edificio Povo 2, via Sommarive nr. 9, Povo (Tn) - Room B105
At 2:00 p.m.
- Tandy Warnow - Institute for Genomic Biology, University of Illinois at Urbana-Champaign, USA
Estimating the Tree of Life will likely involve a two-step procedure, where in the first step trees are estimated on many genes, and then the gene trees are combined into a tree on all the taxa. However, the true gene trees may not agree with the species tree due to biological processes such as deep coalescence, gene duplication and loss, and horizontal gene transfer. Statistically consistent methods based on the multi-species coalescent model have been developed to estimate species trees in the presence of incomplete lineage sorting; however, the relative accuracy of these methods compared to the usual "concatenation" approach is a matter of substantial debate within the research community.
I will present results showing that coalescent-based estimation methods are impacted by gene tree estimation error, so that they can be less accurate than concatenation in many cases. I will also present two new methods (ASTRAL and statistical binning) for estimating species trees in the presence of gene tree conflict due to ILS. Statistical binning and weighted statistical binning are used to improve gene tree estimation, while ASTRAL is a coalescent-based method that is provably statistically consistent that can construct species trees with 1000 species. Key to these methods is addressing gene tree estimation error more effectively. Finally, I present theoretical results investigating whether statistically consistent accurate species tree estimation is possible when gene trees have estimation error.