Phylo2Vec: A Vector Representation for Binary Trees.

Penn MJ., Scheidwasser N., Khurana MP., Duchêne DA., Donnelly CA., Bhatt S.

Binary phylogenetic trees inferred from biological data are central to understanding the shared history among evolutionary units. However, inferring the placement of latent nodes in a tree is computationally expensive. State-of-the-art methods rely on carefully designed heuristics for tree search, using different data structures for easy manipulation (e.g., classes in object-oriented programming languages) and readable representation of trees (e.g., Newick-format strings). Here, we present Phylo2Vec, a parsimonious encoding for phylogenetic trees that serves as a unified approach for both manipulating and representing phylogenetic trees. Phylo2Vec maps any binary tree with n leaves to a unique integer vector of length n-1. The advantages of Phylo2Vec are 4-fold: (i) fast tree sampling, (ii) compressed tree representation compared to a Newick string, (iii) quick and unambiguous verification if 2 binary trees are identical topologically, and (iv) systematic ability to traverse tree space in very large or small jumps. As a proof of concept, we use Phylo2Vec for ML inference on 5 real-world datasets and show that a simple hill-climbing-based optimization scheme can efficiently traverse the vastness of tree space from a random to an optimal tree.

More information Original publication

DOI

10.1093/sysbio/syae030

Type

Journal article

Publication Date

2025-04-01T00:00:00+00:00

Volume

Pages

250 - 266

Total pages

Keywords

Binary trees, optimization, phylogenetics, representation, Phylogeny, Classification, Software, Algorithms

Cookies on this website