Friday, March 06, 2009

Photosynth for trees - supertrees revisited

It's Friday, so time for some random, half-baked ideas. Imagine that we have a database of evolutionary trees, and these overlap for a set of taxa that we are interested in. How do we summarise these trees? One approach is to make a supertree. It would be useful to display the subtrees that went into making this supertree, if only to give an idea of how much they agree with the supertree. How to do this?

One idea I've been toying with is inspired by Photosynth, from Microsoft labs (it only runs on Windows, sigh). Photosynth takes a series of pictures taken from different angles and stiches them together into a 3D model of the object being photographed:


One thing I like about Photosynth is that you can see the original pictures, so when you move around the view you get a sense of how they have contributed to the overall view. This is easier to see than explain:



Now, imagine if we did this with trees. We could create a supertree as a summary of the individual trees, then have the original trees layered on top. Perhaps we could do this in 3D, so that each individual tree is in a plane that is tilted with respect to the supertree in proportion to how much it disagrees with the supertree:

I think this could be a fun way to explore a set of trees, and it would give one the ability to quickly grasp how well the source trees agreed with the supertree. Note that I'm not (necessarily) arguing that the supertree represents the try phylogeny. Think of it as a convenient way to summarise the individual trees.

Part of what attracts me to this approach is that I think most, if not all, 3D phylogeny viewers (such as Paloverde and the Wellcome tree of life) don't make any real use of 3D, beyond the rather gimmicky (and I find ultimately confusing) ability to fly around a 2D tree. Is there a better way to exploit the possibilities of 3D?

5 comments:

tjv said...

I like this idea, but I'd also like to defend Paloverde. The clever use of 3D in that application is that, by viewing the tree from off-center and at a glancing angle, one can effectively foreground tips of interest, and the magnification of the rest of the tree becomes proportional to how close it is to the focal tips.

Anonymous said...

Hi Roderic, that sounds cool ! Would like to see that running automatically in free phylogey software tools !

Roderic Page said...

Todd, I guess I'd argue that the effect you describe doesn't require 3D, one could distort a 2D tree and get much the same effect (see for example Dendroscope). So my question is what does 3D bring that can't be done as well in 3D?

Greg said...

One of the fundamental principles behind visualizing phylogenetic trees is that there should be some correspondence between the "distance" of nodes and their evolutionary divergence. It should be noted that even this basic requirement is often NOT met, for example in the traditional square-ish cladograms where the length of vertical lines have no meaning, or in diagonal cladograms where all lengths are meaningless. But beyond this constraint, the layout of nodes is scientifically entirely arbitrary, and thus controlled merely by the desire to have a usable and aesthetic visualization.

(Note the keyword USABLE here -- that in itself can make the fundamental difference between identifying an interesting scientific phenomenon in a dataset, or missing it entirely. See Ben Shneiderman's interesting work on information visualization, such as the Hierarchical Clustering Explorer, for examples of highly usable scientific tools)

The extra dimension available in 3D layouts potentially affords you more "wiggle room" to enhance the usability or aesthetic of a tree, but using the extra space in a novel way often just leads to confusion.

TJV makes a good point about the practicality of the "3D proportional effect," making it easier to look at nodes of interest while maintaining the context of the tree. This is why hyperbolic layouts are attractive as well (although the hyperbolic layouts lose the sense of relative distance by distorting branch lengths).

I have also long wondered what might be possible to get out of the third dimension, but that question could be just as appropriately asked of the 2nd -- why do we traditionally lay out the taxon labels in an evenly-spaced, vertical line? They could just as easily be spaced according to relative biodiversity (of taxonomic groups) or any other biological variable.

But I'm getting pretty seriously off-topic here. To bring it back to the Phylogenetic Photosynth idea, I think any useful implementation must stem from the fact that a single tree is a complete set of phylogenetic hypotheses (the exact number of hypotheses being made depends on how you approach the problem). Here's my thought on how best to provide a "support summary" of a supertree:

1) Have two side-by-side displays, one showing the supertree itself, and the other somehow summarizing the location of each component tree in "tree space." Two ways of doing this would be to plot them along the first two PCA axes, or to create a "tree of trees" dendrogram, calculated from a distance matrix of the component trees.

2) For a dynamic analysis of the tree's support, when the user hovers over one of the nodes in the tree (which represents a single evolutionary hypothesis that says "All the taxa beneath this node are related in this way"), you could highlight the trees in the "overview" display according to the degree to which they agree with that specific hypothesis.

3) For a more static view of support, one could use the dendrogram of trees in the "overview" window to create a coloring scheme for the component trees, where more similar trees have more similar colors. If you then color the nodes/branches/labels of the supertree according to the varying levels of support, this could give a quick look at what tree the support for a given clade is coming from.

Obviously, the devil's in the details there, but I think it could be an interesting way of looking at large trees or large databases of trees.

Anonymous said...

I have no idea how I got linked to my lectures blog when I was talking about coding but this, I think is a great idea.

Imagine looking at the trees and thinking 'This is just a big mess of lines' which, To be honeset it is. And looking at lines gets boring fast. Your idea would allow for much greater detail to be put into these tress and make it easier for the person viewing them to understand what they are looking at, aswell as much better functionality and quicker searchs through the tree to allow for gathering data.

Colour,angled tree line, pictures and tags can be used in this to give even better understanding and showing of connectivity between species. (of course they could also do most of this in a 2D picture)

Granted the person above me went in to much better detail than me, but input is input. Beside each species if possible when highligting the tree a full 3D version of each species could be brought up. Again this idea is only to enhance the visual input and not actual practicality.

Toggling off and on chains that are completely/partionaly unrelated would be useful aswell as, the ability to move/fold/bend trees to your liking to make for a comparison with other tree lines.

There are so many ideas floating in my head to make sense of that could be incorporated into a 3D structure. Most of them just focus on making it much easier to understand. I guess it would be like looking at a table of numbers, then looking at those numbers after being put into a graph.

I hope something there inspured an idea and I didn't just waste 5 minutes of your life.