Why Mesquite was made

We give two answers, the practical and the poetic, and a comment on the relationship between MacClade and Mesquite.

The practical answer

Mesquite represents a distinct approach to computing for evolutionary biology. In recent years there has been a proliferation of computer programs for phylogenetic analysis, each designed for some particular analysis (e.g., see Felsenstein's compilation of programs). As these often involve unique file formats and user interfaces, it is difficult for users to move from one to another. Users tend to become constrained to a few familiar analyses, since any given program can't do everything, and each program has costs in learning. As a programmer one would like to respond by making a program that does everything, but there are now too many analyses available or conceivable for a single programmer or programming team to keep up. We have seen the impact of these constraints with MacClade: some users perform particular analyses in MacClade not because they are the most appropriate analyses for their questions, but simply because they are available in a familiar program. We would like to add more flexibility to MacClade, but in a monolithic program this can be difficult to do, and even if easy, there are more proposed methods than we could maintain in MacClade.

Hence, our goal was to design a general system for phylogenetic computing to which different programmers could contribute modules. Bringing different analytical tools into a common system increases possible analyses more than additively. In the end, the system has grown beyond being strictly phylogenetic, including capabilities for calculations involving characteristics of many organisms (e.g. population genetics and morphometrics) that need not involve phylogeny.

A second goal of Mesquite is to provide a graphical user interface that will operate, more or less without modification, under different operating systems.

Modularity and Flexibility

"Modularity" in computer progamming might follow different models. It could follow the "Mr. Potato Head" model, in which there is a central program to which different peripheral calculations can be attached in specific places. This allows useful, but limited, flexibility. Or, modularity could follow the "Lego" model, in which building blocks are attached to other building blocks, and so on indefinitely. This allows nearly unlimited flexibility. Mesquite's modularity is somewhat of a hybrid between these: there is a (small) central starting point to which modules attach, but from there modules can be attached to modules attached to modules, indefinitely, leading to considerable flexibility in the analyses that can be constructed.

To give an idea of the flexibility, consider the calculation of the parsimony score of a tree, the treelength. A treelength calculating module takes as input information a tree, and responds by returning its length. Such a module belongs to the general class of modules that return a number when passed a tree. Other modules belonging to this class ("NumberForTree") could return the likelihood of the tree, or a measure of the asymmetry of the tree's branching, or a measure of the tree's discordance with a containing species tree. A Tree Legend module can be written (and has been) that displays the treelength in a legend in the tree window, but the Legend module is designed so that the user can choose to display any other number for the tree, such as its likelihood, asymmetry, or discordance. If a programmer creates a new module to calculate a number for a tree such as the longest branch-length path from root to tip, and a user installs the module, then the longest path measurement would automatically become another option for the tree legend.

The Tree Legend is not the only place where analyses could use numbers for trees. A charting module could display the numbers calculated for a whole series of trees, or a tree search module could use the numbers to find a tree with minimum or maximum values for the number. When such modules are made, they can automatically have access to whatever NumberForTree modules are available. Thus, the chart could show treelength, or likelihood, or asymmetry, or discordance, or longest path. Likewise, the tree search module could seek to optimize any of those. If a programmer makes a new module to analyze numbers for trees, then suddenly all existing NumberForTree modules have a new context in which they can be analyzed. If a new NumberForTree module is made, it will appear as a new option under each of the modules making use of NumberForTree. Hence the number of alternative analyses rises as the product of numbers of modules of different interacting types.

Of course, the trees used had to come from somewhere. One module might supply the trees stored in a file, another might simulate trees using a simple Markovian model of speciation and extinction, another might simulate trees as gene trees coalescing within a species tree. Characters likewise might come from a stored matrix, or might be simulated by a stochastic module of evolution, or might represent reshufflings of existing characters. This means that any calculations using trees or characters can either do their calculations on observed data and reconstructed trees, or can derive null distributions under stochastic models. The calculations don't have to do anything special to achieve this flexibility; they simply let the user choose the sources of trees and characters.

(For more details about modularity, see How Mesquite works)

A community of programmers

Our hope is that building-block style of the Mesquite system will encourage programmers to write modules for their own favorite analyses. Another attraction of the Mesquite system is that many of the details of reading and writing of files, user interface and graphical display are already taken care of, and the programmer might worry only about a single calculation. The system is built in Java and is therefore platform independent. It is also possible for programmers to link in code written in C, C++, or some other language.

We have attempted to design the system so that a programmer's efforts can be recognized as an independent, citable contribution. Modules or suites of modules can have their own names, own manuals, be distributed and cited separately. They simply run within the Mesquite system.

Mesquite source code is available for download. This allows other programmers to modify existing source to create new modules.



The poetic answer

The goals of Mesquite are these:
To change the economics of imagination in evolutionary biology — There are three ways we envisage Mesquite stimulating imaginative ideas and their successful spread:
  • Stimulating the creation of ideas: analyses. With multiple alternative modules available for various parts of an analysis, and with modules specializing in questions from various branches of evolutionary biology (e.g., phylogenetics, molecular evolution, population genetics, geometric morphometrics) the diversity and scope of analyses that can be constructed by combining different modules is great. Individual users can carry their imaginations through to an analysis that no one has tried previously. Indeed, Mesquite, by offering the alternatives to be combined, doesn't merely provide analytical tools for questions that have existed: it suggests and provokes new questions.
  • Stimulating the creation of ideas: biology. As does MacClade, Mesquite has an emphasis on visualization and exploration. An idea — whether a particular hypothesis about the evolutionary history of a group, or a stochastic model of a process — can be followed through to its consequences, and visualized. A biologist can ask "What if this were the phylogenetic tree?" and a character's evolution can be reconstructed or simulated on this tree, and the results visualized. A biologist can ask "What if the population had population sizes fluctuating in this way?", and coalescence can be simulated, and the results visualized. In providing users with the tool to ask "What if?" questions, Mesquite provides an extension of the imagination. Such tools are vital in a field whose ideas have consequences that are difficult to predict or grasp without the aid of a computer.
  • Enhancing the efficient distribution of ideas: programs. The imagination of theoreticians and programmers has produced many valuable ideas for approaches and methods, and many valuable programs to implement them. However, some of the ideas haven't been translated to programs, and many of the programs haven't been as much explored and used as would have been good. We don't know, as a field, how many important ideas will lie unused for decades until they are rediscovered. By allowing the programmer to focus on the precise idea proposed (Mesquite providing much of the housekeeping code for the programmer), Mesquite may allow some ideas, that might never have been implemented, to be realized as tools. By providing a fairly user-friendly context in which modules can operate, Mesquite may encourage some programs to be used more broadly and more easily than otherwise.

To continue to promote a phylogenetic perspective in evolutionary biology — The last few decades have seen the realization of the importance of viewing organismal diversity and evolution in the light of phylogeny. This revolution is analogous to and as fundamental to its field as the revolution in cosmology from a Newtonian view of space to an Einsteinian view of space (Maddison and Pérez, 2000). Just as mass curves space, phylogeny has curved the space of biological diversity, providing a distortion on the distribution of traits of organisms we see around us. MacClade and Mesquite are both designed to provide a corrective lens, to help us to see organisms and their traits in their natural orientation within this curved space along the phylogeny. Mesquite's modularity allows this perspective to be extended to fields such as morphometrics, in which a phylogenetic perspective has relatively recently begun to suffuse the field.



Which to use, Mesquite or MacClade?

Version 4 of MacClade (Maddison & Maddison, 2000) was released in October 2000, and the MacOS X compatible version 4.04 in July 2002. The reader might wonder why we have been working on two different programming efforts, and whether they are intended for different uses. Although Mesquite's extensibility means that it has many more features than MacClade, MacClade has some features that Mesquite lacks. Calculations and functions of MacClade's tree window not currently available in Mesquite include particular charts (e.g., Changes and Stasis), the concentrated changes test, some of the parsimony options (irreversible, stratigraphic, Dollo), a detailed Trace All Changes mode, and some options for tree printing (e.g., saving a tree to the clipboard). Previous to version 2.5 of Mesquite, some of the most significant advantages of MacClade 4 over Mesquite were in tools in the data editor for managing molecular sequence data; these have, however, been added to Mesquite 2.5. MacClade also has a simpler interface, which some users prefer. However, MacClade is not compatible with modern operating systems (it will not work on MacOSX 10.7 or later), and so for most people MacClade is not a viable option. We are making an effort to continue to streamline Mesquite to make it easier to use. We also hope that those who are so interested and able will work on adding some of remaining MacClade-only features to Mesquite.

References

Maddison, D.R. and W.P. Maddison. 2000. MacClade version 4: Analysis of phylogeny and character evolution. Sinauer Associates, Sunderland Massachusetts.

Maddison, W.and T. Pérez, 2000. Biodiversidad y lecciones de la historia. In: Enfoques contemporáneos para el estudio de la biodiversidad [Hernández, H.M., A. García Aldrete, F. Álvarez and M. Ulloa, editors]. Instituto de Biología, UNAM, Mexico. Pp. 201-220.