Friday, August 07, 2009

Wikispecies is not a database


This post was prompted by Stephen Thorpe's post on TAXACOM about Wikispecies in which he wrote (in a thread discussing Roger Hyam's recent blog post) that
[i]f it [Wikispecies] isn't a true database, then it is BETTER than a database. It can do anything a database can do, and more, if you know how it works properly.
I beg to differ. Wikispecies runs on a database (the Mediawiki software uses a database to store the wiki), and Mediawiki can be thought of as a database of semi-structured text, but it lacks a lot of the functionality database users would expect. For example, in Wikispecies there's no way to perform basic queries such as how many descendants a given taxon has, what names a particular author has published, or to find out in which geographic region most new names are being described from. Much of this information is in Wikispecies, it just isn't in a form that we can usefully use.

These limitations are mostly due to the underlying software (Mediawiki), which fortunately can be extended to address these issues using Semantic Mediawiki. I've explored these ideas earlier. With some restructuring, Wikispecies could become a database, but it would require some serious work.

But this raises the real issue with Wikispecies, namely what is it for? Wikipedia is much more informative for many taxa, and the two wikis are very poorly linked (surely we'd want Wikipedia pages linked to the corresponding Wikispecies pages?). Given that Wikipedia is the basis for some core efforts in linked data (e.g., DBPedia), it seems a no brainer that we would want our information stored in Wikipedia, rather than Wikispecies.

It seems to me that the split between Wikipedia and Wikispecies parallels that between "taxonomic concepts" and "taxonomic names". Wikipedia provides the former, in that it provides one (consensus) view of what a taxon is. Wikispecies would be ideally placed to be a nomenclatural database (and a great place to put all the synonyms that we've accumulated over time, but which would swamp Wikipedia). But Wikispecies seems also to want to provide a classification as well, which strikes me as unnecessary (and raises the issue of how this relates to the classification in Wikipedia).

I don't wish to denigrate the efforts of Wikispecies contributors (they are doing some neat things, such as harvesting new names from Zookeys), and by clever use of templates they avoid some of the serious problems with classification in Wikipedia, but it's not a taxonomic database, at least, not yet.