Maddison’s Law of Phylogenetic Analysis

Sigh.

I was painfully reminded yesterday of Maddison’s Law of Phylogenetic Analysis. It wasn’t the first time over the last few months I have been reminded of that law.

As I documented in my March 2023 blog post Prior and Current Ignorance: Struggles with Bayesian analyses, I have been attempting to do a Bayesian analysis of some beetle genomic data that would yield estimates of the divergence times of various branches in the evolutionary tree of Bembidion ground beetles. We (primarily James Pflug) had already completed numerous other analyses, including maximum likelihood analysis of various combinations of data, SVD Quartets, etc., etc. Only the Bayesian analysis remained.

After I last updated that post, I started the final analysis in the program BEAST following the options outlined in the post. I set the BEAST 2.7 analysis going on 17 April 2023 on a relatively fast MacStudio (with an M1 Ultra chip with 16 fast cores and 128GB RAM). It happily did the whole MCMC sampling routine until about two months later, 16 June, when I stopped the analyses as the ESS values were all above 200. Here’s what the window in Tracer v1.7.2 looked like, showing on the right the trace for likelihood:

I was extremely pleased that this was done – my final analysis for the paper was done! But then I looked at the resulting trees and realized that the estimated age of Bembidiini was way too old at 175 million years, in the Jurassic. That is way older than makes sense based upon other dating analyses of beetles and the fossil record.

Searching online revealed the problem: the way the taxa were sampled violated the core assumption of randomness in the BEAST Fossilized Birth Death analysis. It turns out that if you sample taxa to maximize sampling of deep phylogenetic lineages, as I did, then using BEAST’s FBD random sampling model will cause overestimation of the ages (Matschiner et al. 2017). In the advice I received about the sampling model, which led to the exclusion of outgroups as they were sampled much less densely, there was no mention of this bias.

So where to go from here? There is a CladeAge package for BEAST (Matschiner et al. 2017) that is specifically built to avoid this bias in FBD analyses in BEAST. However, I didn’t get far with that, as CladeAge requires presumption of diversification rates (if I remember correctly), which might be fine for extremely well-studied groups like vertebrates with dense fossil records, but for my little beetles there simply isn’t the background knowledge to presume diversification rates that I felt comfortable with.

More exploration revealed that, thankfully, MrBayes could do the FBD analysis with the assumption that taxa were sampled across the deepest nodes, rather than randomly. This was great news, not only for me, but for the field, as most people who do phylogenetic studies make a point of sampling one or two species from each group; they do not randomly sample. So off into the MrBayes world I went.

The key commands for MrBayes were:

prset brlenspr=clock:fossilization;  
prset samplestrat=diversity; 

These told MrBayes to do an FBD analysis, and assume a sampling strategy that was meant to maximize diversity of deep lineages. Perfect! I did an initial test of how long it would take analyzing all 500,000 amino acids, and it seemed to be too slow, so I asked instead for it to analyze the data at the nucleotide level, but this time excluding 3rd positions (so about one million nucleotides). I also asked it to do 4 MCMC runs with 4 chains each. I began my final Bayesian analysis on 29 July 2023.

However…. things did not turn out as hoped. I just couldn’t get MrBayes to sample sufficiently. Two and a half months later, by 14 October 2023, it had completed nearly 22 million generations on the fast 16-core computer, but the ESS of ln L was only 6. Here’s what the Tracer view looks like of ln L:

As you can see, it looks pretty miserable. All four chains had eventually reached the same level for ln L but it seemed as if it would take an extraordinarily long time to sample sufficiently to complete the analysis. I also needed that computer for something else, so I decided to stop the run. That should be no problem, however, as MrBayes has a checkpointing system that allows a run to be restarted partway through.

Alas, I couldn’t get the checkpointing system to work. Tried multiple times, asked questions, got no where.

Because I suspected I could improve the sampling by altering the options for the MCMC process, I tried again with the MrBayes analysis, this time hoping that different values of the chain temperatures and swap frequency would be better. With hope in hand, I started my final Bayesian analysis on 16 November 2023. However, after 3.5 weeks of sampling, this is what the ln L trace looked like (on 11 December):

As it didn’t seem as if it was doing all that much better than the earlier run, and after much thought, and discussion with my colleague Katie Everson, I decided to abandon that run.

But it turns out I had another computer that was available during that time, and so I had already started to explore dating the branches not with the better method of the Fossilized Birth Death process, but instead a classic method in which nodes were calibrated with the fossils. So while that failed FBD analysis in MrBayes was going on, I was also running a calibrated node analysis in BEAST. This included all three fossil Bembidion, and the run was started on 23 October 2023. It finished, with good ESS values, on 6 December 2023. So when I decided to scratch the MrBayes run on 11 December, all was good, as I already had completed what was now my final Bayesian analysis five days before. Yay!

Except… as I was writing up the methods section, I realized that there was a problem with one of the fossils used to calibrate the minimum age of one clade. The placement of the fossil in that clade was less certain than I had originally thought. And so, with some reluctance, I decided to scrap that analysis, and start again, this time dropping that fossil and its calibration from the analysis. OK, whatever.

So now, at last, with some confidence I could begin my final Bayesian analysis: a calibrated node analysis in BEAST, with only two calibrated nodes (one of each of the two fossils) not three. I began the analysis on 6 December 2023, and one month later, on 6 January 2024, it had sampled enough that I felt comfortable stopping it and harvesting the trees. At least I was done!

Except… a couple of weeks later, while helping a colleague (Kip Will) with setting up an analysis for a joint project, I realized that we had mistakenly set one of the priors for the Optimized Relaxed Clock to something we shouldn’t have used. This meant that I had to start again.

I corrected that prior, and on 23 January 2024 I started my final Bayesian analysis, just like the previous one, but with the better prior. It chugged along for a few weeks, and on 12 February 2024, all ESS values were above 200, and so I stopped the analysis, breathed a huge sigh of relief that it was finally done.

Except… yesterday the author of the Optimized Relaxed Clock package in BEAST that we were using reported to the Google Groups community that there was a bug in the version of ORC that we used, and that it was strongly recommended that users redo any analyses.

And, so. Here we are again. Almost one year later. Yesterday night I started the run again. The seventh final Bayesian analysis. I figure it will be done in three weeks or so.

Will this truly be the final Bayesian analysis? Seventh time’s a charm? Who knows. Somehow I suspect not, unless I am just so sick of the whole thing that I abandon the analysis entirely. If, miraculously, I really complete the final analysis for this paper, I will update this post!

Now you can see why I was reminded yesterday of this law:

Maddison’s Law of Phylogenetic Analysis: You will conduct your final phylogenetic analysis at least three times.


References

Matschiner, M., Musilová, Z., Barth, J.I.M, Zuzana Starostová, Salzburger, W., Steel, M., Bouckaert, R., 2017. Bayesian Phylogenetic Estimation of Clade Ages Supports Trans-Atlantic Dispersal of Cichlid Fishes, Systematic Biology, 66:3–22, https://doi.org/10.1093/sysbio/syw076

Posted in Phylogenetics | Tagged | 2 Comments

Truth versus firehoses of falsehoods

As a result of one of my more traumatic experiences as a graduate student, I became familiar with Knoll’s Law of Media Accuracy, “Everything you read in the newspaper is absolutely true, except for that rare story of which you happen to have firsthand knowledge.”

It was my second year in graduate school at the University of Alberta in Edmonton, and I was spending my time examining adults and larvae of Bembidion subgenus Bracteon in an attempt to infer the phylogeny of the group. I had adults and larvae of most species, but I had no larvae of Bembidion balli, and I really wanted to get them. To do that I needed to capture adults during the breeding season. The best place I knew to find Bembidion balli was a beautiful spot called Paynton Ferry, on the North Saskatchewan River in Saskatchewan, and so on 19 May 1983 Jim O’Hara and I got in his van, and drove east from Edmonton. As we were driving rain clouds were approaching behind us; this was not good, as these beetles like sunshine and warm conditions. Nonetheless, we drove on, and once we arrived I set out walking along the river shore to the sand bank about a kilometer away where I knew B. balli to be relatively common. I never managed to get a female as two events drastically altered the day: the rain arrived, and I found a body on the beach. It was the body of a trucker whose semi truck fell off a bridge in Edmonton the previous November, more than 400 km upstream, was swept under the ice and the body was eventually deposited at Paynton Ferry, where I found it in May. There are many more details, of course, but suffice to say that it was a day I will never forget. It was newsworthy in Edmonton, and two days later the papers there carried an article about my discovery. In one paper I was described as “a man walking along the river”, which was accurate. In the other paper, however, I was “two men in a canoe”. I don’t know how that falsehood arose, but one can imagine that it sounded a bit more poetic or perhaps more believable than “a man walking along the river”. And thus I gained a healthy distrust for the media.

Bembidion has been in the news recently, because of a paper we published this week. In 2021 Kip Will (UC Berkeley) found a specimen in Jerry and Anne Brown’s ranch in California. He didn’t recognize the species, and so sent it to me; I examined it and sequenced its DNA, and I compared it to other Bembidion, and it became evident that it was a new species. John Sproul provided some DNA sequence data of related species. Our paper in ZooKeys (Maddison, Sproul, and Will 2023) describing it as Bembidion brownorum came out earlier this week.

Bembidion brownorum adult male. Image copyright David. R. Maddison

In part because the species is named after a former Governor and First Lady of California, and in part as the species has likely declined in abundance, the news services picked it up and the discovery has been covered around the world. The first article by Kara Manke is extensive and quite accurate. Most other articles were subsets of this one. Another took a different tack, with more political discussion and humor; that was is in general pretty good, although it states the research was done by “researchers” at UC Berkeley, and there is no mention of me or John or anyone outside of California, which is unfortunate as the bulk of the research for the paper was done at Oregon State University. But after these initial stories, some of the articles started going off the rails.

One published in the Hindustan News Hub is titled “Rare cricket species found in California named after former governor”. I don’t know how the beetle became a cricket. My name is also misspelled as “Madison”, but even better, the article states that we published the description in the journal Nature! Woohoo! My first paper in Nature!

The most amazing I have found is an article in Sasatimes which uses my picture of Bembidion brownorum, and, given the timing, surely is intended to present the same news story, but which states that the new species is a tenebrionid (darkling beetle) that was named “Eleodes jerrybrowni“, and that it was “discovered by a team of entomologists from the University of California at Davis” and that it was found “at the Oakland Zoo”. As far as I can tell, there is no species called Eleodes jerrybrowni, and I don’t know of any entomologists at UC Davis who would describe a tenebrionid beetle. Where could this fictitious account have come from?

One of Kip’s students suggested it was an AI system that wrote the bogus articles; that seems a very likely possibility. With the rise of ChatGPT and similar AI systems in the last few months, it wouldn’t be surprising if they are being used to generate news articles.

Christopher Marshall decided to test one of them out, and put the prompt “Tell me about Bembidion brownorum” into iAsk.ai. Here’s what he got:

Bembidion brownorum is a species of ground beetle that belongs to the family Carabidae. This species is endemic to the United States, specifically found in the state of California. The beetle was first described by Erwin and Kavanaugh in 1981 and was named after the American entomologist Arthur V. Evans Brown.

Bembidion brownorum has a black body with a metallic green or blue sheen on its head and thorax. The elytra, or wing covers, are also black but have a reddish-brown border. The beetle has a length of about 4-5 mm and can be found in sandy areas near streams or rivers.

There is not much information available about the behavior or ecology of Bembidion brownorum as it is a relatively understudied species. However, like other ground beetles, it is likely an important predator in its ecosystem.

Top 3 Authoritative Reference Publications or URLs:

1. “Bembidion brownorum Erwin & Kavanaugh, 1981.” Carabidae of the World. https://carabidae.org/taxa/brownorumerwin-kavanaugh-1981

2. Erwin, T.L., Kavanaugh, D.H. “New species and records of Bembidion from western North America {Coleoptera: Carabidae}.” Pan-Pacific Entomologist, vol. 57, no. 2, 1981, pp. 145-158.

3. “Bembidion {Peryphus} brownorum Erwin & Kavanaugh 1981.” Integrated Taxonomic Information System {ITIS}. https://www.itis.gov/serv1et/SingleRpt/SingleRpt?search_topic=TSN&search_value=932656#null

This is so bogus it is hard to know where to start. It’s disturbing that it made up the fact that it was named after “American entomologist Arthur V. Evans Brown” (presumably the AI just jammed “Brown” on to the end of Art’s name), but the fact that it created what looks like three valid references is incredibly deceptive. The middle one is especially disturbing. Yes, Erwin and Kavanaugh published a paper on Bembidion in 1981, but it was about a different group of Bembidion (a group now in Lionepha), the paper was called “Systematics and zoogeography of Bembidion Latreille: 1. The carlhi and erasum groups of western North America (Coleoptera: Carabidae, Bembidiini)”, and it was published in the journal Entomologica Scandinavica not Pan-Pacific Entomologist. In Pan-Pacific Entomologist, vol. 57, pages 145-158 are parts of two articles about Hydroporus water beetles by other authors, not ground beetles. Where in the world did the AI come up with that fake reference, and how could it possibly present it as an Authoritative Reference Publication?

So I tried the same prompt in iAsk.ai, and got something equally fake and equally disturbing:

Bembidion brownorum is a species of ground beetle that belongs to the family Carabidae. This species was first discovered in 2008 and is endemic to the state of California in the United States. The beetle is named after Dr. Michael L. Brown, who was a renowned entomologist and professor at the University of California, Riverside.

The Bembidion brownorum species is relatively small, measuring between 4.5 and 5 millimeters in length. It has a dark brown coloration with a metallic sheen and is covered in fine hairs. The beetle’s head is narrower than its thorax, and it has large eyes and long antennae. Like most ground beetles, Bembidion brownorum is a predatory insect that feeds on other small insects.

This species of beetle is known to inhabit sandy beaches along the coast of California, particularly those in the San Francisco Bay Area. It prefers areas with high levels of moisture, such as those near streams or wetlands. Bembidion brownorum is considered a rare species, and its population size and distribution are not well-known.

Conservation efforts for Bembidion brownorum are currently underway to protect its habitat from human encroachment and development. The beetle’s limited range and habitat requirements make it vulnerable to habitat loss and fragmentation, which can lead to declines in population size.

Top 3 Authoritative Reference Publications or URLs:

1. “Bembidion brownorum” – Encyclopedia of Life https:f/eol.org/pages/3286766

2. “New Species of Bembidion from California (Coleoptera: Carabidae)” – ZooKeys Journal https:f/zookeys.pensoft.net/article/381/

3. “Bembidion brownorum” – Integrated Taxonomic Information System https:f/www.itis.gov/serv1et/SingleRpt/SingleRpt?search_topic=TSN&search_value=934618

Now it is named after Michael Brown, and it lives on sandy beaches along the coast. Again, all “Authoritative Reference Publications” are bogus. #1 links to a leaf beetle, #3 links to a lacewing. #2 is closer, but there is no article 381 in ZooKeys.

So I then tried OpenAI’s ChatGPT. When I typed in “Tell me about Bembidion brownorum”, it said it didn’t know anything about it. (This is good, as ignorance is vastly better than making things up out of thin air.) It asked for help, and I told it that it was in the news recently. With that added information, and after an apology (“I apologize for the confusion in my earlier response”), it said it had “since researched and found” the following:

Bembidion brownorum is a species of ground beetle in the family Carabidae. It was first described in 2019 by a team of researchers led by David Maddison, a biologist at Oregon State University. The species is named after Robert and Dee Brown, avid collectors of beetles who discovered the first specimen of Bembidion brownorum in Idaho in 1979.

Bembidion brownorum is a small beetle, measuring only around 5mm in length. It has a brownish-black coloration and a flattened body, which allows it to navigate easily through small crevices and soil. This species is found in mountainous regions of the western United States, such as Idaho, Montana, and Wyoming.

Interestingly, the discovery of Bembidion brownorum sheds light on the history of glaciation in North America. Maddison and his team found that the species has a very narrow distribution range and is only found in areas that were not covered by ice during the Pleistocene glaciations. This suggests that Bembidion brownorum has been isolated in its current habitat for thousands of years.

Oh, wait. Now it is in unglaciated areas of Idaho and named after Robert and Dee Brown! Some other things said are true, others fictitious. The made-up details such as “a flattened body, which allows it to navigate easily through small crevices and soil” lend an air of veracity. What sort of “research” did it do? Throw some dice?

It turns out ChatGPT has a “Regenerate Response” button. I pressed it and got this:

Upon checking my sources, I can confirm that Bembidion brownorum is a newly discovered species of ground beetle that was recently described by a team of entomologists in Germany. Bembidion brownorum was named after Robert Brown, a collector who contributed to the discovery of many species of insects, including beetles. The species is about 5 millimeters long and has a brownish coloration. It is found in the alpine regions of Austria, Italy, and Switzerland, and is adapted to the cold, harsh conditions of its habitat. This discovery highlights the importance of continued exploration and study of biodiversity, especially in remote or poorly explored areas. The identification of new species can provide insights into the evolution and ecology of different groups of organisms and can aid in conservation efforts to protect threatened or endangered species and their habitats.

Germany! Named after Robert Brown! And it is an alpine species! And then again:

Bembidion brownorum is a newly discovered species of ground beetle that was reported in the news in March 2022. It was found in the Bering Sea region of Alaska and was named after the late entomologist Arthur V. Evans, who was known for his contributions to the study of beetles. Bembidion brownorum is about 4 millimeters in length and has a shiny brownish-black coloration. It is believed to be adapted to the harsh Arctic environment, and its discovery may provide insights into how species adapt to changing climates.

In arctic Alaska! Named after Art Evans! In addition to still being alive (I hope), Art’s name isn’t Brown, and so why we would use Bembidion brownorum to honor him is beyond me. And then again:

Bembidion brownorum is a newly discovered species of ground beetle that was named in 2021. It was discovered in Montana, USA, by a team of researchers from the University of Montana and the University of Kansas. The species was named in honor of Dr. Neal Wood Brown, a prominent entomologist who contributed greatly to the study of carabid beetles. Bembidion brownorum is a small, black beetle that belongs to the family Carabidae, and it is likely to have a limited range in Montana.

Montana! Named after Neal Wood Brown!

After repeating this process a few times, and learning that Bembidion brownorum was discovered in the UK by researchers from National Museum of Wales and Cardiff University and named after Professor David Brown, and that it was found in the Appalachian Mountains and named after Dr. Adam Brown, I realized that ChatGPT’s ability to make up entire stories out of thin air and present them as facts vastly outstripped my patience.

If the AI systems had expressed doubt, I would have been a bit less despondent, but ChatGPT’s “I can confirm” is not an appropriate lead-in to flapping at the gums (or sparking in the silicon) with whatever irrelevant words it pulls from its data stores and jams together. That each fictitious story is internally consistent and coherent, peppered with interesting details, makes it worse, as each one sounds plausible and believable.

I fear for the ability of humans to be able to sort truth from bald-faced lies if more and more of the “information” we receive was synthesized by AI systems that, at least now, show no evidence to the reader of being able to judge the trustworthiness of their own output. Hopefully an ability to self-judge is a major focus of AI research, and will be integrated into future systems.

That said, Bembidion brownorum is a very nice beetle. And I did eventually raise larvae of Bembidion balli.

Update: Pam Triplett pointed out that this article will now be gist for the mill of AI, which may lead it to spit out even wilder stories – or perhaps it will realize it shouldn’t believe everything it reads and will express greater skepticism.

Posted in Fieldwork, Revising Bembidiina | Tagged | 1 Comment

Prior and Current Ignorance: Struggles with Bayesian analyses

Many years ago I dressed up for Halloween with a piece of paper taped to my rear with “Pr” on it, in one hand holding a small crowbar, and in the other an engagement ring in a small box (see below). The “Pr” was a posterior probability, the crowbar a prior (a “pryer”), and the engagement ring a proposal mechanism – that was, I was dressed up as a Bayesian analysis.

You might think that would suggest I love Bayesian analyses, but whenever I think of setting up a Bayesian analysis to infer aspects of phylogeny, my heart sinks. I hate wandering into the gauntlet of choices one has to make – it’s as bad as filling out tax forms, or perhaps even worse, as for tax forms there is sufficient documentation available to help me, and I have pieces of paper that record what values I should use.

Among the model-based methods for inferring phylogenetic trees, the two most popular are Maximum Likelihood Estimation (MLE) and Bayesian approaches. The former use what is sometimes called a “frequentist” approach to probabilities, and the latter a Bayesian approach. Bayesian analyses tend to include a richer and more complete model of the evolutionary process, and they require a specification of our prior beliefs about it. I am someone used to the simplicity of MLE approaches, and the relative lack of choices to be made. Diving into Bayesian analysis requires me to face my ignorance about the evolutionary processes that generated the beetles’ diversity, and the processes we have used to sample the beetles.

I am currently attempting to conduct a Bayesian analysis of some genomic data in order to estimate when some lineages of beetles diverged one from another. This is my first attempt to do such an analysis to date phylogenetic splits, and my goodness I am finding it challenging. I decided to expose my confusions and queries to the world, and hope that a kind soul who knows much more about Bayesian analysis than me answers me call. I’ve already received a bit of feedback from Sean Harrington to an earlier set of questions (thank you, Sean!). I hope that in addition to answering my questions, this current effort might help someone else in the same predicament. My goal is to update this post with notes as I get feedback about how best to make the choices needed.

For me the most difficult part of setting up a Bayesian analysis is specifying the priors used. A prior is a statement of the relative probabilities of the potential values of various components of the model, with those probabilities based upon our prior knowledge (rather than the data at hand). However, so little is known about the beetles and the evolutionary process that to specify these priors seems foolhardy; at times I feel as if I need a magic 8-ball or tea leaves or access to those rarefied few who know the secret chants. But Bayesian analyses require such choices to be made, and so I have to make them, and I would rather not trust magic 8-balls or tasseography.

Bayesian analyses can take a very long time (especially if the model used is complex), and that in itself leads to a desire to make good decisions about these choices from the start, as it is time-costly to start an analysis, only to find out several weeks later that mistakes were made.

There are several programs available for conducting Bayesian analyses of phylogenies, including the venerable MrBayes and its descendant RevBayes, BEAST2, and PhyloBayes. My initial explorations suggest that RevBayes and PhyloBayes might require more computer processing power than I have easily available, and so I have decided to use the popular BEAST2.

For a long time there was very little guidance to the user regarding how the many choices involved in setting up a Bayesian analysis should be made, except for the occasional post here or there on a discussion forum. There is one extremely useful document by Tracy Heath, called Divergence Time Estimation using BEAST v2.∗ Dating Species Divergences with the Fossilized Birth-Death Process. However, the ever-evolving suite of models used in the BEAST2 often requires a new set of choices when a new version comes out, and that document is now out of date. More recently, the BEAST2 developers have added an extremely useful “Help Me Choose” site. However, many of the choices I need to make aren’t discussed there yet, especially for some of the newer additions to BEAST2.

Before I go through what choices I made (and where my confusions are), I’ll give some background about the data set I happen to be dealing with at the moment.

The taxonomic group and the data

The data matrix consists of about 500,000 amino acids from many genes for 46 species of beetles (see gratuitous beetle picture below). Of those 46 species, four are outgroups (that is, they do not belong to the group I am focusing on, and are included to help root my study group). The age of the clade represented by the entire 46 species is at least 99 million years, based upon some undescribed fossils; the full clade has about 5500 described species, and likely at least that many again undescribed ones. The ingroup (the primary study group) of 42 species has about 1300 described species, but there are likely at least 2,000 species currently living. In addition, there are three well-documented fossils described in the ingroup, all of which are between 34-48 million years old (the fossils haven’t been more accurately dated than that). There are no estimates about the age of the ingroup beyond the realization that the group is at least 34 million years old. I did not partition the 500,000 sites as initial tests suggested the analysis would take much too long if I did.

Setting up an analysis for BEAST2 in BEAUTi

The following analysis was set up for BEAST2 version 2.7.3. In addition to the core BEAST2 packages, I also installed bModelTest and OBAMA. Those packages allowed me to do Bayesian amino-acid model averaging. Here’s what I did:

  • Open BEAUTi
  • Choose File > Manage Packages
  • In the window that appears, select bModelTest, and press Install/Upgrade
  • Select OBAMA, and press Install/Upgrade
  • Quit BEAUTi, and restart it

I prepared my NEXUS file by adding three new taxa in Mesquite, one for each of the fossils. Thus, in total the matrix has 49 taxa in it (four outgroups, 42 ingroups, and three ingroup fossils containing no DNA sequence data).

I then created three taxon sets in Mesquite; each taxon set included one of the fossils, and the remaining terminal taxa that form the smallest clade to which the fossil definitively belongs. We know that each of those three clades is at least 34 million years old. After saving the file, I opened up the file in a text editor, stripped out extra NEXUS file commands, and replaced the numbers in the TAXSET commands with the full taxon names. (I’ve written a new Mesquite module to Export files for BEAUTi to take care of these things automatically, and that will come out in the next release of Mesquite.)

I then loaded my NEXUS file containing the data into BEAUTi by choosing File > Import Alignment.

Here are the options I chose in BEAUTi, panel by panel. My questions/confusions are shown in blue.

Update: the team at the Centre of Computational Evolution at University of Auckland, keepers of the BEAST, were kind enough to prepare a response. The response was a joint effort by Kylie Chen, Alexei Drummond, Remco Bouckaert, and Walter Xie. Their full response is in their comment, below, but I have also inserted responses to individual questions within the blog post, for easy reading. Where I felt a response was appropriate, I have added them in green. These include addition information about what I chose in response to their answers.


Tip Dates Panel

I feel pretty comfortable about my choices for this panel. In particular, I

  • Checked “Use Tip Dates”
  • Switched to Dates specified numerically as year Before the present
  • For the three fossils (Bembidion_christelae, Bembidion_alekseevi, and Bembidion_bukeisi), changed their “Data (raw value)” to 41. Each of these fossils is from Baltic amber, which is 34–48 million years before the present. I choose 41 as that is the center of that range.

Here’s what the the lower part of the Tip Dates panel then looked like:

(1) Bayesian gurus: are these good choices for the Tip Dates panel?

Response: Yes. Please note that if you sample uncertainty in the age of the fossil taxa then the 41 million you specify here is just a starting value and nothing more.


Site Model Panel

In this panel I:

  • Chose OBAMA Bayesian Aminoacid Model Averaging
  • Deselected the options that appeared to be specific to other types of sequences, and not to metazoan nuclear protein coding genes

This is what the Site Model Panel then looked like:

(2) Is it OK to leave Mutation Rate as it is?

Response: Yes. There is only one partition and therefore the partition-specific mutation rate should not be estimated or it will be non-identifiable with the clock rate.


(3) Are the seven models selected reasonable ones to include for an analysis of nuclear protein-coding genes in an animal?

Response: Yes, these are reasonable selections. The unselected models are specific for virus/mitochondria/reverse transcriptase. We note that keeping all models selected would also be acceptable as the data should converge on the better fitting model.


Clock Model Panel

In the Clock Model Panel I selected Optimized Relaxed Clock, and didn’t change anything else, so this is what the panel looked like:

(4) Should I leave Clock rate as it is?

Response: Yes. Estimating the clock rate is appropriate because you have calibration information in the form of fossil taxa.


Priors Panel

And now, to the tough part, the priors. Here’s what the panel looked like at the start:

For the Tree.t prior, I choose Fossilized Birth Death Model. I then opened up that prior, and changed only one value, that for Rho. According to Tracy’s tutorial, rho is the probability of sampling a tip in the present. Presuming that there are about 10,000 living species in this whole group, and that we have sampled 46 of them, then rho should be 0.0046. I used the value 0.005. I set Origin to 120, as that is older than the oldest fossil. The Tree.t prior information thus looked like this:

(5) Is this reasonable? Should I change anything else, especially in the face of my almost complete ignorance of the evolutionary process in this group?

Response: The numbers entered as initial values do not affect the meaning of the model, however tweaking them may be needed to ensure the initial state is valid and can improve the time taken to achieve convergence (get through burn-in) during MCMC. You should choose a prior for your parameters centred around your known value. For example, a (Beta) prior on sampling proportion centred around 5E-4. The starting value can then be any value chosen from the prior distribution.

My response: Because I am now sampling only the ingroup (see the next question), I’ve reduced the initial values to reflect this smaller group, with 42 species of a possible 2,000 species sampled, and a younger age for the origin. Here are my revised choices:

(6) One thing I really struggle with here. Is it reasonable to use an FBD model when we know the sampling within the whole clade is not uniform? I sampled the ingroup much more heavily than the outgroup. I sampled only 4 species in the outgroups (of over 4,200 known species), but I sampled 42 ingroup species (of 1,300 known species). How can that differential sampling intensity be considered?

Response: The currently available FBD model does not account for variation in sampling rates among lineages. To minimise the effect of variation in sampling intensity we suggest two strategies: (a) remove the outgroup and run analyses using only the ingroup, (b) sample the outgroup as heavily as the ingroup.
In future we anticipate that extending the multistate birth death package MSBD to handle fossil data would provide a good solution to this problem  https://taming-the-beast.org/tutorials/MSBD-tutorial/.

My response: I can’t sample the outgroup as heavily as the ingroup (that would require a huge amount of time and money to do the genomic sequencing), which means I will choose (a), removing the outgroup. That’s OK as I don’t really care about the dating of the outgroup, and all of my fossils are within the ingroup anyway. I also have a good idea as to where the ingroup is rooted based upon other analyses. Until the models can cope with differential sampling, this seems like the best option.

I wonder whether or not the FBD model is really the one I should be using anyway. The sampling of the ingroup was not done randomly; I very specifically chose to sample one or two species from each of the major lineages, so it was much more dispersed than equiprobable sampling.

As I could find no guidance and no reason to change the OBAMA priors, I left them the way they were:

(7) Are these reasonable OBAMA priors?

Response: Yes, these defaults are justified in the OBAMA paper:

  • OBAMA proportion invariant: Beta(1,4) has a mean of 0.2, and favours lower proportion invariable sites, but still allows large proportions.
  • OBAMA gamma shape: any shape value <0.1 leads to one or more categories effectively being zero, which is what the proportion invariable category already models if your data contains a significant number of invariable sites. The OBAMA paper has more details/graphs on why the 0.1 cutoff is a practical choice.

Similarly, I could find no guidance and no reason to change the Optimized Relax Clock priors, and so I left them the way they were:

(8) Are these reasonable ORC priors?

Response:

  • ORCRates: Yes. This is the prior distribution of the branch rates under the relaxed clock model, which are assumed to be drawn from a lognormal distribution with a mean of 1 (in real space) and a standard deviation of ORCsigma, below.
  • ORCsigma: Yes. This default prior is reasonable for the general case. A standard deviation of 0.1 – 0.6 allows the branch rates to vary somewhat, but if this term exceeds 0.8, then this indicates the data are non-clock like in which case many vastly different trees may explain the data. This prior is centred around a realistic range of values.
  • ORCucldMean: No, this is not a good prior choice. This is the clock rate prior. It should be informed by known related systems, and in the case of this beetle dataset it will be expressed in units of substitutions per site per million years. Since you have a calibrated analysis, you could assume a relatively uninformative broad log-normal prior on this parameter quite safely.

For the diversificationRateFBD.t prior, Tracy’s says in her FBD tutorial “Generally, we think that this value is fairly small, particularly since we have few extant species and many fossils. Therefore, an exponential distribution is a reasonable prior for this parameter as it places the highest probability on zero”. In my case there are lots of extant species, and few fossils. I have no idea what to choose, but I was advised “even in large clades, this is probably a relatively small value and an exponential distribution should be fine. If the actual rate is higher, an exponential doesn’t truncate higher values, just puts less weight on them.” I thus chose the default exponential:

(9) For a group like this with many species and few fossils, is a default exponential a reasonable choice for this prior?

Response: To assess whether the choice of prior is appropriate, you can look at how the posterior changes compared to the prior, or conduct a sensitivity analysis using different priors for the diversification rate.

For the originFBD prior, I was advised that a lognormal with a wide variance with peak where I guess it should be would be reasonable. So I chose that, setting the lower bound to 99 (as that is the age of the oldest fossils within the entire group), the upper bound to 250 as, based on the dating that has been on on Coleoptera diversification, 250 is surely beyond the maximum age of the entire clade (outgroup+ingroup); I set the initial age to 110.

I then set the offset to 99, the M to 120, and the S to 1. That gave a curve that feels reasonable for the origin time of the whole clade. That’s not based on clear evidence, but hopefully the curve is flat enough to be acceptable.

(10) Should I really be using a lognormal here? A uniform would also be possible, set between 99 and 250, but it does seem better to put more of the prior distribution at lower values. But how do I choose details of the lognormal? Does it matter?

Response: Hard boundaries reflect 100% certainty the value will not exceed those boundaries. We suggest using a smoother left skewed distribution with (99, 250) in the 95% interval rather than hard boundaries.

My response: I modified the log normal so that it doesn’t have hard boundaries. Also, because I am now including only the ingroup, the values have changed: the peak of the distribution is now much lower. Here’s what I now have:


I’m rather lost about what to choose for the samplingProportionFBD. Tracy’s tutorial says “The sampling proportion is the probability of observing a lineage as a fossil before that lineage goes extinct.” If there are about 10,000 living species. We have only three fossils. So, as the sampling proportion is very small, I chose an exponential distribution, with an initial value of 0.0005, and a mean of 0.01:

(11) Well? Reasonable choices?

Response: Sampling proportion FBD = the probability of sampling prior to death = sampling rate/(sampling rate + death rate). An uninformative prior on this parameter would be a uniform prior between 0 and 1. An informative prior would be some form of Beta prior that was biassed towards the 0 end, since you only have 3 fossils in your analysis compared to many more unsampled fossil species.

My response: The sampling proportion for the ingroup is 3 fossils in perhaps 2,000 species, so about 0.0015; I used that as my initial value and the center for the prior distribution. In the response to question 5, the Bayesian gurus suggested “a (Beta) prior on sampling proportion centred around 5E-4”. Here’s what I now have (but with a different center, around 0.0015, because of the removal of outgroups):

For turnoverFBD, I used a uniform 0 to 1 distribution, as in Tracy’s tutorial:

Each of the three taxon sets (one for each fossil) contain the set of species that form the smallest clade that we confidently believe contains that fossil (the fossil is also include in the set). Following Tracy’s tutorial, I assigned the prior for each of these taxon sets to follow a uniform distribution with the age range of the fossil contained in that taxon set. So all three taxon set priors then looked like this:

I think we are done with the priors! Here’s an overview of the BEAUTi Prior panel at the end:

At this point I feel about the same as I feel when I have got to the last page of my tax forms…

The MCMC Panel is more in my comfort zone, and I changed only two things: (1) chain length (which I increased to 100M, just in case), and (2) number of initialization attempts (to 100).


So I saved the file in BEAUTi, and tried to execute it in BEAST 2.7.3 with the following command:

[path to beast] -seed 22 -working -threads 8 -instances 8 [path to xml file]

Everything started up fine, but then I get a notice about a failure to find an initial starting point:

===============================================================================
Start likelihood: -Infinity after 100 initialisation attempts
P(posterior) = -Infinity (was -Infinity)
  P(prior) = -Infinity (was -Infinity)
    P(FBD.t:46Taxa_Occ66_AA_AllLoci) = -297.30215228670386 (was -297.30215228670386)
    P(OBAMA_PropInvariablePrior.s:46Taxa_Occ66_AA_AllLoci) = 1.0702128141464131 (was 1.0702128141464131)
    P(OBAMA_freqsPrior.s:46Taxa_Occ66_AA_AllLoci) = 53.71197185321927 (was 53.71197185321927)
    P(OBAMA_GammaShapePrior.s:46Taxa_Occ66_AA_AllLoci) = -1.0 (was -1.0)
    P(ORCRatePriorDistribution.c:46Taxa_Occ66_AA_AllLoci) = -410.92248229518384 (was -410.92248229518384)
    P(ORCsigmaPrior.c:46Taxa_Occ66_AA_AllLoci) = 1.3628558876856076 (was 1.3628558876856076)
    P(ORCucldMeanPrior.c:46Taxa_Occ66_AA_AllLoci) = -2.7232296703330143 (was -2.7232296703330143)
    P(diversificationRatePriorFBD.t:46Taxa_Occ66_AA_AllLoci) = -1.0 (was -1.0)
    P(originPriorFBD.t:46Taxa_Occ66_AA_AllLoci) = -5.102121215690426 (was -5.102121215690426)
    P(samplingProportionPriorFBD.t:46Taxa_Occ66_AA_AllLoci) = 4.555170185988091 (was 4.555170185988091)
    P(turnoverPriorFBD.t:46Taxa_Occ66_AA_AllLoci) = 0.0 (was 0.0)
    P(Eupetedromus.prior) = -Infinity (was -Infinity)
    P(Ocydromus_SuperSeries.prior) = NaN (was NaN)  **
    P(Philochthus.prior) = NaN (was NaN)  **
  P(likelihood) = NaN (was NaN)  **
    P(treeLikelihood.46Taxa_Occ66_AA_AllLoci) = NaN (was NaN)  **

java.lang.RuntimeException: Could not find a proper state to initialise. Perhaps try another seed.
See http://www.beast2.org/2018/07/04/fatal-errors.html for other possible solutions.
  at beast.base.inference.MCMC.run(Unknown Source)
  at beastfx.app.beast.BeastMCMC.run(Unknown Source)
  at beastfx.app.beast.BeastMain.main(Unknown Source)
  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
  at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
  at java.base/java.lang.reflect.Method.invoke(Unknown Source)
  at beast.pkgmgmt.launcher.BeastLauncher.run(Unknown Source)
  at beast.pkgmgmt.launcher.BeastLauncher.main(Unknown Source)

ARGGGHHHH!

I presume this is because there is some contradiction in the priors I chose. I’ll work on that issue (but if anyone has suggestions, they would be welcome!).

Response: Finally, the error message comes from your set up of the fossil calibrations. There are two logically distinct things that you want to specify for each fossil:

  • Age uncertainty: specify the geological age uncertainty of the fossil.
  • Topological constraint: specify the extant taxa that this fossil should be grouped with.

These two things need to be done with two different priors in BEAUti. So for three fossils you will specify 6 priors. For each of the fossils you will need specify two things:

  • An ‘Sampled ancestor MRCA Prior’ containing only the fossil taxon (with tipsonly checked) and with a uniform distribution describing the geological age range of the fossil. This prior option is available in BEAUti with the latest version of the sampled-ancestors package when adding priors in the priors panel.
  • An ‘MRCA Prior’ containing the fossil and the related extant taxa, with tipsonly unchecked and monophyletic checked. There is no age distribution associated with this prior, it is just used to maintain the topological constraint that the fossil must stay within the extant group it is associated with.

My response: At first I was a bit confused by this. It appears that the MRCA Prior is the one that I get by default in the BEAUTi interface when I defined the taxon sets in the NEXUS file that I loaded into BEAUTi to begin with. That is, the three priors that appear in BEAUTi (see below) based upon my pre-defined taxon sets are MRCA Priors; I can tell this not because it is indicated in the BEAUTi interface, but because that is what they are called in the XML file that BEAUTi saves.

To add the Sampled ancestor MRCA Prior, touched on the +Add Priors button at the bottom of the Priors panel, and choose Sampled Ancestors MRCA Prior from the dialog box that appears. (I get this only with the latest version of the SA package.) I then defined the taxon set to include only the fossil for that group, and set the prior to have a uniform distribution with the lower and upper bounds indicating the uncertainty in the fossil age.

For example, for one of the fossils, the following two priors were present, one for delimiting age uncertainty, and one for defining the clade:

The first one is the Sampled Ancestors MRCA Prior; the second one is the MRCA Prior.

There’s one more change I had to make. Because I removed all of the outgroups, I needed to enforce the rooting of the ingroup. Fortunately, from likelihood analyses, I am pretty confident that I know where the root of the ingroup goes: there is one species included among the 42 that is the sister group to all the rest. So I made a taxon set that includes all of the other 41 taxa, and made an MRCA Prior for it:

I made these changes, and ran it again. And now it works!

Any additional suggestions for improving the choices I made in priors and elsewhere would be most appreciated. You can leave a comment here, or you can email me at david.maddison@science.oregonstate.edu. Thanks!

Thanks so much, Kylie Chen, Alexei Drummond, Remco Bouckaert, and Walter Xie, for responding with answers to my queries!

Posted in Phylogenetics | 2 Comments

Who are those people?

In 2019, I went to Vancouver, Canada, to spend some time with family. While there, I went into a funky little store called Urban Source. It’s a great place that has all sorts of odds and ends that might be useful in various creative enterprises, including crafts. They had a bin there of old photos – a random collection of various vintages, from film cameras of the past. I presume they came from some estate sale, garage sale, or something of the sort. Many of them were pictures of people on vacation, or at home with friends, or of scenery, or of buildings that meant something to someone.

They were exactly the sort of photograph that someone might put on their fridge. Which got me to thinking… imagine looking at your fridge and seeing photographs of people you didn’t recognize. Wouldn’t that be weird? And maybe a little disturbing? And perhaps… amusing?

So, I purchased a stack of them.

A short while later, I found myself in a friend’s house in Edmonton, Alberta, staying for a few days. For the sake of convenience, let’s call this friend “John”. One evening, when everyone but me was occupied elsewhere, I snuck one of the photographs onto the side of the fridge. Here it is (with a yellow arrow pointing at it), in situ:

“John” also has a music room in his house, with a little corkboard full of pictures, which meant I just had to sneak one on there, too:

(The one I added is the bottom one, with the bells.)

I like to think that one day, several months later, “John”, or someone else in the family, looked at the fridge, or at the corkboard, and scratched their head and thought, “who is that?” Maybe it was a mystery for weeks in the family. I got quite the chuckle out of imagining their confusion.

Alas, I didn’t really plan what to do next. How would I ever find out what happened? What if they never noticed those pictures? Should I spill the beans to “John”? Or should I stay silent, and hope that someday he would notice?

So here I am, nearly three years later, and there is silence. No word at all from Edmonton about the mysterious photos. Have they not noticed? Or did they notice but just never thought it might have been me? I now realize that to some extent the joke is on me, for they might be living their lives normally, without mystery, and instead I am the one who has had to live with the mystery about the fate of those photos, without closure. Or maybe “John” did notice, and knows it was me, and is amused about the fact that I am living without closure. Ha!

I might note, in ending, that I still have many more photographs from Urban Source. Who knows where they might end up?

Posted in fun stuff, Musings | 4 Comments

Pointing and Pinning Block v6

I’ve finally revised the 3D design of pointing and pinning block mentioned in my previous post. This new version of the block has the standard holes of a pinning block (in the center of each level) as well as holes around the outside.

Pointing and pinning block, printed by Shapeways.

The holes near the edge of each level are for insects glued onto standard points (small paper triangles). By having the holes near the edge, the end of the point with the insect on it can hang over the edge.

This version was designed by me and Julia Amerongen Maddison. I used TinkerCAD for the making the 3D file; TinkerCAD was a joy to use. Some test versions of this were printed with the help of Dan Roach – thank you, Dan! And thanks as well to Pamela Triplett for connecting me with Dan.

The 3D files are available at:

Posted in Miscellany, Morphological Techniques | Tagged , , , | 2 Comments

Permanent male genitalic mini-preps

For many insects, examination of the genital structures is vital to understand species diversity. The genitalia of each species are often distinct, and species can be easier to distinguish using genitalia than other morphological traits. In carabid beetles, male genitalia tend to have more variation than female genitalia, and so we more often study the former. I’ve shown examples of the differences in male genitalia in my posts about “Morphological subtleties and the value of n>1”, Bembidion subgenus Lindrochthus, and Bembidion kuprianovii,

When one dissects male genitalia out of a small beetle, one needs a place to store the very small genitalia. A common means of storage is to place it in a drop of glycerol within a very small glass or plastic microvial with a cork or rubber stopper. One can then pin the microvial (through its stopper) on the same pin as the beetle. This has the great advantage that the genitalia is stored with the rest of the beetle, and they are less likely to be disconnected in the future. (One can store the genitalia separately, on microscope slides for example, but that risks their being irretrievably disconnected.)

A specimen with male genitalia stored in a glycerol in a microvial pinned beneath the specimen.

Storing the genitalia in a microvial under the specimen works well, and allows a great deal of flexibility, but it has several problems. First, it’s a pain to get the very small genitalia of something like a Bembidion out of the microvial, and the very process of taking the genitalia out or putting it back in can lead to damage or loss. Second, every time one wants to compare two specimens, one needs to go through the lengthy process of getting the specimens placed onto microscope slides or the like. Third, given that the specimens are in a fluid, one has to handle them more delicately, and it is harder to compare a large number of specimens at once. Finally, the refractive index of glycerol is far enough away from that of insect cuticle that it is harder to see relevant structures than if the genitalia were in a medium that has a refractive index closer to cuticle.

The genitalia of the carabid beetles I study are relatively flat. We typically study them by looking at their left or right sides, which are the flat sides. Because of this, there’s an alternative to storage in microvials that works quite well: mounting the genitalia in a semi-permanent mounting medium such as Euparal on cards that are pinned with the specimens, as shown in the following picture.

A specimen with male genitalia mounted in Euparal, and pinned beneath the specimen.

Here’s what the card looks like on its own:

Before I outline how these are made, I should point out that there are other ways to store the genitalia, including embedding them in drop of water-soluble glue on the same point that the beetle is glued to (this was Carl Lindroth’s approach) or by embedding them in a drop of Euparal on a clear sheet of acetate pinned along with the specimen. The approach Carl Lindroth used was not all that much different from placing them in a vial, in that the genitalia need to be extracted from the glue before they can be properly studied. I find placing them in a drop of Euparal on an acetate card also less than ideal, in part as it is more difficult to write a voucher code on the card (which means you can’t easily compare many at once under the scope as you might get them mixed up). There are surely improvements that could be made to the way I make the preps, but I find the method works well enough.

Making genitalic mini-preps

The following instructions presume that you already have dissected out the genitalia, and have cleared it appropriately, and that it is now stored in 100% or 95% ethanol.

Materials you will need

  • heavy archival paper
  • a printer (if you are going to print a design)
  • 12 mm circular cover slips. I purchase item number CLS-1760-012 from Chemglass. I typically store them in a clean plastic box, for easy access – it will be important to be able to access them quickly, sometimes with forceps that have a bit of Euparal on them.
  • Euparal. Euparal can be very fluid and not very viscous when brand new. This is not ideal, as that means it contains a lot of solvent. The ideal Euparal should be much more viscous. I have not fully explored how to make the Euparal more viscous quickly.
  • 100% ethanol (ideally), or 95-96% ethanol
  • very sharp, clean forceps
  • Kimwipes or other tissue
  • a slide warmer with a cover
  • dissecting microscope with light
  • flat, long “trays” for holding cards as you process them. You can use microscope slides for this, but I prefer longer, less slippery trays. The ones I use are shown below, and are about 12 cm x 2.5 cm. I made mine out of discarded plastics.
12 mm circular cover slips in clean box
12cm x 2.5 cm plastic trays used to carry cards during processing.

Preparing the cards

If you wish to use a card design, print it on archival, heavy paper. Here’s part of one of the designs I use:

I use Arches hot-pressed, 300 g/m2 (140 lb), watercolor paper. Below is a link to a template that you can use to print on 23 cm x 31 cm (9×12) sheets of the watercolor paper. (Note that when I print this template, I scale the printing to 103%.) I print on a Canon inkjet printer that has archival ink.

Cut the cards out. Then use a standard hole punch to punch out the circles.

Now take each individual card, and turn it over such that the printed side is upmost. This will be side that will have any writing on it, and it is the side that shows the primary view of the genitalia; it will also be the side that is eventually topmost. Using your sharp forceps, dip into the thick Euparal, and get a drop in between the tongs; use that to apply a ring of Euparal around the hole. Use viscous Euparal; thin Euparal will contract too much, forming bubbles, and eventually no longer stick to the cover slip. Even with fairly viscous Euparal, it is a good idea to let the ring of Euparal dry a bit before putting on the cover slip.

Ring of Euparal on the card that is drying before cover slip is placed on it.

Once the Euparal had dried to be very thick, place the coverslip on the card so that is glued over the hole by the Euparal.

Here are some things to watch out for:

  • Make sure the cover slip doesn’t stick out over one edge of the paper – that will make it more likely the cover slip will eventually come off or break
  • Don’t put so much Euparal on that it covers the regions of the paper you plan to write on. If you do, the ink you write with may not absorb into the paper, and will be more likely to rub off
  • Again, use viscous Euparal

Now place the card with cover slip onto your slide warmer. Don’t worry if there is a bit of Euparal that leaks down onto the slide warmer. Leave it there for at least 3 or 4 days, then flip over the card so that the cover slip is against the slide warmer. At this point it will be important that the slide warmer is covered, so that dust doesn’t get into the well of the card. Leave the card on the slide warmer for at least a week (ideally longer) to ensure the Euparal is dry.

Cards with one cover slip drying on slide warmer
Slide warmer with cover

In the following figure, you can see the difference between cards whose coverslip was connected with different thicknesses of Euparal. On the left is a card for which thin Euparal was used to connect the coverslip; if you look at it at an angle, with light reflecting off the glass, you can see that most of the space under the cover slip is air, not Euparal; the cover slip will surely just fall right off if it any force is applied to it. The middle card has mostly Euparal connected it to the card; that one should be OK. The one on the right should definitely be good.

With the card completed, if you place it upside down (so that the cover slip is on the bottom), then the card is like a little well slide, with the cover slip forming the bottom of the well, and the sides of the paper hole being the sides of the well. It is into this well that you will place the genitalia.

Adding the genitalia to the card

Once the Euparal on the card is dry, you can add the genitalia to it.

Add the genitalia in standard places and in standard orientations; this is especially important for the parts you will regularly exam. Because the left side of carabid aedeagi is the side most often viewed, I put that side down against the glass. That way, when the preparation is complete, the part we want to view the most will be right up against the glass.

Under the microscope, I extract the genitalia from the ring sclerite, and remove the parameres from the aedeagus in a separate little dish containing 100% ethanol. Once the pieces are all ready, I dip the sharp forceps into a vial of Euparal, and put the drop in the well of the card (which, of course, is now upside down so that the cover slip is against the stage of the microscope). Without cleaning the forceps, I then grab the genitalic parts, ideally all at once, in the forceps (the Euparal on the forceps also helps pick things up), and then put them into the drop of Euparal in the card. I then position all the parts in the well. As you are doing this, consider the following:

  • Don’t use too big of a drop of Euparal. If you use a lot, then you will not be able to easily position the aedeagus, as it will float around. Use as little as possible so that there is a layer of Euparal over the parts. The parts can stick up; that’s OK.
  • If you move each piece individually, ethanol will be transferred each time, making the Euparal very liquidy, and potentially dissolving the Euparal that is holding the coverslip onto the paper. This should be avoided
  • Make sure you use clean forceps.

Filling it with Euparal

Add layers of Euparal, slowly building up the embedding until the well is just full. This may take ten to twenty or more layers. Here are some important things to consider:

  • As you move slides too and from the slide warmer, clean the sharp forceps. Many surfaces (including those of some slide warmers) will flake off slightly, and those particles can then get into your Euparal and make your preparations dirty.
  • Make sure you clean off any dust on the surface of the dried Euparal before you add a new layer. You can do this with a clean, small paint brush
  • Add thin layers. If you add thick layers, then the thicker layers will contain enough solvent to make it more likely the solvent will soften the dried layers the genitalia are in, causing them to float and move.
  • If the Euparal you are adding is thin, and not viscous (i.e., it contains more solvent), then the layers should be thinner.
  • If the genitalia are large they may be less likely to float around, and so you may be able to get away with thicker layers.
  • If the side of the genitalia that is against the glass cover slip is flat, then it is a bit less likely to move around. If the side of the genitalia is rounded, then it can be more likely to move around, and you then need to be even more careful with your layering – make them very thin.
  • Keep the temperature on the slide warmer low – ideally about 35°C. If it is much hotter than that, the Euparal will become softer, and the genitalia will be more likely to float around.
  • Place a note next to the cards on the slide warmer saying when the last layer was added.
  • You want the Euparal to dry between layers, so you will likely want to add new layers no more often than once a week.
  • Don’t overfill the well. The Euparal in the end should just be flat – not convex. (But note below that sometimes I use much less Euparal – see the next section.)

After adding a layer, put it back onto the slide warmer. Once the well is full (if that is your goal), leave it on the slide warmer for at least a month (I usually leave it for at least 3 months), to ensure the Euparal is very dry.

Note that you can photograph the genitalia well before the preparation is complete.

Adding the second cover slip – or not

Typically I add a second cover slip once the well is full to the brim of dried Euparal. The main point of that second cover slip is to both protect the specimen, but also to make the preparation easier to clean. If dust gets on an open Euparal surface, it can be hard to clean. A clean dry brush can work, but a brush dipped in ethanol usually makes the surface uneven and cloudy, which can be resolved with another layer of Euparal. If the genitalia are really small (e.g., from a 1.5 mm beetle) or very tubular, the danger of their shifting orientation as you add layers is much higher. You may not want to complete the process, and instead simply add enough layers to have a flat surface above the beetle bits, and let it harden, without ever filling it full enough to add the second cover slip.

If you are going to add the second cover slip, you will need to fill up the depression, and you will first need to prepare the cards. Examine the areas around the edges of the hole under the microscope, and and see if there are bumps of dried Euparal or other protuberances that will force the coverslip to be raised off the flat surface. If so, scrape them off as best as you can (this can be done with your sharp forces). You will then need to brush off the surface of the Euparal with a clean, dry brush, to get rid of any dust and lint that has settled there.

Once the surface is clean, you are ready to add the second cover slip. Place a large drop of viscous Euparal onto the preparation (again, I use clean, sharp forceps to do this). You can then pick up the cover slip with forceps, and gently lower it onto the drop (do this at an angle by first having one edge of the cover slip make contact with the paper). If the Euparal is very viscous, enough to cause bubbles to get trapped as you lower the cover slip onto it, then you may wish to dip your forceps into 95-100% ethanol, and touch them to the surface of the Euparal. This will spread a thin layer of ethanol onto the surface, and allow the coverslip to be placed without bubbles. If the Euparal is quite viscous, you may need to push the coverslip down a bit.

You should then look at the preparation from an angle, with light reflecting off the glass (as you did above when you first made the card), and see if Euparal fills the contact area between the paper and the cover slip. If not, add Euparal drops to the edges of the cover slip, and encourage the Euparal to seep under the coverslip. This should also be relatively viscous Euparal.

Check the preparation every few hours for the first day, and add Euparal to the edges of the cover slip as needed. Then check it once a day for at least three days. Once it seems to stabilize, leave it somewhere at room temperature for at least three months before pinning it beneath the specimen.

What to do if it all goes wrong

Sometimes it goes wrong. The most common problem is that the Euparal the genitalia are in softens after a new layer is added, and the genitalia float around and are no longer in the correct orientation. This is frustratingly common, and seems to be usually caused by adding too much Euparal at one time for one or more of the layers. If so, you can simply start again. Place the card in a little dish with 95-100% ethanol, and after a couple of hours, the Euparal will have dissolved, and it will all come apart, and you can do it all again.

Posted in Morphological Techniques, Revising Bembidiina | 1 Comment

Development of the Dicaelus purpuratus drawing

In 2006, for George Ball’s 80th birthday, I presented him with a drawing of Dicaelus purpuratus. This was the species that captivated his attention and eventually led to his doing a PhD on the carabid beetle tribe Licinini, of which Dicaelus is a member.

Here’s the final drawing:

Dicaelus purpuratus

Here’s where I did the drawing

I’ve written a post about the drawing method I use, which has varied slightly over the years. For the Dicaelus, I began by enlarging a photograph of the beetle, and printing it on paper.

I then used a soft graphite pencil, and rubbed the back of the paper. That turned it into carbon paper that I could use to transfer some aspects of the photograph onto the Arches Hotpress watercolor paper I use.

I did the transfer by drawing over the photograph using a sharpened 6H pencil. Here’s what was transferred. Note that I have also put a mask over much of the paper.

I then refined the pencil sketch.

The inking started, as well as the colored pencil.

17 May 2006

More ink outline, and more colored pencil:

21 May 2006 1 pm
21 May 2006 5pm
21 May 2006 6 pm

Eventually, once the colored pencil layer was refined, I started adding inks on top of it:

27 May 2006
27 May 2006
28 May 2006
29 May 2006 12 pm
29 May 2006 5pm
30 May 2006 1pm
31 May 2006 8am. Masking removed.
31 May 2006. Completed

Posted in Scientific Illustrations | Tagged , , | Leave a comment

Ten years later: the Lionepha paper is out

In September of 2009, I arrived in Oregon, excited to begin my new position at Oregon State University. I was also excited to live near Marys Peak, as the top of Marys Peak was the locality of capture of the only known specimen of what was then called Bembidion chintimini. (I’ve since moved that and related species out of the genus Bembidion, and as the genus Lionepha.) A few days after I arrived in Corvallis, I drove up to the top of that mountain to see if I could find the species again. I found some little beetles that might or might not be Bembidion chintimini, but I was a novice with that group of bembidiines, and so I wasn’t sure. I also found a larger, related specimen lower down the slopes of Marys Peak, near Alder Creek Falls, which confused me as well. My efforts to discover the identity of those specimens took a long time, and the threads I was tugging on as I explored caused an unravelling in my understanding of that group of beetles, which I would eventually re-weave into a new tapestry. In the end, this path led to the discovery of a total of four new species, as well as a merging of three others.

A specimen from Marys Peak of the same species I found in September 2009

In many ways this project was the heart and soul of my first decade in Oregon. I spent many hundreds of hours on this project, traveling and collecting specimens, extracting and sequencing their DNA, making genitalic preparations, studying their structures, photographing them, examining their chromosomes, doing phylogenetic analyses, and, eventually, creating the figures for the paper and writing the text. Perhaps more emotionally compelling was the discovery and solving of many puzzles along the way, especially that revolving around “Bembidion chintimini”. That some of these puzzles were solved by a combination of old-fashioned morphological studies as well as next-generation genomic sequencing of old type specimens, including a 159-year-old LeConte lectotype, made the tale all the more compelling to me.

John Sproul, a former graduate student of mine, helped by sequencing the DNA I extracted from some pinned type specimens (including that LeConte lectotype), as well as by doing important collecting in the Sierra Nevadas of California; for these efforts, he is a co-author of the paper that has finally resulted.

This paper, titled “Species delimitation, classical taxonomy and genome skimming: a review of the ground beetle genus Lionepha (Coleoptera: Carabidae)“, came out recently in the Zoological Journal of the Linnean Society. The paper can be found at https://doi.org/10.1093/zoolinnean/zlz167; if you would like a PDF, email me.

It turns out that I did find a female of “Bembidion chintimini” on Marys Peak that first trip in 2009. It took until the following summer for me to realize that. It took even longer to eventually come to the realization that this species was widespread, and had a much older name (Lionepha erasa LeConte). Here’s a condensed version of the story, as told in the paper:

Investigation of the rarer species, the one here called Lionepha erasa, began in 2010. Dissection of the first recognized males from Marys Peak, Oregon (type locality of Bembidion chintimini) revealed an aedeagus indistinguishable from those from San Juan Island, Washington (type locality of Bembidion lummi). The female holotype of B. chintimini is wingless and has slightly rounded shoulders. However, the Marys Peak population is wing-dimorphic, and winged individuals are in body form no different from the type series of Bembidion lummi. The elytral microsculpture of the holotype of B. chintimini is perfectly isodiametric (against Erwin & Kavanaugh, 1981), thus matching that of B. lummi. Other characters mentioned by Erwin & Kavanaugh as distinguishing the two populations are not consistent with available specimens. The lack of evident morphological differences, combined with effectively identical DNA sequences in specimens from Oregon, British Columbia and Alaska suggested that the Marys Peak populations are the same species as populations further north, and for this reason, Bembidion chintimini and B. lummi were synonymized by Maddison in Kanda et al. (2015).


This left in question the specimens considered to be Bembidion lindrothellus by Erwin & Kavanaugh, which are at first glance similar to the Marys Peak and other populations of ‘Bembidion chintimini’. Specimens classified as Bembidion lindrothellus are reported to be paler, but all specimens mentioned in Erwin & Kavanaugh (1981) are teneral. The unsclerotized aedeagus of the holotype of Bembidion lindrothellus made comparison of internal sac sclerites difficult. However, the internal sac membrane that rests in the left-most position has a species-specific microsculpture in Lionepha, and the microsculpture scales of the holotype of Bembidion lindrothellus from Alaska match those of Marys Peak specimens. A non-teneral male was also collected by Lindroth at the type locality of B. lindrothellus, but was not included in the type series, perhaps as the specimen was housed in Lindroth’s collection in Lund, Sweden. This specimen is presumably the one whose genitalia Lindroth figured as Bembidion brumale (1963: fig. 127f). We have examined that specimen, and it is indistinguishable from specimens of ‘Bembidion chintimini’ from Alaska, British Columbia, Washington and Oregon, including details of the internal sac. Most critically, DNA sequences of the holotype of Bembidion lindrothellus are identical in eight studied genes to those of other specimens from throughout the range (Figs 5–7). It is thus evident that the holotypes of Bembidion chintimini, B. lindrothellus and B. lummi belong to a single species.


However, there is an older name. The type series of Bembidium erasum consists of four females. These specimens have traditionally been considered to belong to the common, widespread species here called Lionepha probata. Females of these two isodiametrically microsculptured species are difficult to tell apart, especially those with less-extreme prothoracic proportions (neither wide nor narrow). Although there are distinctions in the lobe of the female bursa of fully sclerotized individuals, interpretation of tenerals is more tenuous. Specimens in the type series of Bembidium erasum are all teneral, with prothoraces of moderate width, and thus there is no clear morphological evidence to place them to species. The type series was provided by George Suckley (LeConte, 1859), presumably captured during his travels as naturalist for the governor of Washington Territory during 1853–57 (Cooper & Suckley, 1859). The type series is from ‘Oregon’, which at the time encompassed the current area of Oregon, the southern half of what is now Idaho and some parts of Wyoming and Montana (Barry, 1932). Suckley’s travels in Oregon included areas within the range of both species (Cooper & Suckley, 1859), and thus geography provides no clues about species membership. However, DNA data from the lectotype (and two of the paralectotypes; Sproul & Maddison, 2017) makes it clear that these specimens belong to the current species (Figs 5–7; Supporting Information, Fig. S1). Thus, the valid name of this species is Lionepha erasa, with Bembidion chintimini, B. lindrothellus and B. lummi as junior synonyms.

Maddison and Sproul (2020)

That description of the history does not adequately capture all the many mysteries, proposed and rejected explanations, and confirmed hypotheses along my multi-year path, and the eventual pleasure as the hypotheses became confirmed through the emergence of consistent, repeated patterns. It was one of the more pleasing journeys of discovery I have been on. I’ve previously mentioned a few of the turns and twists in the story, in my posts on Rainy-season beetles and surprises in Lionepha.

In addition to solving the mystery of the small beetles on top of Marys Peak, four new species were discovered along the way and described in the paper:

  • The larger specimen I discovered on my first trip up Marys Peak belongs to a species that lives along the sides of Alder Creek Falls and on rock seeps in the area, as well as along a creek west of Eugene, Oregon, and in the Trinity Alps of California, is now officially named Lionepha tuulukwa Maddison. I’ve talked about the naming of this beetle here and here.
  • A species I first became aware of through a single male along Bishop Creek that John Sproul found, is now called Lionepha lindrothi Maddison & Sproul.
  • The species I found in the central Sierra Nevada of California, and which provided another surprise in the group, is now called Lionepha australerasa Maddison.
  • A species from western Montana and Wyoming, as well as eastern Oregon and Washington, came to light from specimens collected by my good friend David Kavanaugh. This species is now called Lionepha kavanaughi Maddison.

This paper on Lionepha implicitly tells a love story between me and my beetles. When I look at the final product, I see it as a celebration of the process and fruits of discovery. I also see it as a history of my embracing the Pacific Northwest, and the abundant and diverse life that lives here.

Posted in Academia, Fieldwork, Revising Bembidiina, Taxonomic Process | Tagged , | 1 Comment

The Tree of Life Web Project: 25 years online

On 16 November 1994, we first made public the prototype version of the Tree of Life Web Project. At that time, it was a series of static web pages, with trees made out of text characters, created by a special version of MacClade.

This was more than three years before Google existed, and more than a year before the research project that led to Google began. In November 1994, Wikipedia was still more than six years away, Facebook was nine years in the future, and YouTube more than 10 years away.

In the years that followed, the Tree of Life Web Project was nurtured by a host of people on the home team (both deeper in the past and more recent), along with hundreds of biologists from around the world who contributed content.

Although the project is still alive (it still gets over a million separate visitors a year, from many countries), it has been relatively dormant, awaiting someone with a passion to take it over and reinvigorate it.

It’s been a good 25 years; it’s hard to believe it’s been that long.

Posted in Tree of Life | 4 Comments

In Memory of George

As the ship rolls gently upon the waves,
I look east toward the Tasman Sea,
The relentless surges swallowing time,
As they have for millions of years,
Through turn after turn of the Milky Way,

Beneath and beyond the waves,
The tree of life has grown,
Both struggling for life
Against harsh forces,
And exploding in exuberant replication.

And within our small leaf,
The courage and strength of the principled
Inspires and uplifts,
Giving us strength to seek a brighter future,
And explore this world which has given us birth,
And to which we will return.

Posted in Uncategorized | Leave a comment