Tales from the GALS

Sequencing the earthworms of Wytham Woods

By Keiron Derek Brown

In 2020, Liam Crowley contacted me from the University of Oxford about sequencing earthworms in Britain. Liam’s work on the Darwin Tree of Life project focuses on invertebrates at Wytham Woods, a 1,000-acre semi-natural woodland owned and maintained by the University of Oxford. Currently, he is working with others on the first phase of the project – to sequence the full genome of 2,000 species from as many different taxonomic families as possible. They are also focusing in greater depth on certain groups of particular ecological or evolutionary interest. Later phases will aim to ramp up this sequencing to eventually sequence every species in Britain and Ireland!

A beech woodland sample site for earthworms within Wytham Woods (Image: Liam Crowley, University of Oxford)

Recording earthworm genetic material

In the British Isles we have just 31 species of earthworm that occur in natural environments. This makes earthworms an easy target group for getting a good head start during phase 1 of the project. Twenty-nine of those species belong to a single family, Lumbricidae, with the remaining two species being the only species within their respective families in the UK (Acanthodrilidae and Sparganophilidae) and are both very rare and difficult to find — I’ve never personally come across either.

As live specimens are required to obtain the genetic material , my task was to find as many different species as I could over a two-day period while working on the project with Liam at Wytham Woods. I collected live specimens from my garden early in the morning of May 27, before heading to the Woods to undertake further sampling throughout that afternoon and next morning. On the second day we were joined by Michael Tansley, an Oxford PhD student studying earthworms.

In addition to my garden, we explored ancient woodland, calcareous grassland (at Wytham this derives from limestone) and fen wetland habitats, looking within soil, in and under deadwood and in the leaf litter layer. Identifying live earthworms in the field is extremely difficult and rarely even possible, so it was hard to know how many species we had collected. Any juveniles (earthworms with no clitellum or “saddle”) were released as it is not possible to identify their species — though this may be possible in the future using DNA barcoding.

Stages involved in surveying, identifying and sequencing the earthworm specimens collected (Image: Keiron Derek Brown)

Extracting genetic material from earthworms

The DToL project is using an exciting and relatively new sequencing technology known as ‘long-read’ sequencing, which reads and works out the order of nucleotides in much longer fragments of DNA than other methods. In much the same way as a jigsaw with fewer, larger pieces is easier to assemble than one with many small pieces, longer DNA fragments allow for a more accurate genome assembly.

The catch, however, is that DNA is a very unstable molecule, and starts breaking down into smaller fragments very quickly in dead tissue. To allow successful long-read sequencing, therefore, living tissue needs to be flash frozen to preserve long chunks of DNA. We achieved this for our earthworm samples by removing and immediately flash freezing a small section of the tail from each specimen at -80oC, before the specimens were euthanised and preserved in 80% ethanol.

A further small piece of the tail was preserved in 70% ethanol, to be submitted to the Natural History Museum, London, for DNA barcoding. The purpose of also barcoding specimens is three-fold: it allows us to populate the species barcode reference databases; allows matching of barcode ID against the identification made by collectors (preventing unnecessary expensive sequencing of the same species multiple times); and finally, allows a sense check that samples have not got mixed up during the sequencing process and each genome is matched to the correct specimen.

Earthworm specimen 001 was the Lob Worm (Lumbricus terrestris), collected from my garden in Harrow (London). This image is of the live specimen and is unusually pale in colour for this species. (Image: Liam Crowley, University of Oxford)

Once the relevant material was preserved for each molecular method, the remainder of the specimens (everything but a small piece of the tail) were identified under a stereomicroscope. I identified the specimen using the Key to the Earthworms of the UK & Ireland (2nd Edition) by Emma Sherlock. This key uses external morphological features such as the type of head, location of the male pore (reproductive organs), location and shape of the Tubercula Pubertatis or TP (a glandular thickening of the clitellum, thought to be used in mating), and the spatial distance between the setae (or “bristles”).

Ideally, we’d like to sequence every specimen to build up a more complete library of earthworm genomes. However, there is a cost to sequencing each sample, so we prioritised up to three specimens per species for sequencing. Those specimens that will not be sequenced were still identified and contribute important records to the National Earthworm Recording Scheme.

004 Aporrectodea caliginosa; 007 Lumbricus castaneus; 009 Aporrectodea rosea; 010 Satchellius mammalis; 012 Bimastos rubidus; 018 Bimastos eiseni; 019 Eisenia fetida; 022 Murchieona muldali; 026 Allolobophora chlorotica; 027 Octolasion cyaneum; 029 Eiseniella tetraedra; 031 Lumbricus rubellus
(Image: Liam Crowley, University of Oxford)

Survey outputs

The outcomes for our survey were as follows:

Earthworm species records

All specimens were identified and records submitted to the National Earthworm Recording Scheme. In total this contributed 43 species records across 14 species of earthworm, including 13 new species records for Wytham Woods, a previously unrecorded site. These records will be publicly available through the National Earthworm Recording Scheme (UK) dataset on the NBN Atlas.

Genome sequencing 

Thirty-four specimens were sent for genome sequencing across 14 different earthworm species (see list below). This will enable different DNA extraction methodology to be tried and tested. This is necessary because earthworm biochemistry is quite different to other groups such as insects, for which we have the most knowledge of suitable methods, therefore different variations on extraction methods need to be tested.  Any successfully sequenced genomes will be published with open access, making all the data publicly available for everyone.

  1. Allolobophora chlorotica
  2. Aporrectodea caliginosa
  3. Aporrectodea rosea
  4. Bimastos eiseni
  5. Bimastos rubidus
  6. Eisenia fetida
  7. Eiseniella tetraedra
  8. Lumbricus castaneus
  9. Lumbricus rubellus
  10. Lumbricus terrestris
  11. Murchieona muldali
  12. Octolasion cyaneum
  13. Octolasion lacteum
  14. Satchellius mammalis

Preserved specimens 

Thirty-four preserved specimens were submitted to the Natural History Museum, London, earthworm collection, where they will be curated to a high standard and available for further inspection and research.


This article was originally published on Keiron Derek Brown’s website Biological Recording. Liam Crowley also contributed to and reviewed the article.

Keiron Derek Brown is an earthworm expert, chair of the Ecology & Entomology section of the London Natural History Society (LNHS) and project manager for the Field Studies Council’s BioLinks project.

Tales from the GALS

A Living Treasure of Protist Diversity

The Culture Collection of Algae and Protozoa (CCAP)

Most of the algae and protozoa cultures that the Culture Collection of Algae and Protozoa (CCAP) are preparing and sending to the Sanger Institute for the Darwin Tree of Life project were collected many years ago and have been kept in continuous culture ever since. While CCAP staff love to collect and isolate new strains when we can, maintaining the 3000-plus strains kept in the public collection takes up most of our time!

One of our curators in CCAP’s 20°C culture room

CCAP is a Biological Resource Centre located within the Scottish Association for Marine Science (SAMS) near Oban on the scenic west coast of Scotland, with foundations stretching back to the cultures isolated by Professor Ernst Georg Pringsheim and added to his small collection in Prague in the 1920s (our oldest strain is a Chlorella vulgaris isolated in 1892!). Supported by the Natural Environment Research Council (NERC), part of UKRI, CCAP holds a growing collection of microalgae, cyanobacteria, protozoa, small seaweeds and algal pathogens, which are supplied to scientists, educators and businesses worldwide.

Three of the algae we are sending to Sanger: CCAP 1013/11 Coscinodiscus radiatus (top left); CCAP 176/7 Sphaerocystis schroeteri (bottom left); CCAP 291/2 Pseudopediastrum boryanum (right)

Microalgae and protozoa make up the vast bulk of eukaryote diversity, with lineages across most of the eukaryotic tree of life, yet we still know very little about them. Algae and cyanobacteria produce a range of molecules that can be important for biotechnology, aquaculture, biofuels and pharma- or nutraceutical industries, which include fatty acids, pigments, proteins, antioxidants and polysaccharides. Equally, microalgae are the main primary producers in aquatic environments and play a huge role in the environment helping with the conservation of biodiversity and sustainability of the aquatic environment.

CCAP 236/1A Hydrodictyon reticulatum: this alga forms a net-like structure

CCAP holds strains collected from all continents, but the largest proportion (25%) were collected from the UK or in UK coastal waters. This comprises 259 genera and 364 individual species and varieties, including green algae, desmids, diatoms, dinoflagellates, cyanobacteria, euglenophytes, amoebae, ciliates, flagellates, small red and brown seaweeds and a couple of oomycete pathogens of brown seaweeds! Samples were collected from ditches and puddles, rock pools and salt marsh, a mole hill and a cow’s stomach.

Cabinet with Barcoding tubes and aquaculture flasks getting ready to harvest!

We have sent 31 strains to Sanger as of June 2021, with another 9 almost ready to go. Each strain must be grown up in volume before it can be harvested (most of our cultures are kept in test tubes to save on space and resources), which could take anything from a few weeks to a few months depending on the species. Even within the same taxonomic group, different species can have different growth rates and environmental preferences.

CCAP 11/34 Dunaliella primolecta growing in a 1 litre flask

Our beautiful bright green marine microalga Dunaliella primolecta (CCAP 11/34) is being used as a research and development strain for the Tree of Life project. It grows very rapidly and reaches a high density within a very short space of time. Here you can see our 1L continuous culture with clean air bubbling through it. The air system allows for much better gas exchange, which increases growth rates. We grow this strain in a specially adapted growth media called F/2Quad; meaning we are providing the algae with four times the amount of Nitrate and Phosphate compared to the standard F/2 recipe. This results in a very happy culture for much longer periods of time, as they thrive on the extra nutrients. In the video you can see the individual cells of Dunaliella primolecta buzzing about; they are roughly 10-15μm in length and rarely stay still for very long. This strain was originally isolated from the coastal waters off Plymouth in Devon in 1936 and has been in the culture collection ever since.

Another of our extraordinary organisms that has been sent for analysis is CCAP 979/9 Rhodomonas baltica, a beautiful reddish pink marine cryptophyte which was isolated back in 1961 from a rock pool in Bordeaux Harbour on the Isle of Guernsey. Some cryptophyte (cryptomonad) species are known to produce blooms in ideal conditions but are not known to be toxic. Their chloroplasts contain chlorophylls a and c and phycobiliproteins, which in different concentrations and combinations can cause a variety of colours from the red you can see in this species to browns, greens and sometimes even blue!

CCAP 979/9 Rhodomonas baltica

Once we have grown and harvested the strains, we are ready to preserve them and send them to the Sanger Institute for analysis. We are very excited to be part of this project, helping to understand protistan diversity and unveiling some of the mysteries hidden within this extraordinary group of weird and wonderful organisms!

By the CCAP team

Tree Lungworm Lichen Tales from the GALS

Lichens have a certain reputation…

I recall a book when I first started out with lichens, where the publisher had printed the cover photo upside down, and this image has travelled with me, reminding me over the years of the reputation that lichens have for being ‘difficult’. In fact, I prefer the term fascinating. It is true that many seemingly basic questions about them are still unanswered, and every year when I teach, I get to say “I don’t know!” and try to tempt another student into a realm of open questions. The Darwin Tree of Life (DToL) – which ambitiously aims to sequence the genome of all 70-thousand kinds of life across Britain & Ireland – has offered a window into this realm, by proposing to sequence the genomes of a trial set of lichens, with their tangle of genomes. Lichen bodies, minimally, are built of a dominant lichen fungus, which builds the structural body of the hold-in-your-hand-ecosystem, plus cells from a separate kingdom of life formed of algal or bacterial cells providing its internal food source. That is two genomes to untangle, but we also expect an additional potentially complex community of other unseen microbial life including other co-habiting fungi and bacteria. DToL must build and test the analytical pipelines required for these complex samples.

Lobaria pulmonaria, the iconic Tree Lungwort, a rare species in most of Europe & abundant in Scottish rainforests. Photo: R.Yahr

Let’s consider the epiphytes that decorate the trees in our most iconic and internationally important habitat: Scottish Rainforest. These woodlands are a temperate version of rainforest, not too warm, not too cold, but almost always moist, and they are defined by their decorations, not by their trees: the handful of tree species present are adorned by hundreds of species of lichens, mosses and liverworts, and many of them are both internationally imperilled but locally abundant in Scotland. And in a lucky twist of fate, many of the rainforest lichens are also physically large, so making good test subjects to work with at the start of the project.

Looking across the Loch Creran from Glasdrum. Photo: R.Yahr

With Covid-era risk assessments complete and the blessing of NatureScot and permission to sample lichens for the project from the beautiful Glasdrum National Nature Reserve, I drove out north of Oban, to the heart of Scotland’s rainforest zone. Glasdrum has that magical quality of open gladed woodland, with copses of hazel tucked along tiny streams, surrounded by squelchy soil, and with a history of having been wooded for centuries – perfect for rich development of the iconic lichen communities so rare in the rest of Europe. As is usual for fieldwork, I keep to myself (I get to eat all the emergency rations in case of a problem), fieldbook, GPS and paper packets in hand. This time, I also brought along a microscope to process samples in the field straight into special preservatives: the enormous case containing liquid nitrogen and vials of special solutions to protect those precious, living cellular instructions we hope to unlock.

Portable lab in the back of the hire car! Photo: R.Yahr

True to form, it rained. All the way out, and all the way back – but the first tranche of lichens are safely preserved, ready to ship down to the Sanger Institute, where the sequences will be unravelled and those data pipelines will be tested to their limits! Not difficult, but fascinating.

Hypotrachyna taylorensis, on the base of an old oak in gladed woodland. Photo: R.Yahr

Author: Rebecca Yahr, Royal Botanic Gardens Edinburgh

DToL News

Darwin Tree of Life: looking back on 2020

Despite restrictions, 2020 has been a busy year for the Darwin Tree of Life Project. We take a look at some of this year’s achievements and highlights.

The Darwin Tree of Life (DToL) Project kicked off in late 2019 with the ambitious task of sequencing, assembling, and annotating the genomes of around 60,000 British and Irish species over a ten year period.

But when the COVID-19 pandemic hit in early 2020, many of the project’s plans were put on hold. Field work, sampling, and processing of new specimens in the lab were hit most by restrictions put in place to control the spread of the SARS-CoV-2 virus. Despite all this, many significant advances and discoveries were made as part of the DToL project throughout 2020. New species were recorded, DNA extraction methods were refined, and genome annotation became faster than ever before. A parcel of 30 completed genomes was delivered to the public databases at the end of the year.

We take a look back at the work carried out within the DToL project over the last year and shine a light on a few of the biggest highlights of 2020.

Macropis europaea: the Yellow Loosestrife Bee

University of Oxford

This year many species were collected from the Wytham Woods ecological observatory, including new records and rare species. The biggest highlight of the year was the discovery and collection of the Yellow Loosestrife Bee, Macropis europaea. This species was recorded at Wytham for the first time this summer. It is a rare bee in the UK, restricted to mainly wetland sites in southern England. Furthermore, this species is currently the only representative of the Melittidae collected for the project, one of just six families of bees in the UK.

Yellow Loosestrife Bee. Credit: Liam Crowley

“I was so thrilled to find a population of Macropis thriving at Wytham,” says Liam Crowley, a post-doctoral researcher on the DToL project. “Not only is it the first melittid bee to be sequenced for the project, but it was also a species I had never encountered before despite wanting to see it for long-time!”

M. europaea was also the first monolectic bee species collected for the project. This means it collects pollen from just a single species of flower – yellow loosestrife, Lysimachia vulgaris – which is a relatively unusual trait across UK bee species. Even more exceptionally, it collects floral oils from the yellow loosestrife flowers, to produce an oily wax with which it lines its underground nest cells.

This behaviour is unique amongst British bees, and is believed to assist in waterproofing the cells in order to protect the developing larvae from drowning in the saturated soils of wetland habitats.

The challenges of bryophytes

Royal Botanic Garden Edinburgh

The Royal Botanic Garden Edinburgh grows thousands of species of plants in its four garden sites. While COVID-19 restrictions limited work at wild locations, the Royal Botanic Garden Edinburgh team has benefited from access to the rich Living Collection of species held in care across these four sites.

“There have been opportunities to collect from bryophyte-rich woodland and moorland sites in the Scottish Borders. We have worked closely with the University of Edinburgh, Kew and the British Bryological Society to finalise species lists for the UK and Ireland,” said David Bell, Sample Co-ordinator for the DToL, Royal Botanic Garden Edinburgh.

Sample collection on Raven Craig. Credit: Shauna Hay

Bryophytes (mosses, liverworts, and hornworts) bring their own challenges. The combination of their diminutive size and tendency to grow in mixed populations with other bryophytes, fungi, algae and invertebrates, means sampling requires the collection of sufficient relatively clean material.

They must be processed under a microscope to isolate the freshest material of the target species for genome sequencing, with additional samples prepared for DNA barcoding, genome sizing by flow cytometry and voucher herbarium specimens. Sampling sufficient material and targeting larger bryophyte species is particularly important during the early stages of the DToL project while protocols are still being developed.

Sampling sea life: seaweed, sea sponges and sea snails

The Marine Biological Association (MBA)

This year the MBA processed samples for 132 species and set up standard procedures for Macroalgae (seaweed), Porifera (sea sponges), Cnidaria (corals and anemones), Bryozoa (mat animals), Mollusca (sea snails and slugs), Echinodermata (starfish and sea cucumbers), and simple filter feeders such as Tunicata (sea squirts). The first shipment of 568 samples from 53 species was sent to the Wellcome Sanger Institute for genome sequencing in November 2020.

The MBA has also optimised DNA extraction and PCR protocols for many different species of seaweed. To date, they have collected 34 common species. They are also starting to collect protists, very simple eukaryotic organisms that are not considered animals, plants or fungi. Sixteen protist strains are currently being cultivated, while nine have been harvested for DNA extraction.

“Barcoding protocols are currently being developed at MBA by Helen Jenkins and Joanna Harley, and a wider conversation about cross-institutional protocols is occurring with the DToL project collaborators” says Nova Mieszkowska, MBA Research Fellow. “The methods at MBA aim to firstly confirm identification to species level where possible, and secondly provide ‘deep’ phylogenetic information by methods such as building multigene trees.”

Data collection on the go

The Natural History Museum

In spite of the pandemic, the Natural History Museum (NHM) DToL team have had many highlights this year including the successful development of a sample collection-to-barcode pipeline. The sampling team has completed the arthropod species list and once lockdown was lifted fieldwork trips took place. The team also undertook ad hoc collecting locally when possible. A total of 1034 samples have been collected and are now stored in the NHM Molecular Collection Facility.

The data management team worked hard to get a sample data pipeline in place, setting up the epicollect mobile app for in-field sample data entry. This app helps to ensure that sample data can be exported to the DToL sample tracking system (based on COPO) and stored on the NHMcollections management system.

A barcoding pipeline was put in place and collected samples were successfully sequenced, barcodes validated against the BOLD database and the analysed data was then sent over to the Sanger. The NHM team is now fully trained to use their new PacBIO Sequel machine, and they will be validating this system to increase barcoding throughput going forward.

COPO: a big data broker for the DToL

Earlham Institute (EI)

“COPO is something quite special and unique that the science community has long been missing,” says Dr Seanna McTaggart, the Earlham Institute’s (EI) DToL Programme Manager. “For too long, data has been locked away in lab notebooks, or in files on a computer.”

COPO – Collaborative Open Omics – changes that.

COPO is a big data broker for life science. Developed by the Davey Group at EI, COPO takes care of uploading the metadata that are essential for contextualising genomic data. It’s as simple as uploading a spreadsheet, and COPO then does the rest, making sure that data is referred to the correct public repository. In the case of DToL, that is EMBL-EBI’s European Nucleotide Archive (ENA).

“COPO ensures that metadata is validated,” said EI Research Software Engineer Alice Minotto in a recent interview. “This could be metadata such as taxonomy, which can be tricky as identifying organisms is not a fixed process. Names and species identification can change over time, and even within specific communities.

“Instead of having to check and submit this information manually, which would take a very long time, COPO automates the process. This makes it far less time consuming, easier, and eliminates errors.”

To find out more about COPO, contact Dr Felix Shaw and Alice Minotto via the COPO website.

Large-scale sampling and tricky, slimy species

Wellcome Sanger Institute

It has been a tumultuous year for Sanger’s DToL team as they started to set up large-scale DNA sampling and sequencing pipelines from scratch, only for coronavirus to shut down scientific operations for several months. Caroline Howard, Scientific Manager for Sanger’s Tree of Life Programme, says the team have done an outstanding job.

“I think one of our biggest achievements has to be that we’re now properly up and running, despite the disruption of coronavirus. The support from our colleagues in sequencing operations has been amazing, particularly Elizabeth Cook, Craig Corton, Karen Oliver and Mike Quail.”

Sanger now has a fully-functioning tracking system where samples from the same specimen are submitted for the various sequencing techniques required, at a rate of 20-30 species per week. People may think extracting and sequencing DNA is the same for all families and species, but in fact different taxa pose different challenges that have to be solved each time.

“We’ve had a lot of success processing butterfly and moth samples this year, but slimy species such as molluscs continue to be tricky. But we’ve come a long way. A great example of how far our pipelines have come is Patella pellucida, the blue-rayed limpet. This sample was collected by Sanger faculty at Millport, Scotland at the end of August. Within five weeks, it had been received in the lab, gone through sample management, validated using COPO, put through our protocols for DNA extraction and sub-sampling, and submitted for sequencing.”

Mollusc
The blue-rayed limpet (Patella pellucida) was one of the species sequenced using Sanger’s new DNA pipelineCredit: Mark Blaxter

“We’re now assembling all of the data to reference genome standard. I think this represents an impressive turnaround time from collection to reference genome, and stands us in good stead to scale up in the year ahead.”

At the end of the year, the Sanger teams celebrated the formal release of the first 30 DToL species’ genome sequences to the European Nucleotide Archive. These assemblies are of uniformly high quality, with all the sequences assigned to chromosomes. Hundreds more are now in the sequencing, assembly and curation pipeline.

Illuminating nature’s dark matter: protists and single cell genomics

EI and University of Oxford

Protists make up the overwhelming majority of eukaryotic life but until now have remained relatively understudied. Researchers in the Hall group at EI and the Tom Richards lab at the University of Oxford are changing that, aiming to sample and decode the breadth of protist diversity across the British Isles.

That’s no easy task. ‘Protist’ is a word that describes a staggering range of lifeforms, some with genomes as small as a bacterium while others boast far greater complexity than that of the human genome. At EI, Dr Sally Warring has been working with the Single Cell Genomics team to coax the genetic information from this mysterious myriad of lifeforms.

Green algae colonies from an agar plate. Credit: Sally Warring

“Protists are so variable,” Warring explained to us in a recent interview. “Some have thick cell walls, some have glass cell walls, some have silica scales on them, some have starch – all these different things going on with their cell chemistry. This all makes DNA extraction, or the ability of an enzyme to work, highly variable.

“What I’m doing now is culturing protists to use Hi-C [a chromosome capturing mechanism], which looks at the proximity of DNA sequences to each other to get a better idea about the structure of genomic sequences. We’re trying to establish this in our single cell pipeline, possibly from metagenomic samples, to get better single cell genomes.”

Rapid access to the DToL genomes

EMBL’s European Bioinformatics Institute’s (EMBL-EBI)

One important goal of the DToL project is to make all of the newly sequenced genomes fully accessible to all researchers. Every genome sequence from the DToL project will be freely available through EMBL’s European Bioinformatics Institute’s (EMBL-EBI) database, the European Nucleotide Archive (ENA). Each of the genome sequences collected will also be annotated, stored and made available through the Ensembl genome browser. Both the ENA and Ensembl have made significant changes to their underlying processes to be as efficient as possible and keep up with the enormous scale of the DToL project.

These changes, driven by a need for rapid access to genome annotations at scale, led to the launch of Ensembl Rapid Release. Rapid Release is a lightweight, scalable version of the Ensembl genome browser designed to house annotations for species from DToL and other sequencing efforts.

Unlike the main Ensembl website, which updates every three months, Rapid Release is updated every two weeks with new species and annotations. As a result, downstream research can begin within weeks of the annotation being finalised – a huge benefit to the DToL project as the number of genomes begins to ramp up.

“Five months after the launch of Ensembl Rapid Release, we already have over 170 genomes from DToL and other projects,” says Fergal Martin, Vertebrate Annotation Coordinator at EMBL-EBI. “As we get more genomic and transcriptomic data from DToL we can now roll out the annotations on Rapid Release.”

These are just some of the amazing achievements made by the DToL project this year and this is just the beginning. Thousands of new genomes will be sequenced in the coming years as the DToL project gears up to sequence entire ecosystems.

As the DToL project expands to collect and sequence more species, researchers can expect to see more new genomes released and made freely accessible. In the near future, the DToL project will also provide a great opportunity to bring people closer to nature and give us a better understanding of how we can protect our planet.

Members of the Sanger Tree of Life team on a sample collecting visit to Millport, Scotland Credit: Mara Lawniczak

Tales from the GALS

The Weird and Wonderful World of Protists: an interview…

Dr Sally Warring tells us that studying protists could make us rethink what we know about biology, genetics, and the complexities of life on Earth.
(This article was originally posted on the Earlham Institute website on August 23rd 2020 and is reposted here with the generous permission of Sally Warring and Peter Bickerton)

Dr Sally Warring’s first few months at Earlham Institute have been a little out of the ordinary – especially after arriving in the UK from New York in the midst of an accelerating global pandemic. But for someone who studies an unusual group of organisms called protists, extraordinary is the norm.

As a postdoc in the Neil Hall Group, Dr Warring will be working on the Darwin Tree of Life project to sequence the DNA of every eukaryotic species in the British Isles. Had coronavirus not intervened, Warring would have spent the summer months traversing the country in search of novel protists – the mostly single-celled, mostly microscopic, always fascinatingly diverse creatures that science, so far, has paid scant attention to in comparison to plants and animals.

“Protists are awesome,” Warring enthuses while bubbling up a broth of nutritious wheat bran – the preferred diet of some ciliates she is culturing for an experiment. “They make up the vast majority of eukaryotic diversity, yet we have relatively few described species and even fewer genomes available.” 

Indeed, from the little that is already known about them, it’s clear that protists are unfairly grouped together under one title, when really they comprise vast, interlinked branches of the tree of life that dwarf the small twigs of plants, animals and fungi.

“Protists do so many different things,” explains Warring. “Some of them have really complex behaviours. They hunt, they mate, they build structures, they can live in complex communities and colonies. They provide lots in every ecosystem. They’re major primary producers, they’re degraders. Some of them are symbionts in many different ways.

“And there are millions of [species of] them – a small cup of sea water would have many and most would be undescribed. We also don’t know many of them very well. They probably do weird and wonderful things – odd ways of arranging their genomes, or doing just about anything. There’s so much diversity, we don’t know much about that diversity, and it’s all related, really, to our understanding of the evolution of life.”

Diatom Sampler Pack from Connecticut River. Credit: Dr Sally Warring

For the love of protists

Warring discovered her passion for protists at University, where she was able to use a microscope to first delve into what makes ponds murky and seas bloom. She has continued that passion through building a career working on them, from parasites to free living creatures, as well as photographing and filming obscure microbes to help bring the world of protists to a wide audience through her website and a hugely popular instagram channel (Pondlife_Pondlife), which has almost 50 thousand followers.

Warring uses her popular Instagram account to highlight pondlife. Credit: Dr Sally Warring (Instagram @pondlife_pondlife)

“It was something I started during my PhD, after a conversation with my husband. I was telling him about protists and he was saying that it might be something that would be cool on TV. I thought, why am I not doing something about it?

“I had a fair bit of experience with microscopes for research purposes, but imaging for a general audience is different. You want to prioritise different things. So I took my iPhone, which I learned very quickly could quite easily be attached to a microscope, and it’s a really easy way to generate photography. I know scientists now who use their iPhones to generate their research images. It’s a good, affordable way to do it.

“I started posting them to instagram and it went from there.”

This engagement online has led to some really exciting public engagement projects, including some online education, and a collaboration with the American Museum of Natural History. Together with the museum, Warring made a series of short films called “Pondlife”, a “safari to explore the microbial wildernesses all around us”, which you can see on YouTube.

Pond Scum Under the Microscope

For Warring, this sort of public engagement with science is really important.

“This is most of life’s diversity which people never see. There’s so much natural history content on TV, which is fantastic, but it’s exclusively about animals and sometimes plants. But I think that it’s important, when understanding evolution and our place in the ecosystem – to understand biology – it’s useful to have an understanding of cells, in particular, and also microorganisms. They are the foundations of all of our food chains, of biotic cycles, and also our own evolution. So, I think it’s a good thing to have people more aware of microorganisms. And also they’re really cool.

“It’s also pretty hard to engage with scientific content. You can only really read a scientific paper if you’ve got a PhD. There are lots of good things going on around that, but it’s a problem. The only people who can engage with research are other researchers.”

Green Algal colonies at 100x magnification. Credit: Dr Sally Warring (Instagram @pondlife_pondlife)

The nitty gritty of protist genomics (biology knowledge required)

As part of EI’s contribution to the groundbreaking Darwin Tree of Life Project, Warring is working to establish a new way of sorting and documenting protists from environmental samples. This work is only possible due to EI’s unique and cutting edge pipeline for the analysis of single cells, which Warring is adapting to the study of protists. 

“What I’m doing now is culturing protists to use Hi-C [a chromosome capturing mechanism], which looks at the proximity of DNA sequences to each other to get a better idea about the structure of genomic sequences. We’re trying to establish this in our single cell pipeline, possibly from metagenomic samples, to get better single cell genomes.

“What you get from that is hopefully a more accurate picture of the genome. You get information about where the genes are in relation to each other, but also about telomeres and centromeres – which is all really important information about genome structure. This is especially true for microorganisms when we don’t know much about species boundaries, for example. Having knowledge about gene synteny or genomic structure can be potentially really useful for determining whether two single cells are the same species or not.”

However, Warring explains, that’s a complex task.

“Protists are so variable. Some have thick cell walls, some have glass cell walls, some have silica scales on them, some have starch – all these different things going on with their cell chemistry. This all makes DNA extraction, or the ability of an enzyme to work, highly variable. So there hasn’t been a whole lot of Hi-C done on these organisms.

“Hi-C relies on looking at DNA proximity on chromosomes. But then you have these organisms called ciliates which have two nuclei. One of them is somewhat normal and then the other is just a bunch of short fragments of DNA – many of them, with genes on them – so I don’t know how Hi-C will behave under those conditions. And there’s a lot of ciliates.”

Biology’s dark matter

When it comes to discovering novelties about the biology of life on Earth, Warring says that it’s among protists that we’re likely to find many of the breakthroughs in our understanding.

“I think there’ll be different ways of doing things that we’ve only really studied in model organisms. Much of our knowledge on how genomes work comes from yeast, but protists have lots of different ways of doing things – and through them we can explore just how diverse biology can be. We have these dogmas and axioms of biology that might not be as common as we think. Then there’s evolution – protists are most similar to the organisms from which we evolved – and we still have lots of missing links we don’t understand in that process.”

Despite their vibrant diversity, studying the extraordinary world of protists could teach us more about our relatedness to other organisms. 

“We’re just scratching the surface of what we know – I can’t even imagine what sort of things we’re going to come across.”

A golden-yellow coral tooth fungus growing on tree bark Tales from the GALS

All Things Fun-GAL

Fungi are some of the least known and mysterious organisms on Earth.  With a Kingdom of their own and being most closely related to animals than to plants, they are the unsung heroes of all terrestrial ecosystems, recycling nutrients, enabling water uptake by plants and contributing to carbon sequestration. They have uncountable medical, industrial, agricultural, and sustainable applications, but can also have devastating impacts on our health and food security. Given their relevance and the impact they can have in our lives, it is surprising how little we know about them. The fungal kingdom is now thought to encompass 2.2 to 3.8 million species, an estimate that has been improved by recent developments in DNA sequencing technology, but just 145,000 have been properly described globally, with a rate of around 2,000 new species being described each year.

Royal Botanic Garden, Kew (RBGK) has been a leading light in fungal taxonomy for over 140 years, hosting the world’s largest fungarium, with over 1.25 million fungal specimens. Kew’s fungarium is an extremely valuable reference collection as it includes many important type specimens (this is the physical sample that was originally used to describe a new species) and historical samples such as subcultures of Alexander Fleming’s original Penicillium, or specimens collected by Charles Darwin whilst on the Beagle. Nowadays, mycologists at Kew continue working on unravelling the fungal diversity, globally and in the British Isles, and try to understand how fungi have evolved through time and how they interact with their environment. Projects are diverse and include Malagasy and Colombian fungi, the Fungal Tree of Life, or the study of plant-fungal interactions in alpine ecosystems, amongst others.

The work on British fungi has been specially supported by the Lost and Found Fungi (LAFF) community science project, which has engaged with amateur field mycology groups across the country, increasing conservation engagement and developing skills within the recording community. This initiative combines taxonomic work, distribution mapping, molecular data, and checklists, which contribute towards global and regional red-listing assessments and help to increase the presence of fungi in conservation assessments.

The rare coral tooth fungus (Hericium coralloides)

The fungal component of the Darwin Tree of Life (DToL) will greatly benefit of this network of expert field mycologists, as we are aiming to eventually collect, DNA barcode, and generate high quality genomic data for all the known fungal species in the British Isles (ca. 17,000). A big undertaking that will require the joint forces of field and lab mycologists from different institutions across the country.

As one of the Genome Acquisition Labs (GALs) for DToL, our work will include obtaining fresh, high quality specimens from a widely diverse range of habitats. Every geographical area, plant and micro-habitat will support different species of fungi so our searches will take us far and wide. To achieve this, we are designing a community science engagement project that will seek to enlist the help of local experts and amateur groups around the country to make collections of our target taxa. When designing collection strategies for fungi we have a very different set of considerations compared to many other taxonomic groups. Firstly, and perhaps most importantly, nearly all fungi are ephemeral. Their sporocarps (spore-bearing structures) are temporary structures that are only produced when certain stages of the organism’s life cycle have been reached and environmental conditions (mostly temperature and humidity) are adequate. This means that planning field collecting trips can be treacherous, reliant on weather conditions and with no guarantee of finding the same species on a known site even when timing has proven fortunate. This is one of the exciting things about studying fungi, you never know what you’re going to get! This is also one of the key reasons for engaging the mycological community with our DToL work. Individuals and local groups are out regularly and know their areas exceptionally well. They are often able to revisit sites numerous times in a season, greatly increasing the chances of finding the sought species. As already demonstrated by the LAFF project and the historical collections in Kew’s fungarium, we are very lucky to have great working relationship with the amateur community, especially through British Mycological Society (BMS) groups, that have assisted with records and specimens over the years. We will be developing additional training opportunities and support with small taxonomic projects as an exchange for this support and to continue developing mycological knowledge throughout the UK.

Students at a microscopy training event supported by RBG, Kew and Forever Fungi.

The fungi we will be collecting come in an astounding array of shapes, colours, and ecological roles. From the luridly coloured, minute (<1mm), clustered apothecia of some ascomycete fungi, through to the large pom-pom like fruitbodies of our rare and majestic Hericium species, we will be looking high and low to find our targets. In the woods and in the meadows, in peat bogs and on mountain tops, on and in the trees, below the surface of the soil, on bone and sprouting from the carcasses of unfortunate invertebrates, fungi can be found anywhere if you know how to look for them.

Weird and wonderful fungi (from top left: Cobalt crust (Terana caerulea), Cordyceps militaris on buried moth larvae, Ruby Elfcups (Sarcoscypha austriaca) and the parasitic bolete (Pseudoboletus parasiticus) on a common earthball (Scleroderma citrinum).

Alongside the field community, we are working together with mycologists from RBGE, MBA, Aberystwyth University, Cardiff University, and Oxford University that are contributing with their groups of expertise and their extensive knowledge on the biology of fungi. Together we are developing the first lists of target taxa and priority species for a sucessful first phase of the DToL project.

Fungi can establish very sophisticated symbiotic interactions with other organisms, and some of them, like lichenised fungi, can host an incredibly diverse microbiome inside their bodies. These associations can pose an additional challenge to our sampling process.  

Where these hosted organisms are also fungi, this can lead to problems of isolation of the right DNA, adding another layer of complexity to the task. In these cases, cultures of fresh fungi can be made from spore or tissue samples isolated on to nutrient agar in petri dishes, although some fungi have proven difficult to culture and many have never been attempted. As part of our field trips, we will be making cultures of our fresh finds once they are back in our field workroom. We will also be taking additional tissue samples to store in DNA preservatives as backups, should the cultures fail. In all collections, we’ll make sure a good representative is dried properly (usually in a portable food drier at less than 40C) for storage in our fungarium for posterior reference.

To facilitate collections provided by our community contributors, we have developed an online collections portal and DNA preservation kits for sampling throughout the year. When a target specimen has been found, contributors can quickly upload detailed information about the location, substrate, images, and other important features of their finds, either in the field on the mobile app or from a desktop computer. This portal will greatly streamline the whole process and will allow us to work with contributors from Land’s End to John O’Groats, whenever something of interest is found. The DNA preservation packs will allow these contributors to take a number of tissue samples from the specimens, which are then carefully packaged and posted to our mycology labs at RBG, Kew, where further work will then be carried out on them.

Once in our labs at Kew, samples will enter into a strict processing pipeline: the dried voucher specimen will be databased and curated in our collection; the preserved tissue samples will be prepared to be sent to Wellcome Sanger Institute for genome sequencing; in case of cultures, a piece of mycelium will be lyophilised and prepared for shipping and the rest of the culture will be cryopreserved at very low temperature and stored at Kew. These cryopreserved cultures can be revived at any time and new fungal subcultures being produced. In addition to our own collection, we also count with an extensive fungal culture collection ready to be used at CABI, so for some of the species we may not find in the field they might be hidden in CABI’s rich collection.

All bits of tissue, vouchers, cultures, will be accurately coded to allow for further tracking down the line. Feeding from the field online forms, a complex database will gather all the information associated to each single specimen, and just before sending the sample to Wellcome Sanger Institute, a DNA barcode will be generated to confirm its identity.

Fungi looks can be very misleading, two almost identical specimens may represent completely different species, and absolutely disparate shapes can have the same genetic identity. Mycologists didn’t realise of this conundrum until we entered the molecular era and started regularly sequencing our field collections. Fungal taxonomy has undergone a revolution in recent years, with families, genera, species being constantly re-arranged and renamed in the light of new discoveries. Complicating the matter, fungi can often be invisible to the naked eyed and devoid of features that would help us identify them morphologically, leaving us with DNA barcoding as the only tool to categorise them.

DNA barcodes are obtained through DNA extraction and sequencing of short standardised portions of DNA, which are then compared with online DNA repositories, allowing us to assign each sample to a known species (or excitingly, discover a new one). In many cases an integrated approach combining genetic tools, with morphological and ecological traits is the best option to refine our identifications. Often microscopic structures or habitat preferences gives us a hint of what it can be. For instance, the tiny rare ascomycete Poronia punctata in the UK thrives on pony dung, while the morphologically similar, but genetically distinct, Poronia erici is mostly associated with cow and rabbit dung.

A fungus with an ‘unpalatable taste’: the rare nail fungus Poronia punctata growing on pony dung from the New Forest

By Richard Wright, Elena Arrigoni, Ester Gaya, Royal Botanic Garden, Kew

Tales from the GALS

Being a Bryophyte GAL

Being a part of the Darwin Tree of Life project, genome sequencing the multicellular organisms of an entire island archipelago, has involved a major shift in the way we think and talk about the plants that we work on: The sizes of plants, and what constitutes an individual plant, have immediate practical implications for the project.

At the Royal Botanic Garden in Edinburgh (RBGE), as well as working on entire floras like the plants of Nepal, we focus on some taxonomic groups, including the biodiverse genera Begonia and Rhododendron. For the Darwin Tree of Life (DToL) project, our focus is on the bryophytes (mosses, liverworts and hornworts), another group of plants on which the RBGE holds considerable expertise. The island archipelago of the British Isles and Ireland contains approximately 1060 native species, with about 755 mosses, 300 liverworts and only four hornworts. There are also a few introduced species, including the rapidly-spreading southern hemisphere liverwort Lophocolea semiteres.

We are one of the Genome Acquisition Labs (GALs) for the DToL; as such, our job is to obtain fresh high-quality bryophyte specimens, as well as a few pivotal flowers, ferns and lichens. We will be collecting living plants, both from natural habitats and from within our own gardens, taking them into a lab to clean away any other organisms that have stuck to them, popping them into labelled vials that are then flash-frozen in liquid nitrogen, and shipping them down to the Wellcome Sanger Institute just outside Cambridge, where large-scale DNA sequencing will happen.

Bryophytes are small, measuring in the order of millimetres to centimetres, and working with small things can be challenging. The keen field bryologist is frequently to be found on their hands and knees, bottom raised like Bishop Brennan in an infamous episode of Father Ted, peering through their hand lens at some tiny smudge of green; these are plants that you’re more likely to step on than over, plants that often have to be magnified just to be identified.

 A field bryologist hard at work, British Bryological Society spring meeting, Worcestershire 2004; credit Tessa Carrick

A single clump of a liverwort or moss can contain several intertwined bryophyte species, but also the fungi or algae that live on or within the plants, and the tiny creatures that call them home – one of the common names for Tardigrades is moss piglets, after all . While this mixture might seem like a serious problem, dealing with DNA sequences from things that are very different is actually less of a challenge than separating things that are closely related. If you have sequences from a moss and a Tardigrade, they can be sorted by the make-up of their DNA; with mixtures of sequences from different individuals from a single species, the problem is far harder because the sequences are very similar. And this is where one of the bigger challenges of working with bryophytes comes in. With an oak tree it’s simple enough to pick a few leaves from one individual plant. With bryophytes, we have a very incomplete understanding of exactly what an “individual” is when it comes to clumps or cushions of plants, where different stems can either represent clones of one individual, siblings, or unrelated individuals. Without DNA sequencing the plants, the only way to be sure that two stems are from the same genetic individual is if they are physically connected, and often these plants grow from the tips with the older parts decaying, making connections between stems very difficult to trace.

Mixed clump of liverworts – Featherwort Plagiochila carringtonii and Spoonwort Pleurozia purpurea – on Ben Lui, October 2017, credit Dr Neil Bell

The RBGE GAL will be obtaining as much living material as possible for our bryophyte species, as there are several techniques being used in the project, each with different requirements. For each bryophyte species we want some plant tissue for genome and transcriptome sequencing at the Wellcome Sanger Institute, some plant tissue for DNA barcoding here at RBGE, some plant tissue for genome sizing by flow cytometry at RBG Kew, and we also need pieces of the plant to form a dried voucher specimen that will be digitized then stored in the RBGE herbarium. That’s rather a lot more pieces than a typical bryophyte stem can realistically be split into, and this will mean that the bryophyte samples will often have to be treated a bit differently than some of the larger plant species.

The main goal of the project is to obtain and sequence high molecular weight DNA for all species. It is possible to get enough high molecular weight DNA to sequence whole genomes from a single mosquito. However, when working with plants we prefer to start with rather a lot more material – in another project we used 10-20 grams of leaves from the African violet Streptocarpus in the DNA extractions, equivalent to 4-8,000 average sized mosquitoes. Unfortunately bryophytes don’t grow as big as Streptocarps, so we will either have to use horticultural methods to obtain lots of clonal growth, or our laboratory techniques will have to improve. While most of the molecular lab work will take place at the Wellcome Sanger Institute, using flash-frozen plant material that we send down from Edinburgh, there will also be some protocol development and testing work carried out by the Scientific and Technical services team at RBGE.

Once the genomes have been produced, they have to be annotated, marking on where different genes are found. To do this, the transcriptome – or the expressed RNA from the genes – is captured, sequenced and mapped back onto the DNA genome. This will all be done at the Wellcome Sanger Institute, using RNA extracted from frozen plant material. In this case, the transcriptome can be generated from a different individual plant than the one that was used for the genome; for most bryophytes we will send down samples from several individuals that can be used to generate some of this supplementary information. 

In order to check that plants have been identified correctly, and also to allow sample tubes to be identity-checked in the molecular laboratory, small bits of sequence data called DNA barcodes will be generated. The plant DNA barcoding work will be done at RBGE, using plant tissue that’s been dried using a desiccant and can then be stored at room temperature. Most of the bryophyte DNA barcoding work that we do at RBGE uses DNA that has been extracted from multiple bryophyte stems; this does not usually cause problems as DNA barcoding uses data from a standard set of genes that are conserved within species, including a gene for the essential photosynthesis enzyme RuBisCO. For bryophytes, in most cases we will not be barcoding the exact individual that has its genome sequenced, but we will usually work with material from the same patch of plants.

It can be more difficult to work with, and more data can be needed for, plants with large genomes. So that resources can be targeted efficiently, the DToL project utilizes a technique called Flow Cytometry that measures the sizes of nuclei. This will be carried out at the Royal Botanic Gardens Kew. At the RBGE GAL we will prepare parcels of living plants (packed in damp tissue paper) that will be posted down to Kew, providing them with enough material for replicate measurements to be taken. For our tiny bryophytes, again this will not be from the exact individual that has been sent for sequencing, but will be plants that grew in close proximity, so the data will represent genome size estimates for populations.

So that any mistakes in plant identification that might occur can be corrected at a later date, we always voucher our work by preserving a part of each sample so that it can be re-examined and re-identified. For plants, the most common way of doing this is using a herbarium specimen, created by rapidly drying a bit of the plant. Larger plants are squashed between sheets of absorbent paper, forming a brittle 2D structure that can, with careful treatment, last for hundreds of years. Our bryophyte specimens are conceptually rather different, in that the herbarium collection will usually be a clump of the plant species, dried and preserved loose in an envelope. The specimen usually contains multiple individuals of the bryophyte, and frequently also includes bits of all sorts of other living things (seedlings, leaf litter, other bryophytes, tardigrades, beetles, worms…) as well: Bryophyte specimens can be rather more like community snapshots. This means that lots of different individuals can be vouchered by a single bryophyte herbarium packet, even though the individuals that were sampled are not in the packet, and the individuals in the packet have not been sampled.

A community in a packet – herbarium specimen of Aneura mirabilis, or Ghostwort, collected in England by Clifford Townsend (1963) and digitized by David Bell

Of course there are some phenomenal up-sides to working with bryophytes – for one, through organisations like the British Bryological Society we are close to having a complete list of the species that occur here, as well as comprehensive records of where they can be found. For another, the bryophyte life-cycle, unlike that of the ferns, conifers and flowering plants, is dominated by a haploid stage where only a single copy of the genome is present, simplifying some of the bioinformatic processes. And in addition, over the coming years the genome data from this project will spotlight some of our exceptional British and Irish bryological diversity, as researchers start finding amazing things hidden in the genomes of these lineages of diminutive plants that are so often overlooked in their natural habitats. 

By Dr Laura Forrest, Royal Botanic Garden Edinburgh

The hair-cap moss Polytrichum formosum photographed in the UK; credit Dr David Long
Tales from the GALS

A Moth in the Tree of Life at Sanger

Peach Blossom Thyatira batis and barcoded tube at Wytham (see last month’s blog) and tubes safely in the Tree of Life -80 freezer at Sanger. Images from Liam Crowley (left) and Mark Blaxter (right).

The life of a sample at the Tree of Life labs at the Wellcome Sanger Institute starts with an email forewarning us, for example, of the imminent arrival of carefully identified moth specimens from Wytham Woods in barcoded freezer vials. On the day, an email from stores summons Nancy from her desk to collect the freezer parcel, and she scans the vials, checks them against the detailed sample manifest and places them in the -80°C freezer. Most samples are then passed onto the Sanger Samples Management Facility, a carefully backed-up rank of freezers that holds not just the Tree of Life samples but thousands upon thousands of samples from other Sanger programmes in human genetics, cancer, cellular genetics, pathogens and microbes.

There the moth sample waits in the freezers for a short time while Nancy compiles the instructions for sequencing: Is the moth especially rare? What DNA extraction method should be used? How big is the genome likely to be and thus how much data do we need to generate? The sample is then processed to retrieve very long DNA, either by the Tree of Life lab team, or our colleagues in Sanger’s Scientific Operations. For example, Radka (from the Tree of Life lab team of Radka, Michelle, Clare, Robin and Harriet) might take the moth sample and pulverise it before digesting the protein and extracting the DNA. She will check the quality of the DNA samples using a FemtoPulse instrument, which uses very little sample (a blessing when the sample is very small) to accurately quantify and size fragments up to 165 kilobases (kb). We have extraction methods that work well for moths and beetles and mammals and flies, and we are improving the quality of extractions from plants and fungi.

Size analysis of a long DNA sample. The FemtoPulse instrument (left) estimates, with possibly spurious accuracy, that the size of the extracted DNA peaks at 148,446 bases (spectrogram on the right), and thus is excellent for making a long read library. Images from Mark Blaxter and Radka Platte.

Good quality DNA then moves into library production. Making a large-insert library for the Pacific Biosciences SEQUEL II instrument or the Oxford Nanopore Promethion instrument is part art and part routine. As with extractions we currently share the load of library production between the Tree of Life team and Scientific Operations. For the moth, Radka will take some of the DNA, shear it to just the right length (usually between 13-18kb) and perform the molecular biology steps that are needed to prepare it for sequencing. 

The library is handed over to the Scientific Operations Long Read team to load onto the big machines, the SEQUEL II and Promethion sequencers. These technologies have changed what is possible in genomics, and are the basis of the confidence that we can generate genomes from our thousands of target species. The machines take from 24 hrs to 3 days to run, producing tens of gigabases of raw data from each library. For the moth, we will need only one run of one of the sequencers to generate enough data for primary assembly. 

Meanwhile, Mike and Matt in Scientific Operations prepare some special long-range sequencing libraries from unsheared DNA and remaining sample. 10X Genomics linked read cloud libraries generate data that allow us to jump over and resolve complicated repeats in the moth genome. Hi-C libraries capture the three dimensional arrangement of chromosomes in each nucleus of the moth, sampling DNA fragments that are close to each other in 3D space, but far apart on the linear, stretched-out chromosome. These 10X and Hi-C libraries generate data sets that are used to link long-read data into chromosomes. 10X and Hi-C data are generated on the fleet of Illumina sequencing instruments in Scientific Operations.

Pacific Biosciences SEQUEL II instruments (left) and the PromethION instrument (right) at Sanger Scientific Operations, running DToL samples night and day. Images from Mark Blaxter.

The SciOps team checks the data are of good quality, parks them on the Sanger’s (very) large hard drive system, and sends an email announcing the availability of another species’-worth of data.

Shane’s email inbox fills with messages about completed sequencing runs, and when all the moth’s data are ready he and his Tree of Life Assembly team (Marcela and Ksenia) kick off the process of assembly on the Sanger’s compute farm. This uses cutting edge software to identify overlapping long reads, disentangle confusions that result from repeats and errors, and finally stitch everything together first of all into contigs (stretches of contiguous AGCT sequence) and then into scaffolds (contigs that are ordered and oriented using long-range data). Only five years ago we would have struggled to generate assemblies with mean contig lengths over 50 kb. With the long read Pacific Biosciences and Oxford Nanopore data we now get assemblies with mean contig lengths over 1 Megabase (Mb), frequently over 5 Mb and sometimes over 10 Mb. For species like our moth, which has a genome of 600 Mb, once Shane adds the 10X and Hi-C data, these assemblies fall into chromosomes. 

From sequence to contig to scaffold to chromosomes: the genome of a moth comes together using Hi-C data. The denser colours on the plots show the links between the contigs from the genome inferred from Hi-C data – before Hi-C scaffolding on the left, and after on the right, which has 30 large scaffolds and a few smaller ones waiting to be linked together by the GRIT informaticians. We expect a moth to have ~30 chromosomes. Image from Shane McCarthy.

The assembly team then hands the newly-minted moth genome assembly over to Kerstin’s Genome Reference Informatics Team (GRIT: Kerstin, Joanna, Sarah, Ying, James, William, Jonathan, Alan, Damon). For the moth, Ying stress-tests the assembly with a battery of analyses, basically asking “Is this the best we can do?”. The results get handed over to Sarah, who blesses the unproblematic majority of the assembly, affirms some correct guesses, fixes the few errors and exports a quality assured assembly. James, the gatekeeper in GRIT, brokers submission of the genome assembly to the European Nucleotide Archive, part of the International Nucleotide Sequence Database Consortium, and presses the “release” button. 

The new moth genome emerges into the light of a new digital day, one of 1000 species of all kinds we will extract, sequence and assemble this year. To publish the genome and announce its availability to the community to use and analyse, we write a brief Genome Note for rapid publication in Wellcome Open Research (2). Nancy marks the genome “complete”.

Now for the next one.

Mark Blaxter

Tales from the GALS

Wytham Woods: the genomics of ecology and evolution

Ancient woodlands are the most biodiverse and complex terrestrial habitat in the UK. Home to thousands of iconic and specialist animals, plants and fungi, our ancient forests and woodlands are also deeply entwined with our cultural heritage. In recent decades, however, woodland cover has been eroded by land use change, and today just 2.4% of the UK is covered by ancient woodland: sites where forest cover has persisted for over 400 years, usually with management to some degree.

Wytham Woods cloaks a prominent hill above a sweeping bend in the River Thames. The 400 hectare (1000 acre) site is a mosaic of ancient semi-natural woodland, forest plantations, limestone grassland and other species rich-habitats. It has been owned and maintained by the University of Oxford since 1942, and is the site of some of the longest running ecological experiments and observations in the world. Wytham Woods has a rich fauna and flora, with over 500 species of plants and around 1000 recorded species of butterflies and moths, and teems with a diversity of birds and mammals.

As the Darwin Tree of Life project was being conceived, Wytham Woods rapidly emerged as a site for focussed and intensive sampling of terrestrial species for complete genome sequencing. In the earliest phase of the project, we concentrated our attention on sampling arthropods, especially a wide taxonomic spread of moths and a carefully chosen selection of hoverflies, dung beetles and spiders. Our core team (Liam Crowley, Peter Holland and Owen Lewis) has been crawling through vegetation, picking through dung and peering into light traps: identifying, photographing, cataloguing, freezing in barcoded cryovials and shipping specimens to the Tree of Life labs at the Wellcome Sanger Institute for DNA extraction. It has not been a solitary endeavour: we have benefitted enormously from the moth-trapping expertise of Douglas Boyes, and visits from hoverfly, dung fauna and spider specialists (Will Hawkes, František Sládeček, Lauren Sumner-Rooney and Alistair McGregor). Involvement of taxon experts is something we really want to encourage in the project, with forthcoming visits planned by specialist groups including the Dipterists’ Forum and the Earthworm Society of Britain We have a rustic chalet in the middle of the woods, with accommodation for small groups of visitors and volunteers, a kitchen and labs – perfect for early morning or nocturnal work.

Black Arches Lymantria monacha

By January 2020, just a few months into the Darwin Tree of Life project, we had sent specimens of 221 arthropod species to the Sanger Institute. Not all will be turned into genome sequence, but a close look at the first few genome sequences assembled reveals the data quality to be astonishingly good. So what could we learn from Wytham Woods genome sequence data? And more generally, why focus part of a major sequencing project on ancient woodland? We think there are several reasons. First, it is incredibly efficient to focus sampling at a few sites. Second, the sequences will become key reference genomes for ecological and environmental studies through the 21st century. Our woodland fauna and flora are under threat due to land use change, invasive species, climate change and pathogen outbreaks. Understanding and predicting these changes, and possibly mitigating some of them, will require us to understand how each species responds to challenges at a cellular and molecular level. Such studies, including transcriptomic and proteomic analyses, will be greatly aided by reference genomes. Populations could also become fragmented or merged, and to detect this comparisons need to be made between individuals, something that will be facilitated by reference genomes. The third reason centres on evolution. Natural selection has adapted organisms to their environment through fixation of genetic change, and so hidden in the genome sequences will be clues to how evolution has shaped physiology, anatomy, life history, behaviour and other traits. There will surely be new genes, divergent sequences, genome duplications, horizontal gene transfers and much more: a deeper understanding of biodiversity is waiting to be discovered in Wytham Woods.

Peach Blossom Thyatira batis

Peter Holland, Owen Lewis, Liam Crowley