Blog

Uncategorised

International Day of Biodiversity

Biodiversity is at the core of the Darwin Tree of Life (DToL) project, and today (May 22nd) marks the Convention on Biological Diversity’s International Day of Biodiversity. While we celebrate the variety of life on Earth every day, today seems like a great time to explain some of the benefits our project hopes to have for the broader environment. Our aim is to obtain high-quality genome sequences for each of the over 60,000 species of eukaryotic organisms in Britain and Ireland- a goal which includes all protists, plants, fungi, invertebrates and vertebrates. Our consortium includes a number of different Genome Acquisition Laboratories (or GALs), consisting of two botanic gardens (RBG Edinburgh and Kew), the Natural History Museum (who have authored this page on biodiversity), and the Marine Biological Association, as well as several universities and research institutions (the Universities of Cambridge, Edinburgh and Oxford, the Earlham Institute, the Wellcome Sanger Institute and EMBL-EBI). 

DToL is a UK partner of the Earth Biogenome Project (EBP), a worldwide project to sequence all life on Earth. The EBP has three stated goals for it’s research: benefiting human welfare, protecting biodiversity and understanding ecosystems. These 3 goals can equally be applied to the aims of the Darwin Tree of Life. Though the goal of sequencing every species may seem like a fishing expedition, or an attempt to fill a Pokedex by “catching them all”, this type of genomic information can be invaluable, both for better understanding the species that share our islands, and also to inform and aid in their conservation and protection.

The Darwin Tree of Life project has already released genome notes for three mammal species, the red (Sciurus vulgaris) and grey squirrels (Sciurus carolinensis), and the Eurasian otter (Lutra lutra). All three of these species give great examples of the value of genomic information. The squirrel genomes (reported in the Washington Post) revealed the genetic sequences of two closely related and competing species. We hope that these genomes will provide valuable information about the genetic basis of immunity to squirrelpox in grey squirrels, and a minority of reds. Understanding this means we can make the best conservation decisions, such as choosing the most genetically suitable individuals for breeding and reintroduction programmes to preserve red squirrel populations. Such information is valuable for any species subject to reintroduction; through knowledge of the genetics of a species, researchers can identify which individuals are likely to be most resilient in their environment. These individuals are the strongest candidates for reintroduction as their better chance of survival gives the population the greatest chance possible to increase and thrive.

Understanding the genetic sequence of species can also help us to understand the effects of environmental change and adaptation, be it naturally occurring or as the result of human activity. The otter sample sequenced as part of DToL came from collaborators at Cardiff University, where the Otter Project undertakes a variety of studies focusing on the effects of pollution and disease on the UK otter population. The otter genome will provide further data on the effects of pollution by chemicals found in pesticides to the otters. A further (non-DToL) example of genome sequencing that increased understanding of adaptation to extreme environments can be seen in the recently published Antarctic blackfin icefish genome. These icefish are one of only a handful of vertebrate species which lack red blood cells, and also possess a number of other adaptations to extreme cold (such as genes to prevent ice damage- a natural internal anti-freeze!). Through sequencing the genome of a species, we can come to a greater understanding of how the mechanisms that allow them to survive in their environment.

Though the examples in this article have been limited to vertebrates, DToL will soon have many exciting future stories to tell about many other species of flora and fauna, which make up the majority of the biodiversity of the UK (and indeed the Earth). Our namesake Darwin himself was noted for being incredibly fond of earthworms, beetles and barnacles!  We hope to release the genomes for the complete list of UK lepidoptera (moths and butterflies) later this year, which will provide a fascinating comparative dataset for scientists that study these beautiful creatures. Through our project, we aim to provide researchers and naturalists with vital insights into their species of interest, allowing a deeper understanding of their adaptations to their environment, and also hopefully helping to provide the tools for their preservation. Within DToL we have expert groups which have worked to create lists of all the species we aim to sample, prioritising those with particular scientific interest. We look forward to bringing you more stories from a greater range of species as our work continues.

Black Arches Lymantria monacha – Collected by our team at Wytham Woods

By Sophie Potter, Wellcome Sanger Institute

Tales from the GALS

Being a Bryophyte GAL

Being a part of the Darwin Tree of Life project, genome sequencing the multicellular organisms of an entire island archipelago, has involved a major shift in the way we think and talk about the plants that we work on: The sizes of plants, and what constitutes an individual plant, have immediate practical implications for the project.

At the Royal Botanic Garden in Edinburgh (RBGE), as well as working on entire floras like the plants of Nepal, we focus on some taxonomic groups, including the biodiverse genera Begonia and Rhododendron. For the Darwin Tree of Life (DToL) project, our focus is on the bryophytes (mosses, liverworts and hornworts), another group of plants on which the RBGE holds considerable expertise. The island archipelago of the British Isles and Ireland contains approximately 1060 native species, with about 755 mosses, 300 liverworts and only four hornworts. There are also a few introduced species, including the rapidly-spreading southern hemisphere liverwort Lophocolea semiteres.

We are one of the Genome Acquisition Labs (GALs) for the DToL; as such, our job is to obtain fresh high-quality bryophyte specimens, as well as a few pivotal flowers, ferns and lichens. We will be collecting living plants, both from natural habitats and from within our own gardens, taking them into a lab to clean away any other organisms that have stuck to them, popping them into labelled vials that are then flash-frozen in liquid nitrogen, and shipping them down to the Wellcome Sanger Institute just outside Cambridge, where large-scale DNA sequencing will happen.

Bryophytes are small, measuring in the order of millimetres to centimetres, and working with small things can be challenging. The keen field bryologist is frequently to be found on their hands and knees, bottom raised like Bishop Brennan in an infamous episode of Father Ted, peering through their hand lens at some tiny smudge of green; these are plants that you’re more likely to step on than over, plants that often have to be magnified just to be identified.

 A field bryologist hard at work, British Bryological Society spring meeting, Worcestershire 2004; credit Tessa Carrick

A single clump of a liverwort or moss can contain several intertwined bryophyte species, but also the fungi or algae that live on or within the plants, and the tiny creatures that call them home – one of the common names for Tardigrades is moss piglets, after all . While this mixture might seem like a serious problem, dealing with DNA sequences from things that are very different is actually less of a challenge than separating things that are closely related. If you have sequences from a moss and a Tardigrade, they can be sorted by the make-up of their DNA; with mixtures of sequences from different individuals from a single species, the problem is far harder because the sequences are very similar. And this is where one of the bigger challenges of working with bryophytes comes in. With an oak tree it’s simple enough to pick a few leaves from one individual plant. With bryophytes, we have a very incomplete understanding of exactly what an “individual” is when it comes to clumps or cushions of plants, where different stems can either represent clones of one individual, siblings, or unrelated individuals. Without DNA sequencing the plants, the only way to be sure that two stems are from the same genetic individual is if they are physically connected, and often these plants grow from the tips with the older parts decaying, making connections between stems very difficult to trace.

Mixed clump of liverworts – Featherwort Plagiochila carringtonii and Spoonwort Pleurozia purpurea – on Ben Lui, October 2017, credit Dr Neil Bell

The RBGE GAL will be obtaining as much living material as possible for our bryophyte species, as there are several techniques being used in the project, each with different requirements. For each bryophyte species we want some plant tissue for genome and transcriptome sequencing at the Wellcome Sanger Institute, some plant tissue for DNA barcoding here at RBGE, some plant tissue for genome sizing by flow cytometry at RBG Kew, and we also need pieces of the plant to form a dried voucher specimen that will be digitized then stored in the RBGE herbarium. That’s rather a lot more pieces than a typical bryophyte stem can realistically be split into, and this will mean that the bryophyte samples will often have to be treated a bit differently than some of the larger plant species.

The main goal of the project is to obtain and sequence high molecular weight DNA for all species. It is possible to get enough high molecular weight DNA to sequence whole genomes from a single mosquito. However, when working with plants we prefer to start with rather a lot more material – in another project we used 10-20 grams of leaves from the African violet Streptocarpus in the DNA extractions, equivalent to 4-8,000 average sized mosquitoes. Unfortunately bryophytes don’t grow as big as Streptocarps, so we will either have to use horticultural methods to obtain lots of clonal growth, or our laboratory techniques will have to improve. While most of the molecular lab work will take place at the Wellcome Sanger Institute, using flash-frozen plant material that we send down from Edinburgh, there will also be some protocol development and testing work carried out by the Scientific and Technical services team at RBGE.

Once the genomes have been produced, they have to be annotated, marking on where different genes are found. To do this, the transcriptome – or the expressed RNA from the genes – is captured, sequenced and mapped back onto the DNA genome. This will all be done at the Wellcome Sanger Institute, using RNA extracted from frozen plant material. In this case, the transcriptome can be generated from a different individual plant than the one that was used for the genome; for most bryophytes we will send down samples from several individuals that can be used to generate some of this supplementary information. 

In order to check that plants have been identified correctly, and also to allow sample tubes to be identity-checked in the molecular laboratory, small bits of sequence data called DNA barcodes will be generated. The plant DNA barcoding work will be done at RBGE, using plant tissue that’s been dried using a desiccant and can then be stored at room temperature. Most of the bryophyte DNA barcoding work that we do at RBGE uses DNA that has been extracted from multiple bryophyte stems; this does not usually cause problems as DNA barcoding uses data from a standard set of genes that are conserved within species, including a gene for the essential photosynthesis enzyme RuBisCO. For bryophytes, in most cases we will not be barcoding the exact individual that has its genome sequenced, but we will usually work with material from the same patch of plants.

It can be more difficult to work with, and more data can be needed for, plants with large genomes. So that resources can be targeted efficiently, the DToL project utilizes a technique called Flow Cytometry that measures the sizes of nuclei. This will be carried out at the Royal Botanic Gardens Kew. At the RBGE GAL we will prepare parcels of living plants (packed in damp tissue paper) that will be posted down to Kew, providing them with enough material for replicate measurements to be taken. For our tiny bryophytes, again this will not be from the exact individual that has been sent for sequencing, but will be plants that grew in close proximity, so the data will represent genome size estimates for populations.

So that any mistakes in plant identification that might occur can be corrected at a later date, we always voucher our work by preserving a part of each sample so that it can be re-examined and re-identified. For plants, the most common way of doing this is using a herbarium specimen, created by rapidly drying a bit of the plant. Larger plants are squashed between sheets of absorbent paper, forming a brittle 2D structure that can, with careful treatment, last for hundreds of years. Our bryophyte specimens are conceptually rather different, in that the herbarium collection will usually be a clump of the plant species, dried and preserved loose in an envelope. The specimen usually contains multiple individuals of the bryophyte, and frequently also includes bits of all sorts of other living things (seedlings, leaf litter, other bryophytes, tardigrades, beetles, worms…) as well: Bryophyte specimens can be rather more like community snapshots. This means that lots of different individuals can be vouchered by a single bryophyte herbarium packet, even though the individuals that were sampled are not in the packet, and the individuals in the packet have not been sampled.

A community in a packet – herbarium specimen of Aneura mirabilis, or Ghostwort, collected in England by Clifford Townsend (1963) and digitized by David Bell

Of course there are some phenomenal up-sides to working with bryophytes – for one, through organisations like the British Bryological Society we are close to having a complete list of the species that occur here, as well as comprehensive records of where they can be found. For another, the bryophyte life-cycle, unlike that of the ferns, conifers and flowering plants, is dominated by a haploid stage where only a single copy of the genome is present, simplifying some of the bioinformatic processes. And in addition, over the coming years the genome data from this project will spotlight some of our exceptional British and Irish bryological diversity, as researchers start finding amazing things hidden in the genomes of these lineages of diminutive plants that are so often overlooked in their natural habitats. 

By Dr Laura Forrest, Royal Botanic Garden Edinburgh

The hair-cap moss Polytrichum formosum photographed in the UK; credit Dr David Long
Uncategorised

The Darwin Tree of Life Project and the COVID-19…

To all partners and collaborators,

The COVID-19 pandemic and associated public health measures mean that all of the institutions that are partners in the Darwin Tree of Life project have closed their physical doors, with staff working from home. This necessarily means that essentially all sample collection activities have ceased, and that no samples already in hand will be submitted for sequencing in the near future.

Despite this halt to collection and data generation activity the Darwin Tree of Life project is still running. We will be carrying out a series of research, documentation and bioinformatic tasks throughout the period of physical closure. We intend to return to full activity as soon as it is safe to do so, with improved data systems, more accurate species lists, streamlined analytic pipelines and a redoubled enthusiasm for sequencing the biota of Britain and Ireland.

The list of projects we will be approaching while working from home through the shutdown is being finalised but will include:

  • Work on the species inventory for Britain and Ireland: working on the checklists and delivering a much improved overview of the diversity of our environment.
  • Defining the full list of “first” target species (aiming to identify one species and one backup species to be sequenced to generate the reference genome for each taxonomic Family).
  • Work on detailed per-taxon sampling procedures, with specific standard operating procedures developed for each of the major taxa.
  • Work on the collection, handling and display of sample metadata for all of the different groups of organisms we will be collecting.
  • Work on the improvement of assembly algorithms and the development of bioinformatic analysis pipelines for long read and long range data.
  • Delivering high quality assemblies for all species for which we currently have sufficient data
  • Releasing our first annotated genomes on Ensembl and, once these are ready, a landing page for the Darwin Tree of Life at https://projects.ensembl.org

For all of these projects we welcome and encourage both cross-partner collaboration, and also collaboration with colleagues in the wider community who would like to take part. Please contact contact@darwintreeoflife.org if you would like to be involved.

Please cascade this message through your staff and to collaborators.

Stay safe and well.

Mark Blaxter
Tree of Life, Wellcome Sanger Institute
30th March 2020

Tales from the GALS

A Moth in the Tree of Life at Sanger

Peach Blossom Thyatira batis and barcoded tube at Wytham (see last month’s blog) and tubes safely in the Tree of Life -80 freezer at Sanger. Images from Liam Crowley (left) and Mark Blaxter (right).

The life of a sample at the Tree of Life labs at the Wellcome Sanger Institute starts with an email forewarning us, for example, of the imminent arrival of carefully identified moth specimens from Wytham Woods in barcoded freezer vials. On the day, an email from stores summons Nancy from her desk to collect the freezer parcel, and she scans the vials, checks them against the detailed sample manifest and places them in the -80°C freezer. Most samples are then passed onto the Sanger Samples Management Facility, a carefully backed-up rank of freezers that holds not just the Tree of Life samples but thousands upon thousands of samples from other Sanger programmes in human genetics, cancer, cellular genetics, pathogens and microbes.

There the moth sample waits in the freezers for a short time while Nancy compiles the instructions for sequencing: Is the moth especially rare? What DNA extraction method should be used? How big is the genome likely to be and thus how much data do we need to generate? The sample is then processed to retrieve very long DNA, either by the Tree of Life lab team, or our colleagues in Sanger’s Scientific Operations. For example, Radka (from the Tree of Life lab team of Radka, Michelle, Clare, Robin and Harriet) might take the moth sample and pulverise it before digesting the protein and extracting the DNA. She will check the quality of the DNA samples using a FemtoPulse instrument, which uses very little sample (a blessing when the sample is very small) to accurately quantify and size fragments up to 165 kilobases (kb). We have extraction methods that work well for moths and beetles and mammals and flies, and we are improving the quality of extractions from plants and fungi.

Size analysis of a long DNA sample. The FemtoPulse instrument (left) estimates, with possibly spurious accuracy, that the size of the extracted DNA peaks at 148,446 bases (spectrogram on the right), and thus is excellent for making a long read library. Images from Mark Blaxter and Radka Platte.

Good quality DNA then moves into library production. Making a large-insert library for the Pacific Biosciences SEQUEL II instrument or the Oxford Nanopore Promethion instrument is part art and part routine. As with extractions we currently share the load of library production between the Tree of Life team and Scientific Operations. For the moth, Radka will take some of the DNA, shear it to just the right length (usually between 13-18kb) and perform the molecular biology steps that are needed to prepare it for sequencing. 

The library is handed over to the Scientific Operations Long Read team to load onto the big machines, the SEQUEL II and Promethion sequencers. These technologies have changed what is possible in genomics, and are the basis of the confidence that we can generate genomes from our thousands of target species. The machines take from 24 hrs to 3 days to run, producing tens of gigabases of raw data from each library. For the moth, we will need only one run of one of the sequencers to generate enough data for primary assembly. 

Meanwhile, Mike and Matt in Scientific Operations prepare some special long-range sequencing libraries from unsheared DNA and remaining sample. 10X Genomics linked read cloud libraries generate data that allow us to jump over and resolve complicated repeats in the moth genome. Hi-C libraries capture the three dimensional arrangement of chromosomes in each nucleus of the moth, sampling DNA fragments that are close to each other in 3D space, but far apart on the linear, stretched-out chromosome. These 10X and Hi-C libraries generate data sets that are used to link long-read data into chromosomes. 10X and Hi-C data are generated on the fleet of Illumina sequencing instruments in Scientific Operations.

Pacific Biosciences SEQUEL II instruments (left) and the PromethION instrument (right) at Sanger Scientific Operations, running DToL samples night and day. Images from Mark Blaxter.

The SciOps team checks the data are of good quality, parks them on the Sanger’s (very) large hard drive system, and sends an email announcing the availability of another species’-worth of data.

Shane’s email inbox fills with messages about completed sequencing runs, and when all the moth’s data are ready he and his Tree of Life Assembly team (Marcela and Ksenia) kick off the process of assembly on the Sanger’s compute farm. This uses cutting edge software to identify overlapping long reads, disentangle confusions that result from repeats and errors, and finally stitch everything together first of all into contigs (stretches of contiguous AGCT sequence) and then into scaffolds (contigs that are ordered and oriented using long-range data). Only five years ago we would have struggled to generate assemblies with mean contig lengths over 50 kb. With the long read Pacific Biosciences and Oxford Nanopore data we now get assemblies with mean contig lengths over 1 Megabase (Mb), frequently over 5 Mb and sometimes over 10 Mb. For species like our moth, which has a genome of 600 Mb, once Shane adds the 10X and Hi-C data, these assemblies fall into chromosomes. 

From sequence to contig to scaffold to chromosomes: the genome of a moth comes together using Hi-C data. The denser colours on the plots show the links between the contigs from the genome inferred from Hi-C data – before Hi-C scaffolding on the left, and after on the right, which has 30 large scaffolds and a few smaller ones waiting to be linked together by the GRIT informaticians. We expect a moth to have ~30 chromosomes. Image from Shane McCarthy.

The assembly team then hands the newly-minted moth genome assembly over to Kerstin’s Genome Reference Informatics Team (GRIT: Kerstin, Joanna, Sarah, Ying, James, William, Jonathan, Alan, Damon). For the moth, Ying stress-tests the assembly with a battery of analyses, basically asking “Is this the best we can do?”. The results get handed over to Sarah, who blesses the unproblematic majority of the assembly, affirms some correct guesses, fixes the few errors and exports a quality assured assembly. James, the gatekeeper in GRIT, brokers submission of the genome assembly to the European Nucleotide Archive, part of the International Nucleotide Sequence Database Consortium, and presses the “release” button. 

The new moth genome emerges into the light of a new digital day, one of 1000 species of all kinds we will extract, sequence and assemble this year. To publish the genome and announce its availability to the community to use and analyse, we write a brief Genome Note for rapid publication in Wellcome Open Research (2). Nancy marks the genome “complete”.

Now for the next one.

Mark Blaxter

Genome Note

Grey Squirrel Genome Note

The genome assembly for the Eastern Grey Squirrel (Sciurus carolinensis) has now been released on Wellcome Open Research: https://wellcomeopenresearch.org/articles/5-27

This follows the recent release of the red squirrel genome assembly, and together these genomes may have implications for squirrel conservation both in the UK and elsewhere: https://www.sanger.ac.uk/news/view/red-and-grey-squirrel-genomes-could-hold-key-survival-reds-britain-and-ireland

Tales from the GALS

Wytham Woods: the genomics of ecology and evolution

Ancient woodlands are the most biodiverse and complex terrestrial habitat in the UK. Home to thousands of iconic and specialist animals, plants and fungi, our ancient forests and woodlands are also deeply entwined with our cultural heritage. In recent decades, however, woodland cover has been eroded by land use change, and today just 2.4% of the UK is covered by ancient woodland: sites where forest cover has persisted for over 400 years, usually with management to some degree.

Wytham Woods cloaks a prominent hill above a sweeping bend in the River Thames. The 400 hectare (1000 acre) site is a mosaic of ancient semi-natural woodland, forest plantations, limestone grassland and other species rich-habitats. It has been owned and maintained by the University of Oxford since 1942, and is the site of some of the longest running ecological experiments and observations in the world. Wytham Woods has a rich fauna and flora, with over 500 species of plants and around 1000 recorded species of butterflies and moths, and teems with a diversity of birds and mammals.

As the Darwin Tree of Life project was being conceived, Wytham Woods rapidly emerged as a site for focussed and intensive sampling of terrestrial species for complete genome sequencing. In the earliest phase of the project, we concentrated our attention on sampling arthropods, especially a wide taxonomic spread of moths and a carefully chosen selection of hoverflies, dung beetles and spiders. Our core team (Liam Crowley, Peter Holland and Owen Lewis) has been crawling through vegetation, picking through dung and peering into light traps: identifying, photographing, cataloguing, freezing in barcoded cryovials and shipping specimens to the Tree of Life labs at the Wellcome Sanger Institute for DNA extraction. It has not been a solitary endeavour: we have benefitted enormously from the moth-trapping expertise of Douglas Boyes, and visits from hoverfly, dung fauna and spider specialists (Will Hawkes, František Sládeček, Lauren Sumner-Rooney and Alistair McGregor). Involvement of taxon experts is something we really want to encourage in the project, with forthcoming visits planned by specialist groups including the Dipterists’ Forum and the Earthworm Society of Britain We have a rustic chalet in the middle of the woods, with accommodation for small groups of visitors and volunteers, a kitchen and labs – perfect for early morning or nocturnal work.

Black Arches Lymantria monacha

By January 2020, just a few months into the Darwin Tree of Life project, we had sent specimens of 221 arthropod species to the Sanger Institute. Not all will be turned into genome sequence, but a close look at the first few genome sequences assembled reveals the data quality to be astonishingly good. So what could we learn from Wytham Woods genome sequence data? And more generally, why focus part of a major sequencing project on ancient woodland? We think there are several reasons. First, it is incredibly efficient to focus sampling at a few sites. Second, the sequences will become key reference genomes for ecological and environmental studies through the 21st century. Our woodland fauna and flora are under threat due to land use change, invasive species, climate change and pathogen outbreaks. Understanding and predicting these changes, and possibly mitigating some of them, will require us to understand how each species responds to challenges at a cellular and molecular level. Such studies, including transcriptomic and proteomic analyses, will be greatly aided by reference genomes. Populations could also become fragmented or merged, and to detect this comparisons need to be made between individuals, something that will be facilitated by reference genomes. The third reason centres on evolution. Natural selection has adapted organisms to their environment through fixation of genetic change, and so hidden in the genome sequences will be clues to how evolution has shaped physiology, anatomy, life history, behaviour and other traits. There will surely be new genes, divergent sequences, genome duplications, horizontal gene transfers and much more: a deeper understanding of biodiversity is waiting to be discovered in Wytham Woods.

Peach Blossom Thyatira batis

Peter Holland, Owen Lewis, Liam Crowley

Genome Note

Red Squirrel Genome Note

The Red Squirrel Genome (Sciurus vulgaris) note has now been released on Wellcome Open Research: https://wellcomeopenresearch.org/articles/5-18

The release of this note, and the upcoming Grey Squirrel genome was covered on the Wellcome Sanger news page, highlighting the potential conservation importance of this work: https://www.sanger.ac.uk/news/view/red-and-grey-squirrel-genomes-could-hold-key-survival-reds-britain-and-ireland