DToL News

New Tree of Life Gateway

We’re excited to announce that the Tree of Life Gateway has now been launched on Wellcome Open Research. Jonathan Threlfall and Mark Blaxter wrote a article on the Wellcome Open Research blog to mark the occasion, the text of which is reproduced below:

When Carl Linnaeus began to formalise the cataloguing of all life on earth over 250 years ago, he developed a ranked taxonomy to organise our understanding of all species on earth. As time has passed and knowledge of diversity, biology and evolutionary theory has increased, the Linnaean classification system has evolved alongside it. This Linnaean catalogue is one of the crowning achievements of science: an openly accessible, globally treasured and communally owned record of all of the books in the great library of life. The next step in comprehending the interconnectedness of all life is to read and understand the material present in all these Linnaean books: decoding the genomes of all living organisms.

In 1977, Frederick Sanger and colleagues decoded the complete DNA sequence of a genome for the first time. The ΦX174 bacteriophage – a virus that infects E. coli – is made up of just 5386 DNA base pairs. This ground-breaking work would lay the foundations for dramatic advances in genomics, with the human genome – 3.1 billion DNA base pairs deciphered over 13 years – completed in 2003. Forty-four years since Sanger revealed the first genome sequence, the Tree of Life programme at the Wellcome Sanger Institute, the research institute that bears his name, aims to advance the field of genomics yet further. In our programme, the genome sequences of thousands of diverse species will be rapidly determined to the highest quality to generate references of unparalleled accuracy.

The Tree of Life programme uses DNA sequencing to examine the diversity of complex organisms all around us, assembling high-quality genome sequences that can be used to understand the evolution of life and to aid the conservation of biodiversity. The programme delivers its goals through major projects, especially the Darwin Tree of Life project. The Darwin project is a partnership with biodiversity and analysis colleagues throughout the UK, and aims to decipher the genome sequences of all approximately 70,000 eukaryotic organisms (plants, animals, fung and single-celled species) that live on and around Britain and Ireland.

We are also generating genomes for the Aquatic Symbiosis Genomics Project, which will sequence the 1000 participants of 500 symbioses between eukaryotic hosts and their symbiotic microbial ‘cobionts’ to understand their interconnected biology, reference vertebrate genomes as part of the Vertebrate Genomes Project, and other projects. Tree of Life is part of the Earth BioGenome Project, which has the modest goal of attempting to catalogue and sequence all complex life on earth.

We announce the completion of each Tree of Life genome assembly with a Wellcome Open Research micropublication we call a Genome Note. Genome Notes summarise the origin of the specimen that was used for sequencing, the methods used to extract and sequence the genetic material and the bioinformatic processes that were used to assemble and fine-tune the genome sequence to high quality. They provide assessment of the quality of the genome sequence, using descriptive statistics and figures that demonstrate its quality and accuracy. However, they do not provide any formal analysis of the genome.

An example of a Genome Note graphical display showing the the quality of the genome of Sciurus vulgaris, the red squirrel (https://doi.org/10.12688/wellcomeopenres.15679.1). The “snail plot” summarises metrics that describe the genome, and (in the publication) links to an interactive version (see https://blobtoolkit.genomehubs.org/view/mSciVul1_1/dataset/mSciVul1_1/snail)

In addition to announcing to the world the compilation and availability of the genome, these notes provide citable credit for all involved in the generation of these sequences, from field collectors to infrastructure coordinators, ensuring that the vital contributions of everyone involved in the projects are recognised. These types of brief publication are now common for reporting the genomes of bacteria, viruses and organelles (such as, for example, the American Society for Microbiology’s “Microbiological Resource Announcements”), but the sequences of more complex species have usually been published as more in-depth, analytic research articles.

The sheer number of genomes that are being generated by Tree of Life means that writing a traditional research paper for each species is implausible and would only serve to delay the dissemination of these data. We at Tree of Life want to make these sequences public immediately so that they can be used by researchers at once, rather than embargoing them for years, restricting their utility while we complete analyses and then wait for the editorial boards of traditional journals to decide whether (and then when) they want to publish our work.

Publishing Genome Notes with Wellcome Open Research ensures that the data are available for reuse by all immediately without restriction, maximising their impact. The Wellcome Sanger Institute and Wellcome Open Research have iron-clad Open Data policies and stipulate adherence to the FAIR Data Principles. These policies make Wellcome Open Research a natural fit for us at Tree of Life, where we seek to publish all data, underlying and incidental, immediately and entirely openly. For example, we openly expose quality assessments and other metrics for our raw data and interim progress though ToLQC, and Darwin Tree of Life Data Portal tracks species as they progress through the Tree of Life pipeline and links to all of the data as they become available in the European Nucleotide Archive and in the Ensembl genome database. The rapid publication model of Wellcome Open Research allows the genome sequences to be formally published and shared as quickly as possible, ensuring that researchers are aware they can make use of these data and replicate the methods used in their generation almost instantly.

One of the interesting aspects of this project with Wellcome Open Research is to increase the efficiency in which these genome notes can be prepared and published. F1000, who powers the Wellcome Open Research publishing platform, are developing a new technology to provide a quicker automated route to publication. This means that Genome Notes will ultimately be formulated ‘straight from the sequencer’ and be supported with a combination of machine and community review. More on this very soon!

The native range of the red squirrel, Sciurus vulgaris. The arrow shows the source location of the sequenced individual. Image by NordNordWest, Public domain, via Wikimedia Commons. https://commons.wikimedia.org/wiki/File:Sciurus_vulgaris_habitat.png

We hope that the Tree of Life Gateway will serve one additional purpose. As well as announcing the availability of practices and data associated with the programme and providing citable credit for all participants, we also want this Gateway to demonstrate to others that immediate and open data sharing is not only possible but essential if projects of this scale are to have the greatest possible impact.

The species we are sequencing in the Darwin Tree of Life and other projects are not “ours” to own, but part of a global patrimony. They have wide distributions, and are often found in many countries or bioregions. By sequencing an individual for a species from a single location (for example the Lancastrian red squirrel presented in one of our first Genome Notes) we are providing a reference for the whole of that species, across its range (the red squirrel is found from Ireland and Portugal to the Kamchatka Peninsula of Russia, Korea and Japan).

We believe that it is through openness and transparency that the worldwide community of researchers and the lay public can be best engaged with projects such as this, and it is only through engagement that we can maximise the impact of the Tree of Life programme and its output.

The gateway launches with seven published Genome Notes. Among these are the majestic Golden Eagle (Aquila chrysaetos), measures are being taken to secure populations and help increase their numbers, but the species remains to be threatened by illegal shooting and poisoning. Also, the native red squirrel (Sciurus vulgaris), a species that has been largely displaced by the American grey squirrel and declining because of the fatal Squirrelpox virus that grey squirrels carry. Make sure to browse through the other genomes, including the bat, otter and rat.

DToL News

Release of full bee genome sequences creates a buzz

As we revealed in Liam Crowley’s blog yesterday, the Darwin Tree of Life (DToL)  is pleased to announce that we have released full genome sequences for three of the bumblebee species found in Britain and Ireland, with more coming soon. The DToL project aims, ultimately, to sequence all the 70,000 species that make their homes here on and around these islands. We will release these data openly to build foundations for a new biology based on reference genome sequences – basic science, conservation, ecology, evolution, and biotechnology will benefit from our project.

Bumblebees are of particular interest and concern in our environment. DToL project lead Prof. Mark Blaxter says:

Bumblebees are an iconic and important part of our ecosystems, and through their pollination services are essential to the productivity of agricultural crops. However bumblebee populations are threatened globally, and especially in temperate ecosystems. They are also fascinating organisms with social behaviour, complex immune systems, venoms of possible medical application, and many other features.

The Darwin Tree of Life Project is proud to have been able to sequence three bumblebees in the first set of species we are analysing, and we look forward to seeing how these new data will be used by conservationists, ecologists and biologists in understanding and conserving these beautiful animals”

We have already received a lot of positive feedback regarding the latest release from the wider community, who are keen to use these sequences in their research. DToL partners at the Earlham Institute said:

“The Earlham Institute is excited to be working alongside Oxford University and the Wellcome Sanger Institute on researching UK bumblebee species as part of the Darwin Tree of Life project. The groundbreaking work they have done to collect this core sample of UK bumblebee biodiversity, and to generate high quality genome sequences will further enable our ongoing work on bumblebee population biodiversity. We are studying the population structure and history of several of the species included in this release. The unprecedented quality and completeness of the genomes will allow us to ask these questions with an accuracy never thought to be possible before. As bumblebees play such an important role in our native pollinator assemblage, we are interested in understanding why some species are struggling in the UK countryside, whilst others are thriving. Our work will allow us to isolate parts of the genome that are particularly important to the health of contemporary populations, and hopefully this understanding can contribute to future conservation efforts to protect bumblebees across the UK. 

The release of the first DToL bumble bee genomes also presents an amazing opportunity to understand the unique biology of our native bees. Our lead researcher, Dr Calum Raine, is asking questions about the relationship between the way bees determine their sex and the way they evolve. His work, described in an article here, will greatly benefit from the availability of the genome sequences of more UK bumblebees. Working in a field called comparative genomics, more genomes equals more explanatory power. This release will offer such an increase in power, and hopefully speed exciting insights into some of our most loved pollinators.”

Earlham Institute group leader Dr Wilfried Haerty echoed this sentiment, saying:

“We are extremely excited by the release of high quality bumblebee genomes by the DToL as it will allow a great step forward in our ongoing work investigating Bumblebee populations genetics and dynamics across the UK, and how their unique biology shapes the evolution of their genomes”

There has also been excitement from the Bumblebee Conservation Trust (BBCT), the UK charity whose aims are to enhance the understanding of bumblebee ecology and conservation, increase the quality and quantity of bumblebee habitat, and to inspire and enable a diverse range of people to take action for bumblebees. Dr Amy Plowman (Head of Conservation and Science at BBCT) said:

We are really excited to see the first Bombus genomes added to the Darwin Tree of Life.  Bumblebee researchers around the world will be able to use them to understand more about these wonderful species”.

To commemorate the release of these first bee genomes, we have collaborated with award-winning young artist Leon Jarman to distribute a print of his bumblebee painting to project partners. Leon, who donates a portion of his profits to the BBCT, says

“I painted the bumble bee in the summer of 2020 using coloured inks and water.  I am a very big fan of bees and wasps so my friends and I started a Bee Society at school to help inform others about why we must look after our fluffy, flying friends and not be afraid of them.  My painting won first prize in an art competition in August 2020 and because of all the positive comments about my painting and requests for prints I decided to make some to sell.  I chose to donate 25% of all my profits to the Bumblebee Conservation Trust to help raise awareness and help protect them from dangers like harmful pesticides. 

I am very grateful that my painting has been chosen for the bee section of the tree of life project.”

Bumblebee by Leon Jarman

(If you would like a copy of his picture, they can be obtained from Leon’s Facebook page: Bee cooperative – Home)

DToL News

Successful full genome sequencing of three bumblebee species

The Darwin Tree of Life team are delighted to announce the release of three complete bumblebee genomes this week. These high quality, chromosomally complete reference genomes have been produced from specimens from Wytham Woods, near Oxford. The genomes are amongst the first to be produced by the Darwin Tree of Life project and represent an important milestone on our mission to sequence the full genomes of all 70,000 species of eukaryotic organisms in Britain and Ireland.  This was a highly collaborative endeavour, involving researchers from several institutions including the University of Oxford, the Natural History Museum, the Sanger Institute and the European Nucleotide Archive.

Why sequence bumblebee genomes?

Bumblebees are a charismatic genus of large, furry, colourful bees which are globally important pollinators in both agricultural and wild ecosystems. They are particularly diverse across temperate regions and many species are able to live at higher altitudes or fly in cooler conditions than other pollinator groups. Many species of bumblebee are, however, in decline, with at least 2 (arguably 3) species having been lost from Britain. In addition to their importance as pollinators, true bumblebees are social insects which exhibit complex behaviours, making them interesting and important model species for evolutionary studies. Bumblebees are also utilised as environmental indicators in toxicology studies.

Reference genomes not only allow the immediate investigation of the evolutionary history of a species, but are also a fundamental prerequisite for subsequent analysis of a wide range of biological questions. For example, a reference genome is required in order to allow the selection of target loci for resequencing of large numbers of individuals for a population genetics approach to conservation. Bumblebee genomes will provide insights into behaviour, diet, metabolism, kleptoparasitism, immunity and detoxification across the group (e.g. Sun et al., 2020).

Twenty-four of the ~250 global species of bumblebees can be found in the UK, including 6 species of ‘cuckoo bumblebees’ in the subgenus Psithyrus, which are social parasites. Fourteen of these 24 UK bumblebee species have been found to be present across the diverse habitats of Wytham Woods, making this site an ideal location to commence sequencing efforts. The sampling focussed on collection of males and workers, so as to limit the impact of collections on bumblebee populations. This is due to the life history of bumblebees, where only newly mated queens (or females of cuckoo species) overwinter, meaning that limited removal of males and workers is unlikely to affect the overall population. Furthermore, Hymenoptera (the order to which all bees belong) are haplodiploid, meaning that males have half the number of chromosomes as females (one of each chromosome from the mother – rather than a pair of each chromosome with one from each parent), and are therefore more straightforward to sequence.

Which species were sequenced?

For three of the species of Bombus which occur at Wytham Woods sequencing and assembly is complete: B. campestris (the field cuckoo-bee), B. hortorum (the garden bumblebee), and B. pascuorum (the common carder bee).  In addition to the three released this week, there are an additional nine bumblebees already in sequencing in DToL: watch this space!

Bombus campestris – The field cuckoo bee

Field Cuckoo Bee- Photo by Liam Crowley

This is a cuckoo bumblebee, which takes over the nests of B. pascuorum (and probably also other carder bees). The public genome data for this species can be found here.

Bombus hortorum – The garden bumblebee

Garden Bumblebee- Photo by Liam Crowley

This species is quite common in gardens. It is quite a large bee with a very long proboscis (tongue), therefore favours flowers with a deep corolla. The public genome data for this species can be found here.

Bombus pascuorum – The common carder bee

Common Carder Bee- Photo by Liam Crowley

This species is the most common and widespread of the UK carder bees, a group of bumblebees which nest on or just under the ground and cover the nest with moss (hence the ‘carder bee’ name). The public genome data for this species can be found here.

References/further reading

Edwards, M. and Jenner, M., 2005. Field Guide to the Bumblebees of Great Britain and Ireland (Ocelli).

Sun, C., Huang, J., Wang, Y., Zhao, X., Su, L., Thomas, G.W., Zhao, M., Zhang, X., Jungreis, I., Kellis, M. and Vicario, S., 2020. Genus-wide characterization of bumblebee genomes provides insights into their evolution and variation in ecological and behavioral traits. Molecular biology and evolution, 38(2). https://doi.org/10.1093/molbev/msaa240 

By Dr Liam Crowley, postdoctoral field biologist at the University of Oxford who collected the specimens from Wytham Woods.

DToL News

Darwin Tree of Life: looking back on 2020

Despite restrictions, 2020 has been a busy year for the Darwin Tree of Life Project. We take a look at some of this year’s achievements and highlights.

The Darwin Tree of Life (DToL) Project kicked off in late 2019 with the ambitious task of sequencing, assembling, and annotating the genomes of around 60,000 British and Irish species over a ten year period.

But when the COVID-19 pandemic hit in early 2020, many of the project’s plans were put on hold. Field work, sampling, and processing of new specimens in the lab were hit most by restrictions put in place to control the spread of the SARS-CoV-2 virus. Despite all this, many significant advances and discoveries were made as part of the DToL project throughout 2020. New species were recorded, DNA extraction methods were refined, and genome annotation became faster than ever before. A parcel of 30 completed genomes was delivered to the public databases at the end of the year.

We take a look back at the work carried out within the DToL project over the last year and shine a light on a few of the biggest highlights of 2020.

Macropis europaea: the Yellow Loosestrife Bee

University of Oxford

This year many species were collected from the Wytham Woods ecological observatory, including new records and rare species. The biggest highlight of the year was the discovery and collection of the Yellow Loosestrife Bee, Macropis europaea. This species was recorded at Wytham for the first time this summer. It is a rare bee in the UK, restricted to mainly wetland sites in southern England. Furthermore, this species is currently the only representative of the Melittidae collected for the project, one of just six families of bees in the UK.

Yellow Loosestrife Bee. Credit: Liam Crowley

“I was so thrilled to find a population of Macropis thriving at Wytham,” says Liam Crowley, a post-doctoral researcher on the DToL project. “Not only is it the first melittid bee to be sequenced for the project, but it was also a species I had never encountered before despite wanting to see it for long-time!”

M. europaea was also the first monolectic bee species collected for the project. This means it collects pollen from just a single species of flower – yellow loosestrife, Lysimachia vulgaris – which is a relatively unusual trait across UK bee species. Even more exceptionally, it collects floral oils from the yellow loosestrife flowers, to produce an oily wax with which it lines its underground nest cells.

This behaviour is unique amongst British bees, and is believed to assist in waterproofing the cells in order to protect the developing larvae from drowning in the saturated soils of wetland habitats.

The challenges of bryophytes

Royal Botanic Garden Edinburgh

The Royal Botanic Garden Edinburgh grows thousands of species of plants in its four garden sites. While COVID-19 restrictions limited work at wild locations, the Royal Botanic Garden Edinburgh team has benefited from access to the rich Living Collection of species held in care across these four sites.

“There have been opportunities to collect from bryophyte-rich woodland and moorland sites in the Scottish Borders. We have worked closely with the University of Edinburgh, Kew and the British Bryological Society to finalise species lists for the UK and Ireland,” said David Bell, Sample Co-ordinator for the DToL, Royal Botanic Garden Edinburgh.

Sample collection on Raven Craig. Credit: Shauna Hay

Bryophytes (mosses, liverworts, and hornworts) bring their own challenges. The combination of their diminutive size and tendency to grow in mixed populations with other bryophytes, fungi, algae and invertebrates, means sampling requires the collection of sufficient relatively clean material.

They must be processed under a microscope to isolate the freshest material of the target species for genome sequencing, with additional samples prepared for DNA barcoding, genome sizing by flow cytometry and voucher herbarium specimens. Sampling sufficient material and targeting larger bryophyte species is particularly important during the early stages of the DToL project while protocols are still being developed.

Sampling sea life: seaweed, sea sponges and sea snails

The Marine Biological Association (MBA)

This year the MBA processed samples for 132 species and set up standard procedures for Macroalgae (seaweed), Porifera (sea sponges), Cnidaria (corals and anemones), Bryozoa (mat animals), Mollusca (sea snails and slugs), Echinodermata (starfish and sea cucumbers), and simple filter feeders such as Tunicata (sea squirts). The first shipment of 568 samples from 53 species was sent to the Wellcome Sanger Institute for genome sequencing in November 2020.

The MBA has also optimised DNA extraction and PCR protocols for many different species of seaweed. To date, they have collected 34 common species. They are also starting to collect protists, very simple eukaryotic organisms that are not considered animals, plants or fungi. Sixteen protist strains are currently being cultivated, while nine have been harvested for DNA extraction.

“Barcoding protocols are currently being developed at MBA by Helen Jenkins and Joanna Harley, and a wider conversation about cross-institutional protocols is occurring with the DToL project collaborators” says Nova Mieszkowska, MBA Research Fellow. “The methods at MBA aim to firstly confirm identification to species level where possible, and secondly provide ‘deep’ phylogenetic information by methods such as building multigene trees.”

Data collection on the go

The Natural History Museum

In spite of the pandemic, the Natural History Museum (NHM) DToL team have had many highlights this year including the successful development of a sample collection-to-barcode pipeline. The sampling team has completed the arthropod species list and once lockdown was lifted fieldwork trips took place. The team also undertook ad hoc collecting locally when possible. A total of 1034 samples have been collected and are now stored in the NHM Molecular Collection Facility.

The data management team worked hard to get a sample data pipeline in place, setting up the epicollect mobile app for in-field sample data entry. This app helps to ensure that sample data can be exported to the DToL sample tracking system (based on COPO) and stored on the NHMcollections management system.

A barcoding pipeline was put in place and collected samples were successfully sequenced, barcodes validated against the BOLD database and the analysed data was then sent over to the Sanger. The NHM team is now fully trained to use their new PacBIO Sequel machine, and they will be validating this system to increase barcoding throughput going forward.

COPO: a big data broker for the DToL

Earlham Institute (EI)

“COPO is something quite special and unique that the science community has long been missing,” says Dr Seanna McTaggart, the Earlham Institute’s (EI) DToL Programme Manager. “For too long, data has been locked away in lab notebooks, or in files on a computer.”

COPO – Collaborative Open Omics – changes that.

COPO is a big data broker for life science. Developed by the Davey Group at EI, COPO takes care of uploading the metadata that are essential for contextualising genomic data. It’s as simple as uploading a spreadsheet, and COPO then does the rest, making sure that data is referred to the correct public repository. In the case of DToL, that is EMBL-EBI’s European Nucleotide Archive (ENA).

“COPO ensures that metadata is validated,” said EI Research Software Engineer Alice Minotto in a recent interview. “This could be metadata such as taxonomy, which can be tricky as identifying organisms is not a fixed process. Names and species identification can change over time, and even within specific communities.

“Instead of having to check and submit this information manually, which would take a very long time, COPO automates the process. This makes it far less time consuming, easier, and eliminates errors.”

To find out more about COPO, contact Dr Felix Shaw and Alice Minotto via the COPO website.

Large-scale sampling and tricky, slimy species

Wellcome Sanger Institute

It has been a tumultuous year for Sanger’s DToL team as they started to set up large-scale DNA sampling and sequencing pipelines from scratch, only for coronavirus to shut down scientific operations for several months. Caroline Howard, Scientific Manager for Sanger’s Tree of Life Programme, says the team have done an outstanding job.

“I think one of our biggest achievements has to be that we’re now properly up and running, despite the disruption of coronavirus. The support from our colleagues in sequencing operations has been amazing, particularly Elizabeth Cook, Craig Corton, Karen Oliver and Mike Quail.”

Sanger now has a fully-functioning tracking system where samples from the same specimen are submitted for the various sequencing techniques required, at a rate of 20-30 species per week. People may think extracting and sequencing DNA is the same for all families and species, but in fact different taxa pose different challenges that have to be solved each time.

“We’ve had a lot of success processing butterfly and moth samples this year, but slimy species such as molluscs continue to be tricky. But we’ve come a long way. A great example of how far our pipelines have come is Patella pellucida, the blue-rayed limpet. This sample was collected by Sanger faculty at Millport, Scotland at the end of August. Within five weeks, it had been received in the lab, gone through sample management, validated using COPO, put through our protocols for DNA extraction and sub-sampling, and submitted for sequencing.”

The blue-rayed limpet (Patella pellucida) was one of the species sequenced using Sanger’s new DNA pipelineCredit: Mark Blaxter

“We’re now assembling all of the data to reference genome standard. I think this represents an impressive turnaround time from collection to reference genome, and stands us in good stead to scale up in the year ahead.”

At the end of the year, the Sanger teams celebrated the formal release of the first 30 DToL species’ genome sequences to the European Nucleotide Archive. These assemblies are of uniformly high quality, with all the sequences assigned to chromosomes. Hundreds more are now in the sequencing, assembly and curation pipeline.

Illuminating nature’s dark matter: protists and single cell genomics

EI and University of Oxford

Protists make up the overwhelming majority of eukaryotic life but until now have remained relatively understudied. Researchers in the Hall group at EI and the Tom Richards lab at the University of Oxford are changing that, aiming to sample and decode the breadth of protist diversity across the British Isles.

That’s no easy task. ‘Protist’ is a word that describes a staggering range of lifeforms, some with genomes as small as a bacterium while others boast far greater complexity than that of the human genome. At EI, Dr Sally Warring has been working with the Single Cell Genomics team to coax the genetic information from this mysterious myriad of lifeforms.

Green algae colonies from an agar plate. Credit: Sally Warring

“Protists are so variable,” Warring explained to us in a recent interview. “Some have thick cell walls, some have glass cell walls, some have silica scales on them, some have starch – all these different things going on with their cell chemistry. This all makes DNA extraction, or the ability of an enzyme to work, highly variable.

“What I’m doing now is culturing protists to use Hi-C [a chromosome capturing mechanism], which looks at the proximity of DNA sequences to each other to get a better idea about the structure of genomic sequences. We’re trying to establish this in our single cell pipeline, possibly from metagenomic samples, to get better single cell genomes.”

Rapid access to the DToL genomes

EMBL’s European Bioinformatics Institute’s (EMBL-EBI)

One important goal of the DToL project is to make all of the newly sequenced genomes fully accessible to all researchers. Every genome sequence from the DToL project will be freely available through EMBL’s European Bioinformatics Institute’s (EMBL-EBI) database, the European Nucleotide Archive (ENA). Each of the genome sequences collected will also be annotated, stored and made available through the Ensembl genome browser. Both the ENA and Ensembl have made significant changes to their underlying processes to be as efficient as possible and keep up with the enormous scale of the DToL project.

These changes, driven by a need for rapid access to genome annotations at scale, led to the launch of Ensembl Rapid Release. Rapid Release is a lightweight, scalable version of the Ensembl genome browser designed to house annotations for species from DToL and other sequencing efforts.

Unlike the main Ensembl website, which updates every three months, Rapid Release is updated every two weeks with new species and annotations. As a result, downstream research can begin within weeks of the annotation being finalised – a huge benefit to the DToL project as the number of genomes begins to ramp up.

“Five months after the launch of Ensembl Rapid Release, we already have over 170 genomes from DToL and other projects,” says Fergal Martin, Vertebrate Annotation Coordinator at EMBL-EBI. “As we get more genomic and transcriptomic data from DToL we can now roll out the annotations on Rapid Release.”

These are just some of the amazing achievements made by the DToL project this year and this is just the beginning. Thousands of new genomes will be sequenced in the coming years as the DToL project gears up to sequence entire ecosystems.

As the DToL project expands to collect and sequence more species, researchers can expect to see more new genomes released and made freely accessible. In the near future, the DToL project will also provide a great opportunity to bring people closer to nature and give us a better understanding of how we can protect our planet.

Members of the Sanger Tree of Life team on a sample collecting visit to Millport, Scotland Credit: Mara Lawniczak

DToL News

International Day of Biodiversity

Biodiversity is at the core of the Darwin Tree of Life (DToL) project, and today (May 22nd) marks the Convention on Biological Diversity’s International Day of Biodiversity. While we celebrate the variety of life on Earth every day, today seems like a great time to explain some of the benefits our project hopes to have for the broader environment. Our aim is to obtain high-quality genome sequences for each of the over 60,000 species of eukaryotic organisms in Britain and Ireland- a goal which includes all protists, plants, fungi, invertebrates and vertebrates. Our consortium includes a number of different Genome Acquisition Laboratories (or GALs), consisting of two botanic gardens (RBG Edinburgh and Kew), the Natural History Museum (who have authored this page on biodiversity), and the Marine Biological Association, as well as several universities and research institutions (the Universities of Cambridge, Edinburgh and Oxford, the Earlham Institute, the Wellcome Sanger Institute and EMBL-EBI). 

DToL is a UK partner of the Earth Biogenome Project (EBP), a worldwide project to sequence all life on Earth. The EBP has three stated goals for it’s research: benefiting human welfare, protecting biodiversity and understanding ecosystems. These 3 goals can equally be applied to the aims of the Darwin Tree of Life. Though the goal of sequencing every species may seem like a fishing expedition, or an attempt to fill a Pokedex by “catching them all”, this type of genomic information can be invaluable, both for better understanding the species that share our islands, and also to inform and aid in their conservation and protection.

The Darwin Tree of Life project has already released genome notes for three mammal species, the red (Sciurus vulgaris) and grey squirrels (Sciurus carolinensis), and the Eurasian otter (Lutra lutra). All three of these species give great examples of the value of genomic information. The squirrel genomes (reported in the Washington Post) revealed the genetic sequences of two closely related and competing species. We hope that these genomes will provide valuable information about the genetic basis of immunity to squirrelpox in grey squirrels, and a minority of reds. Understanding this means we can make the best conservation decisions, such as choosing the most genetically suitable individuals for breeding and reintroduction programmes to preserve red squirrel populations. Such information is valuable for any species subject to reintroduction; through knowledge of the genetics of a species, researchers can identify which individuals are likely to be most resilient in their environment. These individuals are the strongest candidates for reintroduction as their better chance of survival gives the population the greatest chance possible to increase and thrive.

Understanding the genetic sequence of species can also help us to understand the effects of environmental change and adaptation, be it naturally occurring or as the result of human activity. The otter sample sequenced as part of DToL came from collaborators at Cardiff University, where the Otter Project undertakes a variety of studies focusing on the effects of pollution and disease on the UK otter population. The otter genome will provide further data on the effects of pollution by chemicals found in pesticides to the otters. A further (non-DToL) example of genome sequencing that increased understanding of adaptation to extreme environments can be seen in the recently published Antarctic blackfin icefish genome. These icefish are one of only a handful of vertebrate species which lack red blood cells, and also possess a number of other adaptations to extreme cold (such as genes to prevent ice damage- a natural internal anti-freeze!). Through sequencing the genome of a species, we can come to a greater understanding of how the mechanisms that allow them to survive in their environment.

Though the examples in this article have been limited to vertebrates, DToL will soon have many exciting future stories to tell about many other species of flora and fauna, which make up the majority of the biodiversity of the UK (and indeed the Earth). Our namesake Darwin himself was noted for being incredibly fond of earthworms, beetles and barnacles!  We hope to release the genomes for the complete list of UK lepidoptera (moths and butterflies) later this year, which will provide a fascinating comparative dataset for scientists that study these beautiful creatures. Through our project, we aim to provide researchers and naturalists with vital insights into their species of interest, allowing a deeper understanding of their adaptations to their environment, and also hopefully helping to provide the tools for their preservation. Within DToL we have expert groups which have worked to create lists of all the species we aim to sample, prioritising those with particular scientific interest. We look forward to bringing you more stories from a greater range of species as our work continues.

Black Arches Lymantria monacha – Collected by our team at Wytham Woods

By Sophie Potter, Wellcome Sanger Institute

DToL News

The Darwin Tree of Life Project and the COVID-19…

To all partners and collaborators,

The COVID-19 pandemic and associated public health measures mean that all of the institutions that are partners in the Darwin Tree of Life project have closed their physical doors, with staff working from home. This necessarily means that essentially all sample collection activities have ceased, and that no samples already in hand will be submitted for sequencing in the near future.

Despite this halt to collection and data generation activity the Darwin Tree of Life project is still running. We will be carrying out a series of research, documentation and bioinformatic tasks throughout the period of physical closure. We intend to return to full activity as soon as it is safe to do so, with improved data systems, more accurate species lists, streamlined analytic pipelines and a redoubled enthusiasm for sequencing the biota of Britain and Ireland.

The list of projects we will be approaching while working from home through the shutdown is being finalised but will include:

  • Work on the species inventory for Britain and Ireland: working on the checklists and delivering a much improved overview of the diversity of our environment.
  • Defining the full list of “first” target species (aiming to identify one species and one backup species to be sequenced to generate the reference genome for each taxonomic Family).
  • Work on detailed per-taxon sampling procedures, with specific standard operating procedures developed for each of the major taxa.
  • Work on the collection, handling and display of sample metadata for all of the different groups of organisms we will be collecting.
  • Work on the improvement of assembly algorithms and the development of bioinformatic analysis pipelines for long read and long range data.
  • Delivering high quality assemblies for all species for which we currently have sufficient data
  • Releasing our first annotated genomes on Ensembl and, once these are ready, a landing page for the Darwin Tree of Life at https://projects.ensembl.org

For all of these projects we welcome and encourage both cross-partner collaboration, and also collaboration with colleagues in the wider community who would like to take part. Please contact contact@darwintreeoflife.org if you would like to be involved.

Please cascade this message through your staff and to collaborators.

Stay safe and well.

Mark Blaxter
Tree of Life, Wellcome Sanger Institute
30th March 2020