Description
Lineage B.1.1.7 (Rambaut et al.),
also known as 20B/501Y.V1
(Nextstrain) and
Variant of Concern (VOC) 202012/01
(Public Health England),
spread rapidly in England in November and December 2020
(Volz et al.).
It has a large number of mutations including non-synonymous substitutions and deletions,
and as of Jan. 4 2021, over 7,000 sequences from 29 countries have been submitted to
GISAID. The first confirmed B.1.1.7
sequence in the United States was announced Dec. 29 2020.
This track shows single-nucleotide substitutions and deletions from the SARS-CoV-2 reference genome
in the B.1.1.7 consensus sequence and the first nine genome sequences in the United States
that were assigned to B.1.1.7.
The track was generated using hgPhyloPlace,
the Genome Browser's web front end to
UShER (Turakhia et al.,
see Methods below).
UShER places uploaded sequences in a global phylogenetic tree and also extracts subtrees
showing each sample's local phylogenetic context. hgPhyloPlace generates a JSON file for
each subtree which can be displayed using
nextstrain.org. The first nine U.S.
B.1.1.7 sequences have been placed in five clusters which correlate with geographic location.
Here are links to view the subtrees at nextstrain.org:
Display Conventions
In "dense" mode, a vertical line is drawn at each position where there is a mutation.
In "squish" and "pack" modes, the display shows a plot of all
samples' mutations, with samples ordered using the phylogenetic tree in order to highlight
patterns of linkage. "Full" display mode shows each mutation on its own row,
ordered by position instead of lineage.
Each sample is placed in a horizontal row of pixels; when the number of
samples exceeds the number of vertical pixels for the track, multiple
samples fall in the same pixel row and pixels are averaged across samples.
Each mutation is a vertical bar at its position in the SARS-CoV-2 genome
with white (invisible) representing the reference allele;
the non-reference allele is shown in red if it changes the protein sequence of a gene,
green if it falls within a gene but does not change the protein,
and black if it does not fall within a gene.
Tick marks are drawn at the top and bottom of each mutation's vertical bar
to make the bar more visible when most alleles are reference alleles.
The phylogenetic tree showing inferred relationships between the samples is depicted
in the left column of the display.
Mousing over this will show the sample identifiers.
With the default font size (or smaller), the leaves of the tree are labeled by sample
identifiers. For larger font sizes, the track height will need to be increased in order
for the labels to fit.
The track height can be adjusted in the track controls, which can be reached by
clicking on the gray button to the left of the tree or by right-clicking on the image.
Methods
B.1.1.7 consensus sequence was determined from COG-UK sequences assigned to B.1.1.7 with
early sample collection dates.
The nine U.S. B.1.1.7 genome sequences available as of Jan. 2, 2021 were downloaded from
GenBank
and GISAID
and uploaded to hgPhyloPlace,
which uses
UShER (Turakhia et al.)
to place uploaded SARS-CoV-2 genome sequences in a global phylogenetic tree,
and generates custom tracks for the Genome Browser showing single-nucleotide substitutions
in uploaded sequences.
hgPhyloPlace ignores insertion/deletion mutations, working only with substitutions
because those are adequate for inferring phylogeny; however, since B.1.1.7 has four
deletions, three of which cause amino acid deletions from genes,
minimap2 (Li) was used to align B.1.1.7 to the reference genome
so that deletions could be displayed in addition to substitutions.
Data Access
The first sequences from California, Colorado, Florida and New York are available from GenBank:
All nine sequences are available from
GISAID.
GISAID data displayed in the Genome Browser are subject to GISAID's
Terms and Conditions.
SARS-CoV-2 genome sequences and metadata are available for download from
GISAID EpiCoV™.
COG-UK releases daily updates of sequences and metadata; scroll down to the
"Latest Sequence Data" section of the
Data page for links.
The mutations in the B.1.1.7 consensus sequence and the sequences available from GenBank
may be downloaded in Variant Call Format (VCF):
lineageB_1_1_7_US_first7.vcf.gz
The mutation-annotated phylogenetic tree file used by UShER to place the sequences
may be downloaded in order to run UShER locally:
public-2020-12-08.all.plus.cogUk.12-30.masked.pb.
Credits
This work is made possible by the open sharing of genetic data by research
groups from all over the world.
We gratefully acknowledge the authors and the originating laboratories where the clinical
specimen or virus isolate was first obtained and the submitting laboratories, where sequence
data have been generated and submitted to public databases,
on which this research is based.
References
Rambaut A, Holmes EC, O'Toole Á, Hill V, McCrone JT, Ruis C, du Plessis L, Pybus OG.
A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology.
Nat Microbiol. 2020 Nov;5(11):1403-1407.
PMID: 32669681
Rambaut A, Loman N, Pybus O, Barclay W, Barrett J,
Carabelli A, Connor T, Peacock T, Robertson DL, Volz E, et al.
Preliminary genomic characterisation of an emergent SARS-CoV-2 lineage in the UK
defined by a novel set of spike mutations.
Virological. 2020 Dec 18.
Volz E, Mishra S, Chand M, Barrett JC, Johnson E,
Geidelberg L, Hinsley WR, Laydon DJ, Dabrera G, O'Toole Á, et al.
Transmission of SARS-CoV-2 Lineage B.1.1.7 in England: Insights from linking
epidemiological and genetic data.
Virological. 2020 Dec 31.
Turakhia Y, Thornlow B, Hinrichs AS, De Maio M, Gozashti L, Lanfear R, Haussler D, Corbett-Detig R.
Ultrafast Sample Placement on Existing Trees (UShER) Empowers Real-Time
Phylogenetics for the SARS-CoV-2 Pandemic.
bioRxiv. 2020 Sep 28.
Li H.
Minimap2: pairwise alignment for nucleotide sequences.
Bioinformatics. 2018 Sep 15;34(18):3094-3100.
PMID: 29750242; PMC: PMC6137996
|