Schema for Vaccines - COVID Vaccines BioNTech/Pfizer BNT-162b2 and Moderna mRNA-1273
  Database: wuhCor1    Primary Table: vaccines Data last updated: 2021-04-21
Big Bed File Download: /gbdb/wuhCor1/bbi/wuhCor1.vaccines.bb
Item Count: 3
The data is stored in the binary BigBed format.

Format description: bigPsl pairwise alignment
fieldexampledescription
chromNC_045512v2Reference sequence chromosome or scaffold
chromStart21559Start position in chromosome
chromEnd25384End position in chromosome
nameModernaMrna1273Name or ID of item, ideally both human readable and unique
score1000Score (0-1000)
strand++ or - indicates whether the query aligns to the + or - strand on the reference
thickStart21559Start of where display should be thick (start codon)
thickEnd25384End of where display should be thick (stop codon)
reserved0RGB value (use R,G,B string in input file)
blockCount1Number of blocks
blockSizes3825,Comma separated list of block sizes
chromStarts0,Start positions relative to chromStart
oChromStart54Start position in other chromosome
oChromEnd3879End position in other chromosome
oStrand++ or -, - means that psl was reversed into BED-compatible coordinates
oChromSize4004Size of other chromosome.
oChromStarts54,Start positions relative to oChromStart or from oChromStart+oChromSize depending on strand
oSequencegggaaataagagagaaaagaagagtaagaagaaatataagaccccggcgccgccaccatgttcgtgttcctggtgctgctgcccctggtgagcagccagtgcgtgaacctgaccacccggacccagctgccaccagcctacaccaacagcttcacccggggcgtctactaccccgacaaggtgttccggagcagcgtcctgcacagcacccaggacctgttcctgcccttcttcagcaacgtgacctggttccacgccatccacgtgagcggcaccaacggcaccaagcggttcgacaaccccgtgctgcccttcaacgacggcgtgtacttcgccagcaccgagaagagcaacatcatccggggctggatcttcggcaccaccctggacagcaagacccagagcctgctgatcgtgaataacgccaccaacgtggtgatcaaggtgtgcgagttccagttctgcaacgaccccttcctgggcgtgtactaccacaagaacaacaagagctggatggagagcgagttccgggtgtacagcagcgccaacaactgcaccttcgagtacgtgagccagcccttcctgatggacctggagggcaagcagggcaacttcaagaacctgcgggagttcgtgttcaagaacatcgacggctacttcaagatctacagcaagcacaccccaatcaacctggtgcgggatctgccccagggcttctcagccctggagcccctggtggacctgcccatcggcatcaacatcacccggttccagaccctgctggccctgcaccggagctacctgaccccaggcgacagcagcagcgggtggacagcaggcgcggctgcttactacgtgggctacctgcagccccggaccttcctgctgaagtacaacgagaacggcaccatcaccgacgccgtggactgcgccctggaccctctgagcgagaccaagtgcaccctgaagagcttcaccgtggagaagggcatctaccagaccagcaacttccgggtgcagcccaccgagagcatcgtgcggttccccaacatcaccaacctgtgccccttcggcgaggtgttcaacgccacccggttcgccagcgtgtacgcctggaaccggaagcggatcagcaactgcgtggccgactacagcgtgctgtacaacagcgccagcttcagcaccttcaagtgctacggcgtgagccccaccaagctgaacgacctgtgcttcaccaacgtgtacgccgacagcttcgtgatccgtggcgacgaggtgcggcagatcgcacccggccagacaggcaagatcgccgactacaactacaagctgcccgacgacttcaccggctgcgtgatcgcctggaacagcaacaacctcgacagcaaggtgggcggcaactacaactacctgtaccggctgttccggaagagcaacctgaagcccttcgagcgggacatcagcaccgagatctaccaagccggctccaccccttgcaacggcgtggagggcttcaactgctacttccctctgcagagctacggcttccagcccaccaacggcgtgggctaccagccctaccgggtggtggtgctgagcttcgagctgctgcacgccccagccaccgtgtgtggccccaagaagagcaccaacctggtgaagaacaagtgcgtgaacttcaacttcaacggccttaccggcaccggcgtgctgaccgagagcaacaagaaattcctgccctttcagcagttcggccgggacatcgccgacaccaccgacgctgtgcgggatccccagaccctggagatcctggacatcaccccttgcagcttcggcggcgtgagcgtgatcaccccaggcaccaacaccagcaaccaggtggccgtgctgtaccaggacgtgaactgcaccgaggtgcccgtggccatccacgccgaccagctgacacccacctggcgggtctacagcaccggcagcaacgtgttccagacccgggccggttgcctgatcggcgccgagcacgtgaacaacagctacgagtgcgacatccccatcggcgccggcatctgtgccagctaccagacccagaccaattcaccccggagggcaaggagcgtggccagccagagcatcatcgcctacaccatgagcctgggcgccgagaacagcgtggcctacagcaacaacagcatcgccatccccaccaacttcaccatcagcgtgaccaccgagattctgcccgtgagcatgaccaagaccagcgtggactgcaccatgtacatctgcggcgacagcaccgagtgcagcaacctgctgctgcagtacggcagcttctgcacccagctgaaccgggccctgaccggcatcgccgtggagcaggacaagaacacccaggaggtgttcgcccaggtgaagcagatctacaagacccctcccatcaaggacttcggcggcttcaacttcagccagatcctgcccgaccccagcaagcccagcaagcggagcttcatcgaggacctgctgttcaacaaggtgaccctagccgacgccggcttcatcaagcagtacggcgactgcctcggcgacatagccgcccgggacctgatctgcgcccagaagttcaacggcctgaccgtgctgcctcccctgctgaccgacgagatgatcgcccagtacaccagcgccctgttagccggaaccatcaccagcggctggactttcggcgctggagccgctctgcagatccccttcgccatgcagatggcctaccggttcaacggcatcggcgtgacccagaacgtgctgtacgagaaccagaagctgatcgccaaccagttcaacagcgccatcggcaagatccaggacagcctgagcagcaccgctagcgccctgggcaagctgcaggacgtggtgaaccagaacgcccaggccctgaacaccctggtgaagcagctgagcagcaacttcggcgccatcagcagcgtgctgaacgacatcctgagccggctggaccctcccgaggccgaggtgcagatcgaccggctgatcactggccggctgcagagcctgcagacctacgtgacccagcagctgatccgggccgccgagattcgggccagcgccaacctggccgccaccaagatgagcgagtgcgtgctgggccagagcaagcgggtggacttctgcggcaagggctaccacctgatgagctttccccagagcgcaccccacggagtggtgttcctgcacgtgacctacgtgcccgcccaggagaagaacttcaccaccgccccagccatctgccacgacggcaaggcccactttccccgggagggcgtgttcgtgagcaacggcacccactggttcgtgacccagcggaacttctacgagccccagatcatcaccaccgacaacaccttcgtgagcggcaactgcgacgtggtgatcggcatcgtgaacaacaccgtgtacgatcccctgcagcccgagctggacagcttcaaggaggagctggacaagtacttcaagaatcacaccagccccgacgtggacctgggcgacatcagcggcatcaacgccagcgtggtgaacatccagaaggagatcgatcggctgaacgaggtggccaagaacctgaacgagagcctgatcgacctgcaggagctgggcaagtacgagcagtacatcaagtggccctggtacatctggctgggcttcatcgccggcctgatcgccatcgtgatggtgaccatcatgctgtgctgcatgaccagctgctgcagctgcctgaagggctgttgcagctgcggcagctgctgcaagttcgacgaggacgacagcgagcccgtgctgaagggcgtgaagctgcactacacctgataataggctggagcctcggtggcctagcttcttgccccttgggcctccccccagcccctcctccccttcctgcacccgtacccccgtggtctttgaataaagtctgagtgggcggcaaaaaaaaaSequence on other chrom (or edit list, or empty)
oCDS1..4005CDS in NCBI format
chromSize29903Size of target chromosome
match2622Number of bases matched.
misMatch1203 Number of bases that don't match
repMatch0 Number of bases that match but are part of repeats
nCount0 Number of 'N' bases
seqType10=empty, 1=nucleotide, 2=amino_acid

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStartsoChromStartoChromEndoStrandoChromSizeoChromStartsoSequenceoCDSchromSizematchmisMatchrepMatchnCountseqType
NC_045512v22155925384ModernaMrna12731000+2155925384013825,0,543879+400454,gggaaataagagagaaaagaagagtaagaagaaatataagaccccggcgccgccaccatgttcgtgttcctggtgctgctgcccctggtgagcagccagtgcgtgaacctgaccacccggacccagct ...1..40052990326221203001
NC_045512v22155925384ReconstructedBNT162b21000+2155925384013825,0,513876+417551,gagaataaactagtattcttctggtccccacagactcagagagaacccgccaccatgttcgtgttcctggtgctgctgcctctggtgtccagccagtgtgtgaacctgaccaccagaacacagctgcc ...1..41762990327631062001
NC_045512v22155925384WHO_BNT162b21000+2155925384013825,0,513876+428451,gagaataaactagtattcttctggtccccacagactcagagagaacccgccaccatgttcgtgttcctggtgctgctgcctctggtgtccagccagtgtgtgaacctgaccaccagaacacagctgcc ...1..42852990327631062001

Vaccines (vaccines) Track Description
 

Description

This track shows the alignment of three different mRNA vaccine sequences to the SARS-CoV-2 genome:

  1. The BioNTech/Pfizer BNT-162b2 sequence as published by the World Health Organization
  2. The reconstructed BioNTech/Pfizer BNT-162b2 RNA as sequenced by the Andrew Fire lab, Stanford University School of Medicine
  3. The Moderna mRNA-1273 sequence as sequenced by the Andrew Fire lab, Stanford University School of Medicine

Note that the actual vaccines are synthesized with N1-methyl-pseudouridine (Ψ) in place of uridine. See paper by Hubert in References for a discussion.

Display Conventions and Configuration

The psl output from blat was converted to a bigPsl format file for display in this track. Depending upon the size of the section of the genome in display, the track will draw black where nucleotides are identical between vaccine sequence and the SARS-CoV-2 sequence. Red lines indicate differences in nucleotides. At viewpoints with smaller sections of the genome in view, setting the Color track by codons or bases: to different mRNA bases will show the nucleotides in the vaccine that are different than the SARS-CoV-2 sequence.

Methods

The mRNA sequences were obtained from the MS WORD documents as mentioned in the references below. And the Andrew Fire lab github supplied the fasta sequencing result for the BioNTech/Pfizer BNT-162b2 and Moderna mRNA-1273 samples.

The PSL alignment file was obtained via the UCSC genome browser blat service with parameters -t=dnax -q=rnax and filtered to allow only scores above 1000 to filter out the polyA match:

  gfClient -maxIntron=10 -t=dnax -q=rnax <host> <port> \
     /gbdb/wuhCor1 threeVaccines.fa stdout \
        | pslFilter -minScore=1000 stdin wuhCor1.vaccines.psl

  pslScore wuhCor1.vaccines.psl

  #tName          tStart  tEnd    qName:qStart-qEnd       score   percentIdent
  NC_045512v2     21559   25384   ModernaMrna1273:54-3879  1419    68.60
  NC_045512v2     21559   25384   ReconstructedBNT162b2:51-3876 1701    72.30
  NC_045512v2     21559   25384   WHO_BNT162b2:51-3876     1701    72.30

  faCount threeVaccines.fa | tawk '{print $1,"1.."$2+1}' \
     | head -4 | tail -3 > threeVaccines.cds
  pslToBigPsl -cds=threeVaccines.cds -fa=threeVaccines.fa wuhCor1.vaccines.psl stdout \
     | sort -k1,1 -k2,2n > wuhCor1.vaccines.bigPsl

  bedToBigBed -type=bed12+13 -tab -as=HOME/kent/src/hg/lib/bigPsl.as \
    wuhCor1.vaccines.bigPsl wuhCor1.chrom.sizes wuhCor1.vaccines.bb

Data Access

The fasta file sequences and psl alignment file can be obtained from our download server at: https://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/vaccines/.

The bigPsl alignment file used for the display of this track in the genome browser can be accessed from https://hgdownload.soe.ucsc.edu/gbdb/wuhCor1/bbi/wuhCor1.vaccines.bb. The kent command line access tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here.

The protein encoded by the three sequences has two AA substitutions compared to the SARS-CoV-2 S glycoprotein. Variations: S:K986P and S:V987P in the vaccine sequence. See also: The tiny tweak behind COVID-19 vaccines.

>BNT162b2
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFD
NPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVY
SSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQT
LLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRV
QPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF
VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPC
NGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL
PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGS
NVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTI
SVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGF
NFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAG
TITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALN
TLVKQLSSNFGAISSVLNDILSRLDPPEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRV
DFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT
FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL
QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYTZZ

References

Dae Eun Jeong, Matthew McCoy, Karen Artiles, Orkan Ilbay, Andrew Fire, Kari Nadeau, Helen Park, Brooke Betts, Scott Boyd, Ramona Hoh, and Massa Shoura Assemblies of putative SARS-CoV2-spike-encoding mRNA sequences for vaccines BNT-162b2 and mRNA-1273 obtained from github

Bert Hubert Reverse Engineering the source code of the BioNTech/Pfizer SARS-CoV-2 Vaccine 25 Dec 2020

WikiPedia Pfizer-BioNTech COVID-19 vaccine

World Health Organization MedNet Messenger RNA encoding the full-length SARS-CoV-2 spike glycoprotein Sept. 2020 document 11889

Cyril Le Nouën, Peter L. Collins, and Ursula J. Buchholz Attenuation of Human Respiratory Viruses by Synonymous Genome Recoding Frontiers in Immunology 2019; 10: 1250. PMID: 31231383

Ryan Cross The tiny tweak behind COVID-19 vaccines, Chemical & Engineering News 29 September 2020 Vol 98, issue 38

Credits

Thank you to the Andrew Fire lab, Stanford University School of Medicine for providing the sequencing data of these vaccines.

The presentation of this track was prepared by Hiram Clawson (hclawson@ucsc.edu).