Schema for Microdeletions - Microdeletions in GISAID sequences
  Database: wuhCor1    Primary Table: microdel    Row Count: 142   Data last updated: 2020-05-20
Format description: Browser extensible data
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 585smallint(5) unsigned range Indexing field to speed chromosome range queries.
chrom NC_045512v2varchar(255) values Reference sequence chromosome or scaffold
chromStart 183int(10) unsigned range Start position in chromosome
chromEnd 184int(10) unsigned range End position in chromosome
name 1bvarchar(255) values Name of item
score 300int(10) unsigned range Optional score, nominal range 0-1000
strand +char(1) values + or -

Sample Rows
 
binchromchromStartchromEndnamescorestrand
585NC_045512v21831841b300+
585NC_045512v22212221b300+
585NC_045512v2223324101b300+
585NC_045512v22312343b300+
585NC_045512v22622631b300+
585NC_045512v22792801b300+
585NC_045512v2466606140b300+
585NC_045512v24814821b300+
585NC_045512v250752215b1000+
585NC_045512v250752013b300+

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Microdeletions (microdel) Track Description
 

Description

This track shows deletions that have been found in the sequences uploaded to the GISAID database as of June 6, 2020. Three confidence levels of deletion calls are shown:

  • deletions found in at least 1 GISAID sequence
  • deletions found in at least 2 GISAID sequences
  • deletions found in at least 2 GISAID sequences that were able to be validated with raw reads.

Methods

We accessed all GISAID SARS-CoV-2 sequences on June 6, 2020. We filtered to high coverage reads encompassing the entire SARS-CoV-2 genome (>=29000 bps), leaving 12,403 sequences. We aligned the reads using MAFFT.

Verification

We validated several deletions with the raw reads from NCBI's SRA Run browser. Additionally, NYU Langone Health provided us with the aligned reads for many of their sequences.

Data Access

The raw data can be explored interactively with the Table Browser, combined with other datasets in the Data Integrator tool, or downloaded directly as "microdel.txt.gz" from the download server. Please refer to our mailing list archives for questions, or our Data Access FAQ for more information.

Credits

We thank all of the labs that submitted their sequences to the GISAID database. The full acknowledgement table can be found at https://github.com/briannachrisman/SARS-CoV-2_Microdeletions/blob/master/acknowledgments.pdf. We thank the public health laboratories VIDRL and MDU-PHL at The Peter Doherty Institute for Infection and Immunity for providing over 1000 high quality raw reads to NCBI. Thank you NYU Langone SARS-CoV2 Sequencing Team's Matthew T Maurano, Matija Snuderl, and Adriana Heguy for providing many of their raw reads.

References

Chrisman, Brianna Sierra, Kelley Paskov, Nate Stockham, Kevin Tabatabaei, Jae-Yoon Jung, Peter Washington, Maya Varma, Min Woo Sun, Sepideh Maleki, and Dennis P. Wall. "Indels in SARS-CoV-2 occur at template-switching hotspots." BioData Mining 14, no. 1 (2021): 1-16. https://doi.org/10.1186/s13040-021-00251-0