Schema for gnomAD Structural Variants - Genome Aggregation Database (gnomAD) - Structural Variants
  Database: hg19    Primary Table: gnomadSvControls Data last updated: 2021-04-13
Big Bed File Download: /gbdb/hg19/gnomAD/structuralVariants/gnomad_v2.1_sv.controls_only.sites.bb
Item Count: 273,723
The data is stored in the binary BigBed format.

Format description: bed9+19 for displaying gnomAD structural variants
fieldexampledescription
chromchr1Reference sequence chromosome or scaffold
chromStart161463313Start position in chromosome
chromEnd166487497End position in chromosome
nameINV_1_49Name of item
score0Loss of Function oe ratio
strand.Always .
thickStart161463313Start of where display is thick
thickEnd166487497End of where display should be thick
itemRgb192,0,192Color of item, based on variant class
svlen5024183Size of variant
svtypeINVVariant class, BND, INS, etc
ac1Allele Count
an10384Allele Number
af9.6e-05Allele Frequency
nhet1Number of heterozygous variant carriers
nhomalt0Number of homozygous alternate variant carriers
PROTEIN_CODING__COPY_GAINNACopy gain genes
PROTEIN_CODING__DUP_LOFNAIntragenic exon duplication
PROTEIN_CODING__DUP_PARTIALNAPredicted partial duplication
PROTEIN_CODING__INTERGENICFalseIntergenic variant
PROTEIN_CODING__INTRONICNAIntronic variant
PROTEIN_CODING__INV_SPANAL626787.1, ALDH9A1, ATF6, C1orf110, C1orf111, C1orf226, DDR2, DUSP12, FAM78B, FCGR2A, FCGR2B, FCGR3A, FCGR3B, FCRLA, FCRLB, HSD17B7, HSPA6, LMX1A, LRRC52, MGST3, NOS1AP, NUF2, OLFML2B, PBX1, RGS4, RGS5, RP11-565P22.6, RXRG, SH2D1B, TMCO1, UAP1, UCK2, UHMK1Inversion span (inversion and another consequence)
PROTEIN_CODING__LOFNAExpected protein truncating variant
PROTEIN_CODING__MSV_EXON_OVRNAMulti-Allelic CNV coding exon overlap
PROTEIN_CODING__NEAREST_TSSNANearest transcription start site
PROTEIN_CODING__PROMOTERNAVariant overlaps promoter
PROTEIN_CODING__UTRNAVariant overlaps UTR
_mouseOverGene(s) affected: Too many genes affected, click on item for full list., Position: chr1:161463314-166487497, Size: 5024183, Class: INV, Allele Count: 1, Allele Number: 10384, Allele Frequency: 9.6e-05Mouseover text

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEnditemRgbsvlensvtypeacanafnhetnhomaltPROTEIN_CODING__COPY_GAINPROTEIN_CODING__DUP_LOFPROTEIN_CODING__DUP_PARTIALPROTEIN_CODING__INTERGENICPROTEIN_CODING__INTRONICPROTEIN_CODING__INV_SPANPROTEIN_CODING__LOFPROTEIN_CODING__MSV_EXON_OVRPROTEIN_CODING__NEAREST_TSSPROTEIN_CODING__PROMOTERPROTEIN_CODING__UTR_mouseOver
chr1161463313166487497INV_1_490.161463313166487497192,0,1925024183INV1103849.6e-0510NANANAFalseNAAL626787.1, ALDH9A1, ATF6, C1orf110, C1orf111, C1orf226, DDR2, DUSP12, FAM78B, FCGR2A, FCGR2B, FCGR3A, FCGR3B, FCRLA, FCRLB, HSD ...NANANANANAGene(s) affected: Too many genes affected, click on item for full list., Position: chr1:161463314-166487497, Size: 5024183, Clas ...
chr1166181226166195463DUP_1_25540.1661812261661954630,0,25514236DUP539103640.052299120NANANATrueNANANANAFAM78BNANAGene(s) affected: NA, Position: chr1:166181227-166195463, Size: 14236, Class: DUP, Allele Count: 539, Allele Number: 10364, Alle ...
chr1166181234166181235BND_1_40890.166181234166181235128,128,12814191BND40101540.0039343NANANAFalseNANANANANANANAGene(s) affected: NA, Position: chr1:166181235-166181235, Size: 14191, Class: BND, Allele Count: 40, Allele Number: 10154, Allel ...
chr1166189431166189433INS_1_56440.166189431166189433255,165,0280INS553103300.054311121NANANATrueNANANANAFAM78BNANAGene(s) affected: NA, Position: chr1:166189432-166189433, Size: 280, Class: INS, Allele Count: 553, Allele Number: 10330, Allele ...
chr1166198776166198778INS_1_56450.166198776166198778255,165,0280INS1103849.6e-0510NANANATrueNANANANAFAM78BNANAGene(s) affected: NA, Position: chr1:166198777-166198778, Size: 280, Class: INS, Allele Count: 1, Allele Number: 10384, Allele F ...
chr1166200096166200097BND_1_40900.166200096166200097128,128,12850313022BND3102180.0002930NANANAFalseNANANANANANANAGene(s) affected: NA, Position: chr1:166200097-166200097, Size: 50313022, Class: BND, Allele Count: 3, Allele Number: 10218, All ...
chr1166209499166215000DEL_1_88900.166209499166215000255,0,05500DEL603103220.05852738NANANATrueNANANANAFAM78BNANAGene(s) affected: NA, Position: chr1:166209500-166215000, Size: 5500, Class: DEL, Allele Count: 603, Allele Number: 10322, Allel ...
chr1166211999166218100DUP_1_25550.1662119991662181000,0,2556100DUP173980.0001310NANANATrueNANANANAFAM78BNANAGene(s) affected: NA, Position: chr1:166212000-166218100, Size: 6100, Class: DUP, Allele Count: 1, Allele Number: 7398, Allele F ...
chr1166212301166212302BND_1_40910.166212301166212302128,128,128NABND264900.0003120NANANAFalseNANANANANANANAGene(s) affected: NA, Position: chr1:166212302-166212302, Size: NA, Class: BND, Allele Count: 2, Allele Number: 6490, Allele Fre ...
chr1166216030166236163DEL_1_88910.166216030166236163255,0,020132DEL1102449.8e-0510NANANATrueNANANANAFAM78BNANAGene(s) affected: NA, Position: chr1:166216031-166236163, Size: 20132, Class: DEL, Allele Count: 1, Allele Number: 10244, Allele ...

gnomAD Structural Variants (gnomadStructuralVariants) Track Description
 

Description

The Genome Aggregation Database (gnomAD) - Structural Variants track set shows structural variants calls (>=50 nucleotides) from the gnomAD v2.1 release on 10,847 unrelated genomes. It mostly (but not entirely) overlaps with the genome set used for the gnomAD short variant release. For more information see the following blog post, Structural variants in gnomAD.

There are three subtracks in this track set:

  1. All SV's: The full set of variant annotations from all 10,847 samples.
  2. Control Only SV's: Only samples from individuals not selected as a case in a case/control study of common disease (5,192 samples).
  3. Non-neuro SV's: Only samples from individuals not selected as having a neurological condition in a case/control study (8,342 samples).

Display Conventions and Configuration

Items in all subtracks follow the same conventions: items are shaded according to variant type, mouseover on items indicates affected protein-coding genes, size of the variant (which may differ from the chromosomal coordinates in cases like insertions), variant type (insertion, duplication, etc), Allele Count, Allele Number, and Allele Frequency. When more than 2 genes are affected by a variant, the full list can be obtained by clicking on the item and reading the details page. A short summary of the 3 datasets is available in the below table:

Variant Type All Controls Non-neuro
Breakend (BND) 52604 37891 44952
Complex (CPX) 4778 3129 4167
Translocation (CTX) 8 4 5
Deletion (DEL) 169635 116401 145978
Duplication (DUP) 49571 36223 43916
Insertion (INS) 109025 78475 95658
Inversion (INV) 748 492 667
Multi-Allele CNV (MCNV) 1108 1108 1108

Detailed information on the CNV color code is described here. All tracks can be filtered according to the size of the variant and variant type, using the track Configure options.

Methods

Bed files were obtained from the gnomAD Google Storage bucket:

gsutil cp gs://gnomad-public/papers/2019-sv/gnomad_v2.1_sv.*.bed*
These data were then transformed into bigBed tracks. For the full list of commands used to make this track please see the "gnomAD Structural Variants v2.1" section of the makedoc.

Data Access

The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. The genome annotation is stored in a bigBed file that can be downloaded from the download server. The exact filenames can be found in the track configuration file. Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example:

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/gnomAD/structuralVariants/gnomad_v2.1_sv.sites.bb -chrom=chr6 -start=0 -end=1000000 stdout

Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information.

More information about using and understanding the gnomAD data can be found in the gnomAD FAQ site.

Credits

Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the ODC Open Database License (ODbL) as described here.

References

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 18;536(7616):285-91. PMID: 27535533; PMC: PMC5018207

Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May;581(7809):434-443. PMID: 32461654; PMC: PMC7334197

Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, Khera AV, Lowther C, Gauthier LD, Wang H et al. A structural variation reference for medical and population genetics. Nature. 2020 May;581(7809):444-451. PMID: 32461652; PMC: PMC7334194

Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA, Singer-Berk M, Mudge JM, Karjalainen J, Satterstrom FK, O'Donnell-Luria AH et al. Transcript expression-aware annotation improves rare variant interpretation. Nature. 2020 May;581(7809):452-458. PMID: 32461655; PMC: PMC7334198