Schema for DGV Struct Var - Database of Genomic Variants: Structural Variation (CNV, Inversion, In/del)
  Database: hg19    Primary Table: dgvGold Data last updated: 2019-05-17
Big Bed File Download: /gbdb/hg19/dgv/dgvGold.bb
Item Count: 38,185
Format description: Database of Genomic Variants Gold Standard Curated Variants
fieldexampledescription
chromchr1Chromosome name
chromStart166179412Maximum boundary of CNV
chromEnd166196987Maximum boundary of CNV
namegssvG2660Name from gff
score0Not used
strand.Not used
thickStart166196987Same as chromEnd
thickEnd166196987Same as chromEnd
reserved0,0,200Color of item. Blue for gain and red for loss
blockCount3Number of blocks
blockSizes1,13137,1Size of each block
chromStarts0,3828,17574Start position of each block relative to chromStart
dgvIDgssvG2660Name of CNV from DGV
variant_typeCNVVariant Type
variant_sub_typeGainGain or Loss
inner_rank7Rank of the CNV used to assign the blocks
num_variants207Number of supporting variants
variantsessv46774, essv10180470, essv10180471, essv10180472, essv10180473, essv10180474, essv10180475, essv10180476, essv10180477, essv10180478, essv10180479, essv10180480, essv10180481, essv10180482, essv10180483, essv10180484, essv10180485, essv10180486, essv10180487, essv10180488, essv10180489, essv10180490, essv10180491, essv10180492, essv10180493, essv10180494, essv10180495, essv10180496, essv10180497, essv10180498, essv10180499, essv10180500, essv10180501, essv10180502, essv10180503, essv10180504, essv10180505, essv10180506, essv10180507, essv10180508, essv10180509, essv10180510, essv10180511, essv10180512, essv10180513, essv10180514, essv10180515, essv10180516, essv10180517, essv10180518, essv10180519, essv10180520, essv10180521, essv10180522, essv10180523, essv10180524, essv10180525, essv10180526, essv10180527, essv10180528, essv10180529, essv10180530, essv10180531, essv10180532, essv10180533, essv10180534, essv10180535, essv10180536, essv10180537, essv10180538, essv10180539, essv10180540, essv10180541, essv10180542, essv10180543, essv10180544, essv10180545, essv10180546, essv10180547, essv10180548, essv10180549, essv10180550, essv10180551, essv10180552, essv10180553, essv10180554, essv10180555, essv10180556, essv10180557, essv10180558, essv10180559, essv10180560, essv10180561, essv10180562, essv10180563, essv10180564, essv10180565, essv10180566, essv10180567, essv10180568, essv10180569, essv10180570, essv10180571, essv10180572, essv10180573, essv10180574, essv10180575, essv10180576, essv10180577, essv10180578, essv10180579, essv10180580, essv10180581, essv10180582, essv10180583, essv10180584, essv10180585, essv10180586, essv10180587, essv10180588, essv10180589, essv10180590, essv10180591, essv10180592, essv10180593, essv10180594, essv10180595, essv10180596, essv10180597, essv10180598, essv10180599, essv10180600, essv10180601, essv10180602, essv10180603, essv10180604, essv10180605, essv10180606, essv10180607, essv10180608, essv10180609, essv10180610, essv10180611, essv10180612, essv10180613, essv10180614, essv10180615, essv10180616, essv10180617, essv10180618, essv10180619, essv10180620, essv10180621, essv10180622, essv10180623, essv10180624, essv10180625, essv10180626, essv10180627, essv10180628, essv10180629, essv10180630, essv10180631, essv10180632, essv10180633, essv10180634, essv10180635, essv10180636, essv10180637, essv10180638, essv10180639, essv10180640, essv10180641, essv10180642, essv10180643, essv10180644, essv10180645, essv67600, essv34051, essv7006632, essv46045, essv7006628, essv7006629, essv7006630, essv7006631, nssv1618961, nssv1618962, nssv1618963, nssv1618964, nssv1618965, nssv1618966, nssv1618967, nssv1618968, nssv1618969, nssv1618970, nssv1618971, nssv1618972, nssv1618973, nssv1618974, nssv1618975, nssv1618976, nssv1618977, nssv1618978, nssv1618979, nssv1618980, nssv1618981, nssv1618982Supporting variants
num_studies4Number of studies
StudiesMcCarroll2008, Vogler2010, Conrad2009, 1000GenomesPhase3Study names in 'Name Year' format
num_platforms3Number of platforms
PlatformsAffymetrix6.0, NimbleGen42M, Multiple_NGS_SequencingPlatform names
number_of_algorithms4Number of CNV detection algorithms
algorithmsPennCNV+Birdseye, RD+PEM, GADA, BirdsEyeCNV detection algorithms used
num_samples188Number of samples
samplesHG02282, NA20289, NA18865, HG03378, NA18861, HG01912, HG02536, HG03449, HG03372, RW_0254, NA19116, HG02702, HG02634, HG02839, NA19321, NA19119, NA19448, NA19320, NA19113, HG02938, NA19117, HG02281, HG01989, HG01986, HG03246, NA19707, NA19172, NA19704, NA18872, NA18870, HG02476, NA18876, HG02470, NA18874, HG03457, NA20282, HG02585, HG03518, HG03166, HG02628, HG02629, NA19913, HG02981, NA18488, HG03515, HG02621, HG03517, NA19377, NA19434, NA19916, NA19472, NA19360, NA19917, NA19149, HG03091, NA19920, HG02675, HG03301, HG02620, HG03514, HG02462, HG00641, HG02461, NA19383, HG02756, HG02309, HG02594, HG02595, HG03159, HG03397, NA19120, HG02851, NA19331, HG02012, HG02010, NA19318, HG03024, HG02768, HG03472, HG03473, HG03225, HG03476, HG03388, HG02255, HG02315, HG01924, NA19095, RW_0533, NA18520, HG02009, HG01885, HG03259, HG03565, HG02323, HG02484, HG02485, HG01889, HG01956, NA19351, HG03132, HG02577, NA19908, HG03484, NA19152, NA19711, NA19256, HG02976, HG02318, NA19028, NA19238, NA19235, NA18860, HG02332, HG02330, NA19222, HG01894, HG02667, HG02339, HG01556, HG03045, HG01108, NA19438, HG01551, HG03121, HG03123, HG03410, NA19430, HG03126, HG03520, NA19121, RW_0273, NA19127, NA19129, NA19128, HG02861, HG03212, NA20291, NA19221, HG03193, HG03115, HG02108, HG02554, HG02890, HG02896, HG03547, HG03055, NA19429, HG02716, NA19137, HG02811, HG02810, HG02817, HG02799, RW_0179, NA19346, HG03265, HG03428, NA19984, NA19214, HG02891, NA19982, HG03436, HG02881, HG01124, HG02885, HG02054, HG02646, HG02052, HG02703, NA18858, NA18859, HG03157, HG03538, NA18856, NA18857, NA18502, HG02611, NA18500, HG03124, NA19347, NA18507, RW_0003, HG03548, NA19107, HG02804, HG01461, NA19451, HG02807Sample names
Frequency5.07%Overall frequency of variants across all studies
PopulationSummaryAfrican 181, Asian 0, European 0, Mexican 0, MiddleEast 0, NativeAmerican 0, NorthAmerican 0, Oceania 0, SouthAmerican 7, Turkish 0, Admixed 0, Unknown 0Populations tested across all studies
num_unique_samples_tested3706Total number of samples tested

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStartsdgvIDvariant_typevariant_sub_typeinner_ranknum_variantsvariantsnum_studiesStudiesnum_platformsPlatformsnumber_of_algorithmsalgorithmsnum_samplessamplesFrequencyPopulationSummarynum_unique_samples_tested
chr1166179412166196987gssvG26600.1661969871661969870,0,20031,13137,10,3828,17574gssvG2660CNVGain7207essv46774, essv10180470, essv10180471, essv10180472, essv10180473, essv10180474, essv10180475, essv10180476, essv10180477, essv1 ...4McCarroll2008, Vogler2010, Conrad2009, 1000GenomesPhase33Affymetrix6.0, NimbleGen42M, Multiple_NGS_Sequencing4PennCNV+Birdseye, RD+PEM, GADA, BirdsEye188HG02282, NA20289, NA18865, HG03378, NA18861, HG01912, HG02536, HG03449, HG03372, RW_0254, NA19116, HG02702, HG02634, HG02839, NA ...5.07%African 181, Asian 0, European 0, Mexican 0, MiddleEast 0, NativeAmerican 0, NorthAmerican 0, Oceania 0, SouthAmerican 7, Turkis ...3706
chr1166368570166371135gssvL59260.166371135166371135200,0,031,2472,10,61,2564gssvL5926CNVLoss84essv6766647, essv10180821, essv10180822, essv289573Wong2012b, Ahn2009, 1000GenomesPhase32Illumina_GA, Multiple_NGS_Sequencing2PEM, RD+PEM4SJK, HG01871, SSM064, HG018460.15%African 0, Asian 4, European 0, Mexican 0, MiddleEast 0, NativeAmerican 0, NorthAmerican 0, Oceania 0, SouthAmerican 0, Turkish ...2601
chr1166372633166384158gssvL59280.166384158166384158200,0,031,7186,10,305,11524gssvL5928CNVLoss853essv6372996, essv5428815, essv5939838, essv6069637, essv6243102, essv6322157, essv6351064, essv6375500, essv10180823, essv101808 ...21000GenomesPhase1, 1000GenomesPhase32Illumina_II_IIX_HiSeq, Multiple_NGS_Sequencing2PEM+ReadDepth+SplitRead, RD+PEM36NA19017, NA18560, NA19712, NA19324, HG03378, NA19381, HG01485, NA18868, HG01398, NA19404, NA19463, HG02442, HG03135, NA19119, NA ...1.41%African 32, Asian 1, European 1, Mexican 0, MiddleEast 0, NativeAmerican 0, NorthAmerican 1, Oceania 0, SouthAmerican 1, Turkish ...2562
chr1166396576166397093gssvL59290.166397093166397093200,0,02515,10,516gssvL5929CNVLoss862essv10180858, essv10180859, essv10180860, essv10180861, essv10180862, essv10180863, essv10180864, essv10180865, essv10180866, es ...21000GenomesPhase1, 1000GenomesPhase32Illumina_II_IIX_HiSeq, Multiple_NGS_Sequencing2PEM+ReadDepth+SplitRead, RD+PEM49NA19017, NA19317, NA19308, NA19711, NA19712, NA19324, HG03378, HG02890, NA19381, HG01485, NA18868, HG01398, NA19403, NA19404, NA ...1.91%African 44, Asian 0, European 2, Mexican 0, MiddleEast 0, NativeAmerican 0, NorthAmerican 2, Oceania 0, SouthAmerican 1, Turkish ...2562
chr1166464461166471329gssvL59300.166471329166471329200,0,031,6773,10,43,6867gssvL5930CNVLoss810essv10180912, essv10180913, essv10180914, essv10180915, essv5860388, essv5989531, essv6201651, essv6371404, essv6459326, essv658 ...21000GenomesPhase1, 1000GenomesPhase32Illumina_II_IIX_HiSeq, Multiple_NGS_Sequencing2PEM+ReadDepth+SplitRead, RD+PEM9NA19664, NA19467, HG02010, NA19397, HG00641, NA19075, HG01066, NA19429, HG010480.35%African 4, Asian 1, European 0, Mexican 1, MiddleEast 0, NativeAmerican 0, NorthAmerican 0, Oceania 0, SouthAmerican 3, Turkish ...2562
chr1166530308166584653gssvL59340.166584653166584653200,0,031,48858,10,3702,54344gssvL5934CNVLoss88essv9794033, essv10180920, essv10180921, essv10180922, essv10180923, essv10180924, essv10180925, essv101809262Uddin2014, 1000GenomesPhase32CytoScanHD_2.7M, Multiple_NGS_Sequencing2ChAS, RD+PEM8HG03911, NA20889, HG03672, HG03854, 400532MH, HG04047, NA21144, HG037900.24%African 0, Asian 5, European 0, Mexican 0, MiddleEast 0, NativeAmerican 2, NorthAmerican 0, Oceania 0, SouthAmerican 0, Turkish ...3377
chr1166551372166553114gssvL59360.166553114166553114200,0,031,1363,10,99,1741gssvL5936CNVLoss7279essv5293844, essv10180927, essv10180928, essv10180929, essv10180930, essv10180931, essv10180932, essv10180933, essv10180934, ess ...5Bentley2008, 1000GenomesPhase1, McKernan2009, Conrad2009, 1000GenomesPhase35SOLiD, Illumina_GA, NimbleGen42M, Multiple_NGS_Sequencing, Illumina_II_IIX_HiSeq4PEM, PEM+ReadDepth+SplitRead, RD+PEM, GADA123HG02666, NA18867, NA18861, NA19917, HG03445, NA20299, HG03376, NA19444, NA19445, NA19446, NA19117, HG03247, HG03088, HG03241, HG ...4.80%African 114, Asian 1, European 0, Mexican 1, MiddleEast 0, NativeAmerican 1, NorthAmerican 1, Oceania 0, SouthAmerican 5, Turkis ...2565
chr1166575223166579960gssvL59350.166579960166579960200,0,031,4595,10,134,4736gssvL5935CNVLoss810essv5448307, essv10181170, essv10181171, essv10181172, essv10181173, essv10181174, essv10181175, essv10181176, essv10181177, ess ...21000GenomesPhase1, 1000GenomesPhase32Illumina_II_IIX_HiSeq, Multiple_NGS_Sequencing2PEM+ReadDepth+SplitRead, RD+PEM9NA19026, HG03911, NA20889, HG03672, NA19385, HG03854, HG04047, NA21144, HG037900.35%African 2, Asian 5, European 0, Mexican 0, MiddleEast 0, NativeAmerican 2, NorthAmerican 0, Oceania 0, SouthAmerican 0, Turkish ...2562
chr1167764924167767623gssvL59680.167767623167767623200,0,031,2255,10,389,2698gssvL5968CNVLoss7113essv9740936, essv10181963, essv10181964, essv10181965, essv10181966, essv10181967, essv10181968, essv10181969, essv10181970, ess ...51000GenomesPhase1, Cooper2011, Conrad2009, Boomsma2014, 1000GenomesPhase35Illumina_II_IIX_HiSeq, NimbleGen42M, Custom_Illumina_1.2M, Multiple_NGS_Sequencing, Illumina_HiSeq5Custom_HMM, PEM+ReadDepth+SplitRead, RD+PEM, GATK+BreakDancer+Pindel+GenomeSTRiP+CNVnator+123SV+DWAC-seq, GADA87NA20762, NA20517, sample_nssv727776, sample_nssv727777, sample_nssv727774, sample_nssv727775, sample_nssv727773, sample_nssv7277 ...1.80%African 0, Asian 6, European 74, Mexican 3, MiddleEast 0, NativeAmerican 0, NorthAmerican 0, Oceania 0, SouthAmerican 4, Turkish ...4832
chr1168023819168026232gssvL59790.168026232168026232200,0,031,1743,10,432,2412gssvL5979CNVLoss83essv5268928, essv4561571, essv97409693Bentley2008, McKernan2009, Boomsma20143SOLiD, Illumina_GA, Illumina_HiSeq2PEM, GATK+BreakDancer+Pindel+GenomeSTRiP+CNVnator+123SV+DWAC-seq2NA18507, sample_essv97409690.26%African 1, Asian 0, European 1, Mexican 0, MiddleEast 0, NativeAmerican 0, NorthAmerican 0, Oceania 0, SouthAmerican 0, Turkish ...768

DGV Struct Var (dgvPlus) Track Description
 

Description

This track displays copy number variants (CNVs), insertions/deletions (InDels), inversions and inversion breakpoints annotated by the Database of Genomic Variants (DGV), which contains genomic variations observed in healthy individuals. DGV focuses on structural variation, defined as genomic alterations that involve segments of DNA that are larger than 1000 bp. Insertions/deletions of 50 bp or larger are also included.

Display Conventions

This track contains three subtracks:

  • Structural Variant Regions: annotations that have been generated from one or more reported structural variants at the same location.
  • Supporting Structural Variants: the sample-level reported structural variants.
  • Gold Standard Variants: curated variants from a selected number of studies in DGV.

Color is used in these subtracks to indicate the type of variation:

  • Inversions and inversion breakpoints are purple.
  • CNVs and InDels are blue if there is a gain in size relative to the reference.
  • CNVs and InDels are red if there is a loss in size relative to the reference.
  • CNVs and InDels are brown if there are reports of both a loss and a gain in size relative to the reference.

The DGV Gold Standard subtrack utilizes a boxplot-like display to represent the merging of records as explained in the Methods section below. In this track, the middle box (where applicable), represents the high confidence location of the CNV, while the thin lines and end boxes represent the possible range of the CNV.

Clicking on a variant leads to a page with detailed information about the variant, such as the study reference and PubMed abstract link, the study's method and any genes overlapping the variant. Also listed, if available, are the sequencing or array platform used for the study, a sample cohort description, sample size, sample ID(s) in which the variant was observed, observed gains and observed losses. If the particular variant is a merged variant, links to genome browser views of the supporting variants are listed. If the particular variant is a supporting variant, a link to the genome browser view of its merged variant is displayed. A link to DGV's Variant Details page for each variant is also provided.

For most variants, DGV uses accessions from peer archives of structural variation (dbVar at NCBI or DGVa at EBI). These accessions begin with either "essv", "esv", "nssv", or "nsv", followed by a number. Variant submissions processed by EBI begin with "e" and those processed by NCBI begin with "n".

Accessions with ssv are for variant calls on a particular sample, and if they are copy number variants, they generally indicate whether the change is a gain or loss. In a few studies the ssv represents the variant called by a single algorithm. If multiple algorithms were used, overlapping ssv's from the same individual would be combined to generate a sample level sv.

If there are many samples analyzed in a study, and if there are many samples which have the same variant, there will be multiple ssv's with the same start and end coordinates. These sample level variants are then merged and combined to form a representative variant that highlights the common variant found in that study. The result is called a structural variant (sv) record. Accessions with sv are for regions asserted by submitters to contain structural variants, and often span ssv elements for both losses and gains. dbVar and DGVa do not record numbers of losses and gains encompassed within sv regions.

DGV merges clusters of variants that share at least 70% reciprocal overlap in size/location, and assigns an accession beginning with "dgv", followed by an internal variant serial number, followed by an abbreviated study id. For example, the first merged variant from the Shaikh et al. 2009 study (study accession=nstd21) would be dgv1n21. The second merged variant would be dgv2n21 and so forth. Since in this case there is an additional level of clustering, it is possible for an "sv" variant to be both a merged variant and a supporting variant.

For most sv and dgv variants, DGV displays the total number of sample-level gains and/or losses at the bottom of their variant detail page. Since each ssv variant is for one sample, its total is 1.

Methods

Published structural variants are imported from peer archives dbVar and DGVa. DGV then applies quality filters and merges overlapping variants.

For data sets where the variation calls are reported at a sample-by-sample level, DGV merges calls with similar boundaries across the sample set. Only variants of the same type (i.e. CNVs, Indels, inversions) are merged, and gains and losses are merged separately. Sample level calls that overlap by ≥ 70% are merged in this process.

The initial criteria for the Gold Standard set require that a variant is found in at least two different studies and found in at least two different samples. After filtering out low-quality variants, the remaining variants are clustered according to 50% minimum overlap, and then merged into a single record. Gains and losses are merged separately.

The highest ranking variant in the cluster defines the inner box, while the outer lines define the maximum possible start and stop coordinates of the CNV. In this way, the inner box forms a high-confidence CNV location and the thin connecting lines indicate confidence intervals for the location of CNV.

Data Access

The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. The genome annotation is stored in a bigBed file that can be downloaded from the download server. The exact filenames can be found in the track configuration file. Annotations can be converted to ASCII text by our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example:

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/dgv/dgvMerged.bb -chrom=chr6 -start=0 -end=1000000 stdout

Credits

Thanks to the Database of Genomic Variants for providing these data. In citing the Database of Genomic Variants please refer to MacDonald et al.

References

Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. Detection of large-scale variation in the human genome. Nat Genet. 2004 Sep;36(9):949-51. PMID: 15286789

MacDonald JR, Ziman R, Yuen RK, Feuk L, Scherer SW. The Database of Genomic Variants: a curated collection of structural variation in the human genome. Nucleic Acids Res. 2014 Jan;42(Database issue):D986-92. PMID: 24174537; PMC: PMC3965079

Zhang J, Feuk L, Duggan GE, Khaja R, Scherer SW. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet Genome Res. 2006;115(3-4):205-14. PMID: 17124402