Schema for Genome In a Bottle - Genome In a Bottle Structural Variants and Trios
|
|
Database: hg19 Primary Table: ashkenazimTrio VCF File Download: /gbdb/hg19/giab/AshkenazimTrio/merged.vcf.gz
Format description: The fields of a Variant Call Format data line
field | description |
chrom | An identifier from the reference genome | pos | The reference position, with the 1st base having position 1 | id | Semi-colon separated list of unique identifiers where available | ref | Reference base(s) | alt | Comma separated list of alternate non-reference alleles called on at least one of the samples | qual | Phred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong) | filter | PASS if this position has passed all filters. Otherwise, a semicolon-separated list of codes for filters that fail | info | Additional information encoded as a semicolon-separated series of short keys with optional comma-separated values | format | If genotype columns are specified in header, a semicolon-separated list of of short keys starting with GT | genotypes | If genotype columns are specified in header, a tab-separated set of genotype column values; each value is a colon-separated list of values corresponding to keys in the format column |
|
| |
|
|
Sample Rows
|
|
chrom | pos | id | ref | alt | qual | filter | info | format | genotypes |
1 | 118617 | rs372912307 | T | C | 50 | PASS | platforms=1;platformnames=10X;datasets=1;datasetnames=10XChromium;callsets=1;callsetnames=10XGATKhaplo;datasetsmissingcall=HiSeq ... | GT:DP:ADALL:AD:GQ:IGT:IPS:PS | 1|1:0:0,0:0,0:99:1/1:.:HOMVAR | ./.:.:.:.:.:.:.:. | ... |
1 | 565976 | rs9283151 | C | T | 50 | PASS | platforms=2;platformnames=Illumina,10X;datasets=4;datasetnames=HiSeq250x250,HiSeqPE300x,10XChromium,HiSeqMatePair;callsets=7;cal ... | GT:PS:DP:ADALL:AD:GQ | ./.:.:.:.:.:. | ./.:.:.:.:.:. | ... |
1 | 566048 | rs6421780 | G | A | 50 | PASS | platforms=2;platformnames=Illumina,10X;datasets=4;datasetnames=HiSeq250x250,HiSeqMatePair,HiSeqPE300x,10XChromium;callsets=7;cal ... | GT:PS:DP:ADALL:AD:GQ | ./.:.:.:.:.:. | ./.:.:.:.:.:. | ... |
1 | 567239 | rs78150957 | CG | C | 50 | PASS | platforms=1;platformnames=Illumina;datasets=3;datasetnames=HiSeqPE300x,HiSeq250x250,HiSeqMatePair;callsets=6;callsetnames=HiSeqP ... | GT:DP:ADALL:AD:GQ:IGT:IPS:PS | 1|1:514:0,203:0,131:246:1/1:.:HOMVAR | ./.:.:.:.:.:.:.:. | ... |
1 | 568161 | . | C | T | 50 | PASS | platforms=2;platformnames=Illumina,10X;datasets=3;datasetnames=HiSeq250x250,HiSeqPE300x,10XChromium;callsets=5;callsetnames=HiSe ... | GT:PS:DP:ADALL:AD:GQ | ./.:.:.:.:.:. | 0/1:.:587:124,97:38,24:297 | ... |
1 | 568214 | rs373437560 | C | T | 50 | PASS | platforms=2;platformnames=Illumina,10X;datasets=4;datasetnames=HiSeq250x250,HiSeqPE300x,10XChromium,HiSeqMatePair;callsets=7;cal ... | GT:PS:DP:ADALL:AD:GQ | ./.:.:.:.:.:. | ./.:.:.:.:.:. | ... |
1 | 568412 | rs377573539 | T | C | 50 | PASS | platforms=3;platformnames=Illumina,10X,Solid;datasets=5;datasetnames=HiSeqPE300x,HiSeq250x250,10XChromium,HiSeqMatePair,SolidSE7 ... | GT:DP:ADALL:AD:GQ:IGT:IPS:PS | 0/1:1146:303,141:58,34:545:0/1:.:. | ./.:.:.:.:.:.:.:. | ... |
1 | 568451 | . | C | T | 50 | PASS | platforms=3;platformnames=Illumina,10X,Solid;datasets=5;datasetnames=HiSeq250x250,HiSeqPE300x,10XChromium,HiSeqMatePair,SolidSE7 ... | GT:DP:ADALL:AD:GQ:IGT:IPS:PS | 0/1:1135:287,136:67,33:556:0/1:.:. | ./.:.:.:.:.:.:.:. | ... |
1 | 568463 | rs2153587 | A | G | 50 | PASS | platforms=2;platformnames=Illumina,Solid;datasets=4;datasetnames=HiSeq250x250,HiSeqPE300x,HiSeqMatePair,SolidSE75bp;callsets=7;c ... | GT:DP:ADALL:AD:GQ:IGT:IPS:PS | 0/1:1136:292,130:65,36:385:0/1:.:. | ./.:.:.:.:.:.:.:. | ... |
1 | 568478 | . | C | T | 50 | PASS | platforms=2;platformnames=Illumina,10X;datasets=4;datasetnames=HiSeq250x250,HiSeqPE300x,10XChromium,HiSeqMatePair;callsets=6;cal ... | GT:DP:ADALL:AD:GQ:IGT:IPS:PS | 0/1:1088:247,116:67,31:364:0/1:.:. | ./.:.:.:.:.:.:.:. | ... |
|
| |
|
|
Genome In a Bottle (giab) Track Description
|
|
Description
The tracks listed here contain data from
The Genome in a
Bottle Consortium (GIAB), an open, public consortium hosted by
NIST. The priority of GIAB is to develop
reference standards, reference methods, and reference data by authoritative characterization of
human genomes for use in benchmarking, including analytical validation and technology
development that will support translation of whole human genome sequencing to clinical practice. The
sole purpose of this work is to provide validated variants and regions to enable technology and
bioinformatics developers to benchmark and optimize their detection methods.
The Ashkenazim and the Chinese Trio tracks show benchmark SNV calls from two
son/father/mother trios of Ashkenazi Jewish and Han Chinese ancestry from the
Personal Genome Project,
consented for commercial redistribution.
The Genome In a Bottle Structural Variants track shows benchmark SV calls (nssv)
and variant regions (nsv) (5,262 insertions and 4,095 deletions, > 50 bp, in 2.51 Gb of
the genome) from the son (HG002/NA24385) from the Ashkenazi Jewish trio.
Samples are disseminated as National Institute of Standards and Technology (NIST)
Reference Materials.
Display Conventions and Configuration
These tracks are multi-view composite tracks that contain multiple data types (views). Each view
within a track has separate display controls, as described
here.
Unlike a regular genome browser track, the Ashkenazim and the Chinese Trio tracks display
the genome variants of each individual as two haplotypes; SNPs, small insertions and deletions
are mapped to each haplotype based on the phasing information of the VCF file. The
haplotype 1 and the haplotype 2 are displayed as two separate black lanes for the
browser window region. Each variant is drawn as a vertical dash. Homozygous variants will
show two identical dashes on both haplotype lanes. Phased heterozygous variants are placed on
one of the haplotype lanes and unphased heterozygous variants are displayed in the area
between the two haplotype lanes.
Predicted de novo variants and variants that are inconsistent with phasing in the trio son can be
colored in red using the track Configuration options.
Data Access
The raw data can be explored interactively with the
Table Browser, or the Data Integrator. For
automated analysis, the data may be queried from our REST API.
Benchmark VCF and BED files for small variants are available for GRCh37 and GRCh38 under each
genome at NCBI FTP site.
Structural variants are available for GRCh37 at dbVAR
nst175.
References
Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, Irvine SA, Trigg L, Truty R, McLean CY
et al.
An open resource for accurately benchmarking small variant and reference calls.
Nat Biotechnol. 2019 May;37(5):561-566.
PMID: 30936564; PMC: PMC6500473
Zook JM, Hansen NF, Olson ND, Chapman L, Mullikin JC, Xiao C, Sherry S, Koren S, Phillippy AM,
Boutros PC et al.
A robust benchmark for detection of germline large deletions and insertions.
Nat Biotechnol. 2020 Jun 15;.
PMID: 32541955
| |
|
|
|