Schema for TCGA Pan-Cancer - TCGA Pan-Cancer mutations: 33 TCGA Cancer Projects Summary (Pan-Can 33)
  Database: hg38    Primary Table: BLCA Data last updated: 2019-05-03
Big Bed File Download: /gbdb/hg38/gdcCancer/BLCA.bb
Item Count: 132,956
The data is stored in the binary BigBed format.

Format description: somatic variants converted from MAF files obtained through the NCI GDC
fieldexampledescription
chromchr1Chromosome (or contig, scaffold, etc.)
chromStart166070285Start position in chromosome
chromEnd166070286End position in chromosome
nameC>TName of item
score1Score from 0-1000
strand.+ or -
thickStart166070285Start of where display should be thick (start codon)
thickEnd166070286End of where display should be thick (stop codon)
reserved0,0,0Used as itemRgb as of 2004-11-22
blockCount1Number of blocks
blockSizes1Comma separated list of block sizes
chromStarts0Start positions relative to chromStart
sampleCount1Number of samples with this variant
freq0.00242718446602Variant frequency
Hugo_SymbolFAM78BHugo symbol
Entrez_Gene_Id149297Entrez Gene Id
Variant_ClassificationNonsense_MutationClass of variant
Variant_TypeSNPType of variant
Reference_AlleleCReference allele
Tumor_Seq_Allele1CTumor allele 1
Tumor_Seq_Allele2TTumor allele 2
dbSNP_RSdbSNP RS number
dbSNP_Val_StatusdbSNP validation status
days_to_death1804.0Number of days till death
cigarettes_per_day--Number of cigarettes per day
weight89.2Weight
alcohol_history--Any alcohol consumption?
alcohol_intensity--Frequency of alcohol consumption
bmi26.9291148412Body mass index
years_smoked--Number of years smoked
height182.0Height
gendermaleGender
project_idTCGA-BLCATCGA Project id
ethnicitynot hispanic or latinoEthnicity
Tumor_Sample_BarcodeTCGA-G2-A2EO-01A-11D-A17V-08Tumor sample barcode
Matched_Norm_Sample_BarcodeTCGA-G2-A2EO-11A-21D-A17V-08Matcheds normal sample barcode
case_id98b9ad62-76ed-43e3-91bc-b8f065d79673Case ID number

Sample Rows
 
chromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStartssampleCountfreqHugo_SymbolEntrez_Gene_IdVariant_ClassificationVariant_TypeReference_AlleleTumor_Seq_Allele1Tumor_Seq_Allele2dbSNP_RSdbSNP_Val_Statusdays_to_deathcigarettes_per_dayweightalcohol_historyalcohol_intensitybmiyears_smokedheightgenderproject_idethnicityTumor_Sample_BarcodeMatched_Norm_Sample_Barcodecase_id
chr1166070285166070286C>T1.1660702851660702860,0,011010.00242718446602FAM78B149297Nonsense_MutationSNPCCT1804.0--89.2----26.9291148412--182.0maleTCGA-BLCAnot hispanic or latinoTCGA-G2-A2EO-01A-11D-A17V-08TCGA-G2-A2EO-11A-21D-A17V-0898b9ad62-76ed-43e3-91bc-b8f065d79673
chr1166070482166070483C>G1.1660704821660704830,0,011010.00242718446602FAM78B149297Missense_MutationSNPCCGnovel--1.3698630137107.0----36.5924557984--171.0maleTCGA-BLCAnot hispanic or latinoTCGA-G2-AA3D-01A-11D-A391-08TCGA-G2-AA3D-10A-01D-A394-08e1540865-69c2-4b60-a78e-97d0ce2957dc
chr1166070695166070696T>G1.1660706951660706960,0,011010.00242718446602FAM78B149297Missense_MutationSNPTTGnovel----89.0----26.5758905909--183.0maleTCGA-BLCAnot hispanic or latinoTCGA-GC-A4ZW-01A-11D-A26M-08TCGA-GC-A4ZW-10A-01D-A26K-08a9e17a7d-7ff4-44e3-b8c9-b0d2f46b6cd5
chr1166070696166070697C>T1.1660706961660706970,0,011010.00242718446602FAM78B149297SilentSNPCCTrs748492887byFrequency----89.0----26.5758905909--183.0maleTCGA-BLCAnot hispanic or latinoTCGA-GC-A4ZW-01A-11D-A26M-08TCGA-GC-A4ZW-10A-01D-A26K-08a9e17a7d-7ff4-44e3-b8c9-b0d2f46b6cd5
chr1166166104166166106insCATGGATTATTTGAGATA1.1661661041661661060,0,012010.00242718446602FAM78B149297In_Frame_InsINS--CATGGATTATTTGAGATAnovel--1.5342465753485.0----27.131411791--177.0maleTCGA-BLCAnot hispanic or latinoTCGA-ZF-AA4V-01A-11D-A38G-08TCGA-ZF-AA4V-10A-01D-A38J-08ccd65bc8-82ef-453e-b4bc-a005cc2262d5
chr1166840977166840978C>T1.1668409771668409780,0,011010.00242718446602POGK57645Missense_MutationSNPCCTrs780719034--1.7260273972644.0----20.9274673008--145.0femaleTCGA-BLCAnot hispanic or latinoTCGA-YC-A89H-01A-11D-A364-08TCGA-YC-A89H-10A-01D-A362-0895520295-90d3-4b4e-86b6-4bd856723315
chr1166841028166841029G>A1.1668410281668410290,0,011010.00242718446602POGK57645Missense_MutationSNPGGA391.01.6438356164487.0----29.0687961509--173.0maleTCGA-BLCAnot hispanic or latinoTCGA-FD-A3SS-01A-12D-A22Z-08TCGA-FD-A3SS-10A-01D-A22Z-087b98b829-fdc7-4719-bee9-c83f6154019c
chr1166848942166848943G>A1.1668489421668489430,0,011010.00242718446602POGK57645Missense_MutationSNPGGAnovel1556.0--61.0----19.6926652893--176.0maleTCGA-BLCAnot reportedTCGA-4Z-AA82-01A-11D-A391-08TCGA-4Z-AA82-10A-01D-A394-0832e79b35-a33a-4ca0-bfb7-7ea7e4e26568
chr1166849302166849303C>T1.1668493021668493030,0,011010.00242718446602POGK57645Nonsense_MutationSNPCCTnovel--0.383561643836103.0----34.0203461488--174.0maleTCGA-BLCAnot hispanic or latinoTCGA-DK-A6B6-01A-11D-A30E-08TCGA-DK-A6B6-10A-01D-A30H-08ab6580e4-5de9-4361-b06a-1d20e5571890
chr1166849343166849344G>A1.1668493431668493440,0,011010.00242718446602POGK57645SilentSNPGGA272.01.369863013772.57----22.9558622383--177.8maleTCGA-BLCAnot hispanic or latinoTCGA-FD-A3B5-01A-11D-A20D-08TCGA-FD-A3B5-10A-01D-A20D-08462d5a6b-9f39-4aae-a35b-d06a7811d053

TCGA Pan-Cancer (gdcCancer) Track Description
 

Description

This track shows the genomic positions of somatic variants found through whole genome sequencing of tumors as part of The Cancer Genome Atlas (TCGA) by the National Cancer Institute, made available through the Genomic Data Commons Portal. The data shown here is sometimes called the "Pan-Cancer dataset", a collection of thirty-three TCGA projects processed in a uniform way.

Display Conventions and Configuration

Variants can be filtered by project ID and gender from the track details page. Pressing the "All" button allows the user to specify whether the checked values all have to be true of a particular variant, or if only one of them need be present to satisfy the filter.

The vertical viewing range in full mode can also be used to filter what variants are shown. Variants that have a sampleCount more or less than the min and max values specificed in the viewing range are not displayed.

Data access

The raw data can be explored interactively with the Table Browser or the Data Integrator.

For automated download and analysis, the genome annotation for all the thirty-three projects is stored in a bigBed file that can be downloaded from our download server. There are also bigBed files for each of the thirty-three projects in that directory. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g.,

bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/gdcCancer/gdcCancer.bb -chrom=chr21 -start=0 -end=100000000 stdout

Methods

All MuTect Variant calls were downloaded from the GDC portal in January 2019 and reformatted at UCSC to the bigBed format with a short script, cancerMafToBigBed.

Credits

Thanks to GDC for making the TCGA data available on their web site.