Schema for gnomAD Constraint Metrics - Genome Aggregation Database (gnomAD) - Predicted Constraint Metrics (pLI and Z-scores)

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Database: hg19 Primary Table: gnomadMissense Data last updated: 2020-02-24
Big Bed File Download: /gbdb/hg19/gnomAD/missense/missenseConstrained.bb
Item Count: 6,507
The data is stored in the binary BigBed format.

Format description: Parts of transcripts scored according to how well that region tolerates missense variation.

field	example	description
`chrom`	chr1	Chromosome (or contig, scaffold, etc.)
`chromStart`	166888591	Start position in chromosome
`chromEnd`	166890459	End position in chromosome
`name`	ENST00000271417.3	Name of item
`score`	352	Score from 0-1000
`strand`	-	+ or -
`thickStart`	166888591	Start of where display should be thick (start codon)
`thickEnd`	166890459	End of where display should be thick (stop codon)
`reserved`	0,0,0	RGB color of item
`blockCount`	2	Number of blocks
`blockSizes`	36,516	Comma separated list of block sizes
`chromStarts`	0,1352	Start positions relative to chromStart
`geneName`	ILDR2	Name of corresponding gene
`observed`	29	Number of observed missense variants
`expected`	82.312	Number of expected missense variants
`obs_exp`	0.352	Observed/expected score
`chisq`	34.529	Chi-Squared Difference
`_mouseOver`	O/E: 0.352	Mouseover label

Sample Rows

chrom	chromStart	chromEnd	name	score	strand	thickStart	thickEnd	reserved	blockCount	blockSizes	chromStarts	geneName	observed	expected	obs_exp	chisq	_mouseOver
chr1	166888591	166890459	ENST00000271417.3	352	-	166888591	166890459	0,0,0	2	36,516	0,1352	ILDR2	29	82.312	0.352	34.529	O/E: 0.352
chr1	166890459	166944505	ENST00000271417.3	913	-	166890459	166944505	0,0,0	9	157,217,114,177,147,57,120,333,46	0,1370,5844,14078,15368,18291,35510,36546,54000	ILDR2	143	156.589	0.913	1.179	O/E: 0.913
chr1	171810796	171956806	ENST00000358155.4	277	+	171810796	171956806	0,0,0	3	161,74,11	0,80091,145999	DNM3	9	32.380	0.278	23.714	O/E: 0.278
chr1	171956806	172277979	ENST00000358155.4	667	+	171956806	172277979	0,0,0	14	139,204,99,161,143,136,68,139,87,71,52,114,110,112	0,1278,44735,45438,50652,54342,56718,60945,81152,94165,105157,143508,265906,321061	DNM3	102	152.766	0.668	19.130	O/E: 0.668
chr1	172292468	172376981	ENST00000358155.4	912	+	172292468	172376981	0,0,0	5	12,165,227,237,70	0,55689,63804,65244,84443	DNM3	75	82.234	0.912	0.656	O/E: 0.912
chr1	173907858	173947662	ENST00000367696.2	861	-	173907858	173947662	0,0,0	14	151,116,174,133,91,214,153,168,371,215,282,113,119,37	0,2544,4721,7755,8025,8648,13265,22356,23004,25252,26118,31784,33788,39767	RC3H1	213	247.136	0.862	4.715	O/E: 0.862
chr1	173947662	173962123	ENST00000367696.2	441	-	173947662	173962123	0,0,0	6	96,201,176,240,121,231	0,2284,4202,4893,5974,14230	RC3H1	49	110.974	0.442	34.610	O/E: 0.442
chr1	175292492	175323551	ENST00000367674.2	552	-	175292492	175323551	0,0,0	6	120,164,162,97,152,25	0,999,6717,12354,14171,31034	TNR	58	104.886	0.553	20.959	O/E: 0.553
chr1	175323551	175375850	ENST00000367674.2	950	-	175323551	175375850	0,0,0	16	108,131,144,120,147,120,270,264,90,186,270,151,116,264,477,499	0,1087,1903,5196,8247,9292,10594,11459,12792,25136,31616,36872,39364,42128,48724,51800	TNR	388	408.385	0.950	1.018	O/E: 0.950
chr1	176833419	177030375	ENST00000361833.2	923	-	176833419	177030375	0,0,0	22	238,184,101,136,152,134,269,189,105,128,195,167,151,138,75,85,168,150,108,147,394,162	0,4560,12253,18575,20055,23787,30278,69868,71982,79607,81642,84901,93394,94061,100879,150507,159120,160299,165350,166522,168172, ...	ASTN1	414	448.223	0.924	2.613	O/E: 0.924

gnomAD Constraint Metrics (gnomadPLI) Track Description


	Description The Genome Aggregation Database (gnomAD) - Predicted Constraint Metrics track set contains metrics of pathogenicity per-gene as predicted for gnomAD v2.1.1 and identifies genes subject to strong selection against various classes of mutation. This track includes several subtracks of constraint metrics calculated at gene (canonical transcript), transcript and transcript-region level. For more information see the following blog post. The metrics include: Observed and expected variant counts per transcript/gene Observed/Expected ratio (O/E) Z-scores of the observed counts compared to expected Probability of loss of function intolerance (pLI), for predicted loss-of-function (pLoF) variation only Chi-Squared difference of observed to expected counts, for the regional missense constraint track only Display Conventions and Configuration There are three "groups" of tracks in this set: Gene/Transcript LoF Constraint tracks: Predicted constraint metrics at the whole gene level or whole transcript level for three different types of variation: missense, synonymous, and predicted loss of function. The Gene Constraint track displays metrics for a canonical transcript per gene defined as the longest isoform. The Transcript Constraint track displays metrics for all transcript isoforms. Items on both tracks are shaded according to the pLI score, with outlier items shaded in grey. Gene/Transcript Missense Constraint tracks: The missense constraint tracks are built similarly to the LoF constraint tracks, however the items displayed are based on missense Z scores. All items are colored black, and individual Z scores can be seen on mouseover. Regional Constraint track: Missense-variation constrained regions at the sub-genic level. This track displays metrics for transcripts that have two or more regions with significantly different levels of missense constraint. All items are colored black. All tracks follow the general configuration settings for bigBed tracks. Mouseover on the Gene/Transcript Constraint tracks shows the pLI score and the loss of function observed/expected upper bound fraction (LOEUF), while mouseover on the Regional Constraint track shows only the missense O/E ratio. Clicking on items in any track brings up a table of constraint metrics. Clicking the grey box to the left of the track, or right-clicking and choosing the Configure option, brings up the interface for filtering items based on their pLI score, or labeling the items based on their Ensembl identifier and/or Gene Name. Methods Please see the gnomAD browser help page and FAQ for further explanation of the topics below. Observed and Expected Variant Counts Observed count: The number of unique single-nucleotide variants in each transcript/gene with 123 or fewer alternative alleles (MAF < 0.1%). Expected count: A depth-corrected probability prediction model that takes into account sequence context, coverage, and methylation was used to predict expected variant counts. For more information please see Lek et al., 2016. Variants found in exons with a median depth < 1 were removed from both counts. The O/E constraint score is the ratio of the observed/expected variants in that gene. Each item in this track shows the O/E ratio for three different types of variation: missense, synonymous, and loss-of-function. The O/E ratio is a continuous measurement of how tolerant a gene or transcript is to a certain class of variation. When a gene has a low O/E value, it is under stronger selection for that class of variation than a gene with a higher O/E value. Because Counts depend on gene size and sample size, the precision of the values varies a lot from one gene to the next. Therefore, the 90% confidence interval (CI) is also displayed along with the O/E ratio to better assist interpretation of the scores. When evaluating how constrained a gene is, it is essential to consider the CI when using O/E. In research and clinical interpretation of Mendelian cases, pLI > 0.9 has been widely used for filtering. Accordingly, the Gnomad team suggests using the upper bound of the O/E confidence interval LOEUF < 0.35 as a threshold if needed. Please see the Methods section below for more information about how the scores were calculated. pLI and Z-scores The pLI and Z-scores of the deviation of observed variant counts relative to the expected number are intended to measure how constrained or intolerant a gene or transcript is to a specific type of variation. Genes or transcripts that are particularly depleted of a specific class of variation (as observed in the gnomAD data set) are considered intolerant of that specific type of variation. Z-scores are available for the missense and synonynmous categories and pLI scores are available for the loss-of-function variation. NOTE: The Regional Constraint track data reflects regions within transcripts that are intolerant of missense variation within the ExAc dataset and was calculated with the method described by Samocha et al., 2017. Missense and Synonymous: Positive Z-scores indicate more constraint (fewer observed variants than expected), and negative scores indicate less constraint (more observed variants than expected). A greater Z-score indicates more intolerance to the class of variation. Z-scores were generated by a sequence-context-based mutational model that predicted the number of expected rare (< 1% MAF) variants per transcript. The square root of the chi-squared value of the deviation of observed counts from expected counts was multiplied by -1 if the observed count was greater than the expected and vice versa. For the synonymous score, each Z-score was corrected by dividing by the standard deviation of all synonymous Z-scores between -5 and 5. For the missense scores, a mirrored distribution of all Z-scores between -5 and 0 was created, and then all missense Z-scores were corrected by dividing by the standard deviation of the Z-score of the mirror distribution. Loss-of-function: pLI closer to 1 indicates that the gene or transcript cannot tolerate protein truncating variation (nonsense, splice acceptor and splice donor variation). The gnomAD team recommends transcripts with a pLI >= 0.9 for the set of transcripts extremely intolerant to truncating variants. pLI is based on the idea that transcripts can be classified into three categories: null: heterozygous or homozygous protein truncating variation is completely tolerated recessive: heterozygous variants are tolerated but homozygous variants are not haploinsufficient: heterozygous variants are not tolerated An expectation-maximization algorithm was then used to assign a probability of belonging in each class to each gene or transcript. pLI is the probability of belonging in the haploinsufficient class. Please see Samocha et al., 2014 and Lek et al., 2016 for further discussion of these metrics. Transcripts Included Transcripts from GENCODE v19 were filtered according to the following criteria: Must have methionine at start of coding sequence Must have stop codon at end of coding sequence Must be divisible by 3 Must have at least one observed variant when removing exons with median depth < 1 Must have reasonable number of missense and synonymous variants as determined by a Z-score cutoff After filtering the transcript set, 18225 transcripts were left. UCSC Track Methods Gene and Transcript Constraint tracks Per gene and per transcript data were downloaded from the gnomAD Google Storage bucket: gs://gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_gene.txt.bgz gs://gnomad-public/release/2.1.1/constraint/gnomad.v2.1.1.lof_metrics.by_transcript.txt.bgz These data were then joined to the Gencode v19 set of genes/transcripts available at the UCSC Genome Browser and then transformed into a bigBed 12+5. For the full list of commands used to make this track please see the "gnomAD 2 pLI and other loss-of-function metrics" section of the makedoc. Regional Constraint track Supplementary Table 4 from the associated publication was downloaded and joined to the Gencode v19 set of transcripts available at UCSC and then transformed into a bigBed 12+6. For the full list of commands used to make this track please the "gnomAD Missense Constraint Scores" section of the makedoc. Data Access The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset. The genome annotation is stored in a bigBed file that can be downloaded from the download server. The exact filenames can be found in the track configuration file. Annotations can be converted to ASCII text by our tool `bigBedToBed` which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/gnomAD/pLI/pliByTranscript.bb -chrom=chr6 -start=0 -end=1000000 stdout Please refer to our mailing list archives for questions and example queries, or our Data Access FAQ for more information. More information about using and understanding the gnomAD data can be found in the gnomAD FAQ site. Credits Thanks to the Genome Aggregation Database Consortium for making these data available. The data are released under the ODC Open Database License (ODbL) as described here. References Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, O'Donnell-Luria AH, Ware JS, Hill AJ, Cummings BB et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016 Aug 18;536(7616):285-91. PMID: 27535533; PMC: PMC5018207 Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020 May;581(7809):434-443. PMID: 32461654; PMC: PMC7334197 Collins RL, Brand H, Karczewski KJ, Zhao X, Alföldi J, Francioli LC, Khera AV, Lowther C, Gauthier LD, Wang H et al. A structural variation reference for medical and population genetics. Nature. 2020 May;581(7809):444-451. PMID: 32461652; PMC: PMC7334194 Cummings BB, Karczewski KJ, Kosmicki JA, Seaby EG, Watts NA, Singer-Berk M, Mudge JM, Karjalainen J, Satterstrom FK, O'Donnell-Luria AH et al. Transcript expression-aware annotation improves rare variant interpretation. Nature. 2020 May;581(7809):452-458. PMID: 32461655; PMC: PMC7334198

Description

Display Conventions and Configuration

Methods

Observed and Expected Variant Counts

pLI and Z-scores

Transcripts Included

UCSC Track Methods

Gene and Transcript Constraint tracks

Regional Constraint track

Data Access

Credits

References