Dosage Sensitivity Track Settings
 
pHaplo and pTriplo dosage sensitivity map from Collins et al 2022   (All Phenotype and Literature tracks)

Display mode:       Reset to defaults

Display data as a density graph:
List subtracks: only selected/visible    all  
hide
 Configure
 pHaploinsufficiency  Probability of haploinsufficiency   Data format 
hide
 Configure
 pTriplosensitivity  Probability of triplosensitivity   Data format 
Assembly: Human Dec. 2013 (GRCh38/hg38)

Description

This container track represents dosage sensitivity map data from Collins et al 2022. There are two tracks, one corresponding to the probability of haploinsufficiency (pHaplo) and one to the probability of triplosensitivity (pTriplo).

Rare copy-number variants (rCNVs) include deletions and duplications that occur infrequently in the global human population and can confer substantial risk for disease. Collins et al aimed to quantify the properties of haploinsufficiency (i.e., deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout the human genome by analyzing rCNVs from nearly one million individuals to construct a genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage sensitive segments associated with at least one disorder. These segments were typically gene-dense and often harbored dominant dosage sensitive driver genes. An ensemble machine learning model was built to predict dosage sensitivity probabilities (pHaplo & pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 triplosensitive genes, including 648 that were uniquely triplosensitive.

Display Conventions and Configuration

Each of the tracks is displayed with a distinct item (bed track) covering the entire gene locus wherever a score was available. Clicking on an item provides a link to DECIPHER which contains the sensitivity scores as well as additional information. Mousing over the items will display the gene symbol, the ESNG ID for that gene, and the respective sensitivity score for the track rounded to two decimal places. Filters are also available to specify specific score thresholds to display for each of the tracks.

Coloring and Interpretation

Each of the tracks is colored based on standardized cutoffs for pHaplo and pTriplo as described by the authors:

pHaplo scores ≥0.86 indicate that the average effect sizes of deletions are as strong as the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7) (Karczewski et al., 2020). pHaplo scores ≥0.55 indicate an odds ratio ≥2.

pTriplo scores ≥0.94 indicate that the average effect sizes of deletions are as strong as the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7) (Karczewski et al., 2020). pHaplo scores ≥0.68 indicate an odds ratio ≥2.

Applying these cutoffs defined 2,987 haploinsufficient (pHaplo≥0.86) and 1,559 triplosensitive (pTriplo≥0.94) genes with rCNV effect sizes comparable to loss-of-function of gold-standard PTV-constrained genes.

See below for a summary of the color scheme:

  • Dark red items - pHaplo ≥ 0.86
  • Bright red items - pHaplo < 0.86
  • Dark blue items - pTriplo ≥ 0.94
  • Bright blue items - pTriplo < 0.94

Methods

The data were downloaded from Zenodo which consisted of a 3-column file with gene symbols, pHaplo, and pTriplo scores. Since the data were created using GENCODEv19 models, the hg19 data was mapped using those coordinates by picking the earliest transcription start site of all of the respective gene transcripts and the furthest transcription end site. This leads to some gene boundaries that are not representative of a real transcript, but since the data are for gene loci annotations this maximum coverage was used. Finally, both scores were rounded to two decimal points for easier interpretation.

For hg38, we attempted to use updated gene positions using a few different datasets since gene symbols have been updated many times since GENCODEv19. A summary of the workflow can be seen below, with each subsequent step being used only for genes where mapping failed:

  1. Gene symbols were mapped using MANE1.0. < 2000 items failed mapping here.
  2. Mapping with GENCODEv45 was attempted.
  3. Mapping with GENCODEv20 was attempted. At this point, 448 items were not mapped.
  4. Finally, any missing items were lifted using the hg19 track. 19/448 items failed mapping due to their regions having been split from hg19 to hg38.

In summary, the hg19 track was mapped using the original GENCODEv19 mappings, and a series of steps were taken to map the hg38 gene symbols with updated coordinates. 19/18641 items could not be mapped and are missing from the hg38 tracks.

The complete makeDoc can be found online. This includes all of the track creation steps.

Data Access

The raw data can be explored interactively with the Table Browser, or the Data Integrator. For automated access, this track, like all others, is available via our API. However, for bulk processing, it is recommended to download the dataset.

For automated download and analysis, the genome annotation is stored at UCSC in bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tools can also be used to obtain features confined to a given range, e.g.,

bigBedToBed -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/dosageSensitivityCollins2022/pHaploDosageSensitivity.bb stdout

Please refer to our Data Access FAQ for more information.

Credits

Thanks to DECIPHER for their support and assistance with the data. We would also like to thank Anna Benet-Pagès for suggesting and assisting in track development and interpretation.

References

Collins RL, Glessner JT, Porcu E, Lepamets M, Brandon R, Lauricella C, Han L, Morley T, Niestroj LM, Ulirsch J et al. A cross-disorder dosage sensitivity map of the human genome. Cell. 2022 Aug 4;185(16):3041-3055.e25. PMID: 35917817; PMC: PMC9742861