Schema for Problematic Regions - Problematic Regions for NGS or Sanger sequencing or very variable regions

Home
Genomes
Genome Browser
Tools
Mirrors
- Euro/Asia Mirrors
- Mirroring Instructions
- US Server
- European Server
- Asian Server
Downloads
My Data
Projects
Help
About Us
- News
- Publications
- Blog
- Cite Us
- Credits
- Release Log
- Staff
- Conditions of Use
- Our History
- Jobs
- Licenses
- Contact Us

field

example

description

chrom

chr1

Chromosome (or contig, scaffold, etc.)

chromStart

196712517

Start position in chromosome

chromEnd

196712823

End position in chromosome

start

196712582

start

end

196712758

end

gene

CFH

gene

plusminus65bpexonlength

307

plus minus 65bp exon length

exonlength

177

exon length

averagemappabilityl250m5

0.540716612

average mappability l250m5

exonmappbelow1l250m51yes0no

True

exon mapp below 1 l250m5 (1=yes, 0=no)

positionsmappbelow1l250m5

282

positions mapp below 1 l250m5

percpositionsmappbelow1l250m5

0.9186

%positions mapp below 1 l250m5

maxcontigousbasesl250m5

183

max contigous bases l250m5

chrom

chromStart

chromEnd

start

end

gene

plusminus65bpexonlength

exonlength

averagemappabilityl250m5

exonmappbelow1l250m51yes0no

positionsmappbelow1l250m5

percpositionsmappbelow1l250m5

maxcontigousbasesl250m5

chr1

196712517

196712823

196712582

196712758

CFH

307

177

0.540716612

True

282

0.9186

183

chr1

196757281

196757593

196757346

196757528

CFHR3

313

183

0.53514377

True

291

0.9297

291

chr1

196759110

196759422

196759175

196759357

CFHR3

313

183

0.5

True

313

chr1

196788910

196789097

196788975

196789032

CFHR1

188

0.5

True

188

chr1

196797135

196797441

196797200

196797376

CFHR1

307

177

0.542345277

True

281

0.9153

183

chr1

196881805

196882117

196881870

196882052

CFHR4

313

183

0.529286463

True

291

0.9297

291

chr1

196884021

196884333

196884086

196884268

CFHR4

313

183

0.5

True

313

chr1

196912946

196913133

196913011

196913068

CFHR2

188

0.5

True

188

chr1

201175146

201182814

201175211

201182749

IGFN1

7669

7539

0.927652435

True

956

0.1247

760

chr1

202527940

202528086

202528005

202528021

PPP1R12B

147

0.453514551

True

133

0.9048

108

Description

This track helps call out sections of the genome that often cause problems for bioinformaticians. The 12 subtracks identify genomic regions known to cause analysis artifacts for common sequencing downstream computations, such as alignment, variant calling, or peak calling. The underlying data was imported from the NCBI GeT-RM, the Genome-in-a-Bottle, and Anshul Kundaje's ENCODE Blacklist projects.

The only exception is the UCSC Unusual Regions subtrack, which contains annotations of a few special gene clusters (IGH, IGL, PAR1/2, TCRA, TCRB, etc) and fixed sequences, alternate haplotypes, unplaced contigs, pseudo-autosomal regions, and mitochondria. These loci can yield alignments with low-quality mapping scores and discordant read pairs. This data set was manually curated, based on the Genome Browser's assembly description, the FAQs about assembly, and the NCBI RefSeq "other" annotations track data.

The ENCODE Blacklist subtrack contains a comprehensive set of regions which are troublesome for high-throughput Next-Generation Sequencing (NGS) aligners. These regions tend to have a very high ratio of multi-mapping to unique mapping reads and high variance in mappability due to repetitive elements such as satellite, centromeric and telomeric repeats.

The Genome-In-A-Bottle (GIAB) track set contains defined regions where it is difficult to make a confident call, due to low coverage, systematic sequencing errors, and local alignment problems. These regions were identified from sequencing data generated by multiple technologies.

The NCBI GeT-RM, Genetic Testing Reference Materials, track set contains highly homologous gene- and exon-level regions difficult or impossible to analyze with standard Sanger or short-read NGS approaches and are relevant to current clinical testing.

Display Conventions and Configuration

Each track contains a set of regions of varying length with no special configuration options. The UCSC Unusual Regions track has a mouse-over description, all other tracks have at most a name field, which can be shown in pack mode. The tracks are usually kept in dense mode.

The Hide empty subtracks control hides subtracks with no data in the browser window. Changing the browser window by zooming or scrolling may result in the display of a different selection of tracks.

Data access

The raw data can be explored interactively with the Table Browser or the Data Integrator.

For automated download and analysis, the genome annotation is stored in bigBed files that can be downloaded from our download server. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/bbi/problematic/deadZone.bb -chrom=chr21 -start=0 -end=100000000 stdout

Methods

Files were downloaded from the respective databases and converted to bigBed format. The procedure is documented in our hg19 makeDoc file (search problematic).

Credits

Thanks to Anna Benet-Pages, Max Haeussler, Angie Hinrichs, and Daniel Schmelter at the UCSC Genome Browser for planning, building, and testing these tracks. The underlying data comes from the ENCODE Blacklist, the GeT-RM, and the Genome-in-a-Bottle projects.

References

Amemiya HM, Kundaje A, Boyle AP. The ENCODE Blacklist: Identification of Problematic Regions of the Genome. Sci Rep. 2019 Jun 27;9(1):9354. PMID: 31249361; PMC: PMC6597582

Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O, Hide W, Salit M. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat Biotechnol. 2014 Mar;32(3):246-51. PMID: 24531798

Mandelker D, Schmidt RJ, Ankala A, McDonald Gibson K, Bowser M, Sharma H, Duffy E, Hegde M, Santani A, Lebo M et al. Navigating highly homologous genes in a molecular diagnostic setting: a resource for clinical next- generation sequencing. Genet Med. 2016 Dec;18(12):1282-1289. PMID: 27228465