Schema for Centromeres - Centromere Locations
  Database: hg38    Primary Table: centromeres    Row Count: 109   Data last updated: 2014-01-09
Format description: Browser extensible data
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 189smallint(5) unsigned range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 122026459int(10) unsigned range Start position in chromosome
chromEnd 122224535int(10) unsigned range End position in chromosome
name GJ211836.1varchar(255) values Name of item

Sample Rows
 
binchromchromStartchromEndname
189chr1122026459122224535GJ211836.1
189chr1122224635122503147GJ211837.1
23chr1122503247124785432GJ212202.1
1537chr1124785532124849129GJ211855.1
192chr1124849229124932724GJ211857.1
13chr103968668239935900GJ211930.1
13chr103993600041497440GJ211932.1
901chr104149754041545720GJ211933.1
112chr104154582041593521GJ211936.1
974chr115107834851090317GJ211938.1

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Centromeres (centromeres) Track Description
 

Description

Track indicating the location of the centromere sequences. Centromeres are specialized chromatin structures that are required for cell division. These genomic regions are normally defined by long tracts of tandem repeats, or satellite DNA, that contain a limited number of sequence differences to distinguish the linear order of repeat copies. The size and repetitive nature of these regions mean they are typically not represented in reference assemblies. Unlike all previous versions of the human reference assembly, where the centromere regions have been represented by a multi-megabase gap, GRCh38 incorporates centromere reference models that provide an initial genomic description derived from chromosome-assigned whole genome shotgun (WGS) read libraries of alpha satellite.

Each reference model provides an approximation of the true array sequence organization. Although the long-range repeat ordering is not expected to represent the true organization, the submissions are expected to provide a biologically rich description of array variants and local-monomer organization as observed in the initial WGS read dataset. As a result, these sequences serve as a useful mapping target to extend sequence-based studies to sites previously omitted from the human reference genome.

Methods

The sequences are generated based on second-order Markov models of monomer variants, and graphical models of larger scale higher order repeats. The graphical models are based on an analysis of Sanger reads from the HuRef sequencing project (Assembly GCA_000002125.1; BioProject PRJNA19621), and their local-ordering is supported by observed same-read monomer adjacencies. The Markov models are generated by the program linearSat, which was written for this project and that also generates a linear representation of monomer order. The software linearSat generates a second-order Markov chain to the size of a given array provided by sequence coverage normalization estimates. The sequence definitions of transposable element insertions are limited to the sequences directly adjacent to alpha satellite within the read database, and incomplete representations are noted with an adjacent 100 bp gap. In total, these sequences provide a more complete reference of sequence composition and higher order repeat variation inherent to a given alpha satellite array, used to assemble centromeric regions of the human chromosomes.

Credits

The data for this track was supplied by Karen Miga.

References

Miga KH, Newton Y, Jain M, Altemose N, Willard HF, Kent WJ. Centromere reference models for human chromosomes X and Y satellite arrays. Genome Res. 2014 Apr;24(4):697-707. PMID: 24501022; PMC: PMC3975068