UCSC Genome Browser assembly ID: hg38
Sequencing/Assembly provider ID: GRCh38 Genome Reference Consortium Human Reference 38 (GCA_000001405.15)
Assembly date: Dec. 2013
GenBank accession ID: GCA_000001305.2
NCBI Genome information: NCBI genome/51 (Homo sapiens)
NCBI Assembly information:
NCBI assembly/883148 (GRCh38/GCA_000001405.15)
BioProject information: NCBI Bioproject: 31257
Search the assembly:
By position or search term: Use the "position or search term"
box to find areas of the genome associated with many different attributes, such
as a specific chromosomal coordinate range; mRNA, EST, or STS marker names; or
keywords from the GenBank description of an mRNA.
More information, including sample
By gene name: Type a gene name into the "search term" box,
choose your gene from the drop-down list, then press "submit" to go
directly to the assembly location associated with that gene.
By track type: Click the "track search" button
to find Genome Browser tracks that match specific selection criteria.
Download sequence and annotation data:
The GRCh38 assembly is the first major revision of the human genome released in more than four
years. As with the previous GRCh37 assembly, the
Genome Reference Consortium (GRC)
is now the primary source for human genome assembly data submitted to GenBank. Beginning with this
release, the UCSC Genome Browser version numbers for the human assemblies now match those of the
GRC to minimize version confusion. Hence, the GRCh38 assembly is referred to as "hg38"
in the Genome Browser datasets and documentation. For a glossary of assembly-related terms, see the
GRC Assembly Terminology page.
Alternate sequences - Several human chromosomal regions exhibit sufficient variability to
prevent adequate representation by a single sequence. To address this, the GRCh38 assembly provides
alternate sequence for selected variant regions through the inclusion of alternate loci
scaffolds (or alt loci). Alt loci are separate accessioned sequences that are aligned
to reference chromosomes. This assembly contains 261 alt loci, many of which are associated with
the LRC/KIR area of chr19 and the MHC region on chr6.
See the sequences page for a complete
list of the reference chromosomes and alternate sequences in GRCh38.
Centromere representation - Debuting in this release, the large megabase-sized gaps that
represented centromeric regions in previous assemblies have been replaced by sequences from
centromere models created by
Karen Miga et al., using centromere databases developed during her work in
the Willard lab at
Duke University and analysis software developed while working in the
Kent lab at UCSC.
The models, which provide the approximate repeat number and order for each centromere, will be
useful for read mapping and variation studies.
Mitochondrial genome - The mitochondrial reference sequence included in the GRCh38 assembly
(termed "chrM" in the UCSC Genome Browser) is the
Revised Cambridge Reference Sequence (rCRS) from
MITOMAP with GenBank accession number
J01415.2 and RefSeq accession number NC_012920.1. This differs from the chrM sequence
(RefSeq accession number NC_001907) provided by the Genome Browser for hg19, which was not updated
when the GRCh37 assembly later transitioned to the new version.
Sequence updates - Several erroneous bases and misassembled regions in GRCh37 have been
corrected in the GRCh38 assembly, and more than 100 gaps have been filled or reduced. Much of the
data used to improve the reference sequence was obtained from other genome sequencing and analysis
projects, such as the 1000 Genomes
Analysis set - The GRCh38 assembly offers an "analysis set" that was created to
accommodate next generation sequencing read alignment pipelines. To avoid false mapping of reads,
duplicate copies of centromeric arrays and WGS on several chromosomes have been hard-masked with
Ns. The two PAR regions on chromosome Y have also been hard-masked, and the Epstein-Barr virus
sequence has been added as a decoy to attract contamination in samples. Two versions of the
analysis set are available on our
one without the alternate chromosomes from this assembly, and one that includes them.
Chromosome naming conventions
UCSC has introduced some slight changes to the Genome Browser chromosome naming scheme with
- Haplotype chromosome, unplaced contig and unlocalized contig names now include
their NCBI accession number (e.g., chr6_GL000256v2_alt)
- The "v2" at the end of the accession number indicates the NCBI version number
- Haplotype chromosome names consist of the chromosome number, followed by the NCBI accession
number, followed by "alt"
- Unplaced contig names consist of the chromosome number, followed by the NCBI accession
number, followed by "random"
- Unlocalized contig names (contigs whose associated chromosome is not known) consist of
"chrUn" followed by the NCBI accession number
The Y chromosome in this assembly contains two pseudoautosomal regions (PARs)
that were taken from the corresponding regions in the X chromosome and are
chrY:10,000-2,781,479 and chrY:56,887,902-57,217,415
chrX:10,000-2,781,479 and chrX:155,701,382-156,030,895
For a detailed set of statistics about this assembly, see the
GRCh38 GenBank record.
For more information about the files included in the GRCh38 GenBank submission, see the
Bulk downloads of the sequence and annotation data may be obtained from the Genome Browser
FTP server or the
The annotation tracks for this browser were generated by UCSC and collaborators worldwide.
Miga KH, Newton Y, Jain M, Altemose N, Willard HF, Kent WJ.
Centromere reference models for human chromosomes X and Y satellite arrays.
Genome Res. 2014 Apr;24(4):697-707. Epub 2014 Feb 5.