Rearrangements Track Settings
 
Rearrangements including indels, inversions, and duplications   (All Human Pangenome - HPRC tracks)

Display mode:       Reset to defaults

Minimum number of assemblies with arrangement:
Label: label for item   

Display data as a density graph:
List subtracks: only selected/visible    all  
hide
 Configure
 Insertions  Deletions in hg38 = Insertion in the HPRC assemblies   Data format 
hide
 Configure
 Deletions  Insertions in hg38 = Deletion in the HPRC assemblies   Data format 
hide
 Configure
 Inversions  Inversions with respect to hg38 in HPRC assemblies   Data format 
hide
 Configure
 Duplications  Duplications with respect to hg38 in HPRC assemblies   Data format 
hide
 Configure
 Other Rearrangements  Other Rearrangements: Unalignable sequences in both assemblies (inversions, partial transpositions)   Data format 
Assembly: Human Dec. 2013 (GRCh38/hg38)

Description

This track shows various rearrangements in the HPRC assemblies with respect to hg38. The types include indels, duplications, inversions, and other more complicated rearrangements. There are five tracks in the Rearrangement composite track:

  1. Insertions in hg38 with respect to the HPRC genomes
  2. Deletions in hg38 with respect to the HPRC genomes
  3. Inversion in hg38 with respect to the HPRC genomes
  4. Duplications in the HPRC genomes with respect to hg38
  5. Other Rearrangements: Unalignable sequences in both genomes (inversions, partial transpositions)

Display Conventions

All items are labeled by the number of HPRC assemblies that have the rearrangement. The indel tracks have one or two additional fields that specify how large the indel is in base pairs. For the Insertions and Deletions track there's only one number with "bp" after it. For insertions, it is the size of the insertion in hg38. For deletions, it is the size of the sequence deleted in hg38. For the Other Rearrangements track, there are two numbers given: the number of unaligned bases in hg38 and the number of unaligned bases in the HPRC assemblies.

Methods

All these tracks are built from the HPRC chains and nets. The actual instructions used to create these tracks are in the files hprcRearrange.txt and hprcInDel.txt. The first step for all the tracks is to find the orthologous sequences in each HPRC assembly for each chromosome in hg38. These sequences are called the query sequences. For each query sequence, we select the longest chain to the hg38 sequence. This is called the orthologous chain. Following are the specific methods for each track.

Insertions, Deletions, and Others

In each orthologous chain we look for any gaps in either the reference or the query sequence. There are two basic types of gaps. One type is when the gap contains no bases in one of the two sequences, but one or more unaligned bases in the other. These indicate a standard insertion in one sequence or a deletion in the other. There are also gaps where there are unaligned bases in both sequences. These may be alignment errors or sites where more than one rearrangement occurred between the two sequences. This type of gap is in the "Other Rearrangements" track. This gap identification is done for each of the HPRC assemblies resulting in a set of indels that are clustered based on exact boundaries of the gap in both sequences. This kind of clustering often results in indels that "pile up" with a different number of inserted or deleted bases.

Inversions and Duplications

For each orthologous chain, we look for any other chain between the same query sequence and the sequence in hg38 that overlaps the orthologous chain. Each of those overlaps is determined to be either an inversion or a local duplication in the HPRC genome by the chainArrange utility. This is done for each of the HPRC assemblies resulting in a set of inversion/duplications that are then clustered over all the assemblies. The clustering is by simple overlap such that no cluster overlaps any other and is done by the chainArrangeCollect utility.

References

Wen-Wei Liao, Mobin Asri, Jana Ebler, ...et al, Heng Lin, Benedict Paten A draft human pangenome reference. Nature. 2023 May;617(7960):312-324. PMID: 37165242; PMC: PMC1017212; DOI: 10.1038/s41586-023-05896-x

Glenn Hickey, Jean Monlong, Jana Ebler, Adam M Novak, Jordan M Eizenga, Yan Gao; Human Pangenome Reference Consortium; Tobias Marschall, Heng Li, Benedict Paten Pangenome graph construction from genome alignments with Minigraph-Cactus. Nature Biotechnology. 2023 May 10. doi: 10.1038/s41587-023-01793-w. PMID: 37165083; DOI: 10.1038/s41587-023-01793-w

Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649; DOI: 10.1038/s41586-020-2871-y

Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011 Sep;21(9):1512-28. PMID: 21665927; PMC: PMC3166836; DOI: 10.1101/gr.123356.111