Rhesus SNVs Track Settings
 
Annotated SNVs from the Rhesus Macaque Sequencing Consortium   (All Variation and Repeats tracks)

Display mode:      Duplicate track

Haplotype sorting display

Enable Haplotype sorting display
Haplotype sorting order:
using middle variant in viewing window as anchor.
If this mode is selected and genotypes are phased or homozygous, then each genotype is split into two independent haplotypes. These local haplotypes are clustered by similarity around a central variant. Haplotypes are reordered for display using the clustering tree, which is drawn in the left label area. Local haplotype blocks can often be identified using this display.
To anchor the sorting to a particular variant, click on the variant in the genome browser, and then click on the 'Use this variant' button on the next page.
using the order in which samples appear in the underlying VCF file
Allele coloring scheme:
reference alleles invisible, alternate alleles in black
reference alleles in blue, alternate alleles in red
first base of allele (A = red, C = blue, G = green, T = magenta)
Haplotype sorting display height:

Filters

Exclude variants with Quality/confidence score (QUAL) score less than
Exclude variants with these FILTER values:
 
PASS (All filters passed)
LowQual (Low quality)
hardFilter.snp (QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0)
Minimum minor allele frequency (if INFO column includes AF or AC+AN):


Display data as a density graph:

VCF configuration help

Data schema/format description and download
Assembly: Rhesus Feb. 2019 (Mmul_10/rheMac10)
Data last updated at UCSC: 2020-05-19

Description

This track shows single nucleotide variants (SNVs), from the Rhesus Macaque Genome Consortium that were sequenced and identified by Jeff Rogers' lab at BCM-HGSC.

Display Conventions

In "dense" mode, a vertical line is drawn at the position of each variant. In "pack" mode, since these variants have been phased, the display shows a clustering of haplotypes in the viewed range, sorted by similarity of alleles weighted by proximity to a central variant. The clustering view can highlight local patterns of linkage.

In the clustering display, each sample's phased diploid genotype is split into two independent haplotypes. Each haplotype is placed in a horizontal row of pixels; when the number of haplotypes exceeds the number of vertical pixels for the track, multiple haplotypes fall in the same pixel row and pixels are averaged across haplotypes.

Each variant is a vertical bar with white (invisible) representing the reference allele and black representing the non-reference allele(s). Tick marks are drawn at the top and bottom of each variant's vertical bar to make the bar more visible when most alleles are reference alleles. The vertical bar for the central variant used in clustering is outlined in purple. In order to avoid long compute times, the range of alleles used in clustering may be limited; alleles used in clustering have purple tick marks at the top and bottom.

The clustering tree is displayed to the left of the main image. It does not represent relatedness of individuals; it simply shows the arrangement of local haplotypes by similarity. When a rightmost branch is purple, it means that all haplotypes in that branch are identical, at least within the range of variants used in clustering.

Methods

All SNV calls are relative to the reference rhesus macaque genome (Mmul_10/rheMac10). Gene models from the Ensembl release 98 merged Ensembl and RefSeq dataset that also includes annotations based on PacBio iso-seq (available here) were used to predict the functional consequences of the SNVs.

Whole-genome sequencing was performed over an eight-year period. Consequently, as technology improved, the sequencing platforms used to generate next-generation sequencing reads for this dataset progressed as follows: Illumina HiSeq 2000, HiSeq Rapid 2500, HiSeq X, and NovaSeq platforms, generating 2 X 100 bp or 2 X 150 bp paired-end reads, as is typical for each platform. All underlying sequence data have been deposited into the SRA (BioProject ID: PRJNA251548).

Reads were aligned to the reference genome (Mmul_10/rheMac10) , which also included the mitochondria genome (NC_005943.1) and had the pseudoautosomal region of chromosome Y masked using BWA-MEM 0.7.12-r1039 (Li and Durbin, 2009; Li, 2013). To identify reads potentially originating from a single fragment of DNA and mark them in the bam files, we used Picard MarkDuplicates version 1.105.

SNVs were called using the Genome Analysis Toolkit (GATK) version 4.1.2.0 (McKenna, et al., 2010) and a VCF file was generated. The hard filters suggested by the developers of GATK (https://software.broadinstitute.org/gatk/documentation/article?id=11097) were applied to the SNVs and all failing SNVs were removed. We then used GATK VariantAnnotator to annotate SNVs applying AlleleBalance. SNVs with an allelic balance for heterozygous calls (ABHet=ref/(ref+alt)) ABHet < 0.2 or ABHet > 0.8 were removed.

The Variant Effect Predictor software from Ensembl (McLaren et al., 2010) was used to predict the functional consequence of SNVs queried against Ensembl release 98 rhesus macaque gene models based on Ensembl and RefSeq gene predictions and including PacBio iso-seq data.

Definitions of consequence types can be found in the VEP documentation.

Credits

Thanks to the Rhesus Macaque Genome Consortium and Jeff Rogers' lab at BCM-HGSC for supplying the data for this track.

References

Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at http://arxiv.org/pdf/1303.3997v2.pdf 2013.

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009 Jul 15;25(14):1754-60. PMID: 19451168; PMC: PMC2705234

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010 Sep;20(9):1297-303. PMID: 20644199; PMC: PMC2928508

McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010 Aug 15;26(16):2069-70. PMID: 20562413; PMC: PMC2916720