- Genome Browser
- My Data
- About Us
The bigMaf format stores multiple alignments in a format compatible with MAF files, which is then compressed and indexed as a bigBed.
The bigMaf files are created using the program
bedToBigBed, run with the
-as option to pull in a special autoSql (.as) file that defines the fields of the bigMaf.
The bigMaf files are in an indexed binary format. The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server. Because of this, bigMaf files have considerably faster display performance than regular MAF files when working with large data sets. The bigMaf file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigMaf files, please see the Hosting section of the Track Hub Help documentation.
The following autoSql definition is used to specify bigMaf multiple alignment files. This
definition, contained in the file bigMaf.as, is
pulled in when the
bedToBigBed utility is run with the
table bedMaf "Bed3 with MAF block" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" lstring mafBlock; "MAF block" )
bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt
Alongside the bigMaf file, two other summary and frame bigBeds are created. The
following autoSql definition is used to create the first file, pointed to online
summary <url>, rather than the standard
bigDataUrl <url> used with bigMaf. The file
mafSummary.as, is pulled in when
bedToBigBed utility is run with the
table mafSummary "Positions and scores for alignment blocks" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string src; "Sequence name or database of alignment" float score; "Floating point score." char leftStatus; "Gap/break annotation for preceding block" char rightStatus; "Gap/break annotation for following block" )
bedToBigBed -type=bed3+4 -as=mafSummary.as
-tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb.
hgLoadMafSummary generates the input
The following autoSql definition is used to create the second file,
pointed to online with
frames <url>. The file
mafFrames.as, is pulled in when
bedToBigBed utility is run with the
table mafFrames "codon frame assignment for MAF components" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start range in chromosome" uint chromEnd; "End range in chromosome" string src; "Name of sequence source in MAF" ubyte frame; "frame (0,1,2) for first base(+) or last bast(-)" char strand; "+ or -" string name; "Name of gene used to define frame" int prevFramePos; "target position of the previous base (in transcription direction) that continues this frame, or -1 if none, or frame not contiguous" int nextFramePos; "target position of the next base (in transcription direction) that continues this frame, or -1 if none, or frame not contiguous" ubyte isExonStart; "does this start the CDS portion of an exon?" ubyte isExonEnd; "does this end the CDS portion of an exon?" )
bedToBigBed -type=bed3+8 -as=mafFrames.as
-tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb. Another tool,
genePredToMafFrames generates the input
Note that the
bedToBigBed utility uses a substantial amount of memory: approximately
25% more RAM than the uncompressed BED input file.
To create a bigMaf track, follow these steps:
Step 1. If you already have a MAF file you would like to convert to a bigMaf, skip to Step 3. Otherwise, download this example MAF file for the human GRCh38 (hg38) assembly.
Step 2. If you would like to include optional reading frame and block summary information, download the chr22_KI270731v1_random.gp genePred file.
Download the autoSql file bigMaf.as needed by
bedToBigBed. If you have opted to include the optional frame summary and information
with your bigMaf file, you must also download the autoSql files
Here are wget commands to obtain the above files and the hg38.chrom.sizes file mentioned below:
wget https://genome.ucsc.edu/goldenPath/help/examples/chr22_KI270731v1_random.maf wget https://genome.ucsc.edu/goldenPath/help/examples/chr22_KI270731v1_random.gp wget https://genome.ucsc.edu/goldenPath/help/examples/bigMaf.as wget https://genome.ucsc.edu/goldenPath/help/examples/mafSummary.as wget https://genome.ucsc.edu/goldenPath/help/examples/mafFrames.as wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
mafToBigMaf programs from the UCSC
binary utilities directory. If you have
opted to generate the optional frame and summary files for your multiple alignment, you must also
genePredToMafFrames programs from the same
fetchChromSizes script from the
same directory to create a
chrom.sizes file for the UCSC database with which you are working (e.g., hg38).
Alternatively, you can download the
chrom.sizes file for any assembly hosted at UCSC from our
downloads page (click on "Full
data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38
database is located at
mafToBigMaf hg38 chr22_KI270731v1_random.maf stdout | sort -k1,1 -k2,2n > bigMaf.txt bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt hg38.chrom.sizes bigMaf.bb
Note that the hg38 in the mafToBigMaf hg38 command indicates the referenceDb and matches the expected prefix of the primary species' sequence name, for instance hg38 for the hg38.chr22_KI270731v1_random found in the input example chr22_KI270731v1_random.maf file.
Step 6. Follow the below steps to create the binary indexed mafFrames and mafSummary files to accompany your bigMaf file:
genePredSingleCover chr22_KI270731v1_random.gp single.gp genePredToMafFrames hg38 chr22_KI270731v1_random.maf bigMafFrames.txt hg38 single.gp bedToBigBed -type=bed3+8 -as=mafFrames.as -tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb hgLoadMafSummary -minSeqSize=1 -test hg38 bigMafSummary chr22_KI270731v1_random.maf cut -f2- bigMafSummary.tab | sort -k1,1 -k2,2n > bigMafSummary.bed bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb
Step 7. Move the newly created bigMaf file (bigMaf.bb) to a web-accessible http, https or ftp location. If you generated the bigMafSummary.bb and/or bigMafFrames.bb files, move those to a web accessible location, likely same location as the bigMaf.bb file.
Step 8. Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the track line will look something like this:
track type=bigMaf name="My Big MAF" description="A Multiple Alignment" bigDataUrl=http://myorg.edu/mylab/bigMaf.bb summary=http://myorg.edu/mylab/bigMafSummary.bb frames=http://myorg.edu/mylab/bigMafFrames.bb
Step 9. Paste the custom track line into the text box on the custom track management page. Navigate to chr22_KI270731v1_random to see the example data for this track.
bedToBigBed program can be run with several additional options. For a full
list of the available options, type
bedToBigBed (with no arguments) on the command line
to display the usage message.
In this example, you will create a bigMaf custom track using an existing bigMaf file, bigMaf.bb, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly.
To create a custom track using this bigMaf file:
track type=bigMaf name="bigMaf Example One" description="A bigMaf file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb summary=http://genome.ucsc.edu/goldenPath/help/examples/bigMafSummary.bb
Note that additional track line options exist that are specific to the
MAF format. For instance, adding the parameter
speciesOrder="panTro4 rheMac3 mm10 rn5 canFam3 monDom5" to the above
example will specify the order of sequences by species.
Custom tracks can also be loaded via one URL line. This link loads the same bigMaf.bb track and sets additional display parameters in the URL:
After this example bigMaf is loaded in the Genome Browser, click into an alignment on the browser's track display. Note that the details page displays information about the individual alignments, similar to that which is available for a standard MAF track.
In this example, you will create a bigMaf file from an existing bigMaf input file, bigMaf.txt, located on the UCSC Genome Browser http server.
bedToBigBedutility (Step 4, above).
bedToBigBedutility to create a binary indexed MAF file (Step 6, above):
bedToBigBed -type=bed3+1 -tab -as=bigMaf.as bigMaf.txt hg38.chrom.sizes bigMaf.bb
If you would like to share your bigMaf data track with a colleague, learn how to create a URL by looking at Example 6 on this page.
Because bigMaf files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory.
bigBedToBed— converts a bigBed file to ASCII BED format.
bigBedSummary— extracts summary information from a bigBed file.
bigBedInfo— prints out information about a bigBed file.
As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement.
If you encounter an error when you run the
bedToBigBed program, check your input
file for data coordinates that extend past the the end of the chromosome. If these are present, run
(available here) to remove the problematic
row(s) in your input file before running the