- Genome Browser
- My Data
- About Us
Assembly Hubs allow researchers to create Track Data Hubs on assemblies that are not in the UCSC Browser. By including the underlying reference sequence in UCSC twoBit format, as well as data tracks, researchers can browse and annotate any genome. We may have a GenArk Hub of your genome, or you can visit our assembly request page and we can build an assembly hub for you.
STEP 1: In a publicly accessible directory, copy this Arabidopsis thaliana plant assembly hub, which includes an araTha1.2bit file, using the following wget command:
wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/
Alternatively, if you do not have wget installed, you can curl these files individually. Perform the curl -O option in the location you wish to copy the files:
curl -O http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt
If you use curl, be sure to recreate the structure with matching araTha1 and araTha1/bbi directories. Double check you have all the files by looking here:
STEP 2: Paste your hub.txt link (
http://yourURL/hub.txt) into the
My Hubs tab of the Track Data Hubs page,
click the "Add Hub" button, and then click the "Genome Browser" link from the
Alternatively build a URL that will directly load your assembly hub and display it on hgGateway. Then click the "Genome Browser" link from the top bar to view your assembly hub:
This URL should work the same as using the original data just copied:
STEP 3: Congratulations! Your assembly hub should display!
If you are having problems, be sure all your files and the directories are publicly accessible. You may also wish to reset the browser occasionally to clear all existing data. For hubs to work, your server must also accept byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays:
curl -I http://yourURL/hub.txt
Now that you have the assembly hub copied from above, you can copy the directory and start to edit some of the documents such as genomes.txt, groups.txt, and trackDb.txt to understand how they work. Refer to the Assembly Hub Wiki to understand how to build a twoBit file for your own original fasta files. Read more about trackDb settings in the definition document.
This assembly hub is a an abbreviated version of a larger plant assembly Public Hub. You can explore the larger hub structure here.
Please note that the Browser waits 5 minutes before checking for any changes to these files.
When editing hub.txt, genomes.txt, trackDb.txt, and related hub files, shorten this delay by
udcTimeout=1 to your URL. For more information, please see the
Debugging and Updating Track Hubs section of
the Track Hub User Guide. Also, for more detailed
instructions on setting up a regular hub, please see the Setting Up Your Own Track Hub section of the Track Hub User Guide.
By running gfServers from your institution, you can enable blat on your assembly hubs. See Starting Blat and In-Silico PCR for an Assembly Hub for details.
With an operational installation of Genome Browser in a Box (GBiB), you can quickly and easily acquire an example assembly hub and run gfServers locally on the GBiB to enable Blat and In-Silico PCR. See the section Starting a Blat and In-Silico PCR enabled Assembly Hub on GBiB for more information.
From the location of yourAssembly.2bit file,
http://yourURL/yourAssembly/yourAssembly.2bit, you can start two gfServers, specifying
a port for the assembly hub to access amino acid sequence,
17777 -trans, or DNA
17779, in this example:
gfServer start localhost 17777 -trans -mask yourAssembly.2bit & gfServer start localhost 17779 -stepSize=5 yourAssembly.2bit &
Then you can edit the genomes.txt file of your assembly hub to include three lines in the stanza referring to yourAssembly, that would have matching port numbers:
transBlat yourLab.yourInstitution.edu 17777 blat yourLab.yourInstitution.edu 17779 isPcr yourLab.yourInstitution.edu 17779
The assembly hub can be configured to talk to a dynamic BLAT server that loads
a pre-built index when started by an
xinetd super-server. This
allows genomes to have a blat server without needing it to be resident in
memory at all times. See
Running your own gfServer
Adding BLAT servers
for details on how to setup dynamic BLAT servers
See an example genomes.txt with commented out lines
here, and please note the uppercase "B" in transBlat. For more
information, see the "Adding BLAT servers" section of the
Assembly Hub Wiki. The
Downloads page offers access to utilities with pre-compiled binaries such as gfServer found in
a blat/ directory for your machine type here and further blat documentation
here. Please note that because the
-mask option in the above
-trans gfServer option will mask all lower-case sequence from being matched, you may not
wish to include it. See the above blat links and gfServer usage statement for more information.
If you have trouble connecting your blat servers with the browser or if the browser cannot access your files, check if your institution has a firewall that prevents the browser from sending multiple inquiries. If this is the case, ask your systems administrator to add the following IP addresses as exceptions so that access is not limited.
128.114.119.* 220.127.116.11 18.104.22.168 22.214.171.124
This will allow connections with the U.S.-based genome.ucsc.edu site, the Europe-based mirror, the Asia-based mirror, and the UCSC development server.
STEP 2. With your GBiB operational, use your computer's terminal program to ssh
into your GBiB:
ssh browser@localhost -p 1235, using
browser for the password.
STEP 3. Navigate to the GBiB's
/folders directory and use sudo to wget this assembly
cd /folders sudo wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/
STEP 4. You now have all the required files on your local machine and can load this plant assembly hub by using this URL and selecting it under the "group" category where "Plant araTha1" displays:
STEP 5. To enable blat you must acquire the gfServer utility. The UCSC Genome Browser and Blat software are free for academic, nonprofit, and personal use. Commercial download and installation of the Blat and In-Silico PCR software may be licensed through Kent Informatics.
You can obtain just the gfServer utility on your GBiB with either of the following commands that will create a bin directory and install the tool. The commands use the North American and the European download servers respectively.
mkdir ~/bin -p; rsync -avP hgdownload.soe.ucsc.edu::genome/admin/exe/linux.x86_64/blat/gfServer ~/bin/
mkdir ~/bin -p; rsync -avP hgdownload-euro.soe.ucsc.edu::genome/admin/exe/linux.x86_64/blat/gfServer ~/bin/
The GBiB also includes a tool you can run on the command line to download an entire suite of tools
STEP 6. Navigate to the genomes.txt file of this assembly hub:
Edit the currently commented-out blat lines with
sudo vi genomes.txt and
use "x" when the cursor is over the
# at the start of the line to remove it
:w! to save the changes and
:q to quit.
blat localhost 17779 transBlat localhost 17777 isPcr yourLab.yourInstitution.edu 17779
Please note that if you loaded your hub earlier, it will take five minutes (300 seconds)
for the browser to check for any changes to genomes.txt, and that this delay can be
shortened temporarily by adding
&udcTimeout=10 to the URL. See more information in the
Debugging and Updating section of the
Track Hub User Guide.
STEP 7. Change directories to the 2bit file:
Run the two gfServer commands to start the blat servers:
gfServer start localhost 17777 -trans -mask araTha1.2bit & gfServer start localhost 17779 -stepSize=5 araTha1.2bit &
STEP 8. Load this plant assembly hub by using this URL and selecting it under the "group" category where "Plant araTha1" displays:
On the blat page,
http://127.0.0.1:1234/cgi-bin/hgBlat, you can now select the
Arabidopsis thaliana assembly and blat plant amino acid sequences, such as
or DNA sequences, such as
On the PCR page,
http://127.0.0.1:1234/cgi-bin/hgPcr, you can now select the
Arabidopsis thaliana genome and enter a forward primer such as
TAGGTCTGCACCTGTGGTTCAAAATTTT and a reverse primer such as
CAATACAAGTCAACATTTTAGCGCCGAGA and click the "Flip Reverse Primer"
box and then click submit to find matches on the assembly.