Schema for Pfam in UCSC Gene - Pfam Domains in UCSC Genes
  Database: mm10    Primary Table: ucscGenePfam    Row Count: 60,332   Data last updated: 2019-09-20
Format description: Browser extensible data
On download server: MariaDB table dump directory
fieldexampleSQL type info description
bin 76smallint(5) unsigned range Indexing field to speed chromosome range queries.
chrom chr1varchar(255) values Reference sequence chromosome or scaffold
chromStart 3216426int(10) unsigned range Start position in chromosome
chromEnd 3671018int(10) unsigned range End position in chromosome
name XK-relatedvarchar(255) values Name of item
score 0int(10) unsigned range Optional score, nominal range 0-1000
strand -char(1) values + or -
thickStart 0int(10) unsigned range Start of where display should be thick (start codon)
thickEnd 0int(10) unsigned range End of where display should be thick (stop codon)
reserved 0int(10) unsigned range Used as itemRgb as of 2004-11-22
blockCount 3int(10) unsigned range Number of blocks
blockSizes 542,200,467,longblob   Comma separated list of block sizes
chromStarts 0,205275,454125,longblob   Start positions relative to chromStart

Sample Rows
 
binchromchromStartchromEndnamescorestrandthickStartthickEndreservedblockCountblockSizeschromStarts
76chr132164263671018XK-related0-0003542,200,467,0,205275,454125,
615chr139995774024882PLAT0-000440,82,79,147,0,8078,19492,25158,
76chr140419394092769PLAT0-0002168,153,0,50677,
616chr141200664147955PLAT0-00037,155,144,0,22545,27745,
616chr141486814170351PLAT0-000363,87,147,0,15173,21523,
617chr142435944267612PLAT0-000425,76,79,144,0,1436,17932,23874,
617chr142848204311422PLAT0-000378,87,153,0,8105,26449,
618chr143520154352306DCX0-000266,105,0,186,
618chr143524954352669DCX0-0001174,0,
619chr144922164492369Sox17_18_mid0-0001153,0,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Pfam in UCSC Gene (ucscGenePfam) Track Description
 

Description

Most proteins are composed of one or more conserved functional regions called domains. This track shows the high-quality, manually-curated Pfam-A domains found in transcripts located in the UCSC Genes track by the software HMMER3.

Display Conventions and Configuration

This track follows the display conventions for gene tracks.

Methods

The sequences from the knownGenePep table (see UCSC Genes description page) are submitted to the set of Pfam-A HMMs which annotate regions within the predicted peptide that are recognizable as Pfam protein domains. These regions are then mapped to the transcripts themselves using the pslMap utility. A complete shell script log for every version of UCSC genes can be found in our GitHub repository under hg/makeDb/doc/ucscGenes, e.g. mm10.knownGenes17.csh is for the database mm10 and version 17 of UCSC known genes.

Of the several options for filtering out false positives, the "Trusted cutoff (TC)" threshold method is used in this track to determine significance. For more information regarding thresholds and scores, see the HMMER documentation and results interpretation pages.

Note: There is currently an undocumented but known HMMER problem which results in lessened sensitivity and possible missed searches for some zinc finger domains. Until a fix is released for HMMER /PFAM thresholds, please also consult the "UniProt Domains" subtrack of the UniProt track for more comprehensive zinc finger annotations.

Credits

pslMap was written by Mark Diekhans at UCSC.

References

Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K et al. The Pfam protein families database. Nucleic Acids Res. 2010 Jan;38(Database issue):D211-22. PMID: 19920124; PMC: PMC2808889