Schema for Stan/Yale TFBS - Transcription Factor Binding Sites by ChIP-seq from ENCODE/Stanford/Yale

JavaScript is disabled in your web browser

You must have JavaScript enabled in your web browser to use the Genome Browser

Database: mm9 Primary Table: wgEncodeSydhTfbsCh12Znf384hpa004051IggrabPk Row Count: 33,721 Data last updated: 2012-07-31
Format description: BED6+4 Peaks of signal enrichment based on pooled, normalized (interpreted) data.
On download server: MariaDB table dump directory

field	example	SQL type	info	description
`bin`	617	`smallint(5) unsigned`	range	Indexing field to speed chromosome range queries.
`chrom`	chr1	`varchar(255)`	values	Reference sequence chromosome or scaffold
`chromStart`	4218215	`int(10) unsigned`	range	Start position in chromosome
`chromEnd`	4218606	`int(10) unsigned`	range	End position in chromosome
`name`	.	`varchar(255)`	values	Name given to a region (preferably unique). Use . if no name is assigned
`score`	1000	`int(10) unsigned`	range	Indicates how dark the peak will be displayed in the browser (0-1000)
`strand`	.	`char(1)`	values	+ or - or . for unknown
`signalValue`	9.88975	`float`	range	Measurement of average enrichment for the region
`pValue`	26.9499	`float`	range	Statistical significance of signal value (-log10). Set to -1 if not used.
`qValue`	25.4226	`float`	range	Statistical significance with multiple-test correction applied (FDR -log10). Set to -1 if not used.
`peak`	196	`int(11)`	range	Point-source called for this peak; 0-based offset from chromStart. Set to -1 if no point-source called.

Sample Rows

bin	chrom	chromStart	chromEnd	name	score	strand	signalValue	pValue	qValue	peak
617	chr1	4218215	4218606	.	1000	.	9.88975	26.9499	25.4226	196
617	chr1	4288310	4288510	.	379	.	5.9131	2.88614	2.26562	55
619	chr1	4524378	4524745	.	498	.	4.62378	6.64143	5.65728	167
620	chr1	4677785	4678095	.	506	.	5.85086	6.89083	5.89128	140
621	chr1	4738022	4738564	.	1000	.	10.6379	36.4578	34.8036	316
621	chr1	4758247	4758805	.	501	.	2.53952	6.75413	5.76131	350
621	chr1	4841646	4841807	.	380	.	9.33648	2.92998	2.29649	78
77	chr1	4848689	4850069	.	598	.	2.52337	9.65296	8.52706	808
622	chr1	4864206	4864839	.	679	.	5.03604	12.0504	10.8387	331
623	chr1	5092678	5092980	.	402	.	3.98356	3.66669	2.93033	155

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Stan/Yale TFBS (wgEncodeSydhTfbs) Track Description


	Description This track shows probable binding sites of the specified transcription factors (TFs) in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq). Each experiment is associated with an input signal, which represents the control condition where immunoprecipitation with non-specific immunoglobulin was performed in the same cell type. For each experiment (cell type vs. antibody) this track shows a graph of enrichment for TF binding (Signal), along with sites that have the greatest evidence of transcription factor binding, as identified by the PeakSeq algorithm (Peaks). The sequence reads, quality scores, and alignment coordinates from these experiments are available for download. Display Conventions and Configuration This track is a multi-view composite track that contains multiple data types (views). For each view, there are multiple subtracks that display individually on the browser. Instructions for configuring multi-view tracks are here. This track contains the following views: Peaks Regions of signal enrichment based on processed data (normalized data from pooled replicates). Intensity is represented in grayscale, the darker shading shows higher intensity (a solid vertical line in the peak region represents the the point with the highest signal). ENCODE Peaks tables contain fields for statistical significance, including the minimum false discovery rate (FDR) threshold at which the test may be called significant (qValue). Signal Density graph (wiggle) of signal enrichment based on processed data. Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks. Methods Cells were grown according to the approved ENCODE cell culture protocols. For details on the chromatin immunoprecipitation protocol used, see (Euskirchen et al., 2007), (Rozowsky et al., 2009) and (Auerbach et al., 2009). DNA recovered from the precipitated chromatin was sequenced on the Illumina (Solexa) sequencing platform and mapped to the genome using the Eland alignment program. ChIP-seq data was scored based on sequence reads (length ~30 bps) that align uniquely to the human genome. From the mapped tags, a signal map of ChIP DNA fragments (average fragment length ~ 200 bp) was constructed where the signal height is the number of overlapping fragments at each nucleotide position in the genome. Reads were pooled from all submitted replicates to generate the Peak and Signal files. Per-replicate aligments and sequences are available for download at downloads page. For each 1 Mb segment of each chromosome, a peak height threshold was determined by requiring a false discovery rate <= 0.01 when comparing the number of peaks above said threshold to the number of peaks obtained from multiple simulations of a random null background with the same number of mapped reads (also accounting for the fraction of mapable bases for sequence tags in that 1 Mb segment). The number of mapped tags in a putative binding region is compared to the normalized (normalized by correlating tag counts in genomic 10 kb windows) number of mapped tags in the same region from an input DNA control. Using a binomial test, only regions that have a p-value ≤ 0.01 are considered to be significantly enriched compared to the input DNA control. Release Notes This is Release 4 (August 2012). It contains a total of 88 ChIP-seq experiments on transcriptions factor binding with the addition of 22 new experiments including 12 new antibodies. Previous versions of files are available for download from the FTP site. Credits These data were generated and analyzed by the labs of Michael Snyder at Stanford University and Sherman Weissman at Yale University. Contact: Philip Cayting. References Auerbach RK, Euskirchen G, Rozowsky J, Lamarre-Vincent N, Moqtaderi Z, Lefrançois P, Struhl K, Gerstein M, Snyder M. Mapping accessible chromatin regions using Sono-Seq. Proc Natl Acad Sci U S A. 2009 Sep 1;106(35):14926-31. Euskirchen GM, Rozowsky JS, Wei CL, Lee WH, Zhang ZD, Hartman S, Emanuelsson O, Stolc V, Weissman S, Gerstein MB et al. Mapping of transcription factor binding regions in mammalian cells by ChIP: comparison of array- and sequencing-based technologies. Genome Res. 2007 Jun;17(6):898-909. Martone R, Euskirchen G, Bertone P, Hartman S, Royce TE, Luscombe NM, Rinn JL, Nelson FK, Miller P, Gerstein M et al. Distribution of NF-kappaB-binding sites across human chromosome 22. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12247-52. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007 Aug;4(8):651-7. Rozowsky J, Euskirchen G, Auerbach RK, Zhang ZD, Gibson T, Bjornson R, Carriero N, Snyder M, Gerstein MB. PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls. Nat Biotechnol. 2009 Jan;27(1):66-75. Data Release Policy Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column on the track configuration page and the download page. The full data release policy for ENCODE is available here.

Description

Display Conventions and Configuration

Methods

Release Notes

Credits

References

Data Release Policy