Schema for Duke Affy Exon - Affymetrix Exon Array from ENCODE/Duke
  Database: hg19    Primary Table: wgEncodeDukeAffyExonGlioblaSimpleSignalRep3V2 Data last updated: 2012-03-28
Big Bed File Download: /gbdb/hg19/bbi/wgEncodeDukeAffyExonGlioblaSimpleSignalRep3V2.bigBed
Item Count: 38,378
Format description: BED6 + exon count + constituitive exons
fieldexampledescription
chromchr1Chromosome (or contig, scaffold, etc.)
chromStart166244884Start position in chromosome
chromEnd166245508End position in chromosome
nameRP11-7G12.2Name of item
score581Score from 0-1000. Capped number of reads
strand-+ or -
signalValue5.8194Measurement of expression value of the gene
exonCount3Number of exons used to estimate expression value
constituitiveExons0Number of constituitive exons used to estimate the expression value

Sample Rows
 
chromchromStartchromEndnamescorestrandsignalValueexonCountconstituitiveExons
chr1166244884166245508RP11-7G12.2581-5.819430
chr1166304143166304911RP11-479J7.2365+3.302830
chr1166445064166459248RP11-276E17.2239-0.780930
chr1166445406166450798FMO7P356+3.122430
chr1166535420166549885FMO8P328+2.578990
chr1166573168166600610FMO9P275+1.5197110
chr1166635152166651258FMO10P329+2.5897110
chr1166717522166717642RP11-54B9.2308+2.164810
chr1166745970166761823FMO11P200+050
chr1166765536166766322CNN2P10578+5.78640

Duke Affy Exon (wgEncodeDukeAffyExon) Track Description
 

Description

This track displays human tissue microarray data using Affymetrix Human Exon 1.0 ST expression arrays. This RNA expression track was produced as part of the ENCODE Project. The RNA was extracted from cells that were also analyzed by DNaseI hypersensitivity (Duke DNaseI HS), FAIRE (UNC FAIRE), and ChIP (UTA TFBS).

Display Conventions and Configuration

In contrast to the hg18 annotation, this track now displays exon array data that has been aggregated to the gene level for those probes that have been linked to genes. Probes not linked to genes are not included. The display for this track shows gene probe location and signal value as grayscale-colored items where higher signal values correspond to darker-colored blocks.

Items with scores between 900-1000 have signal values greater than 9 that have been linearly scaled for that particular cell type. Items scoring 400-900 have signal values between 4 and 9, and the signal is simply multiplied by 100 to get the score. Items with scores between 200-400 have signal values below 4 that have been linearly scaled to fit that score range.

The subtracks within this composite annotation track correspond to data from different cell types and tissues. The configuration options are shown at the top of the track description page, followed by a list of subtracks. To display only selected subtracks, uncheck the boxes next to the tracks you wish to hide.

For information regarding specific microarray probes, turn on the Affy Exon Probes track, which can be found in the Expression track group. See Methods for a description as to how probe level data was processed to produce gene level annotations.

Metadata for a particular subtrack can be found by clicking the down arrow in the list of subtracks.

Data from these tracks are stored as bed files whose first six fields follow the bed file standard. The three additional fields are as follows:

  • signalValue: The normalized expression value for a gene, calculated as described below.
  • exonCount: The number of exons used in the calculation of the expression value.
  • constitutiveExons: The number of constitutive exons used in the calculation of the expression value.

Methods

Cells were grown according to the approved ENCODE cell culture protocols. Total RNA was isolated from these cells using trizol extraction followed by cleanup on RNEasy column (Qiagen) that included a DNaseI step. The RNA was checked for quality using a nanodrop and an Agilent Bioanalyzer. RNA (1 µg) deemed to be of good quality was then processed either by 1) the standard Affymetrix Whole transcript Sense Target labeling protocol that included a riboreduction step, or 2) the NuGEN labeling system. The fragmented biotin-labeled cDNA was hybridized over 16 h to Affymetrix Exon 1.0 ST arrays and scanned on an Affymetrix Scanner 3000 7G using AGCC software.

Data from all replicates were then normalized together. Probesets flagged as cross-hybridizing were removed from the analysis (Salomonis et al. 2010). Though these arrays provide exon-level resolution, gene-level expression was estimated by grouping probesets by gene for normalization (Bemmo et al. 2008). Probesets were assigned to genes based on the GENCODE v10 annotation (July 2011). An exon was classified as constitutive or non-constitutive based on whether it was present in all protein-coding transcripts. For genes with at least 4 constitutive probes, only constitutive probesets were used to estimate gene expression. For all other genes, including all non-protein-coding genes, all (non-cross-hybridizing) probesets that mapped to an expressed exon in any transcript of the gene were used. Gene-level expression estimates were normalized using Affymetrix Power Tools (APT) (Lockstone 2011) with the chipstream command "rma-bg, med-norm, pm-gcbg, med-polish". This chipstream calls for an RMA normalization with gc-background correction using antigenomic background probes.

While the data was generated using the same microarray platform, two different experimental backgrounds were present due to a change in labeling reagents (Affymetrix vs. NuGEN; see Methods above). It was found that batch effects related to this change were causing array data to group by experimental protocol rather than cell type relatedness. We used an R script (ComBat) to correct for this batch effect (Johnson et al. 2007).

Verification

When biological replicates were available, data were verified by analyzing replicates displaying a Pearson correlation coefficient > 0.9.

Release Notes

This is release 3 of this track (April 2012). Several new cell types have been added. The name of cell line Astrocy was changed to NH-A.

Credits

RNA was extracted from each cell type by Greg Crawford's group at Duke University. RNA was purified and hybridized to Affymetrix Exon arrays by Sridar Chittur and Scott Tenenbaum at the University of Albany-SUNY. Data analyses were primarily performed by Nathan Sheffield (Duke University) with assistance from Melissa Cline (UCSC), Zhancheng Zhang (UNC Chapel Hill), and Darin London (Duke University).

Contact: Terry Furey

References

Bemmo A, Benovoy D, Kwan T, Gaffney DJ, Jensen RV, Majewski J. Gene expression and isoform variation analysis using Affymetrix Exon Arrays. BMC Genomics. 2008 Nov 7;9:529.

Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007 Jan;8(1):118-27.

Lockstone HE. Exon array data analysis using Affymetrix power tools and R statistical software. Brief Bioinform. 2011 Nov;12(6):634-44.

Salomonis N, Schlieve CR, Pereira L, Wahlquist C, Colas A, Zambon AC, Vranizan K, Spindler MJ, Pico AR, Cline MS et al. Alternative splicing regulates mouse embryonic stem cell pluripotency and differentiation. Proc Natl Acad Sci U S A. 2010 Jun 8;107(23):10514-9.

Data Release Policy

Data users may freely use ENCODE data, but may not, without prior consent, submit publications that use an unpublished ENCODE dataset until nine months following the release of the dataset. This date is listed in the Restricted Until column, above. The full data release policy for ENCODE is available here.