14 February 2011 - ENCODE data releases: Caltech RNA-seq, Broad Histone, UW Histone & SUNY RIP Tiling

Four tracks of ENCODE data on the GRCh37/hg19 human assembly from the Caltech, Broad/MGH, SUNY Albany and University of Washington ENCODE groups were released:

Caltech RNA-seq: This track shows transcriptome measurements performed on polyA+ RNA using both stranded and unstranded protocols.

Broad Histone, UW Histone: These tracks display maps of histone modifications reflective of chromatin state changes identified by ChIP-seq.

SUNY RIP Tiling: This track displays transcriptional fragments associated with RNA binding proteins in different cell lines, using RIP-Chip (Ribonomic) profiling on Affymetrix GeneChip ENCODE 2.0R Tiling Arrays.

9 February 2011 - First Mouse ENCODE data release: Transcription Factor Binding Sites by ChIP-seq from Stanford/Yale

The first Mouse ENCODE data is now available on the mm9 (NCBI37) genome assembly. The Stan/Yale TFBS track in the 'Regulation' track group shows probable binding sites of the following transcription factors: c-MYB (H-141), CTCF (C-20), Max, NELFE, p300 (N-15), Rad21, and USF2, in the MEL leukemia (K562 analog) cell line as determined by ChIP-seq. Thanks to all who had a hand in generating this data and to the UCSC wrangler, Venkat Malladi, and Q/A staff who made this data release possible.

16 December 2010 - Release of DNA Methylation and DNaseI Sensitivity data

Three tracks of ENCODE DNA Methylation and DNaseI Sensitivity data have been released on the GRCh37/hg19 human assembly. All three tracks are in the browser 'Regulation' track group; one of the tracks is in the new ENC DNA Methyl super-track and the other two are in the new ENC DNase/FAIRE super-track (super-tracks provide additional documentation and organization by data type). The three newly released tracks are:

HAIB Methyl RRBS: This track reports the percentage of DNA molecules that exhibit cytosine methylation at specific CpG dinucleotides. In general, DNA methylation within a gene's promoter is associated with gene silencing, and DNA methylation within the exons and introns of a gene is associated with gene expression.

UW DNaseI DGF: This track contains deep sequencing DNase data that will be used to identify sites where regulatory factors bind to the genome (footprints).

UW DNaseI HS: This track shows DNaseI sensitivity measured genome-wide in different cell lines using the Digital DNaseI methodology and DNaseI hypersensitive sites.

Some of the data that comprise these tracks were originally released on hg18 and have been remapped to hg19; in such cases, subtracks have 'origAssembly hg18' as part of their metadata.

16 November 2010 - Release of the first ENCODE RNA-seq data on hg19

We are pleased to announce the release of the first ENCODE RNA-seq data on the GRCh37/hg19 human browser. Two tracks have just been released; these are organized in the new ENC RNA-seq super-track within the browser 'Expression' track group. The super-track provides additional documentation and organization by data type. The two tracks released are:

CSHL Sm RNA-seq: This track depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues or subcellular compartments from ENCODE cell lines.

GIS RNA-seq: This track shows high throughput sequencing of RNA samples from tissues or subcellular compartments from cell lines included in the ENCODE Transcriptome subproject.

All of the data that comprise these tracks were originally released on hg18 and have been remapped to hg19.

15 Nov 2010 - New ENCODE Tutorial at OpenHelix

OpenHelix, together with the UCSC Genome Bioinformatics group, anounce a new online tutorial suite to teach users how to access the ENCODE data in the UCSC Genome Browser. This tutorial introduces the types of data available under ENCODE, and presents methods to access the data via the Genome Browser, Table Browser, and downloads. This tutorial suite is freely available at OpenHelix

16 September 2010 - First Production ENCODE Data on hg19 has been Released

We are pleased to announce the release of the first sets of production ENCODE data on hg19:

GIS DNA PET: This track shows the starts and ends of DNA fragments from different cell lines determined by paired-end ditag (PET) sequencing using different DNA fragment sizes for analysis of genome structural variation. The data in this track uses the new BAM data format. For more information about SAM/BAM, click here. All of the subtracks that comprise this track were originally released on hg18 and have been remapped to hg19.

Gencode Genes: This track (version 4, May 2010) shows high-quality manual annotations merged with evidence-based automated annotations across the entire human genome generated by the GENCODE project. Previous versions of this data were released on hg18, but this newest version is available solely on hg19.

20 August 2010 - New ENCODE Integrated Regulation Super-track Released

We are pleased to announce the release of the ENCODE Integrated Regulation super-track, a collection of regulatory tracks containing state-of-the-art information about the mechanisms that turn genes on and off at the transcription level. Individual tracks within the set show enrichment of histone modifications suggestive of enhancer and promoter activity, DNAse clusters indicating open chromatin, regions of transcription factor binding, and transcription levels. When viewed in combination, the complementary nature of the data within these tracks has the potential to greatly facilitate our understanding of regulatory DNA.

The data comprising these tracks were generated from hundreds of experiments on multiple cell lines conducted by labs participating in the ENCODE project, and were submitted to the UCSC ENCODE Data Coordination Center for display on the Genome Browser.

Faced with the problem of how to display such a large amount of data in a manner facilitating analysis, UCSC has developed new visualization methods that cluster and overlay the data, and then display the resulting tracks on a single screen. Each of the cell lines in a track is associated with a particular color. Light, saturated colors are used to produce the best transparent overlay.

ENCODE Regulatory track screenshot

Currently, the ENCODE Regulation data are available only on the March 2006 (NCBI Build 36, UCSC version hg18) assembly of the human genome.

For a detailed description of the datasets contained in this super-track and a discussion of how the tracks can be used synergistically to examine regions of regulatory functionality within the genome, see the track description page.

6 August 2010 - June and July ENCODE news

Initial Release of the HudsonAlpha RNA-seq track: This track shows short tag sequencing of cDNA obtained from biological replicate samples (different culture plates) of the ENCODE cell lines. The sequences were aligned to the human genome (hg18) and UCSC known-gene splice junctions.

Release 2 of the Caltech RNA-seq track: This track shows alignments, signal density, and splice sites based on 75 bp paired reads and 32 bp strand-specific single reads of polyA+ RNA aligned to the human genome (hg18) and UCSC known-gene splice junctions. Also included with the track as downloadable files are RPKM expression level measurements at the gene-level and exon-level, and candidate novel exons. Release 2 of this track adds five new cell types: H1-hESC, HeLa-S3, HepG2, HUVEC, and NHEK.

Initial Release of the GIS PET Loc track: This track shows starts and ends of full length mRNA transcripts determined by PET sequencing of polyA+ and total RNA from 6 subcellular compartments and whole cell, in 8 cell lines.

Release 2 of the HAIB TFBS track: This track shows signal density and binding sites of selected transcription factors in a variety of cell types. Release 2 of this track adds 73 new experiments covering 13 new cell lines and 27 antibodies. Additionally, DEX and EtOH treatments have been included in the A549 cell line.

Initial Release of the BU ORChID track: This track displays the predicted hydroxyl radical cleavage intensity on naked DNA for each nucleotide in the genome.

Update of the Mapability track: This track displays the level of sequence uniqueness of the reference hg18 genome. The update adds CRG Alignability data, which displays how uniquely k-mer sequences align to a region of the genome.

19 July 2010 - April and May ENCODE news

Initial Release of the GIS RNA-seq track: This track shows RNA-seq of high-quality PolyA+ RNA in ENCODE Tier1 cell lines and H1 ESC sequenced on the ABI Solid platform.

Release 3 of the Yale TFBS track: This track shows probable binding sites of the specified transcription factors (TFs) in the given cell types as determined by chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq). Release 3 adds 64 experiments and 26 input/control datasets for a total of 54 factors in 21 cell lines.

Release 2 of the RIKEN CAGE Loc track: This track shows 5' cap analysis gene expression (CAGE) tags and clusters in RNA extracts from different sub-cellular localizations. Release 2 of this track adds data for eight new cell-type/compartment combinations (GM12878 Nucleus, H1-hESC whole cell, HepG2 cytosol/nucleus/nucleolus, HUVEC cytosol, and NHEK cytosol/nucleus).

Initial Release of the Duke Affy Exon track: This track shows gene expression by microarray of RNA extracted from 28 cell lines that were also analyzed by DNaseI hypersensitivity, FAIRE, and ChIP assays (Open Chromatin track).

Initial Release of the UW DNase DGF track: This track shows high-resolution DNase annotations from samples sequenced to depths of 300-fold or greater, in 5 cell lines.

Initial Release of the GIS DNA PET track: This track shows the starts and ends of DNA fragments from different cell lines determined by paired-end ditag (PET) sequencing using different DNA fragment sizes for analysis of genome structural variation.

18 March 2010 - February and March 2010 ENCODE news

Release 3 of the Open Chromatin track: This track displays evidence of open chromatin in multiple cell types from the Duke/UNC/UT-Austin/EBI ENCODE group. Release 3 of this track includes 18 new cell line or cell/treatment experiments. In addition, a number of new experiments were added to existing cell lines. Almost all Peaks have been called anew using improved cut-offs and p-Values. Finally, a second type of peak called using a ZINBA algorithm has been provided for several of the FAIRE-seq experiments.

Release 3 of the Broad Histone track: This track shows maps of chromatin state generated using CHIP-seq. Release 3 of this track adds the HSMM cell line and includes new experiments for H1-hESC and NHLF.

Release 2 of the UW Affy Exon track: This track displays human tissue microarray data using the Affymetrix Human Exon 1.0 GeneChip. This release includes 28 new cell types, and replaces the data for four existing tables (replicate 1 for K562, NB4, and SKMC; replicate 2 for HeLa-S3).

Initial release of the UW Histone track: This track displays maps of histone modifications genome-wide in different cell lines, using ChIP-seq high-throughput sequencing.

Release 2 of the HudsonAlpha Methyl-seq track: Release 2 adds data for five new cell types.

Release 3 of the Gencode Genes track:
shows high-quality manual annotations in the ENCODE regions generated by the GENCODE project. Version 3 of the Gencode gene set presents a full merge between HAVANA and ENSEMBL, giving priority to the manually curated Havana objects and using ENSEMBL objects where they are different or fall into un-annotated regions.

Initial release of the CSHL
Small RNA-seq track: This track depicts NextGen sequencing information for RNAs between the sizes of 20-200 nt isolated from RNA samples from tissues of sub cellular compartments from ENCODE cell lines.

Release 3 of the UW DNaseI HS track:
This track shows DNaseI sensitivity measured genome-wide in different using the Digital DNaseI methodology, and DNaseI hypersensitive sites. This release includes 19 new cell lines as well as new version of NB4 replicate 1.

6 January 2010 - December 2009 ENCODE news

"ENCODE whole-genome data in the UCSC Genome Browser": This paper addresses the history of the ENCODE project, summarizes the datasets available as of September 2009, and outlines methods to access the data. See Nucleic Acids Res. 2010 Jan;38(Database issue):D620-5.

Initial release of the Caltech RNA-seq track: This track contains sequence reads and RPKM transcript abundance measures for sequences that map to either the genome or to known RNA splice sites. The results of four different mapping algorithms are provided, enabling comparison between different mapping algorithms.  Results are available for polyA+ and total RNA for the two ENCODE Tier 1 cell lines.

Release 2 of the Broad Histone track: This track displays maps of chromatin state generated using CHIP-seq.   Release 2 adds data for the ENCODE Tier 2 cell lines H1-hESC and HepG2, plus NHLF (normal human lung fibroblasts) and HMEC (human mammary epithelial) cells. This expands the track data to 9 cell lines, and 11 antibodies plus an input control.

Release 2 of the CSHL Long RNA-seq track: This track depicts sequencing of long RNAs of more than 200 nucleotides in length. Release 2 adds data from strand-specific assays of total RNA for the two ENCODE Tier 1 cell lines.

Release 2 of the ENCODE Open Chromatin track: This track displays evidence of open chromatin as identified by two complementary methods, DNaseI hypersensitivity and FAIRE, combined with ChIP identification methods. Release 2 adds data from eight additional cell types, expanding the track to 41 experiments in 13 cell lines.

7 November 2009 - October ENCODE News

Sep 2009 data freeze complete:  The ENCODE Consortium has just completed data submissions for the fourth production data freeze (Sep 09). The first set of data from this freeze to complete quality review is now available on the UCSC public server, in Release 2 of the ENCODE Transcription Factor Binding Sites from Yale/UC-Davis/Harvard track. Release 2 adds 59 ChIP-seq experiments to this track.

Other October track releases:  The Affymetrix/CSHL Subcellular RNA Localization by Tiling Array track was expanded to include 4 additional experiments.

encodeproject.org:  By request of the ENCODE Consortium, the domain encodeproject.org has been registered by the ENCODE Data Coordination Center, and is redirected to the ENCODE portal at UCSC.

New grants funded:  NHGRI has funded 5 new ENCODE grants, as part of the American Investment and Recovery Act. The new grants include expansion of ENCODE to the mouse genome and proteogenomics.

Job openings at UCSC:  The UCSC Genome Browser and ENCODE projects are currently accepting applications for Software Developer and Biological Database Testing/User Support Technician positions. We are looking for talented individuals who would like to use their skills in computer science, biology, and bioinformatics on fast-paced projects featuring the work of top genomics scientists worldwide.

24 September 2009 - ENCODE data releases since July 1

During this period a total of 10 new ENCODE tracks were released to the UCSC public server. Functional elements and region characterization in these tracks include:

For track names and file access, see the Release Log and Downloads links listed in the left menu bar.

We would like to thank the contributing ENCODE labs and the the DCC team at UCSC for their efforts completing these tracks.

1 July 2009 - ENCODE data releases for the period April - June 2009

The following ENCODE tracks were released to the ENCODE DCC/UCSC public server during this period:

The Release Log listed in the left menu bar shows all released ENCODE tracks.

We would like to thank the HudsonAlpha, Transcriptome, Open Chromatin, Broad, and SUNY ENCODE groups and the DCC team at UCSC for their efforts completing these tracks.

13 March 2009 - ENCODE Data Release: Transcription Factor Binding Sites from Yale/UC-Davis/Harvard

We are pleased to announce the release to the ENCODE DCC/UCSC public server of the ENCODE Transcription Factor Binding Sites by ChIP-seq from Yale/UC-Davis/Harvard (Yale TFBS, in the Regulation group). This track shows probable binding sites of 12 transcription factors and RNA polymerase II in 7 cell types, as determined by chromatin immunoprecipitation followed by high-throughput sequencing. The Genome Browser displays discrete Peaks of enrichment and Signal graphs of enrichment density for these experiments. The sequence reads, quality scores, and sequence alignment coordinates from these experiments are available for download.

We would like to acknowledge the efforts of the Yale/UC-Davis/Harvard ENCODE group and the work of the UCSC data wrangler for this group, Tim Dreszer, for completing this track. We also thank the entire UCSC ENCODE team and the UCSC Quality Assurance group for contributing to this first ENCODE track release.

27 Feb 2009 - ENCODE February 2009 data freeze

The February 2009 ENCODE data freeze supplements the data contributed for the November 2008 freeze, and will be used together with the earlier freeze for the initial analysis effort of the ENCODE Consortium. Data from this freeze is being incorporated into tracks created from the first data freeze, and is being reviewed by the UCSC Quality Assurance team.

9 Dec. 2008 - First ENCODE whole-genome data freeze completed

The ENCODE Consortium has just completed the first freeze (November 2008) of whole-genome experimental data produced for the ENCODE production phase. Data submitted to the DCC for this freeze include:

  • transcription factor binding sites
  • histone modifications
  • DNaseI hypersensitive sites
  • DNA methylation
  • transcription maps and tags, localized to subcellular compartments
  • GENCODE gene annotations

Experiments during this freeze focused on the ENCODE Tier1 cell lines -- K562 leukemia, and GM12878 lymphoblastoid (which is also a 1000 Genomes project sample designated for in-depth analysis of genetic variation). The freeze also includes data from some ENCODE Tier2 and Tier3 cell lines (see Cell Types). The majority of these experiments were assayed by high-throughput sequencing (ChIP-seq, DNase-seq, and RNA-seq).

The UCSC quality team is currently reviewing these data. When the review is complete, the browser tracks and associated downloads will be released to the UCSC public Genome Browser.

Thanks to the many labs who contributed data for the initial phase of this project. We'd also like to acknowledge the UCSC ENCODE team for data wrangling during the freeze, and for the development and maintenance of the ENCODE automated data submission pipeline and associated tools: Kate Rosenbloom, Tim Dreszer, Larry Meyer, Michael Pheasant, Ting Wang, Galt Barber, and Andy Pohl.

4 Oct. 2007 - ENCODE Genome Browser Released for hg18 Assembly

The ENCODE browser for UCSC human genome assembly hg18 (NCBI Build 36) is now available. You can access the browser directly at http://genome.ucsc.edu/ENCODE/encode.hg18.html or by clicking the ENCODE link on the Genome Browser home page, then selecting the Regions (hg18) item in the sidebar menu on the ENCODE portal page.

The hg18 ENCODE browser includes 540 data tables in 59 browser tracks that were migrated from the hg17 browser. The hg17 data coordinates were converted to hg18 coordinates using the UCSC liftOver process.

To improve the accessibility of the data, related ENCODE tracks have been gathered into new configuration groupings ("super-tracks") that can be displayed or hidden using a single visiblity control. We have also reduced the number of track groups and have modified some of the group names for clarity.

The following table summarizes the data currently present in the hg18 ENCODE browser:

Group Super-tracks Tracks Tables
Regions and Genes21273
Chromatin Immunoprecipitation828349
Chromatin Structure2851

Note that the Variation and Comparative Genomics data were not lifted during this migration; instead, they will be replaced by new data. The first ENCODE MSA alignment for hg18 (TBA) is currently in progress on the UCSC development server.

During the migration, ENCODE tracks with whole-genome data were moved into the standard browser track groups. These include the GIS PET and UCSD/LI TAF1 tracks. Future submissions of whole-genome ENCODE data will be loaded directly into the standard track groups.

We have expanded the ENCODE downloads site to include original data for all "wiggle" datasets. These data files now have filename extensions indicating the wiggle input format (fixed step, variable step, or bedGraph).

You can find a description of the migration project and full details of the tables, tracks, and super-tracks available at the UCSC ENCODE portal on the UCSC genomeWiki.

The UCSC team members who contributed to this effort were: Andy Pohl (data conversion and database loading), Ting Wang and Donna Karolchik (super-track documentation), Bob Kuhn (portal updates), Brooke Rhead, Kayla Smith, and Ann Zweig (quality assurance), and Kate Rosenbloom (super-track development, project management).

13 Jun. 2007 - ENCODE Findings Published in Nature and Genome Research

The findings of the ENCODE project have been released to the public today, the culmination of a four-year effort to catalog the biologically functional elements in 1 percent of the human genome. The publications, which include a group paper in the 14 June 2007 issue of Nature and 28 companion papers in the June 2007 issue of Genome Research, were authored by researchers from academic, governmental, and industry organizations located in 11 countries. The Nature issue includes a pull-out poster featuring a screenshot of the UCSC Genome Browser displaying a broad range of the ENCODE data.

In the press release accompanying the publication rollout, NHGRI Director Francis S. Collins is quoted as saying "This impressive effort has uncovered many exciting surprises and blazed the way for future efforts to explore the functional landscape of the entire human genome. Because of the hard work and keen insights of the ENCODE consortium, the scientific community will need to rethink some long-held views about what genes are and what they do, as well as how the genome's functional elements have evolved. This could have significant implications for efforts to identify the DNA sequences involved in many human diseases."

For more information on the ENCODE project, including the consortium's data release and accessibility policies and a list of NHGRI-funded participants, see the NHGRI ENCODE website.

12 Jun. 2007 - Spring 2007 ENCODE News

Between January and May of 2007, several new or upgraded data tracks were released by UCSC:

  • Gencode March 2007 Genes and Gencode RACEfrags -- Reannotation of 69 loci consisting of 132 transcripts based on RACE, array, and sequencing analyses. New features include the addition of PolyA features, polymorphic gene type, and integration of experimental intron validation.
  • Yale ChIP-chip RFBR -- Analysis of ChIP-chip data to identify regulatory factor binding regions, resulting from over 105 experiments representing 29 transcription factors in 9 cells lines, performed in 7 labs using 3 microarray platforms. Data provided by the ENCODE Transcriptional Regulation analysis group.
  • University of Virginia DNA Replication Origins -- Three new datasets representing two new experimental methods for origin mapping (bubble trapping and nascent strand) in HeLa and GM06990 cells.
  • University of Vienna RNAz -- RNA secondary structure prediction as predicted by RNAz, based on evolutionary conservation and thermodynamic stability.
  • Duke DNaseI Hypersensitivity -- DNase-chip results in IMR90 (fibroblast), K562 (leukemia) and H9 (stem) cells.
  • UCSC EvoFold -- Replaces TBA23 EvoFold.
  • UC Davis ChIP Hits -- Updated cMyc and E2F1.

Thanks to the UCSC staff who worked on these tracks: Rachel Harte and Kate Rosenbloom (development), and Ann Zweig, Archana Thakkapallayil, and Kayla Smith (quality review).

9 Jan. 2007 - Winter 2006 ENCODE News

To improve communication, we have posted instructions for our ENCODE ftp site on the ENCODE Wiki and have set up an email alias for notifying UCSC about your ENCODE data submissions: encode@soe.ucsc.edu. The current UCSC recipients are Kate Rosenbloom, Daryl Thomas, Ting Wang, and Rachel Harte.

During November and December, four new/improved data tracks were released:

  • Chip-PET from the Genome Institute of Singapore - Genome-wide data for c-Myc in P493 B cells was added as a new subtrack (cMyc P493, encodeGisChipPetMycP493) to the existing GIS ChIP-PET track.
  • DNaseI Hs from Duke University - Existing NHGRI data and new data from the Crawford lab at Duke University (raw and p-value data for the HepG2 cell line) were merged into the Duke/NHGRI DNase track. The newer data is based on DNase-chip technology.
  • STAGE tags from University of Texas - Raw tags data for STAT1 in HeLa cells were added as a new subtrack (UT STAT1 HeLa Tags, encodeUtexStageStat1HelaTags) to the existing UT-Austin Stage track.
  • DNaseI Hs from University of Washington - The existing three UW/Regulome DNaseI Sens tracks were replaced with a single new track (UW/Reg QCP DNaseI Sens) based on quantitative chromatin profiling (QCP) methods in 16 cell types.

Thanks to the UCSC staff who worked on these tracks: Rachel Harte, Kate Rosenbloom, and Hiram Clawson (development), Ann Zweig, Brooke Rhead, and Kayla Smith (quality review).

7 Oct. 2006 - Comparative Genomics Data Release

Twelve tracks of data produced by the ENCODE Multi-Species Sequence Analysis group have been released to the UCSC public server. These tracks contain multiple sequence alignments, conservation, and conserved (constrained) elements produced by four conservation methods (phastCons, binCons, GERP, SCONE) applied to three sequence alignments (TBA, MLAGAN, MAVID), and also an assessment of the agreement among the alignment methods. The alignments were based on genomic sequence in the ENCODE regions of 28 vertebrate species, as defined in the MSA September 2005 sequence freeze.

The following tracks can now be found in the ENCODE Comparative Genomics track group on the public ENCODE browser:

  • MSA Consensus Constrained Elements
  • TBA Alignments, TBA Conservation, TBA Conserved Elements
  • MLAGAN Alignments, MLAGAN Conservation, MLAGAN Conserved Elements
  • MAVID Alignments, Conservation, Conserved Elements
  • MSA Alignment Agreement, MSA Alignment Gaps

Thanks to the following providers of this data:

  • Elliott Margulies (NHGRI) - TBA alignments, binCons and phastCons conservation, Consensus elements, sequence freeze
  • Daryl Thomas (Haussler lab, UCSC) - sequence freeze
  • George Asimenos (Batzoglou lab, Stanford University) - MLAGAN alignments
  • Colin Dewey (while at the Pachter lab, UC Berkeley) - MAVID alignments
  • Greg Cooper (while at the Sidow lab, Stanford University) - GERP conservation
  • Saurabh Asthana (Sunyaev lab, Harvard Genetics) - SCONE conservation
  • Ariel Schwartz (Pachter lab, UC Berkeley) - Alignment agreement

Also, thanks to the UCSC team that produced these tracks in the browser: Kate Rosenbloom (track development), Ann Zweig, Kayla Smith, and Archana Thakkapallayil (quality review).

31 Aug. 2006 - Summer ENCODE data activity

Since mid-June, UCSC has released new ENCODE data from three labs (Sanger Institute, Uppsala University, and University of North Carolina) and has a track in progress for newly submitted data from a fourth lab (NHGRI/Duke University):

  • Sanger ChIP-chip (MOLT4 and PTR8 cells) - Eight new datasets were added to the Sanger Chip/chip track. The new data show sites of H3 histone methylation and acetylation in MOLT4 (lymphoblastic leukemia) and also in the chimpanzee PTR8 cell line, used for comparative analysis. Thanks to Rob Andrews at the Ian Dunham lab for providing these data.
  • Uppsala University Chip/chip Butyrate - This track shows the effects of Na-butyrate treatment of HepG2 (liver carcinoma) cells on histone H3 and H4 acetylation, assayed on Sanger microarrays. Thanks to Adam Ameur at the Claes Wadelius lab for providing these data.
  • University of North Carolina FAIRE (Peaks data update) - This track was updated to include a subtrack of peaks generated by an alternate peak-finding algorithm, ChIPOTle. The FAIRE data were generated from 2091 fibroblast cells hybridized to NimbleGen ENCODE arrays. Thanks to Paul Giresi at the Jason Leib lab for providing these data.
  • NHGRI DNaseI-HS Raw (in progress) - DNase-chip raw and p-value data in GM06990, CD4+, and Hela S3 cells.

Thanks to our UCSC staff who worked on these tracks: Rachel Harte and Hiram Clawson (development), Ann Zweig and Archana Thakkapallayil (quality review), and Donna Karolchik (documentation).

The ENCODE data status page has been updated to reflect the recent activity.

14 June 2006 - New ENCODE data at UCSC

During the build-up to the analysis paper submissions, UCSC received a flurry of ENCODE data submissions (6 during the month of May). We have recently released three data sets to our public server; the remaining tracks are in progress, as indicated below.

Released data:

  • Sanger ChIP-chip (HFL-1 cells) - New data added to the Sanger ChIP track show the location of modified histones in HFL-1 (embryonic lung fibroblast) cells. Thanks to Rob Andrews and Christopher Koch for providing this data.
  • DLESS (Detection of LinEage Specific Selection) - This track shows elements predicted by the DLESS program to be under lineage-specific selection, based on alignments of 17 mammalian species from NHGRI/PSU TBA ENCODE alignments. DLESS is based on a phylo-HMM with states for neutrally evolving and conserved regions, and for gains and losses on each branch of the tree. Thanks to Adam Siepel of Cornell University, who developed the DLESS program, generated the data, and loaded the annotation track.
  • UW/Regulome Dnase/Array - This track displays DNaseI sensitivity in GM06990 cells, using the DNase/Array methodology. Dnase/Array is a novel method for isolating DNA segments corresponding to specific DNaseI cleavage events on individual nuclear chromatin templates. Thanks to Scott Kuehn at the University of Washington for providing these data.

Thanks to our UCSC staff who worked on these tracks: Rachel Harte (development), Ann Zweig, Kayla Smith, and Archana Thakkapallayil (quality review).

In quality review:

  • UNC FAIRE Peaks (Update, 2091 fibroblasts)

In progress:

  • University of Uppsala Chip/Chip (butyrate treatment in HepG2)
  • UW/Regulome QCP Dnase HS (update and new cell lines -- 16 tissues)
  • PECAN Alignments from EBI

The ENCODE data status page has been updated to reflect the recent activity.

13 January 2006 - Release of October freeze data complete

All datasets submitted for the October 2005 ENCODE freeze have now been released. The tracks with their release dates are:

  • 13 Jan. - Yale ChIP-chip (5 new transcription factors)
  • 3 Jan. - University of Uppsala Chip/chip (data correction)
  • 2 Jan. - Yale RNA (update)
  • 29 Dec. - U Virginia DNA Replication Timing and Predicted Origins
  • 28 Dec. - UT Austin STAGE
  • 21 Dec. - Affy RNA and Transfrags (HeLa update)
  • 22 Nov. - Sanger ChIP-chip (5 histone modifications)
  • 19 Nov. - UC Davis ChIP-chip (Pol2, Myc) in HeLa
  • 18 Nov. - U North Carolina FAIRE (Formaldehyde Assisted Isolation of Regulatory Elements)
  • 1 Nov. - Affy RNA and Transfrags (update)
  • 1 Nov. - UT-Austin Ng ChIP-chip (E2F4, c-Myc factors)
  • 1 Nov. - GIS PET RNA (update)
  • 1 Nov. - Gencode Genes and Introns (update)
  • 1 Nov. - NHGRI DNaseI HS (update and new GM06990)
  • 27 Oct. - UCSD/LI Ng ChIP-chip (update and new TFs)
  • 14 Oct. - GIS ChIP/PET (STAT1)
  • 21 Sep. - Boston University ORChID (update)

Posted on 17 Oct. 2005 - ENCODE hg17 Genome Browser Released

The ENCODE browser for UCSC human genome assembly hg17 (NCBI Build 35) is now available at http://genome.ucsc.edu/ENCODE/encode.hg17.html and from the Regions (hg17) item in the left (blue bar) menu of the ENCODE portal.

All datasets originally submitted in hg17 coordinates for the June data freeze were directly loaded; the remaining data were coordinate-converted using the UCSC liftOver process. A total of 351 data tables were loaded into our database. NOTE: Many of these tracks will be updated with new data from the October ENCODE data freeze.

Many thanks to all the ENCODE consortium members who contributed data for this release. We'd also like to thank UCSC team members Kate Rosenbloom for portal and track development and Jennifer Jackson, Kayla Smith, Ann Zweig, and Bob Kuhn for quality assurance.

Posted on 15 July 2005 - Regulome DnaseI HS, Oxford Recombination, and MSA Data Released

The remaining datasets submitted for the June 2005 ENCODE data freeze have now been released as annotation tracks in the hg16 browser. These tracks and their release dates are:

  • 15 Jul. - TBA23 EvoFold RNA Structure Predictions
  • 14 Jul. - Regulome DNaseI HS and Sens
  • 14 Jul. - Regulome Plate Q/A and Amplicons
  • 14 Jul. - MSA Consensus Elements
  • 14 Jul. - TBA alignments with Conservation
  • 14 Jul. - Stanford MLAGAN Alignments and Conservation
  • 13 Jul. - Oxford Recombination

Posted on 12 July 2005 - EGASP genes, RIKEN CAGE tags, and Stanford RTPCR, ChIP-chip and DNA Methylation data released

Several new ENCODE data sets have been released in the UCSC Genome Browser.

EGASP Full, Partial, and Update: These three gene prediction tracks are from the ENCODE Gene Annotation Assessment Project (EGASP) Prediction Workshop 2005. The EGASP Full track shows 20 sets of gene predictions originally submitted for the workshop, covering all 44 ENCODE regions. THE EGASP Partial track shows eight sets of gene precdictions that were submitted for the workshop, but do not cover all ENCODE regions. The EGASP Update track shows updated versions of some of the submitted predictions. Thanks to Julien Lagarde at IMIM for providing the EGASP Full and EGASP Partial data sets. Thanks to Tyler Alioto of IMIM (GeneID-U12 and SGP2-U12), Deyou Zheng of Yale (Yale Pseudogenes), Sarah Djebali of Ecole Normale Supérieure (Exogean), Jonathan Allen of TIGR/Univ. Maryland (Jigsaw) and Mario Stanke of the University of Gottingen (Augustus) for providing their EGASP Update gene sets.

RIKEN CAGE Predicted Gene Start Sites: This track shows the numbers of 5' cap analysis gene expression (CAGE) tags that map to the genome at specific locations. Areas in which many tags map to the same region may indicate a significant transcription start site. Thanks to Albin Sandelin at RIKEN and the FANTOM (Functional Annotation of Mouse) Consortium for providing these data.

Stanford RTPCR: This track displays absolute transcript copy numbers for 136 genes and 12 negative control intergenic regions, determined by real-time PCR in HCT116 cells.

Stanford ChIP-chip, Stanford ChIP-chip Smoothed Score: These tracks display raw scores and sliding-window mean scores for regions bound by the Sp1 and Sp3 transcription factors in three cell lines (HCT116, Jurkat, K562), assayed by chromatin immunoprecipitation and hybridization to Nimblegen oligo tiling arrays. The Smoothed Score track is provided solely for visualization purposes -- the raw scores should be used for data analysis.

Stanford Methylation Digest: This track displays experimentally determined regions of unmethylated CpGs in eight cell lines (BE2C, CRL1690, HCT116, HT1080, HepG2, JEG3, SNU182, and U87MG). Genomic DNA was digested with a cocktail of six methyl-sensitive restriction enzymes, size-selected, amplified, labeled, and hybridized to Nimblegen oligo tiling arrays.

Thanks to Nathan Trinklein for providing the Stanford datasets.

We'd like to acknowledge the work of the UCSC Genome Bioinformatics team members who produced these tracks: Kate Rosenbloom, Angie Hinrichs, and Hiram Clawson (development), Ann Zweig, Bob Kuhn, Galt Barber, Rachel Hart, and Ali Sultan-Quarrie (quality review), and Donna Karolchik (documentation).

Posted on 6 July 2005 - NHGRI DIPs, HapMap SNPs, Sanger Assoc annotation tracks released

The first datasets in the ENCODE Variation group are now available in the UCSC browser.

NHGRI Deletion/Insertion Polymorphisms: All human trace data from NCBI's trace archive were aligned to the genome and processed using the programs ssahaSNP and ssahaDIP to detect deletion and insertion polymorphisms. Thanks to Jim Mullikin at NHGRI for performing the analyses and providing these data.

HapMap Allele Frequencies: This track shows allele frequencies for the four HapMap populations in the ten ENCODE regions that have been resequenced for variation (manually selected regions m010, m013, and m014 and randomly selected regions r112, r113, r123, r131, r213, r232, and r321). These data were obtained from HapMap public release #16c.1. Thanks to the International HapMap Project for making this information available.

Sanger Genotype-Expression Association: This track displays associations among gene expression data from the 60 unrelated Centre d'Etude du Polymorphisme Humain (CEPH) individuals of the International HapMap Project with SNPs genotyped by HapMap, in eight ENCODE regions (m010, m013, m014 and r123, r131, r213, r232, and r321). The CEPH population is composed of Utah residents with ancestry from northern and western Europe. The expression data were generated with the Illumina platform at the Wellcome Trust Sanger Institute. Thanks to Manolis Dermitzakis at the Sanger Institute for providing these data.

We'd also like to acknowledge the UCSC ENCODE team members who worked on these tracks: Heather Trumbower, Daryl Thomas, and Angie Hinrichs (development), Galt Barber and Ali Sultan-Qurraie (quality assurance), and Donna Karolchik (documentation).

Posted on 23 June 2005 - Yale and Affymetrix ChIP-chip and transcription data release; New ENCODE track groupings

To aid ENCODE analysis and reduce visual clutter, we have split the Genome Browser ENCODE track group into six new groups:

  • ENCODE Regions and Genes
  • ENCODE Transcript Levels
  • ENCODE Chromatin Immunoprecipitation
  • ENCODE Chromosome, Chromatin and DNA Structure
  • ENCODE Variation
  • ENCODE Comparative Genomics
All of these track groups are visible on the UCSC test browser. The last two groups, Variation and Comparative Genomics, do not yet have published tracks on the public server and therefore are not visible on that server.

We have also released a set of new Yale data and an extensive update of Affymetrix data. The track controls for these datasets can be found in the track groups ENCODE Transcript Levels and ENCODE Chromatin Immunoprecipitation.

Yale ChIP-chip and RNA: Three tracks of ChIP-chip data from Yale, evaluating microarray platforms, have been released: Yale ChIP pVal, Yale ChIP Sig, and Yale ChIP Sites. These tracks show results of ChIP experiments using STAT1 antibody in HeLa cells on four different microarrays -- three custom maskless photolithographic oligo arrays, designed at different resolutions, and the PCR amplicon array developed by the Ren lab at the Ludwig Institute/UCSD.

Two tracks of RNA transcript data from Yale have been released: Yale RNA and Yale TAR. These tracks show transcriptionally active regions and transcribed fragments for three cell types (neutrophil, placenta, and NB4 variously treated for differentiation).

Thanks to Joel Rozowsky at Yale for providing this data. Additional Yale ChIP-chip data is currently under review by our quality assurance group.

Affymetrix ChIP-chip and RNA: The Affymetrix ChIP-chip dataset now contains experimental results for ten factors (Brg1, CEBPe, CTCF, H3K27me3, H4Kac4, P300, PU1, Pol2, RARA, and SIRT1) in HeLa cells at four timepoints after retinoic acid treatment, plus TFIIB for the final timepoint only. The data is displayed in eight tracks: Affy PVal 0h, 2h, 8h, 32h and Affy Sites 0h, 2h, 8h, 32h. We acknowledge that this track grouping is a bit awkward and are working composite track enhancements to provide more flexibility.

The Affy RNA tracks show RNA abundance and transfrags in retinoic acid-stimulated HL-60 cells at four timepoints, and in GM06990 and HeLa cells: Affy RNA Signal and Affy Transfrags.

Thanks to Stefan Bekiranov and Srinka Ghosh at Affymetrix for providing these data.

Thanks also to the UCSC ENCODE team members who developed and reviewed these tracks and to Rachel Harte in the UCSC browser group for her assistance with track review.

Posted on 15 June 2005 - Six data sets released in Genome Browser

Six more ENCODE annotation tracks have been added to the Genome Browser this week:

NHGRI DNaseI-Hypersensitive Sites (update): The NHGRI DNaseI-HS track has been updated with new data. The track now includes DNaseI-hypersensitive sites in CD4+ T-cells before and after activation by anti-CD3 and anti-CD28 antibodies. Thanks to Greg Crawford at the Collins lab (NHGRI) for providing these data.

Genome Institute of Singapore PET of PolyA+ RNA: The GIS PET RNA track displays starts and ends of mRNA transcripts determined by paired-end ditag sequencing in two cell lines, MCF7 and HCT116 treated with 5 fluoro-uracil. A total of 584,624 PETs were generated for MCF7 and 280,340 were generated for HCT116. More than 80% of the PETs in each group were mapped to the genome. Thanks to Atif Shahab, Yijun Ruan, the GIS, and the Bioinformatics Institute of Singapore for providing these data.

Gencode Gene Annotations and Intron Validation: The Gencode Genes track displays high-quality manual annotations in the ENCODE regions generated by the GENCODE project. A companion track, Gencode Introns, shows experimental gene structure validations for these annotations. Thanks to the HAVANA team at the Wellcome Trust Sanger Institute; France Denoeud, Julien Lagarde, and Roderic Guigo at the IMIM; and Alexandre Reymond at the University of Geneva for providing the annotations and experimental confirmation, as well as working with UCSC to develop the track display.

Boston University Hydroxyl Radical Cleavage: The BU ORChID track displays predicted hydroxyl radical cleavage intensity on naked DNA for each nucleotide in the ENCODE regions. The prediction algorithm draws data from a database of 150 experimentally-determined cleavage patterns. Thanks to Jay Greenbaum at the Tullius lab for providing these data.

UC Davis ChIP-chip: The UCD Ng E2F1 track shows ChIP analysis of HeLa cells using E2F1 antibody as assayed on Nimblegen arrays. The E2F1 factor is associated with cell cycle control, transcriptional regulation, and apoptosis. Thanks to the Farnham lab and Kyle Munn, Todd Richmond and Roland Green of Nimblegen for providing these data.

Ludwig Institute/UCSD ChIP-chip: Three tracks of ChIP-chip data using Nimblegen arrays have been released: LI Ng TAF1 IMR90, LI Ng Validation, and LI Ng gIF ChIP. The TAF1 track shows genome-wide TAF1 binding sites as determined by ChIP in IMR90 (fibroblast) cells assayed on Nimblegen high-density oligo arrays. The peaks from the genome scan experiments were verified using four antibodies associated with transcription start (Pol2, H3ac, H3K4me2, and TAF1) on condensed arrays covering the putative TAF1 binding sites. The gIF track shows the results of ChIP-chip experiments using gamma interferon-treated HeLa cells, with Pol2 and H3K4me3 antibodies.

Two additional tracks, LI ChIP Various and LI gIF ChIP, display ChIP-chip data assayed on Ren lab ENCODE PCR tiling arrays for five antibodies (Pol2, TAF1, H3K4me2, SUZ12, H3K27me3) and four cell lines (HeLa, THP1, IMR90, HCT116), as well as gamma interferon experiments using seven antibodies (Pol2, TAF1, H3K4me2, H3K4me3, H3ac, H4ac, and STAT1) in HeLa cells. Thanks to Chunxu Qu and Bing Ren, at the Ren lab for providing these data and assisting with track display issues.

We'd also like to acknowledge the UCSC team members who developed these tracks: Kate Rosenbloom, Hiram Clawson, and Angie Hinrichs for track development; Bob Kuhn, Ali Sultan-Qurraie and Galt Barber for quality assurance; and Donna Karolchik and Jim Kent for documentation.

Posted on 9 June 2005 - Sanger Histone Modification ChIP-chip track updated

The Sanger Institute has submitted ChIP-chip data for additional antibodies and cell lines, which we have incorporated into the existing Sanger ChIP browser track. The expanded track now contains data for five antibodies (H3K4me1, H3K4me2, H3K4me3, H3ac, H4ac) and two cell lines (GM06990, K562 (leukemia)).

Thanks to Rob Andrews and Chris Koch at the Dunham lab for providing these data. UCSC team members who developed this track include Hiram Clawson (track development), Ali Sultan-Qurraie (quality assurance), and Donna Karolchik and Jim Kent (documentation).

Posted on 6 June 2005 - Changes to ENCODE track labels in Genome Browser

We have modified the Genome Browser labels for the existing ENCODE data tracks to trim overly-long labels that were truncated in the display and to facilitate cross-track analysis. The new label format shows the submitter and the experiment, followed by the cell line (in tracks where the data includes only one cell line).

Posted on 6 June 2005 - Stanford Promoters and UVa DNA Replication tracks released

We are pleased to announce the first UCSC Genome Browser tracks released for the June 2005 ENCODE data freeze: Stanford Promoters and UVa DNA Replication Temporal Profiling.

Stanford has provided an update of their promoter activity data based on transient transfection luciferase reporter assays of 643 putative promoter fragments in the ENCODE regions. The update includes two additional cell lines and activity averaged across all cell lines. The data tables now contain additional experimental detail to facilitate analysis. This track, containing 17 subtracks (16 cell lines and the average), is labeled "Stanf. Promoter". Thanks to Sara Hartman at the Myers lab for providing these data.

The Dutta lab at Univerity of Virginia (UVa) has completed the second biological replicate of their temporal profiling of HeLa cell replication products and has provided a dataset containing merged data from the two replicates. The track, containing five subtracks representing two-hour intervals, is labeled "UVa DNA Rep". Thanks to Christopher Taylor at the Dutta lab for providing these data.

We'd also like to acknowledge the UCSC team members who worked on these annotation tracks: Angie Hinrichs (track development), Galt Barber and Ali Sultan-Qurraie (QA), and Jim Kent and Donna Karolchik (track documentation).

Posted on 24 May 2005 - New MSA sequence data freeze available

A new ENCODE MSA sequence data freeze is available on the UCSC downloads server. The latest freeze contains sequences from 23 vertebrates provided by NISC, Baylor, the Broad Institute (2X) and the whole genome shotgun (WGS) assemblies. The data may be downloaded as individual data files or a directory tarball. Aligners are encouraged to upload alignments and related data (such as conservation scores and elements) to the UCSC ENCODE ftp site as soon as possible and then notify Kate Rosenbloom. Other data, (conservation, trees, etc.) will be generated based on this dataset.

The following is a summary of data updates from the previous release:

  • The human assembly version remains at hg16 (Jul. 2003).
  • The mouse assembly has been updated from mm5 (May 2004) to mm6 (Mar. 2005).
  • The multiple rat sequences have been replaced by a single sequence: rn3 (Jun. 2003).
  • The cow sequences have been updated using an assembly of BAC-based sequences provided by Baylor College of Medicine.
  • Fugu (fr1), macaque (rheMac1), opossum (monDom1), Tetraodon (tetNig1), Xenopus (xenTro1), and zebrafish (danRer2) have been added. The macaque sequence was obtained from a Baylor College of Medicine assembly that has not yet been officially released.
  • A new NISC species, rfbat (Rhinolophus ferrumequinum), is now available.
  • Platypus data from regions where NISC has not yet generated data were provided by Jim Mullikin from a preliminary assembly of Washington University WGS reads.
  • This freeze includes low-redundancy sequence data from tenrec, elephant, armadillo, and rabbit. Only one set of sequences are provided per species/target combination; where available, NISC data is provided instead of the 2X assemblies. These data are not yet accessioned at NCBI, but were made available by the Broad Institute (rabbit, elephant, armadillo) and Jim Mullikin (tenrec).
  • Orthology predictions for Fugu were made only by MAVID/Mercator; predictions for all other assemblies supported by the UCSC Genome Browser represent a union with UCSC predictions as well. Because no additional post-processing was done on the Fugu predictions, they contain a few very small contigs.

Thanks to the many people, particularly Elliott Margulies and Daryl Thomas, who made this release possible.

Posted on 24 May 2005 - ChIP-PET/GIS annotation track (Genome Institute of Singapore) released

The ENCODE Genome Browser now features the ChIP-PET/GIS annotation track, which shows paired-end ditag (PET) sequences derived from 65,572 individual p53 ChIP fragments of 5-fluorouracil (5FU) stimulated HCT116 (colon) cells. Only PETs with a single specific mapping to the genome are included in this track.

Thanks to Atif Shahab, Chia-Lin Wei, and Yijun Ruan at the Genome Institute of Singapore for providing the p53 ChIP-PET library and sequence data. The data were mapped and analyzed by scientists from the Genome Institute of Singapore, the Bioinformatics Institute, Singapore, and Boston University. For more information about this annotation, see the ChIP-PET/GIS track description page.

Posted on 23 May 2005 - Boston University First Exon annotation track released

The First Exon/BU annotation track, contributed by the ZLAB at Boston University, is now available in the UCSC Genome Browser. This track displays expression levels of computationally identified first exons and a constitutive exon of 20 genes in the ENCODE regions.

For each gene, all alternative first exons were identified based on manual selection of predictions from the PromoSer program. The expression levels of exons were then quantified using rcPCR in ten normal human tissues.

Thanks to Ulas Karaoz and the Zhiping Weng lab at Boston University for providing these data. For more information about this annotation, as well as a complete list of the individuals who contributed to this track, see the First Exon/BU track description page.

Posted on 7 May 2005 - ENCODE status page now available

A simple summary page has been added to the UCSC ENCODE portal to show the status of datasets submitted to UCSC by ENCODE contributors. The page may be found at http://genome.ucsc.edu/ENCODE/trackStatus.html and can be accessed via links on the ENCODE home page and the ENCODE data submission page. The status page will be updated approximately once a week.

To review the latest ENCODE data releases, see the release log.

Posted on 6 May 2005 - UCSD/Ludwig Institute histone modification ChIp/Chip data released

Two additional sets of ChIp/Chip data from UCSD/Ludwig Institute are now available:

    ChIp/LI AcH3     
    ChIp/LI MeH3K4   
These tracks display locations of acetylated H3 and dimethylated K4H3 binding in IMR90 (lung fibroblast) cells.

To consolidate viewing in the browser, the previously released eight datasets from UCSD/LI have been reformatted as two composite tracks (one track per antibody) with each track containing four subtracks (one per cell line). These tracks are:

    ChIp/LI Pol2
    ChIp/LI TAF1
To facilitate data analysis, the data were also reloaded in a format that allows extraction of the original data values via the UCSC table browser.

Thanks to Chunxu Qu and the Ren lab for providing these data.

Posted on 17 Feb. 2005 - ENCODE downloads released

The ENCODE download area has been reorganized and updated on our public download server. The downloads access page is now:


and the annotations are now located in the assembly-specific download directory, currently:


Any web pages referencing the previous UCSC ENCODE downloads will need to be updated. Please contact us if you have any difficulties.

Posted on 2 Feb. 2005 - Affymetrix ChIp/Chip and PolyA RNA data released

A second set of ENCODE ChIp/Chip data is now available on the July 2003 human genome assembly:

    ChIp/Affy Pol2 Pval
    ChIp/Affy Pol2 Sites
    RNA/Affy Signal
    RNA/Affy Sites
These tracks show RNA Polymerase II precipitation and RNA abundance in retinoic acid-stimulated HL-60 cells at 0, 2, 8, and 32 hours, as measured by Affymetrix tiling arrays in the non-repetitive ENCODE regions. The Pval and Signal tracks show values for each tiled probe; the Sites tracks show contiguous regions of enrichment.

A new composite track display was developed to concisely display multiple data sets of similar types, a common feature of ENCODE data. Each of these new tracks contains 4 subtracks, one for each time interval. The subtracks share a single description page and set of visibility controls. Checkboxes on the track configuration page allow selected subtracks to be hidden in the display.

These data were generated and analyzed by Tom Gingeras' group at Affymetrix and Kevin Struhl's group at Harvard Medical School. We would like to thank Stefan Bekiranov at Affymetrix for submitting the data and working closely with us to clarify the experimental methods and verification descriptions.

Posted on 5 Nov. 2004 - First ENCODE data tracks in the UCSC Browser

The first datasets submitted for the ENCODE project are now publicly available:

  • ChIP-chip and transcription (Ludwig Institute/UCSD)
  • Temporal profiling of DNA replication (University of Virginia)
  • Promoter activity (Stanford)
  • DNaseI hypersensitive sites (NHGRI)
These tracks are visible in the ENCODE track group of the July 2003 (hg16) human genome assembly. We would like to thank the labs of Bing Ren (LI/UCSD), Anindya Dutta (UVA), Rick Myers (Stanford), and Francis Collins (NHGRI) for contributing the initial ENCODE data sets.

Posted on 26 Oct. 2004 - Sequence Freeze For Multiple Alignments

We are pleased to release the first "official" sequence data freeze for the ENCODE multiple sequence alignment projects. The data formats are described in the README file, and the sequences and supporting information is collected in the data directory.

The species included in this freeze are as follows:

       _SPECIES_      _SOURCE_
        Human           hg16
        Chimpanzee      panTro1
        Dog             canFam1
        Rat             rn3
        RatB	        BCM
        Mouse           mm5
        Chicken		galGal2
        Galago		NISC
        Baboon		NISC
        Marmoset        NISC
        Armadillo       NISC
        Platypus        NISC
You will notice that we have worked hard to include a number of species for which genome-wide sequence data was already available. The process by which these orthologous regions were identified is still an area of active research development, the details of which will be presented at the upcoming ENCODE meeting at CSHL.

Please note that we have also included a *second* rat sequence (ratB) in this freeze. RatB represents an initiative to standardize the quality level of sequences in the ENCODE regions for species with genome-wide sequence data. The data were made available just before our freeze date, so we decided to include both versions of the rat sequence for now. Eventually these sequences will likely be rolled into future genome assemblies.

Remember, this is a work in progress, so not all targets have sequence from all species. And some species/target combinations may not be complete yet. Progress on the NISC-generated sequences can be found at the NISC ENCODE Project: Comparative Sequencing.

Many people have worked very hard to make these data available. Special thanks to Daryl Thomas, Kate Rosenbloom and the entire UCSC Team; Greg Schuler and the NCBI team; Colin Dewey and Lior Pachter; Pam Thomas and the NISC team; and David Wheeler and the BCM team.

Please send any comments to the ENCODE MSA Mailing List.

Elliott H. Margulies, Ph.D. Genome Technology Branch, NHGRI

Posted on 24 Jun. 2004 - ENCODE Project Portal Released

We are proud to announce the release of features in the UCSC Genome Browser that are tailored to the ENCODE project community, including this home page to consolidate these resources.

The initial resources include sequences for the current human assemblies (hg16, hg15, hg13, and hg12), sequence of the comparative species from NISC, tools for coordinate conversion between human assemblies, format descriptions for data submission, and contact information for help with submitting annotation data and analyses.

Bulk downloads of the sequence and annotations may be obtained from the ENCODE Project Downloads page. The sequences available here are repeat-masked versions of the GenBank records.