BIOINFORMATICS APPLICATIONS NOTE Vol. 21 no. 8 2005, pages 1699–1700 HESAS: HERVs Expression and Structure Analysis System
Tae-Hyung Kim1, Yeo-Jin Jeon1, Woo-Yeon Kim1 and Heui-Soo Kim1,2,∗1PBBRC, Interdisciplinary Research Program of Bioinformatics and 2Division of Biological Sciences,College of Natural Sciences, Pusan National University, Busan 609–735, Korea
Received on October 21, 2004; revised on November 11, 2004; accepted on November 26, 2004
Advance Access publication December 10, 2004
ABSTRACT ANALYSIS PIPELINE AND DATABASE Summary: HESAS (HERVs Expression and Structure Analysis Sys- CONSTRUCTION
tem) database was developed to understand the human endogenous
retroviruses (HERVs) that have an effect on the expression of humanfunctional genes. The database products are generated by the exon-
Intron and both of the 5 kb regions, upstream- and downstream-
based expressed sequence tag clustering and reconstructing of partial
containing genes for the HESAS (HERVs Expression and Structure
HERV structures that result from various mutations during primate
Analysis System), were obtained from GoldenPath, which is based
evolution. The expression types were classified according to the exist-
on gene information of NCBI Build 34.3. Intron/exon structures
ence of splicing, transcriptional start and polyadenylation signal sites.
forming various HERV-related transcripts in genes are obtained
The database currently contains HERV information on 26 981 human
by alignment of RefSeq mRNAs as counterparts of the human
genes of exon–intron structure. The HERV elements were inserted
genes (Pruitt et al., 2003). HERV elements in genes were identified
into 17 317 of these genes and linked to expression with 898 genes. Availability:
embedded MaskerAid (Bedell et al., 2000) with 352 ERV con-
Contact: [email protected]
sensus sequences from the Repbase Update (Jurka, 2000). A HERV’slocus on the genome, its position in the consensus sequence, dir-ection, subfamily name and Smith–Waterman score were derived
from RepeatMasker’s outputs. The expressed sequence tag (EST)sequences were derived from NCBI’s dbEST database that con-
The human genome contains various endogenous retroviruses
tains 8209 cDNA libraries (Boguski et al., 1993). The useful EST
(HERVs) that represent the footprints of ancient germ-cell infec-
information for tissues and pathology types was obtained from the
tions (Lower et al., 1996). These HERVs and other long terminal
eVOC ontology, a set of controlled vocabularies for unifying gene
repeat (LTR)-like elements account for ∼8% of the human genome.
expression data (Kelso et al., 2003).
Most of the HERVs are no longer able to code for functional protein,owing to multiple stop codons, insertions, deletions and frame shifts. EST clustering
However, the presence of LTR-promoters from some HERV familiesare increasing their expression in human placenta and several can-
For EST clustering with strict supervisor to detect HERV-related
cer cell lines (Mi et al., 2000). Recently, these retroelements have
expression patterns from a given set of ESTs, intron/exon structures
gained the evolutionary potential role to enhance the coding capacity
resulting from the mapping of RefSeq mRNA by sim4 (Florea et al.,
and regulatory versatility of the genome without compromising its
1998) were collected. We set the criterion that at least one side of the
integrity (Sorek et al., 2002). Database for HERV elements was used
two-end boundary of an exon region had to overlap with an aligned
for searches of individual HERV families (Paces et al., 2002). Over
EST with an identity >97% in order to eliminate EST data con-
the past decade, a considerable number of studies have been conduc-
taminated by genomic sequences. To obtain expressed transcripts
ted on the capacity to modify the expression of neighboring genes
with HERVs only, non-ERV repeat sequences such as LINE, SINE,
(Jordan et al., 2003). These reports are focused mainly on strong
MIR and simple repeat elements within all genes were masked by
LTR-derived promoters in the specific tissues, including data for
RepeatMasker before performing EST clustering.
alternative splicing and primary polyadenylation signal (Mager et al.,1999). However, there has been no study concerning systematic ana-
Identification of HERV genomic structure
lyses of the creation of various transcripts caused by HERV elements
Consensus domain libraries for scanning HERVs were newly con-
within genomic sequences of human genes. Here, we characterized
structed by comparing highly conserved residues as potential coding
HERV positions, LTR-truncated constructs and HERV ORFs within
regions (gag, pro, RT, RNaseH, IN and env) within internal-HERV
the human genes. In addition, we present a large number of HERVs
with traditional viral genes of Pfam (Bateman et al., 2004) using
linked to genes that are expressed in normal tissues and pathology
HMMER. Thus, both libraries, of the consensus domain and the
tissues using an electronic mapping method.
original Repbase, were simultaneously used to identify the HERVelements within whole gene region. We found that 17 317 of the26 981 human genes were inserted by HERVs in the inner or neigh-
∗To whom correspondence should be addressed.
borhood part of their gene region. The genomic structure of HERV
The Author 2004. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected]T.-H.Kim et al.
elements was defined as LTR-HERV-LTR including the boundary
zoom in and out of any region of a gene to show a sophisticated image.
of the coding region. These were divided into four types (com-
The 2 bp flanking region of the exon boundary is compared with the
plete, 5 truncate, 3 truncate and 5 –3 truncate) according to LTR
given canonical splicing site (AG-GT), and then it is represented by
truncations, and solitary LTRs that were detected by calculating
the RepeatMasker output in reconstructing the HERV structure ofthe non-fragmental state before accumulating mutation (Kim et al.,
Bateman,A., Coin,L., Durbin,R., Finn,R.D., Hollich,V., Griffiths-Jones,S., Khanna,A.,
Analysis of the types of HERV expression within
Marshall,M., Moxon,S., Sonnhammer,E.L. et al. (2004) The Pfam protein families
database. Nucleic Acids Res., 32, D138–D141.
Bedell,J.A., Korf,I. and Gish,W. (2000) MaskerAid: a performance enhancement to
The genomic locus of the HERVs and exon information on the spli-
RepeatMasker. Bioinformatics, 16, 1040–1041.
cing structure of mRNA/EST were calculated from their positions
Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) dbEST—database for “expressed
within the genes. The HERV expression was interpreted as an over-
sequence tags”. Nat. Genet., 4, 332–333.
Florea,L., Hartzell,G., Zhang,Z., Rubin,G.M. and Miller,W. (1998) A computer program
lap relationship of HERV and mRNA/EST within the genes. All the
for aligning a cDNA sequence with a genomic DNA sequence. Genome Res., 8,
overlapping states are determined by fully or partially depending on
inclusion relationship. The types of HERV expression were classified
Jordan,I.K., Rogozin,I.B., Glazko,G.V. and Koonin,E.V. (2003) Origin of a substantial
according to the position of HERVs within the genes and information
fraction of human regulatory sequences from transposable elements. Trends Genet., 19, 68–72.
on the splicing structure of mRNA/EST.
Jurka,J. (2000) Repbase update: a database and an electronic journal of repetitive
elements. Trends Genet., 16, 418–420. ACCESS AND VISUALIZATION
Kelso,J., Visagie,J., Theiler,G., Christoffels,A., Bardien,S., Smedley,D., Otgaar,D.,
Greyling,G., Jongeneel,C.V., McCarthy,M.I. et al. (2003) eVOC: a controlled
The HESAS can be searched in two major modes. First, users can
vocabulary for unifying gene expression data. Genome Res., 13, 1222–1230.
query to detect the types of expressed transcripts with HERVs and
Kim,T.H., Jeon,Y.J., Yi,J.M., Kim,D.S., Huh,J.W., Hur,C.G. and Kim,H.S. (2004) The
the genomic structure of HERVs by selecting the chromosome num-
distribution and expression of HERV families in the human genome. Mol. Cells, 18,
ber and clicking on ‘HERV expression’. Second, it is possible for
Lower,R., Lower,J. and Kurth,R. (1996) The viruses in all of us: characteristics and
users to search the existence of HERV expression by selection of
biological significance of human endogenous retrovirus sequences. Proc. Natl Acad.
specific or interesting genes, and EST category of tissues or patho-
Sci. USA, 93, 5177–5184.
logy. Result pages are listed in the tabular format to represent the
Mager,D.L., Hunter,D.G., Schertzer,M. and Freeman,J.D. (1999) Endogenous retrovir-
evidence and information of expressed HERV events within genes.
uses provide the primary polyadenylation signal for two new human genes (HHLA2
Using Java applet, we also developed a viewer of sufficient power
and HHLA3). Genomics, 59, 255–263.
Mi,S., Lee,X., Li,X., Veldman,G.M., Finnerty,H., Racie,L., LaVallie,E., Tang,X.Y.,
to analyze types of HERV expression. This viewer shows a tran-
Edouard,P., Howes,S. et al. (2000) Syncytin is a captive retroviral envelope protein
script of each expressed HERV that is represented by the exon/intron
involved in human placental morphogenesis. Nature, 403, 785–789.
splicing structure of mRNAs/ESTs, as well as merging HERV ele-
Paces,J., Pavlicek,A. and Paces,V. (2002) HERVd: database of human endogenous
ments as evidence of expressed HERV elements. Moreover, the
retroviruses. Nucleic Acids Res., 30, 205–206.
Pruitt,K.D., Tatusova,T. and Maglott,D.R. (2003) NCBI reference sequence project:
event of expressed HERV provides a visual presentation of various
update and current status. Nucleic Acids Res., 31, 34–37.
highlighted transcript structures, alternative promoter, splicing and
Sorek,R., Ast,G. and Graur,D. (2002) Alu-containing exons are alternatively spliced.
polyadenylation signal. Viewers are allowed a facilitative function to
Genome Res., 12, 1060–1067.
Creating and Leveraging Options in the High Technology Supply Chain By Corey Billington and Blake Johnson Management of high technology companies is made challenging by the extremely rapid pace of change in technology products and markets. The most fundamental driver of this change is Moore’s Law, named after Gordon Moore, one of the founders of Intel, who observed over three decades ago
gigio brunello Macbeth de repente Enredo: Macbeth de repente Drama em dois actos para fantoches livremente inspirado ao W. Shakespeare. O titereiro desculpa-se com o público: o Macbeth não será representado por causa de contratempos dévidos à construção dos novos fantoches. Em lugar da tragédia ele vai fazer uma comédia utilizando máscaras da Commedia dell’arte e d