In the current release, there are 6496 and 10492 putative TISes from 4662 human genes (from four biological replicates of HEK293 cells) and 5361 mouse genes (from ONLY one sample of MEF cell) respectively.
TIS information is inferred from GTI-Seq technique. It is challenging to map TIS sites from genes with low expression levels or the annotated TIS signal is lower than alternative TIS sites of the same transcript. TISdb could also miss genes which are resistant to GTI-Seq. Hence, TISdb is not a comprehensive database of translation initiation site. It aims to provide insights into alternative translation initiation sites for biologists.
The coordinate of TIS is based on UCSC genome assembly hg19 and mm10. All the coordinates are one-based.
The coordinates of TIS in the "ORF Prediction View" is relative coordinate to transcription start site, which is also one-based.
Based on the annotated start codon of mRNA transcript, we classify TISes into three categories:
An upstream Open Reading Frame (uORF) is defined on a transcriptional basis. If a short open reading frame started with a uTIS is predicted within the 5' untranslated region, it is classified as a uORF.
TIS data from other two studies (Cell. 147(4):789-802 and Genome Res. 22(11):2208-18.) were incoporated into TISdb as additional support for our data.
For each TIS, we also provided average phyloP score over a [-3,+4] windows flanking start codon. phyloP score is based on 46-way (for hg19) and 60-way (for mm10) vertebrate alignment from UCSC genome browser, which could be used to measure base-by-base conservation of genomic region. Generally, a POSITIVE phyloP score means CONSERVATION while a NEGATIVE value reflects SELECTION.