Consecutively the translation initiation sites (TIS) are determined. Here we distinguise 2 methods, based on the data type.
HARR/LTM method To enable translation intiation site (TIS) identification, RIBO-seq is performed in the presence of specific antibiotics: harringtonine (HARR) or lactimidomycin (LTM). In contrast to cycloheximde(CHX) or emitine(EM), these antibiotics halt ribosomes specifically at the translation initiation site, resulting in an accumulation of ribosomes at the translation initiation start sites. In summary RIBO-seq is performed twice, once to capture translating ribosomes (CHX, EM) and a second time to allow TIS-identification(HARR/LTM). The assembly reconstructs all posibble sORFs (both spliced, using the Ensembl database and un-spliced), from the TIS's. The difference in ribosome accumulation between the HARR/LTM and the CHX/EME/snapfreeze data acts as an criterium for TIS-calling.
in-frame coverage method In absence of HARR/LTM data, 'all' possible sORFs are reconstruced genome wide. This implies that sORFs are reconstruced both with and without considering transcript information with splicing information fetched from Ensembl. Next, the in-frame coverage is computed for all these sORFs (% of in-frame positions covered by ribosomes). For sORFs having an in-frame coverage of at least 10% and a ribosome count of at least 10, the TIS is stored. On contrary to the HARR/LTM method the pipeline does not consider all near-cognate start-sites, only the most abundant ones (ATG, CTG, GTG, TTG).