This function identifies potential cryptic splice sites by comparing sequence motifs in introns to canonical splice site motifs (donor and acceptor). Cryptic splice sites are those that do not match the canonical donor (GT) or acceptor motifs (AG). It compares the identified splice sites with the provided canonical motifs and flags the sites that differ from the canonical patterns, making it useful for studying aberrant splicing events.
Arguments
- input
A data frame containing intron coordinates, ideally generated by
extract_introns()
andassign_splice_sites()
. Must contain columns:seqnames
,intron_start
,intron_end
,strand
,transcript_id
,intron_number
,gene_name
,gene_id
,donor_ss
andacceptor_ss
.- genome
A BSgenome object representing the genome sequence. This is used to extract the sequence for each intron to identify splice sites.
- canonical_donor
A character vector of canonical donor splice site motifs. Default is
c("GT")
.- canonical_acceptor
A character vector of canonical acceptor splice site motifs. Default is
c("AG")
.- verbose
Logical; if
TRUE
, progress messages are printed. Default isTRUE
.
Value
The input data frame with two logical columns:
cryptic_donor
:TRUE
if donor site is non-canonical.cryptic_acceptor
:TRUE
if acceptor site is non-canonical.
Details
This function performs the following steps:
It assigns donor and acceptor splice sites to each intron using the
assign_splice_sites
function.It compares the identified donor and acceptor splice sites against the provided canonical motifs (
GT
for donor andAG
for acceptor by default). If the splice site sequences do not match the canonical motifs, they are flagged as cryptic.The function returns a data frame with the same intron information, including additional columns
cryptic_donor
andcryptic_acceptor
indicating whether the splice sites are cryptic.The progress of the function is printed if the
verbose
argument is set toTRUE
, showing also the total number of cryptic donor and acceptor sites and their respective percentages.
Examples
if (FALSE) { # \dontrun{
if (requireNamespace("BSgenome.Hsapiens.UCSC.hg38", quietly = TRUE)) {
file_v1 <- system.file("extdata", "gencode.v1.example.gtf.gz", package = "GencoDymo2")
gtf_v1 <- load_file(file_v1)
introns_df <- extract_introns(gtf_v1)
introns_ss <- assign_splice_sites(introns_df, genome = BSgenome.Hsapiens.UCSC.hg38)
cryptic_sites <- find_cryptic_splice_sites(introns_ss, BSgenome.Hsapiens.UCSC.hg38)
head(cryptic_sites)
}
} # }