Extracts CDS regions from a GTF annotation file or data frame using genomic coordinates and retrieves corresponding DNA sequences from a BSgenome reference.
Arguments
- input
A character string (GTF file path) or data frame containing CDS annotations.
- genome
A BSgenome object for the relevant genome. Defaults to human (hg38).
- save_fasta
A logical indicating whether to save sequences to a FASTA file. Defaults to
FALSE
.- output_file
A character string specifying the FASTA output path. If
NULL
, uses "CDS.fa".- verbose
A logical indicating whether to print progress messages. Defaults to
TRUE
.
Value
A data frame containing CDS annotations with corresponding sequences. If save_fasta = TRUE
, also writes a FASTA file.
Details
This function processes CDS entries from the input GTF, extracts their sequences from the reference genome, and optionally saves them in FASTA format. Useful for downstream analyses like protein translation.
Examples
file_v1 <- system.file("extdata", "gencode.v1.example.gtf.gz", package = "GencoDymo2")
gtf_v1 <- load_file(file_v1)
# Human CDS extraction
suppressPackageStartupMessages(library(BSgenome.Hsapiens.UCSC.hg38))
suppressPackageStartupMessages(library(GenomicRanges))
gtf_granges <- GRanges(gtf_v1)
cds_seqs <- extract_cds_sequences(gtf_granges, BSgenome.Hsapiens.UCSC.hg38, save_fasta = FALSE)
#> Using provided GRanges object...
#> Warning: No CDS features found in the GTF data.