Compares the number of annotated genomic elements (genes, transcripts, exons, introns) between two specified GENCODE releases. The function accepts input data as either file paths (in GTF/GFF format) or pre-loaded data frames. It computes the absolute difference (delta), the percentage change relative to a chosen baseline, and determines the direction of change (increase or decrease).
Arguments
- input1
A character string specifying the file path to a GTF/GFF file from the first GENCODE release, or a data frame containing the annotation data.
- input2
A character string specifying the file path to a GTF/GFF file from the second GENCODE release, or a data frame containing the annotation data.
- type
A character string indicating the type of genomic element to compare. Valid options are
"gene"
,"transcript"
,"exon"
, or"intron"
.- gene_type
An optional character string specifying a particular gene biotype to filter comparisons (e.g.,
"protein_coding"
,"lncRNA"
). IfNULL
(default), all gene types are included.- baseline
A character string defining the baseline for calculating percentage change. Options include:
"count1"
: Uses the count from the first input (release) as the baseline."count2"
: Uses the count from the second input (release) as the baseline (default)."average"
: Uses the average of the counts from both inputs as the baseline.
Value
A list with the following elements:
delta
: The absolute difference in the number of annotations.percentage
: The percentage change relative to the selected baseline.direction
: A string indicating the direction of the change ("increase", "decrease", or "no change").
Details
This function processes two GENCODE releases to compare annotation counts for a specified genomic element type. Key steps include:
Input Handling: If inputs are file paths, they are loaded into data frames using the
load_file
function. Data frames are used directly.Element Filtering: If
gene_type
is specified, annotations are filtered to include only that gene biotype.Count Calculation: The number of elements (genes, transcripts, etc.) of the specified type is counted in each release.
Delta and Percentage: The absolute difference (delta) and percentage change are calculated based on the chosen baseline.
Direction Determination: The direction of change is determined by comparing counts between the two releases.
The function provides both numerical results and a formatted console output highlighting key metrics.
Examples
file_v1 <- system.file("extdata", "gencode.v1.example.gtf.gz", package = "GencoDymo2")
file_v2 <- system.file("extdata", "gencode.v2.example.gtf.gz", package = "GencoDymo2")
# Example 1: Using data frames with the provided example GTF files
gtf_v1 <- load_file(file_v1)
gtf_v2 <- load_file(file_v2)
comparison <- compare_release(gtf_v1, gtf_v2, type = "gene")
#> Delta: 1
#> Percentage: 33.333%
#> Change Direction: increase