Improved function to annotate genomic coordinates based on a GTF file, with optional canonical‑transcript exon numbering information.

**Key speed‑ups** * Canonical transcript and exon information is now pre‑computed once in `preload_gtf()` and cached. * Robust checks ensure a non‑NULL `canonical_exons_by_gene` object; if unavailable, the cache is rebuilt.

annotate_genomic_coordinates(
  coordinates,
  genome,
  gtffile,
  tss_upstream = 2000,
  tss_downstream = 200,
  cache_gtf = TRUE,
  include_exon_info = TRUE,
  verbose = F
)

Arguments

coordinates

Data frame with columns `chr` and `pos`.

genome

Character; genome build name (e.g., `"hg38"`, `"hg19"`, `"mm10"`).

gtffile

Character; path to the GTF annotation file.

tss_upstream

Numeric; bases upstream of TSS to define as promoter (default: 2000).

tss_downstream

Numeric; bases downstream of TSS to include in promoter (default: 200).

cache_gtf

Logical; whether to cache the GTF data between calls (default: TRUE).

include_exon_info

Logical; whether to compute canonical‑transcript exon numbers (default: FALSE).

verbose

Logical. if TRUE, will provide messaging.

Value

A data frame with annotation results for each coordinate. If `include_exon_info = TRUE`, extra columns `within_exon`, `fiveprime_exon`, and `threeprime_exon` are included.