Reads a gzipped VCF file containing structural variant data, applies several filters, optionally annotates fusion breakpoints using a GTF file, and (optionally) generates a Circos plot.

get_fusions_from_vcf(
  vcf_file,
  gtf_file = NULL,
  genome = "hg38",
  title = "",
  thresh = 20,
  allowed = paste0("chr", c(1:22)),
  highlight = NULL,
  plot = FALSE,
  filter = c("pass", "fully_spanned", "has_strand"),
  annotate = FALSE,
  verbose = F,
  tss_upstream = 2000,
  tss_downstream = 200
)

Arguments

vcf_file

Character. File path to the gzipped VCF file.

gtf_file

Character or NULL. Optional file path to a GTF file for breakpoint annotation. Required when annotate is TRUE.

genome

Character. Genome assembly version (default: "hg38").

title

Character. Title for the plot when plot = TRUE (default: an empty string).

thresh

Numeric. Threshold for read depth filtering (default: 20).

allowed

Character vector. Allowed chromosomes (default: paste0("chr", 1:22)).

highlight

Optional. Indices or logical vector of fusion events to highlight in the plot (default: NULL).

plot

Logical. If TRUE a Circos plot will be generated (default: FALSE).

filter

Character vector. Filtering criteria; supported values include "pass", "fully_spanned", "protein_coding", and "has_strand".

annotate

Logical. If TRUE, fusion breakpoints will be annotated using gene coordinates (default: FALSE).

verbose

Logical. if TRUE, will provide messaging.

tss_upstream

Numeric. Number of base pairs upstream of the transcription start site (default: 2000).

tss_downstream

Numeric. Number of base pairs downstream of the transcription start site (default: 200).

Value

A data frame containing fusion events with their breakpoint coordinates and (if requested) annotation details.