This function creates a genome-wide copy number alteration (CNA) plot from a BigWig depth file and a VCF variant file. It computes delta values using one of three methods ("fit", "delta", or "loess"), downsampling the coverage data, and overlays variant calls as rectangles. A running median trend line is computed on weighted delta values.

CNA_plot(
  depth_bigwig_file,
  variant_file,
  txdb,
  method = c("fit", "delta", "loess"),
  gene_delta_threshold = 2,
  downsample = 0.1,
  point_size = 0.01,
  line_size = 0.1,
  line_color = "red",
  colors = NULL,
  max_value = NULL,
  min_value = NULL,
  min_variant_distance = 10000,
  samplename = "",
  chr_filter = NULL,
  trend_window = 50,
  apply_weight = TRUE,
  outside_weight = 0.25,
  inside_weight = 1,
  variant_alpha = 0.1,
  trend_regions = "inside",
  exclude_xy = TRUE,
  return_data = FALSE
)

Arguments

depth_bigwig_file

Character. File path to a BigWig file containing depth/coverage information.

variant_file

Character. File path to a variant call file (VCF) containing structural variant information.

txdb

A transcript database object (e.g., from the GenomicFeatures package) used for gene annotation.

method

Character. One of "fit", "delta", or "loess" determining the method to compute delta values. Only the first element of the provided vector is used.

gene_delta_threshold

Numeric. Threshold applied to the delta values for gene annotation.

downsample

Numeric. Proportion of coverage data to retain (e.g., 0.01 for 1 percent).

point_size

Numeric. Size of the plotted data points.

line_size

Numeric. Size of the plotted trend line.

line_color

Character. Color for the trend line.

colors

Named vector of colors for chromosomes. If NULL, an alternating palette of black/gray is used.

max_value

Numeric. Maximum allowed delta value; values above this are capped.

min_value

Numeric. Minimum allowed delta value; values below this are capped.

min_variant_distance

Numeric. Minimum distance (in bp) for a variant call to be retained.

samplename

Character. A label for the sample that is appended to the plot title.

chr_filter

Character. If specified, only data from the given chromosome are processed.

trend_window

Integer. The number of consecutive data points over which to compute the running median trend line.

apply_weight

Logical. If TRUE, weight multipliers are applied to delta values outside CNA calls.

outside_weight

Numeric. Multiplier applied to delta values outside CNA calls.

inside_weight

Numeric. Multiplier applied to delta values inside CNA calls.

variant_alpha

Numeric. Transparency level for variant rectangles.

trend_regions

Character. One of "both", "inside", or "outside" to control where the trend line is plotted. Default is "both".

exclude_xy

Logical. If TRUE, chromosomes X and Y are excluded from the analysis.

return_data

Logical. If TRUE, returns a list containing the plot, the coverage data, and variant calls.

Value

Either a ggplot2 object representing the CNA plot or a list with additional processed data when return_data is TRUE.

Details

The function performs several steps:

  1. Imports coverage data from the BigWig file using import.bw() and keeps only standard chromosomes (with keepStandardChromosomes()).

  2. Optionally loads external GC/repeat data when using the "fit" method.

  3. Filters and down-samples the coverage data.

  4. Computes delta values using the specified method:

    • ["delta"] subtracts the mean coverage.

    • ["loess"] fits a LOESS model and computes log2 ratios.

    • ["fit"] fits a linear model to predict coverage based on GC content and repeat fraction.

  5. Reads and filters variant calls from the VCF file.

  6. Computes genomic offsets via generate_offsets() (assumed to be defined elsewhere).

  7. Applies weighting to delta values and computes a running median trend line using rollapply().

  8. Based on the value of trend_regions, the trend line is shown only for regions:

    • "inside": only within CNV call regions,

    • "outside": only outside CNV call regions, or

    • "both": across all regions (the default).

  9. Builds the plot with ggplot2 incorporating points, trend line, variant rectangles, and chromosome boundaries.

Note

This function assumes that the helper function generate_offsets() is defined in your package.