Generates a genome-wide CNA plot with additional gene annotation and labeling. This function extends CNA_plot by annotating gene coordinates (using annotateCoverageWithGenes()) and optionally highlighting selected genes and cytogenetic (G-band) features.

CNA_plot_highlight(
  depth_bigwig_file,
  variant_file,
  txdb,
  org,
  gene_delta_threshold = 2,
  downsample = 0.1,
  point_size = 0.01,
  line_size = 0.1,
  line_color = "red",
  colors = NULL,
  max_value = NULL,
  min_value = NULL,
  min_variant_distance = 10000,
  method = c("fit", "delta", "loess"),
  trend_window = 50,
  apply_weight = TRUE,
  outside_weight = 0.25,
  inside_weight = 1,
  variant_alpha = 0.1,
  nudge_y = 3,
  samplename = "",
  chr_filter = NULL,
  exclude_xy = FALSE,
  highlight_genes = NULL,
  gband_file = file.path("~/develop/pacbiowdlR/cytobands.tsv"),
  gband_y_offset = -0.2,
  gband_text_size = 2,
  showCNAbands = FALSE,
  trend_regions = "both",
  return_data = FALSE
)

Arguments

depth_bigwig_file

Character. File path to a BigWig file containing coverage depth data.

variant_file

Character. File path to a VCF file with variant data.

txdb

A transcript database object used for gene annotation.

org

A species annotation object (e.g., org.Hs.eg.db) for mapping gene identifiers.

gene_delta_threshold

Numeric. Delta threshold for gene annotation filtering.

downsample

Numeric. Proportion of the coverage data to retain after downsampling.

point_size

Numeric. Size of individual points in the plot.

line_size

Numeric. Size of the trend line.

line_color

Character. Color of the trend line.

colors

Named vector of colors for chromosomes; if NULL, defaults to an alternating palette.

max_value

Numeric. Maximum allowed delta value; values above this are capped.

min_value

Numeric. Minimum allowed delta value; values below this are capped.

min_variant_distance

Numeric. Minimum variant length (in bp) to include.

method

Character. One of "fit", "delta", or "loess" determining the method used to compute delta values.

trend_window

Integer. Window size for computing the running median trend line.

apply_weight

Logical. If TRUE, applies a weight multiplier to delta values outside CNA calls.

outside_weight

Numeric. Weight multiplier applied to delta values outside CNA calls.

inside_weight

Numeric. Weight multiplier applied to delta values inside CNA calls.

variant_alpha

Numeric. Transparency level for variant call rectangles.

nudge_y

Numeric. Vertical nudge for positioning gene labels.

samplename

Character. Sample label used in the plot title.

chr_filter

Character. If provided, restricts the analysis to the specified chromosome.

exclude_xy

Logical. If TRUE, chromosomes X and Y are excluded.

highlight_genes

Character vector. Specific gene symbols to highlight in the plot.

gband_file

Character. File path to a cytogenetic band data file (e.g., a TSV file).

gband_y_offset

Numeric. Vertical offset for G-band label placement.

gband_text_size

Numeric. Text size for G-band labels.

showCNAbands

Logical. If TRUE, displays G-band annotations on the plot.

trend_regions

Character. One of "both", "inside", or "outside". This option controls where the running median trend line is displayed:

"both"

Trend line is plotted for all points (default).

"inside"

Trend line is only plotted for positions that fall within variant regions.

"outside"

Trend line is only plotted for positions that fall outside variant regions.

return_data

Logical. If TRUE, returns a list containing the plot, the coverage data, variant calls, and gene annotations.

Value

Either a ggplot2 object representing the CNA plot with gene highlights, or a list with additional data when return_data is TRUE.

Details

The function performs several steps:

  1. Imports and filters coverage data from the BigWig file using import.bw() and retains standard chromosomes (using keepStandardChromosomes()).

  2. Optionally loads external GC/repeat data when using the "fit" method.

  3. Downsamples the coverage data.

  4. Computes delta values using the specified method:

    • ["delta"] subtracts the mean coverage.

    • ["loess"] fits a LOESS model and computes log2 ratios.

    • ["fit"] fits a linear model to predict coverage based on GC content and repeat fraction.

  5. Reads and filters variant calls from the VCF file.

  6. Computes genomic offsets via generate_offsets() and marks variant regions.

  7. Applies weighting to delta values with different multipliers for points inside and outside CNV calls.

  8. Computes a running median trend line using rollapply(), and, based on trend_regions, subsets the trend line to display only points inside variants, outside variants, or in both regions.

  9. Annotates gene information on the coverage data using annotateCoverageWithGenes().

  10. Constructs the final plot with ggplot2 showing data points, the trend line, variant rectangles, and gene labels (with optional G-band annotations).