CNA_plot_highlight.RdGenerates a genome-wide CNA plot with additional gene annotation and labeling.
This function extends CNA_plot by annotating gene coordinates (using
annotateCoverageWithGenes()) and optionally highlighting selected genes
and cytogenetic (G-band) features.
CNA_plot_highlight(
depth_bigwig_file,
variant_file,
txdb,
org,
gene_delta_threshold = 2,
downsample = 0.1,
point_size = 0.01,
line_size = 0.1,
line_color = "red",
colors = NULL,
max_value = NULL,
min_value = NULL,
min_variant_distance = 10000,
method = c("fit", "delta", "loess"),
trend_window = 50,
apply_weight = TRUE,
outside_weight = 0.25,
inside_weight = 1,
variant_alpha = 0.1,
nudge_y = 3,
samplename = "",
chr_filter = NULL,
exclude_xy = FALSE,
highlight_genes = NULL,
gband_file = file.path("~/develop/pacbiowdlR/cytobands.tsv"),
gband_y_offset = -0.2,
gband_text_size = 2,
showCNAbands = FALSE,
trend_regions = "both",
return_data = FALSE
)Character. File path to a BigWig file containing coverage depth data.
Character. File path to a VCF file with variant data.
A transcript database object used for gene annotation.
A species annotation object (e.g., org.Hs.eg.db) for mapping gene identifiers.
Numeric. Delta threshold for gene annotation filtering.
Numeric. Proportion of the coverage data to retain after downsampling.
Numeric. Size of individual points in the plot.
Numeric. Size of the trend line.
Character. Color of the trend line.
Named vector of colors for chromosomes; if NULL, defaults to an alternating palette.
Numeric. Maximum allowed delta value; values above this are capped.
Numeric. Minimum allowed delta value; values below this are capped.
Numeric. Minimum variant length (in bp) to include.
Character. One of "fit", "delta", or "loess" determining the method used to compute delta values.
Integer. Window size for computing the running median trend line.
Logical. If TRUE, applies a weight multiplier to delta values outside CNA calls.
Numeric. Weight multiplier applied to delta values outside CNA calls.
Numeric. Weight multiplier applied to delta values inside CNA calls.
Numeric. Transparency level for variant call rectangles.
Numeric. Vertical nudge for positioning gene labels.
Character. Sample label used in the plot title.
Character. If provided, restricts the analysis to the specified chromosome.
Logical. If TRUE, chromosomes X and Y are excluded.
Character vector. Specific gene symbols to highlight in the plot.
Character. File path to a cytogenetic band data file (e.g., a TSV file).
Numeric. Vertical offset for G-band label placement.
Numeric. Text size for G-band labels.
Logical. If TRUE, displays G-band annotations on the plot.
Character. One of "both", "inside", or "outside".
This option controls where the running median trend line is displayed:
"both"Trend line is plotted for all points (default).
"inside"Trend line is only plotted for positions that fall within variant regions.
"outside"Trend line is only plotted for positions that fall outside variant regions.
Logical. If TRUE, returns a list containing the plot, the coverage data, variant calls, and gene annotations.
Either a ggplot2 object representing the CNA plot with gene highlights, or a list with additional data when return_data is TRUE.
The function performs several steps:
Imports and filters coverage data from the BigWig file using import.bw() and retains standard chromosomes (using keepStandardChromosomes()).
Optionally loads external GC/repeat data when using the "fit" method.
Downsamples the coverage data.
Computes delta values using the specified method:
["delta"] subtracts the mean coverage.
["loess"] fits a LOESS model and computes log2 ratios.
["fit"] fits a linear model to predict coverage based on GC content and repeat fraction.
Reads and filters variant calls from the VCF file.
Computes genomic offsets via generate_offsets() and marks variant regions.
Applies weighting to delta values with different multipliers for points inside and outside CNV calls.
Computes a running median trend line using rollapply(), and, based on trend_regions,
subsets the trend line to display only points inside variants, outside variants, or in both regions.
Annotates gene information on the coverage data using annotateCoverageWithGenes().
Constructs the final plot with ggplot2 showing data points, the trend line, variant rectangles, and gene labels (with optional G-band annotations).