A function to compute the biweight mean vector and covariance matrix
Source:R/biwt.cor.R
biwt.cor.Rd
A function to compute the biweight mean vector and covariance matrix
Arguments
- x
a
g x n
matrix or data frame (n
is the number of measurements,g
is the number of observations (genes) )- r
breakdown (
k/n
wherek
is the largest number of observations that can be replaced with arbitrarily large values while keeping the estimates bounded). Default isr = 0.2
.- output
a character string specifying the output format. Options are "matrix" (default), "vector", or "distance". See value below.
- median
a logical command to determine whether the initialization is done using the coordinate-wise median and MAD^2 (TRUE, default) or using the minimum covariance determinant (MCD) (FALSE). Using the MCD is substantially slower. The MAD is the median of the absolute deviations from the median. See R help file on
mad
.- full.init
a logical command to determine whether the initialization is done for each pair separately (FALSE) or only one time at the beginning using the entire data matrix (TRUE, default). Initializing for each pair separately is substantially slower.
- absval
a logical command to determine whether the distance should be measured as 1 minus the absolute value of the correlation (TRUE, default) or as 1 minus the correlation (FALSE).
Value
Specifying "vector" for the output argument returns a vector consisting of the lower triangle of the correlation matrix stored by columns in a vector, say \(bwcor\). If \(g\) is the number of observations and \(bwcor\) is the correlation vector, then for \(i < j <= g\), the biweight correlation between (rows) \(i\) and \(j\) is \(bwcor[(j-1)*(j-2)/2 + i]\). The length of the vector is \(g*(g-1)/2\), i.e., of order \(g^2\).
Specifying "matrix" for the output argument returns a matrix of the biweight correlations.
Specifying "distance" for the output argument returns a matrix of the biweight distances (default is 1 minus absolute value of the biweight correlation).
If there is too much missing data or if the initialization is not accurate, the function will compute the MCD for a given pair of observations before computing the biweight correlation (regardless of the initial settings given in the call to the function).
The "vector" output option is given so that correlations can be stored as vectors which are less computationally intensive than matrices.
Returns a list with components:
- corr
a vector consisting of the lower triangle of the correlation matrix stored by columns in a vector, say bwcor. If
g
is the number of observations, i.e., then for \(i < j \leq g\), the biweight correlation between (rows)i
andj
is bwcor[\(g*(i-1) - i*(i-1)/2 + j-i\)]. The dimension of the matrix is \(g x g\).- corr.mat
a matrix consisting of the lower triangle of the correlation matrix stored by columns in a vector, say bwcor. If
g
is the number of observations, i.e., then for \(i < j \leq g\), the biweight correlation between (rows)i
andj
is bwcor[\(g*(i-1) - i*(i-1)/2 + j-i\)]. The length of the vector is \(g*(g-1)/2\), i.e., of order \(g^2\).- dist.mat
a matrix consisting of the correlations converted to distances (either 1 - correlation or 1 - abs(correlation)).
Examples
# note that biwt.cor() takes data that is gxn where the
# goal is to find correlations or distances between each of the g items
samp.data <- t(MASS::mvrnorm(30,mu=c(0,0,0),
Sigma=matrix(c(1,.75,-.75,.75,1,-.75,-.75,-.75,1),ncol=3)))
# To compute the 3 pairwise correlations from the sample data:
samp.bw.cor <- biwt.cor(samp.data, output="vector")
samp.bw.cor
#> [1] 0.8002356 -0.8044574 -0.7784624
# To compute the 3 pairwise correlations in matrix form:
samp.bw.cor.mat <- biwt.cor(samp.data)
samp.bw.cor.mat
#> [,1] [,2] [,3]
#> [1,] 1.0000000 0.8002356 -0.8044575
#> [2,] 0.8002356 1.0000000 -0.7784625
#> [3,] -0.8044575 -0.7784625 1.0000000
# To compute the 3 pairwise distances in matrix form:
samp.bw.dist.mat <- biwt.cor(samp.data, output="distance")
samp.bw.dist.mat
#> [,1] [,2] [,3]
#> [1,] 0.0000000 0.1997644 0.1955425
#> [2,] 0.1997644 0.0000000 0.2215375
#> [3,] 0.1955425 0.2215375 0.0000000
# To convert the distances into an object of class `dist'
as.dist(samp.bw.dist.mat)
#> 1 2
#> 2 0.1997644
#> 3 0.1955425 0.2215375