A function to compute the biweight mean vector and covariance matrix
Source:R/biwt_cor.R
biwt_cor.Rd
A function to compute the biweight mean vector and covariance matrix
Arguments
- x
an
n x g
matrix or data frame (n
is the number of measurements,g
is the number of observations (genes) )- r
breakdown (
k/n
wherek
is the largest number of observations that can be replaced with arbitrarily large values while keeping the estimates bounded)- median
a logical command to determine whether the initialization is done using the coordinate-wise median and MAD (TRUE) or using the minimum covariance determinant (MCD) (FALSE). Using the MCD is substantially slower.
- full_init
a logical command to determine whether the initialization is done for each pair separately (FALSE) or only one time at the beginning using the entire data matrix (TRUE). Initializing for each pair separately is substantially slower.
Value
Using biwt_est
to estimate the robust covariance matrix, a robust measure of correlation is computed using Tukey's biweight M-estimator. The biweight correlation is essentially a weighted correlation where the weights are calculated based on the distance of each measurement to the data center with respect to the shape of the data. The correlations are computed pair-by-pair because the weights should depend only on the pairwise relationship at hand and not the relationship between all the observations globally. The biwt functions compute many pairwise correlations and create distance matrices for use in other algorithms (e.g., clustering).
In order for the biweight estimates to converge, a reasonable initialization must be given. Typically, using TRUE for the median and full_init arguments will provide acceptable initializations. With particularly irregular data, the MCD should be used to give the initial estimate of center and shape. With data sets in which the observations are orders of magnitudes different, full_init=FALSE should be specified.
Returns a list with components:
- biwt_corr
a vector consisting of the lower triangle of the correlation matrix stored by columns in a vector, say bwcor. If
g
is the number of observations, i.e., then for \(i < j \leq g\), the biweight correlation between (rows)i
andj
is bwcor[\(g*(i-1) - i*(i-1)/2 + j-i\)]. The length of the vector is \(g*(g-1)/2\), i.e., of order \(g^2\).- biwt_NAid
a vector which is indexed in the same way as
biwt_corr
. The entries represent whether the biweight correlation was possible to compute (will be NA if too much data is missing or if the initializations are not accurate). 0 if computed accurately, 1 if NA.
Examples
# note that biwt_cor() takes data that is nxg where the
# goal is to find correlations between each of the g items
samp_data <- MASS::mvrnorm(30,mu=c(0,0,0),Sigma=matrix(c(1,.75,-.75,.75,1,-.75,-.75,-.75,1),ncol=3))
r <- 0.2 # breakdown
# To compute the 3 pairwise correlations from the sample data:
samp_bw_cor <- biwt_cor(samp_data,r)
samp_bw_cor
#> $biwt_corr
#> [1] 0.6850272 -0.6567926 -0.6451526
#>
#> $biwt_NAid
#> NULL
#>