A function to compute the biweight mean vector and covariance matrix
Source:R/biwt_est.R
biwt_est.Rd
Compute a multivariate location and scale estimate based on Tukey's biweight weight function.
Arguments
- x
an
n x 2
matrix or data frame (n
is the number of observations)- r
breakdown (
k/n
wherek
is the largest number of observations that can be replaced with arbitrarily large values while keeping the estimates bounded)- med.init
a (robust) initial estimate of the center and shape of the data. form is a list with components center and cov.
Value
A robust measure of center and shape is computed using Tukey's biweight M-estimator. The biweight estimates are essentially weighted means and covariances where the weights are calculated based on the distance of each measurement to the data center with respect to the shape of the data. The estimates should be computed pair-by-pair because the weights should depend only on the pairwise relationship at hand and not the relationship between all the observations globally.
Returns a list with components:
- biwt_mu
the final estimate of location
- biwt_sig
the final estimate of scatter
- biwt_NAid
a logical of whether the given initialization was used (coded as 0) or whether a more precise initialization was used (namely, the pair by pair MCD, coded as 1).
References
Hardin, J., Mitani, A., Hicks, L., VanKoten, B.; A Robust Measure of Correlation Between Two Genes on a Microarray, BMC Bioinformatics, 8:220; 2007.
Examples
# note that biwt_est() takes data that is nx2 where the
# goal is to find the correlation between the 2 items
samp_data <- MASS::mvrnorm(30,mu=c(0,0),Sigma=matrix(c(1,.75,.75,1),ncol=2))
r <- 0.2 # breakdown
samp_mcd <- robustbase::covMcd(samp_data)
samp_bw <- biwt_est(samp_data, r, samp_mcd)
samp_bw_var1 <- samp_bw$biwt_sig[1,1]
samp_bw_var2 <- samp_bw$biwt_sig[2,2]
samp_bw_cov <- samp_bw$biwt_sig[1,2]
samp_bw_corr <- samp_bw_cov / sqrt(samp_bw_var1 * samp_bw_var2)
# or:
samp_bw_corr <- samp_bw$biwt_sig[1,2] / sqrt(samp_bw$biwt_sig[1,1]*samp_bw$biwt_sig[2,2])
samp_bw_corr
#> [1] 0.7951656
##############
# to speed up the calculations, use the median/mad for the initialization:
##############
samp_init <- list()
samp_init$cov <- diag(apply(samp_data, 2, stats::mad, na.rm=TRUE))
samp_init$center <- apply(samp_data, 2, median, na.rm=TRUE)
samp_bw <- biwt_est(samp_data, r, samp_init)
samp_bw_corr <- samp_bw$biwt.sig[1,2] / sqrt(samp_bw$biwt.sig[1,1]*samp_bw$biwt.sig[2,2])