Fitting individual models
Umut Caglar, Claus O. Wilke
2025-10-03
Source:vignettes/fitting_individual_models.Rmd
fitting_individual_models.Rmd
Sometimes it can be useful to fit only specific models to a dataset
rather than fit multiple models and run the decision algorithm. For this
purpose, sicegar provides the function
multipleFitFunction()
. This function fits a chosen model to
the input dataset. It calls the fitting algorithm multiple times with
different, randomly generated start parameters, to guarantee robust and
reliable fitting.
The default functions assume that the lower asymptote,
,
(when x is at negative infinity) is zero. If the data are better fit
using a model that estimates the lower asymptote,
multipleFitFunction_h0()
function can be used.
We will demonstrate the use of this function on double-sigmoidal
data, generated by adding some noise to the double-sigmoidal curve used
for fitting. The curve used for fitting is implemented as
doublesigmoidalFitFormula()
, and thus can be used to
generate model data.
time <- seq(3, 24, 0.5)
noise_parameter <- 0.2
intensity_noise <- runif(n = length(time), min = 0, max = 1) * noise_parameter
intensity <- doublesigmoidalFitFormula(time,
finalAsymptoteIntensityRatio = .3,
maximum = 4,
slope1Param = 1,
midPoint1Param = 7,
slope2Param = 1,
midPointDistanceParam = 8)
intensity <- intensity + intensity_noise
dataInput <- data.frame(time, intensity)
head(dataInput) # the generated input data
## time intensity
## 1 3.0 0.2342345
## 2 3.5 0.2634529
## 3 4.0 0.2884316
## 4 4.5 0.3297332
## 5 5.0 0.5435398
## 6 5.5 0.7838986
Before we can perform the fit, we need to normalize the data
appropriately. All sicegar fit functions work on
normalized data, where time and intensity are normalized to the interval
from 0 to 1. Sicegar provides a convenient normalization function
normalizeData()
that normalizes data appropriately while
storing the required information to transform fitted parameters back
into non-normalized coordinates:
normalizedInput <- normalizeData(dataInput = dataInput,
dataInputName = "doubleSigmoidalSample")
head(normalizedInput$timeIntensityData) # the normalized time and intensity data
## time intensity
## 1 0.1250000 0.000000000
## 2 0.1458333 0.007491139
## 3 0.1666667 0.013895288
## 4 0.1875000 0.024484372
## 5 0.2083333 0.079300988
## 6 0.2291667 0.140925160
The data scaling paratmers and the data input name are stored as well:
normalizedInput$dataScalingParameters # scaling parameters
## timeRange intensityMin intensityMax intensityRange
## 24.0000000 0.2342345 4.1346318 3.9003973
normalizedInput$dataInputName # data input name
## [1] "doubleSigmoidalSample"
Note that normalizeData()
normalizes time with respect
to the maximum value the time parameter takes:
timeRange <- time
timeNormalized <- time/timeRange # normalized time values
Whereas intensity is normalized with respect to the intensity interval:
intensityMin <- min(intensity)
intensityMax <- max(intensity)
intensityRange <- intensityMax - intensityMin
intensityNormalized <- (intensity-intensityMin)/intensityRange # normalized intensity values
Fitting the models to the data
To fit a model to the data using the function
multipleFitFunction()
, we provide it as input the
normalized data and the model type to be fitted, which can be
"sigmoidal"
or "doublesigmoidal"
. Here we are
fitting both models to the same input data:
# Do the sigmoidal fit
sigmoidalModel <- multipleFitFunction(dataInput=normalizedInput,
model="sigmoidal")
# Do the double-sigmoidal fit
doubleSigmoidalModel <- multipleFitFunction(dataInput=normalizedInput,
model="doublesigmoidal")
The two generated model objects contain a large number of computed parameters, described in detail in the following.
t(sigmoidalModel)
## [,1]
## maximum_N_Estimate "0.5817715"
## maximum_Std_Error "0.0471064"
## maximum_t_value "12.35016"
## maximum_Pr_t "3.144059e-15"
## slopeParam_N_Estimate "56.69418"
## slopeParam_Std_Error "62.92244"
## slopeParam_t_value "0.901017"
## slopeParam_Pr_t "0.3729725"
## midPoint_N_Estimate "0.254053"
## midPoint_Std_Error "0.02232296"
## midPoint_t_value "11.38079"
## midPoint_Pr_t "4.07523e-14"
## residual_Sum_of_Squares "2.947177"
## log_likelihood "-3.38678"
## AIC_value "14.77356"
## BIC_value "21.81836"
## isThisaFit "TRUE"
## startVector.maximum "0.4293649"
## startVector.slopeParam "17.12084"
## startVector.midPoint "-0.2478345"
## dataScalingParameters.timeRange "24"
## dataScalingParameters.intensityMin "0.2342345"
## dataScalingParameters.intensityMax "4.134632"
## dataScalingParameters.intensityRange "3.900397"
## model "sigmoidal"
## additionalParameters "FALSE"
## maximum_Estimate "2.503374"
## slopeParam_Estimate "2.362258"
## midPoint_Estimate "6.097271"
## dataInputName "doubleSigmoidalSample"
## betterFit "4"
## correctFit "20"
## totalFit "28"
t(doubleSigmoidalModel)
## [,1]
## finalAsymptoteIntensityRatio_N_Estimate "0.2783729"
## finalAsymptoteIntensityRatio_Std_Error "0.005673176"
## finalAsymptoteIntensityRatio_t_value "49.06827"
## finalAsymptoteIntensityRatio_Pr_t "2.802985e-35"
## maximum_N_Estimate "0.9890303"
## maximum_Std_Error "0.00674476"
## maximum_t_value "146.6368"
## maximum_Pr_t "9.166088e-53"
## slope1Param_N_Estimate "26.72541"
## slope1Param_Std_Error "0.9129164"
## slope1Param_t_value "29.27476"
## slope1Param_Pr_t "3.431544e-27"
## midPoint1Param_N_Estimate "0.2942993"
## midPoint1Param_Std_Error "0.001583065"
## midPoint1Param_t_value "185.9048"
## midPoint1Param_Pr_t "1.427291e-56"
## slope2Param_N_Estimate "25.36587"
## slope2Param_Std_Error "1.253865"
## slope2Param_t_value "20.23014"
## slope2Param_Pr_t "1.33991e-21"
## midPointDistanceParam_N_Estimate "0.334362"
## midPointDistanceParam_Std_Error "0.003221369"
## midPointDistanceParam_t_value "103.795"
## midPointDistanceParam_Pr_t "3.171517e-47"
## residual_Sum_of_Squares "0.01073605"
## log_likelihood "117.3356"
## AIC_value "-220.6713"
## BIC_value "-208.3429"
## isThisaFit "TRUE"
## startVector.finalAsymptoteIntensityRatio "0.7374777"
## startVector.maximum "0.6159543"
## startVector.slope1Param "86.40114"
## startVector.midPoint1Param "0.5335278"
## startVector.slope2Param "179.7345"
## startVector.midPointDistanceParam "0.212152"
## dataScalingParameters.timeRange "24"
## dataScalingParameters.intensityMin "0.2342345"
## dataScalingParameters.intensityMax "4.134632"
## dataScalingParameters.intensityRange "3.900397"
## model "doublesigmoidal"
## additionalParameters "FALSE"
## finalAsymptoteIntensityRatio_Estimate "0.3196819"
## maximum_Estimate "4.091846"
## slope1Param_Estimate "1.113559"
## midPoint1Param_Estimate "7.063182"
## slope2Param_Estimate "1.056911"
## midPointDistanceParam_Estimate "8.024689"
## dataInputName "doubleSigmoidalSample"
## betterFit "5"
## correctFit "20"
## totalFit "40"
The calculated quantities can be grouped into several different groups of parameters:
1. Information about the fitting process
-
model
: String indicating the type of the model,"sigmoidal"
for the sigmoidal model and"doublesigmoidal"
for the double-sigmoidal model. -
isThisaFit
: A boolean that equals toTRUE
if at least one fit was successful andFALSE
otherwise. -
betterFit
: The number of times that the minimum AIC score improved with a subsequent fitting attempt. In other words, this counts the number of times the multiple fit attempts increased fit quality. -
correctFit
: The total number of successfull fits. -
totalFit
: The total number of fit attempts.
2. Estimates of the fitted parameters
These estimates have been converted from the normalized data to the original raw data, and are the main quantities of interest to the user. They depend on the type of the model, sigmoidal vs. double-sigmoidal.
Estimates for the sigmoidal model are:
-
maximum_Estimate
: Maximum intensity estimate for the raw data. -
slopeParam_Estimate
: Slope parameter estimate for the raw data. Note that the slope parameter is related to but not equal to the slope. See the vignette on additional parameters for details. -
midPoint_Estimate
: Mid-point estimate (time the intensity reaches 1/2 of maximum) for the raw data.
Estimates for the double-sigmoidal model are:
-
maximum_Estimate
: Maximum intensity estimate for the raw data. -
slope1Param_Estimate
: Slope 1 parameter estimate for the raw data. Note that the slope 1 parameter is related to but not equal to the slope. See the vignette on additional parameters for details. -
midPoint1Param_Estimate
: Mid-point 1 estimate (time the intensity reaches 1/2 of maximum) for the raw data. Needs numerical correction. See the vignette on additional parameters for details. -
slope2Param_Estimate
: Slope 2 parameter estimate for the raw data. Note that the slope 2 parameter is related to but not equal to the slope. See the vignette on additional parameters for details. -
midPointDistanceParam_Estimate
: Distance between mid-point 1 and mid-point 2, where mid-point 2 is the time at which intensity decreases to the mean between the final asymptote intensity and the maximum value. Needs numerical correction. See the vignette on additional parameters for details. -
finalAsymptoteIntensityRatio_Estimate
: This is the ratio between asymptote intensity and maximum intensity of the fitted curve.
3. Quantities measuring the overall quality of fit
-
residual_Sum_of_Squares
: Residual sum of squares, smaller values indicate a better fit. -
log_likelihood
: Negative log likelihood, larger values indicate a better fit. -
AIC_value
: Akaike Information Criterion, smaller values indicate a better fit. -
BIC_value
: Bayesian Information Criterion, smaller values indicate a better fit.
4. Start point of the gradient descent algorithm
Each time a fit is attempted, the likelihood maximization algorithm
starts from a random initiation point and finds the final parameter
estimates by gradient descent. The start vector for the best fit is
returned in the form of variables whose name starts with
startVector.
, followed by the name of the estimated
parameter. For example:
-
startVector.maximum
: Value of the maximum parameter at the initiation point.
5. Parameters related to the normalization step
-
dataScalingParameters.timeRange
: Maximum of raw time data.
-
dataScalingParameters.intensityMin
: Minimum of raw intensity data. -
dataScalingParameters.intensityMax
: Maximum of raw intensity data. -
dataScalingParameters.intensityRange
: Maximum - Minimum of intensity data.
6. Error estimates for fitted parameters
For each estimated parameter listed under point 2, the algorithm provides additional statistical parameters, such as the estimate in the normalized scale, the standard error (also in normalized scale), the t value, and the P value. For example, for the maximum estimate, these are:
-
maximum_N_Estimate
: Estimate in normalized scale. -
maximum_Std_Error
: Standard error, in normalized scale. -
maximum_t_value
: t value -
maximum_Pr_t
: P value