In high-throughput scenarios, we often don’t know a priori whether a given dataset represents a sigmoidal curve, a double-sigmoidal curve, or neither. In these cases, we need to fit multiple different models to the data and then determine which model fits the best. This process is normally done automatically by the function sicegar::fitAndCategorize()
. In this vignette, we describe how this process works and how it can be done manually.
We will demonstrate this process for an artificial dataset representing a double-sigmoidal function. We first generate the data:
<- seq(3, 24, 0.5)
time <- 0.2
noise_parameter <- runif(n = length(time), min = 0, max = 1) * noise_parameter
intensity_noise <- doublesigmoidalFitFormula(time,
intensity finalAsymptoteIntensityRatio = .3,
maximum = 4,
slope1Param = 1,
midPoint1Param = 7,
slope2Param = 1,
midPointDistanceParam = 8)
<- intensity + intensity_noise
intensity <- data.frame(time, intensity) dataInput
We need to fit the sigmoidal and double-sigmoidal models to this dataset, using multipleFitFunction()
. This requires first normalizing the data:
<- normalizeData(dataInput = dataInput,
normalizedInput dataInputName = "doubleSigmoidalSample")
# Fit sigmoidal model
<- multipleFitFunction(dataInput = normalizedInput,
sigmoidalModel model = "sigmoidal",
n_runs_min = 20,
n_runs_max = 500,
showDetails = FALSE)
# Fit double-sigmoidal model
<- multipleFitFunction(dataInput = normalizedInput,
doubleSigmoidalModel model = "doublesigmoidal",
n_runs_min = 20,
n_runs_max = 500,
showDetails = FALSE)
We also need to perform the additional parameter calculations, as these are required by the categorize()
function we use below.
# Calculate additional parameters
<- parameterCalculation(sigmoidalModel)
sigmoidalModel
# Calculate additional parameters
<- parameterCalculation(doubleSigmoidalModel) doubleSigmoidalModel
This is what the two fits look like:
Clearly the sigmoidal fit is not appropriate but the double-sigmoidal one is. Next we demonstrate how to arrive at this conclusion computationally, using the function categorize()
. It takes as input the two fitted models as well as a number of parameters that are used in the decision process (explained below under “The decision process”).
# now we can categorize the fits
<- categorize(threshold_minimum_for_intensity_maximum = 0.3,
decisionProcess threshold_intensity_range = 0.1,
threshold_t0_max_int = 0.05,
parameterVectorSigmoidal = sigmoidalModel,
parameterVectorDoubleSigmoidal = doubleSigmoidalModel)
The object returned by categorize()
contains extensive information about the decision process, but the key component is the decision
variable. Here, it states that the data fits the double-sigmoidal model:
print(decisionProcess$decision)
## [1] "double_sigmoidal"
(The possible values here are “no_signal”, “sigmoidal”, “double_sigmoidal”, and “ambiguous”.)
The decision process consists of two parts. First, the categorize()
function checks whether all provided input data are valid. The steps of this verification are as follows:
categorize()
function provided with sigmoidal and double_sigmoidal models as input?sicegar::sigmoidalFitFunctions
?sicegar::doublesigmoidalFitFunctions
?sicegar::parameterCalculation()
?sicegar::parameterCalculation()
?After these steps, the primary decision process begins. It takes a list of four possible outcomes (“no_signal”, “sigmoidal”, “double_sigmoidal”, “ambiguous”) and systematically removes options until only one remains.
First, the algorithm checks if the provided data includes a signal or not.
threshold_minimum_for_intensity_maximum
; otherwise, the data is labeled with "no_signal"
.threshold_intensity_range
; otherwise, the data is labeled with "no_signal"
."no_signal"
, then the data can not be labeled with "no signal"
anymore.Next the algorithm checks if the sigmoidal and double sigmoidal models make sense.
"sigmoidal"
."double_sigmoidal"
.threshold_AIC
; otherwise, the data can not be labeled with "sigmoidal"
.threshold_AIC
; otherwise, the data cannot be labeled with "double_sigmoidal"
.startPoint_x
for the sigmoidal model must be a positive number; otherwise, the data cannot be labeled with "sigmoidal"
.startPoint_x
for the double-sigmoidal model must be a positive number; otherwise, the data cannot be labeled with "double_sigmoidal"
.start_intensity
for the sigmoidal model must be smaller than threshold_t0_max_int
; otherwise, the data cannot be labeled with "sigmoidal"
.start_intensity
for the double-sigmoidal model must be smaller than threshold_t0_max_int
; otherwise, the data cannot be labeled with "double_sigmoidal"
.threshold_dsm_tmax_IntensityRatio
; otherwise, the data cannot be labeled with "double_sigmoidal"
.threshold_sm_tmax_IntensityRatio
; otherwise, the data cannot be labeled with "sigmoidal"
.In step eight, the algorithm checks whether the data should be labelled as "ambiguous"
or not.
"sigmoidal"
or "double_sigmoidal"
, then the data cannot be labeled with "ambiguous"
.In the last step; the algorithm checks whether the data should be labeled as "sigmoidal"
or "double_sigmoidal"
.
"sigmoidal"
and "double_sigmoidal"
options, then the choice will be made based on the AIC scores of those models and value of threshold_bonus_sigmoidal_AIC
. If sigmoidalAIC + threshold_bonus_sigmoidal_AIC < doublesigmoidalAIC
, then the data cannot be labeled with "double_sigmoidal"
. If sigmoidalAIC + threshold_bonus_sigmoidal_AIC > doublesigmoidalAIC
, then the data cannot be labeled with "sigmoidal"
.The only option that is left at this point will be the label of the data and thus the final decision.