This document demonstrates how to simulate genetic values for multiple traits in multiple environments based on an unstructured model for genotype-by-environment (GxE) interaction. While the simulation of genetic values is not directly implemented in ‘FieldSimR’, the package provides wrapper functions to facilitate the simulation of genetic values for multi-environment field trial settings employing the R package ‘AlphaSimR’. The two wrapper functions to simulate genetic values based on an unstructured model for GxE interaction are:
unstr_asr_input()
unstr_asr_output()
Note: ‘FieldSimR’ also provides wrapper functions to enable simulation of genetic values using a compound symmetry model for GxE interaction. This is demonstrated in the vignette on the Simulation of genetic values based on a compound symmetry model for genotype-by-environment (GxE) interaction.
The core function of ‘FieldSimR’ generates plot errors comprising 1) a spatially correlated error term, 2) a random error term, and 3) an extraneous error term. Spatially correlated errors are simulated using either bivariate interpolation, or a two-dimensional autoregressive process of order one (AR1:AR1). Through the combination of the plot errors with (simulated) genetic values, ‘FieldSimR’ enables simulation of multi-environment plant breeding trials at the plot. This is demonstrated in the vignette on the Simulation of plot errors and phenotypes in a plant breeding field trial.
We conceive a scenario in which 100 maize hybrids are measured for grain yield (t/ha) and plant height (cm) in three environments. The first and the third location include two replicates, and the third location includes three replicates.
The simulation process comprises three steps:
To provide a comprehensive overview of the unstructured modelling approach for GxE interaction, we assume additive and dominance gene action for both grain yield and plant height. Details on how ‘AlphaSimR’ simulates additive and non-additive biological effects can be found in the “Traits in AlphaSimR” vignette.
It should be noted, however, that a simple additive genetic model will be sufficient to answer most experimental questions and may be preferred to more complex models, especially if data to tune the simulation model is not available and the parameters are unknown.
First, we set the number of traits, the number of environments (e.g., locations), and the number of replicates tested within environments. We also define the number of genotypes in our founder population to be simulated, the number of chromosomes, and the number of segregating sites (biallelic QTN) per chromosome. Then, we set the additive genetic parameters, the dominance parameters, and the genetic correlation structures required to simulate the two traits in three environments based on an unstructured model for GxE interaction.
We create a founder population of 20 heterozygous genotypes. These founder genotypes will then be split into two heterotic pools, and one doubled haploid (DH) line will be produced from each founder. To generate hybrids, the two pools will be crossed using a factorial design.
ntraits <- 2 # Number of traits.
nenvs <- 3 # Number of environments.
nreps <- c(2, 2, 3) # Number of replicates tested within environments 1, 2 and 3.
nind <- 20 # Number of founder genotypes in the population.
nchr <- 10 # Number of chromosomes.
nseg_sites <- 200 # Number of QTN per chromosome.
We define mean additive genetic values for all trait x environment combinations. The additive mean values are provided in a single vector with environments nested within traits. Grain yield is measured in tons per hectare (t/ha) and plant height is measured in centimetres (cm).
mean <- c(4.9, 5.4, 5.1, 235.2, 228.5, 239.1) # c(Yld:E1, Yld:E2, Yld:E3, Prt:E1, Prt:E2, Prt:E3)
The genetic variances for the six trait x environment combinations can be set in two different ways.
mean
).T_var
) and a second vector of genetic variances for the
three environments (E_var
), assuming a separable
structure.We choose the first option and provide a single vector of variances.
var <- c(0.085, 0.12, 0.06, 15.1, 8.5, 11.7) # c(Yld:E1, Yld:E2, Yld:E3, Pht:E1, Pht:E2, Pht:E3)
We assume mean dominance degrees of 0.4 for grain yield and 0.1 for plant height in all three environments, respectively. The dominance degree variance is set to 0.2 for both traits in the three environments.
meanDD <- c(0.4, 0.4, 0.4, 0.1, 0.1, 0.1) # c(Yld:E1, Yld:E2, Yld:E3, Pht:E1, Pht:E2, Pht:E3)
varDD <- 0.2 # Same value set for all environment-within-trait combinations
Additive and non-additive genetic correlations between all trait x environment combinations can be defined in two ways:
corA
).TcorA
) and a matrix of genetic correlations between
environments (EcorA
), assuming a separable correlation
structure.We choose the second option and assume a separable additive genetic correlation structure. This simplifies the generation of the additive correlation matrix, especially when the number of environments and/or traits is high.
TcorA <- matrix( # Matrix of additive genetic correlations between the two traits.
c(
1.0, 0.6,
0.6, 1.0
),
ncol = 2
)
#> [,1] [,2]
#> [1,] 1.0 0.6
#> [2,] 0.6 1.0
EcorA <- matrix(
c( # Matrix of additive genetic correlations between the three environments.
1.0, 0.4, 0.6,
0.4, 1.0, 0.5,
0.6, 0.5, 1.0
),
ncol = 3
)
#> [,1] [,2] [,3]
#> [1,] 1.0 0.4 0.6
#> [2,] 0.4 1.0 0.5
#> [3,] 0.6 0.5 1.0
Alternatively, the function rand_cor_mat()
provides a
quick means to generate random correlation matrices. The correlations
are sampled within a defined range of values.
corA <- rand_cor_mat( # Additive genetic correlation structure.
(ntraits * nenvs), # Could be used instead of TcorA and EcorA.
min.cor = 0.1,
max.cor = 0.9,
pos.def = TRUE
)
round(corA, 2)
#> 1 2 3 4 5 6
#> 1 1.00 0.20 0.80 0.74 0.74 0.38
#> 2 0.20 1.00 0.34 0.29 0.54 0.69
#> 3 0.80 0.34 1.00 0.27 0.75 0.57
#> 4 0.74 0.29 0.27 1.00 0.45 0.30
#> 5 0.74 0.54 0.75 0.45 1.00 0.24
#> 6 0.38 0.69 0.57 0.30 0.24 1.00
Furthermore, we assume independence between the dominance degrees of the six trait-by-environment combinations. Therefore, we generate a 6x6 diagonal matrix (although this is not strictly necessary. A diagonal matrix is constructed by default if no correlation matrix is provided).
corDD <- diag(6)
#> [,1] [,2] [,3] [,4] [,5] [,6]
#> [1,] 1 0 0 0 0 0
#> [2,] 0 1 0 0 0 0
#> [3,] 0 0 1 0 0 0
#> [4,] 0 0 0 1 0 0
#> [5,] 0 0 0 0 1 0
#> [6,] 0 0 0 0 0 1
Once we have defined all simulation parameters, we use the function
unstr_asr_input()
to prepare them into a list that is used
with ‘AlphaSimR’ to
simulate correlated genetic values based on an unstructured model for
GxE interaction.
input_asr <- unstr_asr_input(
ntraits = ntraits,
nenvs = nenvs,
mean = mean,
var = var,
TcorA = TcorA,
EcorA = EcorA,
meanDD = meanDD,
varDD = varDD,
corDD = corDD
)
Note: The object input_asr
should not
be modified and must be used directly with ‘AlphaSimR’ as
demonstrated below.
Our list of simulation parameters input_asr
is now used
with ‘AlphaSimR’ to
simulate correlated genetic values for 100 maize hybrid genotypes tested
for two traits in three environments based on an unstructured model for
GxE interaction.
First, we simulate a population of 20 heterozygous maize
founder genotypes using the function runMacs
in ‘AlphaSimR’.
founders <- runMacs( # Simulation of founder genotypes using AlphaSimR's "MAIZE" presets
nInd = nind, # to mimic the species' evolutionary history.
nChr = nchr,
segSites = nseg_sites,
inbred = FALSE,
species = "MAIZE",
nThreads = 2
)
SP <- SimParam$new(founders)
Then, we use the simulation parameters stored in
input_asr
to simulate correlated genetic values for
grain yield and plant height in three testing environments.
SP$addTraitAD( # Additive + dominance trait simulation.
nQtlPerChr = nseg_sites,
mean = input_asr$mean,
var = input_asr$var,
corA = input_asr$corA,
meanDD = input_asr$meanDD,
varDD = input_asr$varDD,
corDD = input_asr$corDD,
useVarA = FALSE
)
founders <- newPop(founders)
We now split the simulated founders into two heterotic pools A and B. We create one DH line per founder, which gives us 10 DH lines per heterotic pool. Hybrids are then generated by crossing pool A and pool B in a factorial manner (all pairwise combinations), resulting in 100 hybrid genotypes
pool_A <- makeDH(founders[1:10], nDH = 1) # Pool A: 1 DH line from founders 1 to 10, respectively.
pool_B <- makeDH(founders[11:20], nDH = 1) # Pool B: 1 DH line from founders 11 to 20, respectively.
dh_lines <- mergePops(list(pool_A, pool_B))
factorial_plan <- as.matrix(expand.grid(A = pool_A@id, B = pool_B@id)) # Factorial crossing plan.
hybrid_pop <- makeCross(pop = dh_lines, crossPlan = factorial_plan, nProgeny = 1) # Hybrid genotypes.
In the last step, we use the function unstr_asr_output()
to extract the simulated genetic values from the ‘AlphaSimR’
population object hybrid_pop
and store them in a data
frame.
gv_df <- unstr_asr_output(
pop = hybrid_pop,
ntraits = ntraits,
nenvs = nenvs,
nreps = nreps
)
#> env id rep gv.Trait1 gv.Trait2
#> 1 1 41 1 5.077413 237.2122
#> 2 1 42 1 5.049010 236.4975
#> 3 1 43 1 4.594107 232.5589
#> 4 1 44 1 4.462159 233.5065
#> 5 1 45 1 4.934760 240.1853
#> 6 1 46 1 4.775406 238.1134
Histogram showing the genetic values of the 100 maize hybrids for grain yield in the three environments
ggplot(gv_df, aes(x = gv.Trait1, fill = factor(env))) +
geom_histogram(color = "#e9ecef", alpha = 0.8, position = "identity", bins = 50) +
scale_fill_manual(values = c("violetred3", "goldenrod3", "skyblue2")) +
labs(x = "Genetic values for grain yield (t/ha)", y = "Count", fill = "Environment")
The simulated genetic values for grain yield and plant height measured at three environments can now be combined with plot errors to generate plant breeding trial phenotype plot data. We provide an example on how to do that in the vignette Simulation of plot errors and phenotypes in a plant breeding field trial.