Please, copy-and-paste the code below to an R script file in RStudio: `File > New File > R Script`

. See installing the `qtlpoly`

package section if you have not installed it yet.

```
## Load package and data
library(qtlpoly)
data("genoprob4x")
length(genoprob4x)
data("pheno4x")
head(pheno4x)
## Read data to an object
data <- read_data(ploidy = 4, geno.prob = genoprob4x, pheno = pheno4x, step = 1)
print(data, detailed = TRUE)
## Detect QTL. One needs to run the score-based resampling method before
## proceding. Change pheno.col = c(1:3) and increase n.clusters if you
## believe your computer can handle it
sig.fwd <- 0.0011493379 # 20% genome-wide significance level
sig.bwd <- 0.0002284465 # 5% genome-wide significance level
remim.mod <- remim(data = data, pheno.col = 1, w.size = 15, sig.fwd = sig.fwd,
sig.bwd = sig.bwd, d.sint = 1.5, n.clusters = 1)
print(remim.mod)
plot_profile(data = data, model = remim.mod, grid = TRUE)
plot_sint(data = data, model = remim.mod)
## Fit final QTL model
fitted.mod <- fit_model(data = data, model = remim.mod)
summary(fitted.mod)
plot_qtl(data = data, model = remim.mod, fitted = fitted.mod)
## Estimate allele effects
est.effects <- qtl_effects(ploidy = 4, fitted = fitted.mod)
plot(est.effects, p1 = "Atlantic", p2 = "B1829-5")
## Predict QTL-based breeding values
y.hat <- breeding_values(data = data, fitted = fitted.mod)
plot(y.hat)
## Run FEIM for comparison. One needs to run permutation before proceding
sig.lod <- c(5.68, 5.78, 5.6) # 5% genome-wide significance level
feim.mod <- feim(data = data, w.size = 15, sig.lod = sig.lod)
print(feim.mod)
plot_profile(data = data, model = feim.mod, grid = TRUE)
```

Quantitative trait loci (QTL) mapping studies main goal is to investigate the genetic architecture of traits of interest. Single- and multiple-QTL models have been available for inbred, diploid mapping populations for quite some time now (see Da Costa E Silva and Zeng (2010) for a comprehensive review). However, only recently these methods became available for outbred, polyploid mapping populations.

The R package `qtlpoly`

(v. 0.2.1) is an under development software to map multiple QTL in full-sib families of outcrossing, autopolyploid species (G. da S. Pereira, Gemenet, et al. 2020). It is based on the partition of the phenotypic variance (\(\sigma^2_p\)) into variance due to QTL (\(\sigma^2_q\)) in addition to the residual variance (\(\sigma^2_e\)) as follows:

\[\sigma^2_p = \sum_{q=1}^Q \boldsymbol{G}^{(i,i')}_q\sigma^2_q + \sigma^2_{e}\]

where \(\boldsymbol{G}^{(i,i')}_q\) is the additive relationship between full-sibs \(i\) and \(i'\), whose computation is based on the genotype conditional probabilities of QTL \(q\). See how to estimate these genotype probabilities using `mappoly`

in Mollinari et al. (2020).

Because we only need to estimate one parameter per QTL (the very variance component associated with it), it is relatively easy to look for additional QTL and add them to the variance component model, without ending up with an overparameterized model. A multiple-QTL model is known to have increased power when compared to a single-QTL model, with ability to detect minor or separate linked QTL (G. da S. Pereira, Gemenet, et al. 2020).

Variance components associated with putative QTL (\(\sigma^2_q\)) are tested using score statistics from the R package `varComp`

(v. 0.2-0) (Qu, Guennel, and Marshall 2013). Final models are fitted using residual maximum likelihood (REML) from the R package `sommer`

(v. 3.6) (Covarrubias-Pazaran 2016). Plots for visualizing the results are based on `ggplot2`

(v. 3.1.0) (Wickham 2016).

This tutorial was first developed for the Polyploid Tools Training Workshop (December 12-15, 2021), and tested in R version 4.0.3 (2020-10-10) running on either Ubuntu 18.04.1 LTS (64-bit) or Windows 10.

`qtlpoly`

packageIn order to run this tutorial, you will need to install the `qtlpoly`

package, which is available at GitHub. You can install all needed packages within R using the commands below:

```
install.packages("devtools")
devtools::install_url("https://cran.r-project.org/src/contrib/Archive/SPA3G/SPA3G_1.0.tar.gz")
devtools::install_url("https://cran.r-project.org/src/contrib/Archive/varComp/varComp_0.2-0.tar.gz")
devtools::install_version("sommer", version = "3.6", repos = "http://cran.us.r-project.org",
upgrade = FALSE)
devtools::install_github("guilherme-pereira/qtlpoly", upgrade = FALSE)
```

If you have installed `qtlpoly`

before, please make sure to run the last line of the installation process again, so that you will have all the last updates. Then, use the function `library()`

– or `require()`

– to load the package:

A cross between two potato (*Solanum tuberosum*, 2*n* = 4*x* = 48) cultivars, ‘Atlantic’ \(\times\) ‘B1829-5’, resulted in 156 full-sibs. The population has been phenotyped (4-year evaluation) and genotyped (8k SNP array), and analyses have been performed to call SNP dosage, build a genetic map and map QTL (G. da S. Pereira, Mollinari, et al. 2020).

For brevity’s sake, we have selected three phenotypes (foliage maturity evaluated in years 2007, 2008 and 2014, i.e. FM07, FM08 and FM14) and three linkage groups (LGs, namely 1, 5 and 7) for this demo. Foliage maturity is measured as the “area under the curve” of foliage color along the plant cycle. All analyses can be found at this GitHub page, though.

Use the function `data()`

to preload the genotype probabilities computed in the LGs 1, 5 and 7 (now called 1, 2 and 3, respectively) as well as the phenotypic data:

The `genoprob4x`

object contains the genotype probability along the LGs for each individual, reflecting all recombination events. For example, individuals 1 and 2 show the following patterns of genotype probabilities, and so will every other individual:

```
par(mfrow = c(3, 2), pty = "s", mar = c(1, 1, 1, 1))
for (i in 1:3) {
image(t(genoprob4x[[i]]$probs[, , 1]), axes = FALSE, ylab = paste("LG",
i), main = "Ind. 1")
image(t(genoprob4x[[i]]$probs[, , 2]), axes = FALSE, ylab = paste("LG",
i), main = "Ind. 2")
}
```

The `pheno4x`

object contains the phenotypic values of foliage maturity for years 2007, 2008 and 2014:

```
library(ggplot2)
ggplot() + geom_boxplot(data = stack(as.data.frame(pheno4x)), aes(x = ind, y = values,
color = ind)) + xlab("Trait") + theme(legend.position = "none")
```

In a region where a QTL exists, the more alleles a pair of individuals share, the more their phenotypes will look alike. In fact, this is the basis of QTL detection.

The function `read_data()`

reads both ‘genoprob’ and ‘pheno’ objects, together with other relevant information like the ploidy level (`ploidy = 4`

), and the step size (`step = 1`

) in which tests will be performed. A `print`

function provides detailed information on the data:

```
data <- read_data(ploidy = 4, geno.prob = genoprob4x, pheno = pheno4x, step = 1)
## Reading the following data:
## Ploidy level: 4
## No. individuals: 156
## No. linkage groups: 3
## Step size: 1 cM
## Map size: 438.17 cM (443 positions)
## No. phenotypes: 3
print(data, detailed = TRUE)
## This is an object of class 'qtlpoly.data'
## Ploidy level: 4
## No. individuals: 156
## No. linkage groups: 3
## Step size: 1 cM
## Map size: 438.17 cM (443 positions)
## LG 1: 205.88 cM (207 positions)
## LG 2: 106.2 cM (108 positions)
## LG 3: 126.09 cM (128 positions)
## No. phenotypes: 3
## Trait 1: 'FM07' (141 individuals)
## Trait 2: 'FM08' (143 individuals)
## Trait 3: 'FM14' (140 individuals)
```

A algorithm proposed for QTL model selection (Kao, Zeng, and Teasdale 1999) is available within the software, and it can be described as follows:

- Null model: for each trait, a model starts with no QTL \[\sigma^2_p = \sigma^2_{e}\]
- Forward search: QTL (\(q = 1, \dots, Q\)) are added one at a time, conditional to the one(s) (if any) already in the model, under a less stringent genome-wide significance level (e.g., \(\alpha < 0.20\)) \[\sigma^2_p = \sum_{q=1}^Q \boldsymbol{G}^{(i,i')}_q \sigma^2_q + \sigma^2_{e}\]
- Model optimization: each QTL \(r\) is tested again conditional to the remaining one(s) in the model under a more stringent genome-wide significance level (e.g., \(\alpha < 0.05\)) \[\sigma^2_p = \boldsymbol{G}^{(i,i')}_r \sigma^2_r + \sum_{q \neq r} \boldsymbol{G}^{(i,i')}_q \sigma^2_q + \sigma^2_{e}\] Steps 1 and 2 iterate until no more QTL can be added to or dropped from the model, and positions of the remaining QTL do not change. After the first model optimization round, the following forward searches use the more stringent threshold (e.g., \(\alpha < 0.05\)).
- QTL profiling: score statistics for the whole genome are updated conditional to the final set of selected QTL.

Rather than guessing pointwise significance levels for declaring QTL, you can use the score-based resampling method to assess the genome-wide significance level (Zou et al. 2004). This method is relatively intensive, because it involves score statistics computation for every position in the map repeated 1,000 times (resampling):

```
## data.sim <- simulate_qtl(data = data, mu = 0, h2.qtl = NULL, var.error = 1,
## n.sim = 1000, missing = TRUE, seed = 123)
## score.null <- null_model(data = data.sim$results, n.clusters = 6, plot = NULL)
min.pvl <- numeric(length(score.null$results))
for (p in 1:length(score.null$results)) {
min.pvl[p] <- score.null$results[[p]]$pval[which.max(score.null$results[[p]]$stat)]
}
quantile(sort(min.pvl), c(0.2, 0.05))
## 20% 5%
## 0.0011493379 0.0002284465
```

Therefore, we ran it in advance and learned that genome-wide significance levels of \(\alpha=0.20\) and \(\alpha=0.05\) match \(P < 0.0011493\) and \(P < 2.2844648\times 10^{-4}\), respectively, which will be used next. Remember you are supposed to run the resampling method for all the 12 linkage groups together (for our potato example).

Now, we are to use the `remim`

function for building our multiple QTL model. One should include the `data`

object, the window size (`w.size = 15`

), the value \(d\) that is decreased from the log of \(P\)-value to compute the support interval (`d.sint = 1.5`

for ~95%), and the number of cores to be used (`n.clusters = 6`

).

In case you have computed the ‘score.null’ object, both forward search (`sig.fwd = 0.20`

) and backward elimination (`sig.bwd = 0.05`

) will reflect the desired genome-wide significance levels (\(\alpha\)):

```
remim.mod <- remim(data = data, w.size = 15, score.null = score.null, sig.fwd = 0.2,
sig.bwd = 0.05, d.sint = 1.5, n.clusters = 6)
```

Otherwise, you can just include the pointwise significance level computed before based on our resampling method, which is the option we are going to use here:

```
remim.mod <- remim(data = data, w.size = 15, sig.fwd = 0.0011493379, sig.bwd = 0.0002284465,
d.sint = 1.5, n.clusters = 6)
## INFO: Using 6 CPUs for calculation
##
## REMIM for trait 1 'FM07'
## QTL was found on LG 2 at 26 cM (position number 234)
## QTL was found on LG 3 at 66 cM (position number 382)
## No more QTL were found. A putative QTL on LG 1 at 8 cM (position number 9) did not reach the threshold; its p-value was 0.04462
## Refining QTL positions ... 234 ... 382
## Profiling QTL ... 234 ... 382
## Calculation took 265.83 seconds
##
## REMIM for trait 2 'FM08'
## QTL was found on LG 2 at 27 cM (position number 235)
## QTL was found on LG 3 at 66 cM (position number 382)
## QTL was found on LG 1 at 9 cM (position number 10)
## No more QTL were found. A putative QTL on LG 3 at 110 cM (position number 426) did not reach the threshold; its p-value was 0.00419
## Refining QTL positions ... 236 ... 381 ... 10
## Profiling QTL ... 10 ... 235 ... 381
## Calculation took 462.09 seconds
##
## REMIM for trait 3 'FM14'
## QTL was found on LG 2 at 26 cM (position number 234)
## QTL was found on LG 3 at 66 cM (position number 382)
## No more QTL were found. A putative QTL on LG 3 at 2 cM (position number 318) did not reach the threshold; its p-value was 0.08978
## Refining QTL positions ... 226 ... 382
## Profiling QTL ... 226 ... 382
## Calculation took 186.64 seconds
```

Use `print()`

and a summary table for each trait will be shown:

```
print(remim.mod)
## This is an object of class 'qtlpoly.remim'
##
## * Trait 1 'FM07'
## LG Pos Nmrk Mrk Score Pval
## 1 2 26 234 loc_26 355.99 <2.22e-16
## 2 3 66 382 loc_66 167.43 8.98e-07
##
## * Trait 2 'FM08'
## LG Pos Nmrk Mrk Score Pval
## 1 1 9 10 loc_9 101.58 2.16e-04
## 2 2 27 235 loc_27 425.04 <2.22e-16
## 3 3 65 381 loc_65 104.85 2.00e-04
##
## * Trait 3 'FM14'
## LG Pos Nmrk Mrk Score Pval
## 1 2 18 226 loc_18 197.01 9.77e-08
## 2 3 66 382 loc_66 138.45 1.57e-05
```

Since support intervals have been calculated, you can print them as well by specifying the `sint`

argument properly:

```
print(remim.mod, sint = "lower")
## This is an object of class 'qtlpoly.remim'
##
## * Trait 1 'FM07'
## LG Pos_lower Nmrk_lower Mrk_lower Score_lower Pval_lower
## 1 2 18 226 loc_18 293.50 <2.22e-16
## 2 3 52 368 loc_52 129.17 2.50e-05
##
## * Trait 2 'FM08'
## LG Pos_lower Nmrk_lower Mrk_lower Score_lower Pval_lower
## 1 1 0 1 c2_51460 74.70 1.96e-03
## 2 2 0 208 c2_23776 283.91 <2.22e-16
## 3 3 35 351 loc_35 64.98 5.74e-03
##
## * Trait 3 'FM14'
## LG Pos_lower Nmrk_lower Mrk_lower Score_lower Pval_lower
## 1 2 0 208 c2_23776 170.29 5.87e-07
## 2 3 54 370 loc_54 97.91 4.43e-04
```

```
print(remim.mod, sint = "upper")
## This is an object of class 'qtlpoly.remim'
##
## * Trait 1 'FM07'
## LG Pos_upper Nmrk_upper Mrk_upper Score_upper Pval_upper
## 1 2 29 237 loc_29 321.09 <2.22e-16
## 2 3 70 386 loc_70 146.96 6.68e-06
##
## * Trait 2 'FM08'
## LG Pos_upper Nmrk_upper Mrk_upper Score_upper Pval_upper
## 1 1 32 33 loc_32 67.51 4.56e-03
## 2 2 41 249 loc_41 308.78 <2.22e-16
## 3 3 71 387 loc_71 69.15 3.99e-03
##
## * Trait 3 'FM14'
## LG Pos_upper Nmrk_upper Mrk_upper Score_upper Pval_upper
## 1 2 29 237 loc_29 169.67 2.37e-06
## 2 3 70 386 loc_70 109.05 1.82e-04
```

Given the final profiled models, you can plot either individual or joint \(LOP\) profiles using the function `plot_profile()`

. Triangles show where the mapped QTL peaks are located:

The argument `grid`

organizes the multiple plots as a grid if `TRUE`

, or superimposed profiles if `FALSE`

.

You can visualize the QTL distributed along the linkage map, together with their support intervals using the function `plot_sint()`

:

Once final models have been defined, one may use REML to estimate their parameters by using the function `fit_model()`

:

```
fitted.mod <- fit_model(data = data, model = remim.mod)
## There are 2 QTL in the model for trait 1 'FM07'. Fitting model... Done!
##
## There are 3 QTL in the model for trait 2 'FM08'. Fitting model... Done!
##
## There are 2 QTL in the model for trait 3 'FM14'. Fitting model... Done!
```

A `summary()`

function shows parameter estimates together with the QTL heritabilities computed from the variance component estimates:

```
summary(fitted.mod)
## This is an object of class 'qtlpoly.fitted'
##
## * Trait 1 'FM07'
## LG Pos Nmrk Intercept Var(g) Var(e) h2
## 1 2 26 234 NA 336.2883 NA 0.5769673
## 2 3 66 382 NA 72.76961 NA 0.1248503
## 3 NA NA NA 73.17467 NA 173.7972 0.7018176
##
## * Trait 2 'FM08'
## LG Pos Nmrk Intercept Var(g) Var(e) h2
## 1 1 9 10 NA 117.4037 NA 0.07430268
## 2 2 27 235 NA 1010.584 NA 0.6395801
## 3 3 65 381 NA 112.6918 NA 0.07132057
## 4 NA NA NA 79.20089 NA 339.3945 0.7852034
##
## * Trait 3 'FM14'
## LG Pos Nmrk Intercept Var(g) Var(e) h2
## 1 2 18 226 NA 7.123422 NA 0.2982218
## 2 3 66 382 NA 4.006484 NA 0.1677313
## 3 NA NA NA 90.51603 NA 12.75642 0.465953
```

Another way of visualization you can use is the one provided by the function `plot_qtl()`

:

Dots are located on the respective LG positions of the QTL peaks. Size of the dots corresponds to the specific QTL heritability. Color of the dots corresponds to the \(P\)-values, and helps to identify the most or the less significant QTL.

Additive effects contributing to the overall mean by each allele individually as well as their combinations within each parent may be computed using the function `qtl_effects()`

as follows:

```
est.effects <- qtl_effects(ploidy = 4, fitted = fitted.mod)
## There are 2 QTL in the model for trait 1 'FM07'. Computing effects for QTL ... 234 ... 382. Done!
##
## There are 3 QTL in the model for trait 2 'FM08'. Computing effects for QTL ... 10 ... 235 ... 381. Done!
##
## There are 2 QTL in the model for trait 3 'FM14'. Computing effects for QTL ... 226 ... 382. Done!
```

A `plot()`

function allows the user to visualize these contributions graphically: