# Introduction

The R package qtlpoly (v. 0.2.1) (Pereira et al. 2020) is an under development software to map multiple quantitative trait loci (QTL) in full-sib families of outcrossing autopolyploid species. It is based on the following random-effect model:

$\boldsymbol{y} = \boldsymbol{1}\mu + \sum_{q=1}^Q \boldsymbol{g}_q + \boldsymbol{e}$

where the vector of phenotypic values from a specific trait $$\boldsymbol{y}$$ is a function of the fixed intercept $$\mu$$, the $$q = 1, \dots, Q$$ random QTL effects $$\boldsymbol{g}_q \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{G}_q\sigma^2_q)$$, and the random environmental error $$\boldsymbol{e} \sim \mathcal{N}(\boldsymbol{0}, \boldsymbol{I}\sigma^2)$$. $$\boldsymbol{G}_q$$ is the additive relationship matrix between full-sibs derived from the genotype conditional probabilities of QTL $$q$$. See model details in Pereira et al. (2020).

Variance components associated with putative QTL ($$\sigma^2_q$$) are tested using score statistics from the R package varComp (v. 0.2-0) (Qu, Guennel, and Marshall 2013). Final models are fitted using residual maximum likelihood (REML) from the R package sommer (v. 3.6) (Covarrubias-Pazaran 2016). Plots for visualizing the results are based on ggplot2 (v. 3.1.0) (Wickham 2016).

This tutorial used R version 4.0.3 (2020-10-10) running on either Ubuntu 18.04.1 LTS (64-bit) or Windows 10.

## Install and load the qtlpoly package and data

As mentioned, the package qtlpoly depends on a couple of functions from sommer (v. 3.6) and varComp (v. 0.2-0). varComp has been archived from CRAN, while sommer has been constantly updated. In order to avoid conflict with updates in functions and object structures, we decided to stick with an earlier sommer version (v. 3.6), so you will need to downgrade if you have installed the most recent version of sommer.

qtlpoly package is available at GitHub. You can install all needed packages within R using the functions from the R package devtools:

> install.packages(c("devtools", "data.table"))
> devtools::install_url("https://cran.r-project.org/src/contrib/Archive/SPA3G/SPA3G_1.0.tar.gz")
> devtools::install_url("https://cran.r-project.org/src/contrib/Archive/varComp/varComp_0.2-0.tar.gz")
> devtools::install_version("sommer", version = "3.6", repos = "http://cran.us.r-project.org",
+     upgrade = FALSE)
> devtools::install_github("guilherme-pereira/qtlpoly", upgrade = FALSE)

Then, use the function library() – or require() – to load the package, and the function data() to preload the simulated linkage map and phenotypic data that comes with the package:

> library(qtlpoly)
> data(maps6x)
> data(pheno6x)

These simulated data were made simpler, so that one could run the functions along with this tutorial in a regular personal computer (with 4 cores and 6 GB of RAM, minimum). For real data analyses, you may need to run an R script in a cluster with more cores and RAM, though. In general, the computational needs depend on ploidy level, population size and number of markers.

The data was based on a full-sib family ($$N = 300$$) simulated from a cross between two parents from a putative autohexaploid species with three chromosomes. The simulated linkage map consists of three linkage groups (LGs) with 1,310 markers, spanning 274.76 centiMorgans (cM) in total. Please see details on the map construction for this simulated cross using mappoly here.

The phenotypic data contains three traits, and it is a simple table with traits in columns and individuals in rows. For example:

> head(pheno)
##              T32        T17        T45
## Ind_1 -0.1698446 -0.9332320 -1.2259338
## Ind_2  2.5319356  0.1997378 -1.8004184
## Ind_3  1.3669074  1.0584794 -0.7980037
## Ind_4  0.7955652 -1.7186921  1.5834176
## Ind_5  1.3168502 -0.7119421  1.3099067
## Ind_6 -0.8778211 -0.2339543 -0.6323779

Trait means were simulated as zero, i.e. $$\mu = 0$$ . Environmental variances were simulated from a standard normal distribution, i.e. $$\sigma^2_e = 1$$. QTL were simulated as described below:

• ‘T32’: 3 QTL, with heritabilities of 0.20 (LG 1 at 32.03 cM), 0.15 (LG 1 at 95.02 cM) and 0.30 (LG 2 at 40.01 cM), i.e. $$\sigma^2_q = \{0.60, 0.45, 0.80\}$$;
• ‘T17’: 1 QTL, with heritability of 0.17 (LG 3 at 34.51 cM), i.e. $$\sigma^2_q = 0.20$$;
• ‘T45’: no QTL.

# Conditional probabilities

Our qtlpoly package is fully integrated with mappoly (Mollinari and Garcia 2019), so that if you have a map built using another software, you will need to re-estimate it using functions from mappoly – or, at least, mimic the ‘mappoly.map’ object structure (good luck with that!). Please refer to its tutorial in order to learn more about the software.

To install mappoly, simply use the following command:

> devtools::install_github("mmollina/MAPpoly")

mappoly uses hidden Markov model (HMM) adapted from Lander and Green (1987) to autopolyploids to estimate the genotype conditional probabilities of putative QTL given an ordered set of markers comprising a linkage map (Mollinari and Garcia 2019). These conditional probabilities are ultimately used for QTL mapping by qtlpoly.

To compute the conditional probabilities for each linkage group, one should use the function calc_genoprob() from mappoly:

> library(mappoly)
> genoprob <- lapply(maps, calc_genoprob)
##  Ploidy level: 6
##  Number of markers: 538
##  Number of individuals: 300
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##
##  Ploidy level: 6
##  Number of markers: 329
##  Number of individuals: 300
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##
##  Ploidy level: 6
##  Number of markers: 443
##  Number of individuals: 300
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  ..................................................
##  

# Prepare data

Once the conditional probabilities have been calculated, you can use the function read_data() to read both ‘genoprob’ and ‘pheno’ objects. Row names (individuals) on ‘pheno’ must match the individual names from the ‘genoprob’ object. The argument ploidy = 6 informs the ploidy level of the cross, i.e. a hexaploid cross in this case. Finally, step = 1 provides the step size, so that only positions at every 1 cM (approximately) will be tested:

> data <- read_data(ploidy = 6, geno.prob = genoprob, pheno = pheno, step = 1)
## Reading the following data:
##   Ploidy level:       6
##   No. individuals:    300
##   No. linkage groups: 3
##   Step size:          1 cM
##   Map size:           274.76 cM (263 positions)
##   No. phenotypes:     3

If you want to see detailed information on the map (number of positions per linkage group) and the phenotypes (number of individuals with non-missing values per trait), use a print() function with detailed = TRUE:

> print(data, detailed = TRUE)
## This is an object of class 'qtlpoly.data'
##   Ploidy level:       6
##   No. individuals:    300
##   No. linkage groups: 3
##   Step size:          1 cM
##   Map size:           274.76 cM (263 positions)
##     LG 1: 116.39 cM (112 positions)
##     LG 2: 55.41 cM (52 positions)
##     LG 3: 102.95 cM (99 positions)
##   No. phenotypes:     3
##     Trait 1: 'T32' (300 individuals)
##     Trait 2: 'T17' (300 individuals)
##     Trait 3: 'T45' (300 individuals)

Notice that regardless of the number of markers per linkage group, we will rather evaluate only a limited number of positions, given the step argument (in our example, every 1 cM). Because of the quite large linkage disequilibrium blocks in mapping populations, specially in full-sib families with relatively small sample sizes (let us say, $$N < 300$$), there is no need to test every single marker in the map. Moreover, with such marker density (e.g., 5+ markers/cM), a lot of markers will contain redundant information for QTL mapping purposes anyway. step = NULL will allow to test every single marker in a high density map if one wants to do so, but be sure that computational time (and needed RAM for loading ‘qtlpoly.data’ object) will be hugely increased.

# Perform QTL detection

Building a multiple QTL model is considered a model selection problem, and there are several ways to approach it. Here, we adapted the algorithm proposed by Kao, Zeng, and Teasdale (1999) for fixed-effect multiple interval mapping (MIM) for diploids to our random-effect MIM (REMIM) for polyploids, which is summarized as follows:

1. Null model: for each trait, a model starts with no QTL $\boldsymbol{y} = \boldsymbol{1}\mu + \boldsymbol{e}$
2. Forward search: QTL ($$q = 1, \dots, Q$$) are added one at a time, conditional to the one(s) (if any) already in the model, under a less stringent threshold (e.g., $$P < 0.01$$) $\boldsymbol{y} = \boldsymbol{1}\mu + \sum_{q=1}^Q \boldsymbol{g}_q + \boldsymbol{e}$
3. Model optimization: each QTL $$r$$ is tested again conditional to the remaining one(s) in the model under a more stringent threshold (e.g., $$P < 10^{-4}$$) $\boldsymbol{y} = \boldsymbol{1}\mu + \boldsymbol{g}_r + \sum_{q \neq r} \boldsymbol{g}_q + \boldsymbol{e}$ Steps 1 and 2 are repeated until no more QTL can be added to or dropped from the model, and positions of the remaining QTL do not change. After the first model optimization, the following forward searches use the more stringent threshold (e.g., $$P < 10^{-4}$$) as the detection power is expected to increase once QTL have already been added to the model.
4. QTL profiling: score statistics for the whole genome are updated conditional to the final set of selected QTL.

It is worth to mention that there is no such definitive way to perform model selection. One may try to come up with alternative ways of building a multiple QTL model, some of which will be mentioned later. qtlpoly tries to provide functions flexible enough, so that the users can build a multiple QTL model on their own, manually. One strategy we have used the most and tested through simulations implements the above-mentioned algorithm of building a multiple QTL model, and will be demonstrated here.

## Add/drop QTL to/from the model

Suppose you have information (from previous work, for instance) that can help you to decide whether a QTL should be in the model or not. You can take advantage of the function modify_qtl() to either include or exclude a certain putative QTL from the model. For example, let us say we want to add a QTL to the trait 3 ‘T17’ model on LG 3 at 13.03 cM (position number 184), which has not reached the most stringent threshold when a second search was performed after (see the results from the step 2. Model optimization). We simply choose the pheno.col and the position where we want a QTL to be added, i.e. add.qtl = 184:

> modified.mod <- modify_qtl(model = profile.mod, pheno.col = 3, add.qtl = 184)
## Model modification for trait 3 'T45'; there are no QTL in the model
##   QTL was added on LG 3 at 20.01 cM (position number 184)

The function print() shows how the model has been modified:

> print(modified.mod)
## This is an object of class 'qtlpoly.modify' and it has not been profiled yet
##
## * Trait 1 'T32'
##   LG    Pos Nmrk   Mrk  Score      Pval
## 1  1  33.07   33 M_167 381.02 <2.22e-16
## 2  1 104.28  100 M_534 281.70  3.94e-08
## 3  2  41.20  152 M_903 548.00 <2.22e-16
##
## * Trait 2 'T17'
##   LG   Pos Nmrk    Mrk  Score     Pval
## 1  3 29.18  193 M_1156 264.14 1.32e-06
##
## * Trait 3 'T45'
##   LG   Pos Nmrk    Mrk  Score     Pval
## 1  3 20.01  184 M_1078 135.74 1.64e-03

Notice that since the model has not been profiled yet, statistics may change.

Now, suppose you want to drop the least significant QTL from the trait 1 ‘T32’ model. You would have to use drop.qtl = 100 instead:

> modified.mod2 <- modify_qtl(model = modified.mod, pheno.col = 1, drop.qtl = 100)
> print(modified.mod2)
## Model modification for trait 1 'T32'; there are 3 QTL in the model already
##   QTL was dropped from LG 1 at 104.28 cM (position number 100)
##
## This is an object of class 'qtlpoly.modify' and it has not been profiled yet
##
## * Trait 1 'T32'
##   LG   Pos Nmrk   Mrk  Score      Pval
## 1  1 33.07   33 M_167 381.02 <2.22e-16
## 2  2 41.20  152 M_903 548.00 <2.22e-16
##
## * Trait 2 'T17'
##   LG   Pos Nmrk    Mrk  Score     Pval
## 1  3 29.18  193 M_1156 264.14 1.32e-06
##
## * Trait 3 'T45'
##   LG   Pos Nmrk    Mrk  Score     Pval
## 1  3 20.01  184 M_1078 135.74 1.64e-03

Once modifications have been suggested, run the function profile() in order to obtain updated statistics:

> profile.mod2 <- profile_qtl(data = data, model = modified.mod2, d.sint = 1.5,
+     polygenes = FALSE, n.clusters = 4, plot = "profile2")
## INFO: Using 4 CPUs for calculation
##
## QTL profile for trait 1 'T32'; there are 2 QTL in the model
##   Profiling QTL ... 37 ... 152
##   Calculation took 409.78 seconds
##
## QTL profile for trait 2 'T17'; there is 1 QTL in the model
##   Profiling QTL ... 193
##   Calculation took 194.8 seconds
##
## QTL profile for trait 3 'T45'; there is 1 QTL in the model
##   Profiling QTL ... 184
##   Calculation took 194.95 seconds

Using print(), notice how positions and statistics may have changed:

> print(profile.mod2)
## This is an object of class 'qtlpoly.profile'
##
## * Trait 1 'T32'
##   LG   Pos Nmrk   Mrk  Score      Pval
## 1  1 37.21   37 M_180 430.03 <2.22e-16
## 2  2 41.20  152 M_903 542.08 <2.22e-16
##
## * Trait 2 'T17'
##   LG   Pos Nmrk    Mrk  Score     Pval
## 1  3 29.18  193 M_1156 264.14 1.32e-06
##
## * Trait 3 'T45'
##   LG   Pos Nmrk    Mrk  Score     Pval
## 1  3 20.01  184 M_1078 135.74 1.64e-03

It is worth to mention that a modification should be considered carefully, as we may not have enough statistical support to hold it. Nevertheless, modification might be interesting to study how much of the phenotypic variance an added QTL can explain, regardless of its $$P$$-value, for example.

## Plot profiles

Given the final profiled models, you can plot either individual or joint $$LOP$$ profiles using the function plot_profile(). For individual plots, one need to specify the pheno.col argument:

> for (p in remim.mod\$pheno.col) plot_profile(data = data, model = remim.mod,
+     pheno.col = p, ylim = c(0, 10))