Multivariable MR Tutorial
Wes Spiller, Eleanor Sanderson, and Jack Bowden
31^{st} January 2021
Source:vignettes/MVMR.rmd
MVMR.rmd
Overview
Multivariable Mendelian Randomisation (MVMR) is a form of
instrumental variable analysis which estimates the direct effect of
multiple exposures on an outcome using genetic variants as instruments.
The MVMR
R package facilitates estimation of causal effects
using MVMR, as well as including a range of sensitivity analyses
evaluating the underlying assumptions of the approach. The methods
included in MVMR
originate from Sanderson, Spiller, and Bowden (2021), available
here.
Step 1: Obtain summary data
The following information is necessary to estimate causal effects using MVMR:
Geneexposure associations for each variant selected as an instrument for any exposure.
Corresponding standard errors for the geneexposure associations.
Geneoutcome associations for each instrument.
Corresponding standard errors for the geneoutcome associations.
The data frame rawdat_mvmr
, included in the
MVMR
package shows an example of such data obtained from
MRBase. When data is extracted from MRBase or using the TwoSampleMR R
package, the mrmvinput_to_mvmr_format()
function can be
used to convert the data to the MVMR
data format. This
specifically requires geneexposure associations for all included SNPs,
geneoutcome associations, and corresponding standard errors. In this
case, lowdensity lipoprotein cholesterol (LDLC), highdensity
lipoprotein cholesterol (HDLC), and triglycerides (Trg) have been
selected as exposures, while systolic blood pressure (SBP) is the
outcome of interest. Here the suffix _beta
is used to
denote association estimates, while _se
denotes standard
errors. Please note that the MVMR
can take an arbirtary
number of exposures (greater than 1), and that three exposures have been
selected purely for illustration.
The first 6 rows of rawdat_mvmr
are:
library(MVMR)
head(rawdat_mvmr)
#> LDL_beta HDL_beta Trg_beta LDL_se HDL_se Trg_se SBP_beta SBP_se
#> 1 0.0270 0.0182 0.0228 0.0046 0.0050 0.0045 0.00426935 0.00280123
#> 2 0.1179 0.0020 0.0379 0.0038 0.0042 0.0039 0.00110389 0.00227690
#> 3 0.0269 0.0079 0.0000 0.0039 0.0042 0.0038 0.01317370 0.00222743
#> 4 0.0081 0.0508 0.0085 0.0064 0.0069 0.0062 0.00111303 0.00374264
#> 5 0.0191 0.0103 0.0270 0.0034 0.0036 0.0033 0.00854986 0.00207701
#> 6 0.0043 0.0320 0.0109 0.0038 0.0040 0.0037 0.00472509 0.00237133
#> SNP
#> 1 rs10019888
#> 2 rs10468017
#> 3 rs1047891
#> 4 rs10490626
#> 5 rs10761762
#> 6 rs10832962
Note that the final column SNP
contains the rsid numbers
for each genetic variant. These are not necessary for conducting MVMR,
but assist in followup analyses. Summary data for LDLC, HDLC, and
Triglycerides originate from GLGC, while SBP data
was obtained using UK
Biobank.
Estimating pairwise covariances between SNP associations
The MVMR approach requires pairwise covariances between an instrument and pairs of exposures to be known across all SNPs for testing and sensitivity analyses. However, this is often not reported in published GWAS analyses. Before continuing with MVMR it is therefore necessary to select one of the following three solutions:

Estimate the covariance terms using individual level data
If individual level data is available from which the GWAS summary estimates were obtained, the
snpcov_mvmr()
function can be used to calculate the necessary covariance terms. Estimate the covariance terms using phenotypic correlation between exposures. an estimate of the correlation between the (phenotypic) exposures is available, the
phenocov_mvmr()
function can be used to provide an approximation for the necessary covariance terms. This function takes the phenotypic correlation between the exposures and the standard error of the SNPexposure betas as inputs.
Obtain geneexposure associations from nonoverlapping samples.
If geneexposure associations are estimated in seperate nonoverlapping samples, then the covariances will be zero by design. It is therefore not necessary to calculate the set of covariances, although this approach can be difficult to apply due to a lack of suitable sources of data.
When the necessary data are provided to the
snpcov_mvmr()
or phenocov_mvmr()
functions, a
set of covariance matrices will be produced equal to the number of SNPs
used in estimation. By saving this output as an object, it is possible
to use this information in downstream senstivity analyses and assumption
testing. As the phenocov_mvmr()
function requires
geneexposure standard errors, it can be useful to estimate the
covariance matrices after initially formatiing the data. An illustrative
example is provided in step 6, creating an object Xcovmat
using the phenocov_mvmr()
function.
Step 2: Format summary data
Downstream functions in the MVMR
package rely upon prior
formatting of raw summary data using the format_mvmr()
function. Specifically, format_mvmr()
checks and organises
summary data columns for use in MVMR analyses. The
format_mvmr()
function takes the following arguments:

BXGs
: A subset containing betacoefficient values for genetic associations with each exposure. Columns should indicate exposure number, with rows representing estimates for a given genetic variant. 
BYG
: A numeric vector of betacoefficient values for genetic associations with the outcome. 
seBXGs
: A subset containing standard errors corresponding to the subset of betacoefficientsBXGs
. 
seBYG
: A numeric vector of standard errors corresponding to the betacoefficientsBYG
. 
RSID
: A vector of names for genetic variants included in the analysis. If variant IDs are not provided (RSID = "NULL"
), a vector of ID numbers will be generated.
Using the previous data rawdat.mvmr
, we can format the
data using the following command:
F.data < format_mvmr(BXGs = rawdat_mvmr[,c(1,2,3)],
BYG = rawdat_mvmr[,7],
seBXGs = rawdat_mvmr[,c(4,5,6)],
seBYG = rawdat_mvmr[,8],
RSID = rawdat_mvmr[,9])
head(F.data)
#> SNP betaYG sebetaYG betaX1 betaX2 betaX3 sebetaX1 sebetaX2
#> 1 rs10019888 0.00426935 0.00280123 0.0270 0.0182 0.0228 0.0046 0.0050
#> 2 rs10468017 0.00110389 0.00227690 0.1179 0.0020 0.0379 0.0038 0.0042
#> 3 rs1047891 0.01317370 0.00222743 0.0269 0.0079 0.0000 0.0039 0.0042
#> 4 rs10490626 0.00111303 0.00374264 0.0081 0.0508 0.0085 0.0064 0.0069
#> 5 rs10761762 0.00854986 0.00207701 0.0191 0.0103 0.0270 0.0034 0.0036
#> 6 rs10832962 0.00472509 0.00237133 0.0043 0.0320 0.0109 0.0038 0.0040
#> sebetaX3
#> 1 0.0045
#> 2 0.0039
#> 3 0.0038
#> 4 0.0062
#> 5 0.0033
#> 6 0.0037
In the above code we have provided the numbered columns for each
argument. For example, BXGs = rawdat.mvmr[,c(1,2,3)]
indicates that columns 1, 2, and 3 are the association estimates for
exposures 1, 2, and 3. It is important to note that standard error
columns seBXGs
should be input in the same order as BXGs to
ensure the correct matching of association estimates with corresponding
standard errors.
In subsequent steps, each exposure is numbered such that
X1
, X2
, and X3
are the first,
second, and third entries in the
BXGs = rawdat.mvmr[,c(1,2,3)]
argument.
Step 3: Test for weak instruments
In univariate twosample summary MR, genetic variants selected as instruments are required to be strongly associated with their corresponding exposure. This is quantified by regressing the exposure upon each instrument, and evaluating conditional dependence using the Fstatistic for the instrument. Conventionally, a Fstatistic greater than 10 is used as a threshold for sufficient instrument strength, representing a 10% relative bias towards the null in the twosample MR setting.
Multivariable MR relies upon an extension of this assumption, requiring instruments to be strongly associated with their corresponding exposure conditioning on the remaining included exposures. Conditional instrument strength is quantified by a conditional Fstatistic which has the same distribution as the univariate Fstatistic. Consequently, the same conventional instrument strength threshold of 10 can be used.
Further details are available here.
The strength_mvmr()
function is used to evaluate
instrument strength in the MVMR setting. The function contains two
arguments:

r_input
: A formatted data frame created using theformat_mvmr()
function or an object of classMRMVInput
from themr_mvinput()
function in the MendelianRandomization package. 
gencov
: A variancecovariance matrix for the effect of the genetic variants on each exposure. This is obtained from eithersnpcov_mvmr()
,phenocov_mvmr()
, or set to zero when omitted.
Note: The strength_mvmr()
function will
output a warning if a variancecovariance matrix is not provided. Please
see Step 1 for further information.
Continuing with the previous example, we can evaluate the conditional strength of the instruments for each exposure using the following command
sres < strength_mvmr(r_input = F.data, gencov = 0)
#> Warning in strength_mvmr(r_input = F.data, gencov = 0): Covariance between
#> effect of genetic variants on each exposure not specified. Fixing covariance at
#> 0.
#>
#> Conditional Fstatistics for instrument strength
#>
#> exposure1 exposure2 exposure3
#> Fstatistic 46.33671 67.80463 38.80184
In this case the set of instruments is sufficiently strong for MVMR
estimation using the conventional Fstatistic threshold of 10. However,
note that we have manually set mvmrcov
to zero, which would
likely not be appropriate given each SNPexposure estimate was obtained
from the same sample. Using a random phenotypic covariance matrix,
conditional Fstatistics can be calculated as
mvmrcovmatrix<matrix(c(1,0.1,0.05,0.1,1,0.2,0.05,0.2,1), nrow = 3, ncol = 3)
Xcovmat<phenocov_mvmr(mvmrcovmatrix,F.data[,7:9])
sres2 < strength_mvmr(r_input = F.data, gencov = Xcovmat)
#>
#> Conditional Fstatistics for instrument strength
#>
#> exposure1 exposure2 exposure3
#> Fstatistic 48.20993 69.55193 39.77326
Step 4: Test for horizontal pleiotropy using conventional Qstatistic estimation
Horizontal pleiotropy can be evaluated using a modified form of Cochran’s Q statistic with respect to differences in MVMR estimates across the set of instruments. In this case, observed heterogeneity is indicative of a violation of the exclusion restriction assumption in MR (validity), which can result in biased effect estimates.
Importantly, weak instruments can increase the false positive rate for pleiotropy detection, as heterogeneity in effect estimates due to weak instrument bias is conflated with heterogeneity as a result of pleiotropic bias. As a correction it is possible to estimate heterogeneity from pleiotropy through Qstatistic minimisation.
The function pleiotropy_mvmr()
can be used to test for
heterogeneity, requiring the same arguments as the
strength_mvmr()
; r_input
and
gencov
.
pres < pleiotropy_mvmr(r_input = F.data, gencov = 0)
#> Warning in pleiotropy_mvmr(r_input = F.data, gencov = 0): Covariance between
#> effect of genetic variants on each exposure not specified. Fixing covariance at
#> 0.
#> QStatistic for instrument validity:
#> 683.0807 on 141 DF , pvalue: 4.880403e72
And with the example covariance matrices from Step 3:
pres < pleiotropy_mvmr(r_input = F.data, gencov = Xcovmat)
#> QStatistic for instrument validity:
#> 682.843 on 141 DF , pvalue: 5.36533e72
Step 5: Estimate causal effects
Two MVMR estimation methods are provided in the MVMR
package. The first method fits an inverse variance weighted (IVW) MVMR
model, providing estimates of the direct effect of each exposure upon
the outcome. This is performed using the ivw_mvmr()
function as shown below:
res < ivw_mvmr(r_input = F.data)
#> Warning in ivw_mvmr(r_input = F.data): Covariance between effect of genetic
#> variants on each exposure not specified. Fixing covariance at 0.
#>
#> Multivariable MR
#>
#> Estimate Std. Error t value Pr(>t)
#> exposure1 0.021845061 0.01417255 1.541364 0.1254538
#> exposure2 0.003735249 0.01033779 0.361320 0.7183973
#> exposure3 0.025572042 0.01601913 1.596344 0.1126351
#>
#> Residual standard error: 2.197 on 142 degrees of freedom
In this case, the effect estimates are interpreted as the direct effects of LDLC (exposure 1), HDLC (exposure 2), and Trg (exposure 3) on SBP. Estimates are not robust to weak instruments of pleiotropic bias, and therefore rely upon the underlying MVMR assumptions being satisfied.
Step 6: Robust causal effect estimation.
Where the MVMR assumptions are potentially violated, specifically
where instruments are weak or exhibit pleiotropy, it is possible to
obtain more robust estimates through Qstatistic minimisation. This can
be performed using the qhet_mvmr()
function.
res1 < qhet_mvmr(F.data, mvmrcovmatrix, CI = F, iterations = 100)
#> Warning in qhet_mvmr(F.data, mvmrcovmatrix, CI = F, iterations = 100):
#> qhet_mvmr() is currently undergoing development.
res1
#> Effect Estimates
#> Exposure 1 0.0264865644
#> Exposure 2 0.0094372624
#> Exposure 3 0.0002575009
It is important to highlight that the phenotypic covariance matrix is
used as an input, and not the set of estimated covariance matrices which
previously formed the gencov
argument. It should also be
noted that as the number of exposure and instruments increases,
estimation using qhet_mvmr()
may prove difficult, owing to
the substantial amount of computing power required. This can be
initially relaxed by not computing 95% confidence intervals as
above.