Title: | Tidy Data and 'Geoms' for Bayesian Models |
---|---|
Description: | Compose data for and extract, manipulate, and visualize posterior draws from Bayesian models ('JAGS', 'Stan', 'rstanarm', 'brms', 'MCMCglmm', 'coda', ...) in a tidy data format. Functions are provided to help extract tidy data frames of draws from Bayesian models and that generate point summaries and intervals in a tidy format. In addition, 'ggplot2' 'geoms' and 'stats' are provided for common visualization primitives like points with multiple uncertainty intervals, eye plots (intervals plus densities), and fit curves with multiple, arbitrary uncertainty bands. |
Authors: | Matthew Kay [aut, cre], Timothy Mastny [ctb] |
Maintainer: | Matthew Kay <[email protected]> |
License: | GPL (>= 3) |
Version: | 3.0.7.9000 |
Built: | 2024-11-16 06:08:46 UTC |
Source: | https://github.com/mjskay/tidybayes |
tidybayes
is an R package that aims to make it easy to integrate
popular Bayesian modeling methods into a tidy data + ggplot workflow.
Tidy data frames (one observation per row) are particularly convenient for
use in a variety of R data manipulation and visualization packages (Wickham 2014).
However, when using Bayesian modeling functions like JAGS or Stan in R,
we often have to translate this data into a form the model understands,
and then after running the model, translate the resulting sample (or
predictions) into a more tidy format for use with other R functions.
tidybayes
aims to simplify these two common (often tedious)
operations. It also provides a variety of ggplot geometries aimed
at making the visualization of model output easier.
For a comprehensive overview of the package, see vignette("tidybayes")
.
For overviews aimed at the rstanarm
and brms
packages, see
vignette("tidy-rstanarm")
and vignette("tidy-brms")
. For an overview
of the majority of geoms in the ggdist/tidybayes family, see
vignette("slabinterval", package = "ggdist")
.
For a list of supported models, see tidybayes-models.
Maintainer: Matthew Kay [email protected]
Other contributors:
Timothy Mastny [email protected] [contributor]
Wickham, Hadley. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23. doi:10.18637/jss.v059.i10.
Useful links:
Report bugs at https://github.com/mjskay/tidybayes/issues
Add draws from a matrix of draws (usually draws from a predictive distribution) to a data frame in tidy format. This is
a generic version of add_predicted_draws()
that can be used with model types
that have their own prediction functions that are not yet supported by tidybayes.
add_draws(data, draws, value = ".value")
add_draws(data, draws, value = ".value")
data |
Data frame to add draws to, with M rows. |
draws |
N by M matrix of draws, with M columns corresponding to the M rows in |
value |
The name of the output column; default |
Given a data frame with M rows and an N by M matrix of N draws, adds a .row
, .draw
, and .value
column (or another name if value
is set) to data
, and expands data
into a long-format dataframe of draws.
add_epred_draws(df, m)
is roughly equivalent to add_draws(df, posterior_epred(m, newdata = df))
, except
that add_epred_draws
standardizes argument names and values across packages and has additional features for some
model types (like handling ordinal responses and distributional parameters in brms).
add_predicted_draws(df, m)
is roughly equivalent to add_draws(df, posterior_predict(m, newdata = df))
, except
that add_predicted_draws
standardizes argument names and values across packages.
A data frame (actually, a tibble) with a .row
column (a
factor grouping rows from the input data
), a .draw
column (a unique index corresponding to each draw
from the distribution), and a column with its name specified by the value
argument (default is .value
)
containing the values of draws from draws
. The data frame is grouped by all rows in data
plus the .row
column.
Matthew Kay
add_predicted_draws()
, add_draws()
## Not run: library(ggplot2) library(dplyr) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars, # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # plot posterior predictive intervals mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% # the line below is roughly equivalent to add_epred_draws(m_mpg), except # that it does not standardize arguments across model types. add_draws(posterior_epred(m_mpg, newdata = .)) %>% ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) + stat_lineribbon(aes(y = .value), alpha = 0.25) + geom_point(data = mtcars) + scale_fill_brewer(palette = "Greys") ## End(Not run)
## Not run: library(ggplot2) library(dplyr) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars, # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # plot posterior predictive intervals mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% # the line below is roughly equivalent to add_epred_draws(m_mpg), except # that it does not standardize arguments across model types. add_draws(posterior_epred(m_mpg, newdata = .)) %>% ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) + stat_lineribbon(aes(y = .value), alpha = 0.25) + geom_point(data = mtcars) + scale_fill_brewer(palette = "Greys") ## End(Not run)
Given a data frame and a model, adds draws from the linear/link-level predictor, the expectation of the posterior predictive, the posterior predictive, or the residuals of a model to the data frame in a long format.
add_epred_draws( newdata, object, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) epred_draws( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) ## Default S3 method: epred_draws( object, newdata, ..., value = ".epred", seed = NULL, category = NULL ) ## S3 method for class 'stanreg' epred_draws( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) ## S3 method for class 'brmsfit' epred_draws( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) add_linpred_draws( newdata, object, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL, n ) linpred_draws( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL, n, scale ) ## Default S3 method: linpred_draws( object, newdata, ..., value = ".linpred", seed = NULL, category = NULL ) ## S3 method for class 'stanreg' linpred_draws( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) ## S3 method for class 'brmsfit' linpred_draws( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) add_predicted_draws( newdata, object, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", n ) predicted_draws( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", n, prediction ) ## Default S3 method: predicted_draws( object, newdata, ..., value = ".prediction", seed = NULL, category = ".category" ) ## S3 method for class 'stanreg' predicted_draws( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category" ) ## S3 method for class 'brmsfit' predicted_draws( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category" ) add_residual_draws( newdata, object, ..., value = ".residual", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", n ) residual_draws( object, newdata, ..., value = ".residual", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", n, residual ) ## Default S3 method: residual_draws(object, newdata, ...) ## S3 method for class 'brmsfit' residual_draws( object, newdata, ..., value = ".residual", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category" )
add_epred_draws( newdata, object, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) epred_draws( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) ## Default S3 method: epred_draws( object, newdata, ..., value = ".epred", seed = NULL, category = NULL ) ## S3 method for class 'stanreg' epred_draws( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) ## S3 method for class 'brmsfit' epred_draws( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) add_linpred_draws( newdata, object, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL, n ) linpred_draws( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL, n, scale ) ## Default S3 method: linpred_draws( object, newdata, ..., value = ".linpred", seed = NULL, category = NULL ) ## S3 method for class 'stanreg' linpred_draws( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) ## S3 method for class 'brmsfit' linpred_draws( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", dpar = NULL ) add_predicted_draws( newdata, object, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", n ) predicted_draws( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", n, prediction ) ## Default S3 method: predicted_draws( object, newdata, ..., value = ".prediction", seed = NULL, category = ".category" ) ## S3 method for class 'stanreg' predicted_draws( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category" ) ## S3 method for class 'brmsfit' predicted_draws( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category" ) add_residual_draws( newdata, object, ..., value = ".residual", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", n ) residual_draws( object, newdata, ..., value = ".residual", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category", n, residual ) ## Default S3 method: residual_draws(object, newdata, ...) ## S3 method for class 'brmsfit' residual_draws( object, newdata, ..., value = ".residual", ndraws = NULL, seed = NULL, re_formula = NULL, category = ".category" )
newdata |
Data frame to generate predictions from. |
object |
A supported Bayesian model fit that can provide fits and predictions. Supported models
are listed in the second section of tidybayes-models: Models Supporting Prediction. While other
functions in this package (like |
... |
Additional arguments passed to the underlying prediction method for the type of model given. |
value |
The name of the output column:
|
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
re_formula |
formula containing group-level effects to be considered in the prediction.
If |
category |
For some ordinal, multinomial, and multivariate models (notably, |
dpar |
For |
n |
(Deprecated). Use |
scale |
(Deprecated). Use the appropriate function ( |
prediction , residual
|
(Deprecated). Use |
Consider a model like:
This model has:
an outcome variable,
a response distribution, , having parameters
(with link function
) and
(with link function
)
a single predictor,
coefficients ,
,
, and
We fit this model to some observed data, , and predictors,
. Given new values of predictors,
,
supplied in the data frame
newdata
, the functions for posterior draws are
defined as follows:
add_predicted_draws()
adds draws from the posterior predictive distribution,
,
to the data.
It corresponds to
rstanarm::posterior_predict()
or brms::posterior_predict()
.
add_epred_draws()
adds draws from the expectation of the posterior predictive
distribution, aka the conditional expectation,
,
to the data.
It corresponds to
rstanarm::posterior_epred()
or brms::posterior_epred()
.
Not all models support this function.
add_linpred_draws()
adds draws from the posterior linear predictors to the data.
It corresponds to rstanarm::posterior_linpred()
or brms::posterior_linpred()
.
Depending on the model type and additional parameters passed, this may be:
The untransformed linear predictor, e.g.
=
.
This is returned by
add_linpred_draws(transform = FALSE)
for brms and rstanarm models.
It is analogous to type = "link"
in predict.glm()
.
The inverse-link transformed linear predictor, e.g.
=
.
This is returned by
add_linpred_draws(transform = TRUE)
for brms and rstanarm models.
It is analogous to type = "response"
in predict.glm()
.
NOTE: add_linpred_draws(transform = TRUE)
and add_epred_draws()
may be equivalent but
are not guaranteed to be. They are equivalent when the expectation of the response
distribution is equal to its first parameter, i.e. when . Many
distributions have this property (e.g. Normal distributions, Bernoulli distributions),
but not all. If you want the expectation of the posterior predictive, it is best to
use
add_epred_draws()
if available, and if not available, verify this property holds prior
to using add_linpred_draws()
.
add_residual_draws()
adds draws from residuals,
,
to the data.
It corresponds to
brms::residuals.brmsfit()
.
The corresponding functions without add_
as a prefix are alternate spellings
with the opposite order of the first two arguments: e.g. add_predicted_draws(newdata, object)
versus predicted_draws(object, newdata)
. This facilitates use in data
processing pipelines that start either with a data frame or a model.
Given equal choice between the two, the spellings prefixed with add_
are preferred.
A data frame (actually, a tibble) with a .row
column (a
factor grouping rows from the input newdata
), .chain
column (the chain
each draw came from, or NA
if the model does not provide chain information),
.iteration
column (the iteration the draw came from, or NA
if the model does
not provide iteration information), and a .draw
column (a unique index corresponding to each draw
from the distribution). In addition, epred_draws
includes a column with its name specified by
the epred
argument (default ".epred"
); linpred_draws
includes a column with its name
specified by the linpred
argument (default ".linpred"
), and
predicted_draws
contains a column with its name specified by the .prediction
argument (default
".prediction"
). For convenience, the resulting data frame comes grouped by the original input rows.
Matthew Kay
add_draws()
for the variant of these functions for use with packages that do not have
explicit support for these functions yet. See spread_draws()
for manipulating posteriors directly.
## Not run: library(ggplot2) library(dplyr) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars, # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # draw 100 lines from the posterior means and overplot them mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% # NOTE: only use ndraws here when making spaghetti plots; for # plotting intervals it is always best to use all draws (omit ndraws) add_epred_draws(m_mpg, ndraws = 100) %>% ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) + geom_line(aes(y = .epred, group = paste(cyl, .draw)), alpha = 0.25) + geom_point(data = mtcars) # plot posterior predictive intervals mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% add_predicted_draws(m_mpg) %>% ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) + stat_lineribbon(aes(y = .prediction), .width = c(.99, .95, .8, .5), alpha = 0.25) + geom_point(data = mtcars) + scale_fill_brewer(palette = "Greys") ## End(Not run)
## Not run: library(ggplot2) library(dplyr) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars, # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # draw 100 lines from the posterior means and overplot them mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% # NOTE: only use ndraws here when making spaghetti plots; for # plotting intervals it is always best to use all draws (omit ndraws) add_epred_draws(m_mpg, ndraws = 100) %>% ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) + geom_line(aes(y = .epred, group = paste(cyl, .draw)), alpha = 0.25) + geom_point(data = mtcars) # plot posterior predictive intervals mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% add_predicted_draws(m_mpg) %>% ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) + stat_lineribbon(aes(y = .prediction), .width = c(.99, .95, .8, .5), alpha = 0.25) + geom_point(data = mtcars) + scale_fill_brewer(palette = "Greys") ## End(Not run)
rvar
s for the linear predictor, posterior expectation, posterior predictive, or residuals of a model to a data frameGiven a data frame and a model, adds rvar
s of draws from the linear/link-level predictor,
the expectation of the posterior predictive, or the posterior predictive to
the data frame.
add_epred_rvars( newdata, object, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) epred_rvars( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) ## Default S3 method: epred_rvars( object, newdata, ..., value = ".epred", seed = NULL, dpar = NULL, columns_to = NULL ) ## S3 method for class 'stanreg' epred_rvars( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) ## S3 method for class 'brmsfit' epred_rvars( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) add_linpred_rvars( newdata, object, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) linpred_rvars( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) ## Default S3 method: linpred_rvars( object, newdata, ..., value = ".linpred", seed = NULL, dpar = NULL, columns_to = NULL ) ## S3 method for class 'stanreg' linpred_rvars( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) ## S3 method for class 'brmsfit' linpred_rvars( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) add_predicted_rvars( newdata, object, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, columns_to = NULL ) predicted_rvars( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, columns_to = NULL ) ## Default S3 method: predicted_rvars( object, newdata, ..., value = ".prediction", seed = NULL, columns_to = NULL ) ## S3 method for class 'stanreg' predicted_rvars( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, columns_to = NULL ) ## S3 method for class 'brmsfit' predicted_rvars( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, columns_to = NULL )
add_epred_rvars( newdata, object, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) epred_rvars( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) ## Default S3 method: epred_rvars( object, newdata, ..., value = ".epred", seed = NULL, dpar = NULL, columns_to = NULL ) ## S3 method for class 'stanreg' epred_rvars( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) ## S3 method for class 'brmsfit' epred_rvars( object, newdata, ..., value = ".epred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) add_linpred_rvars( newdata, object, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) linpred_rvars( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) ## Default S3 method: linpred_rvars( object, newdata, ..., value = ".linpred", seed = NULL, dpar = NULL, columns_to = NULL ) ## S3 method for class 'stanreg' linpred_rvars( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) ## S3 method for class 'brmsfit' linpred_rvars( object, newdata, ..., value = ".linpred", ndraws = NULL, seed = NULL, re_formula = NULL, dpar = NULL, columns_to = NULL ) add_predicted_rvars( newdata, object, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, columns_to = NULL ) predicted_rvars( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, columns_to = NULL ) ## Default S3 method: predicted_rvars( object, newdata, ..., value = ".prediction", seed = NULL, columns_to = NULL ) ## S3 method for class 'stanreg' predicted_rvars( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, columns_to = NULL ) ## S3 method for class 'brmsfit' predicted_rvars( object, newdata, ..., value = ".prediction", ndraws = NULL, seed = NULL, re_formula = NULL, columns_to = NULL )
newdata |
Data frame to generate predictions from. |
object |
A supported Bayesian model fit that can provide fits and predictions. Supported models
are listed in the second section of tidybayes-models: Models Supporting Prediction. While other
functions in this package (like |
... |
Additional arguments passed to the underlying prediction method for the type of model given. |
value |
The name of the output column:
|
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
re_formula |
formula containing group-level effects to be considered in the prediction.
If |
dpar |
For |
columns_to |
For some models, such as ordinal, multinomial, and multivariate models (notably, |
Consider a model like:
This model has:
an outcome variable,
a response distribution, , having parameters
(with link function
) and
(with link function
)
a single predictor,
coefficients ,
,
, and
We fit this model to some observed data, , and predictors,
. Given new values of predictors,
,
supplied in the data frame
newdata
, the functions for posterior draws are
defined as follows:
add_predicted_rvars()
adds rvar
s containing draws from the posterior predictive distribution,
,
to the data.
It corresponds to
rstanarm::posterior_predict()
or brms::posterior_predict()
.
add_epred_rvars()
adds rvar
s containing draws from the expectation of the posterior predictive
distribution, aka the conditional expectation,
,
to the data.
It corresponds to
rstanarm::posterior_epred()
or brms::posterior_epred()
.
Not all models support this function.
add_linpred_rvars()
adds rvar
s containing draws from the posterior linear predictors to the data.
It corresponds to rstanarm::posterior_linpred()
or brms::posterior_linpred()
.
Depending on the model type and additional parameters passed, this may be:
The untransformed linear predictor, e.g.
=
.
This is returned by
add_linpred_rvars(transform = FALSE)
for brms and rstanarm models.
It is analogous to type = "link"
in predict.glm()
.
The inverse-link transformed linear predictor, e.g.
=
.
This is returned by
add_linpred_rvars(transform = TRUE)
for brms and rstanarm models.
It is analogous to type = "response"
in predict.glm()
.
NOTE: add_linpred_rvars(transform = TRUE)
and add_epred_rvars()
may be equivalent but
are not guaranteed to be. They are equivalent when the expectation of the response
distribution is equal to its first parameter, i.e. when . Many
distributions have this property (e.g. Normal distributions, Bernoulli distributions),
but not all. If you want the expectation of the posterior predictive, it is best to
use
add_epred_rvars()
if available, and if not available, verify this property holds prior
to using add_linpred_rvars()
.
The corresponding functions without add_
as a prefix are alternate spellings
with the opposite order of the first two arguments: e.g. add_predicted_rvars(newdata, object)
versus predicted_rvars(object, newdata)
. This facilitates use in data
processing pipelines that start either with a data frame or a model.
Given equal choice between the two, the spellings prefixed with add_
are preferred.
A data frame (actually, a tibble) equal to the input newdata
with
additional columns added containing rvar
s representing the requested predictions or fits.
Matthew Kay
add_predicted_draws()
for the analogous functions that use a long-data-frame-of-draws
format instead of a data-frame-of-rvar
s format. See spread_rvars()
for manipulating posteriors directly.
## Not run: library(ggplot2) library(dplyr) library(posterior) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars, family = lognormal(), # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # Look at mean predictions for some cars (epred) and compare to # the exponeniated mu parameter of the lognormal distribution (linpred). # Notice how they are NOT the same. This is because exp(mu) for a # lognormal distribution is equal to its median, not its mean. mtcars %>% select(hp, cyl, mpg) %>% add_epred_rvars(m_mpg) %>% add_linpred_rvars(m_mpg, value = "mu") %>% mutate(expmu = exp(mu), .epred - expmu) # plot intervals around conditional means (epred_rvars) mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% add_epred_rvars(m_mpg) %>% ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) + stat_lineribbon(aes(dist = .epred), .width = c(.95, .8, .5), alpha = 1/3) + geom_point(aes(y = mpg), data = mtcars) + scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Set2") # plot posterior predictive intervals (predicted_rvars) mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% add_predicted_rvars(m_mpg) %>% ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) + stat_lineribbon(aes(dist = .prediction), .width = c(.95, .8, .5), alpha = 1/3) + geom_point(aes(y = mpg), data = mtcars) + scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Set2") ## End(Not run)
## Not run: library(ggplot2) library(dplyr) library(posterior) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars, family = lognormal(), # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # Look at mean predictions for some cars (epred) and compare to # the exponeniated mu parameter of the lognormal distribution (linpred). # Notice how they are NOT the same. This is because exp(mu) for a # lognormal distribution is equal to its median, not its mean. mtcars %>% select(hp, cyl, mpg) %>% add_epred_rvars(m_mpg) %>% add_linpred_rvars(m_mpg, value = "mu") %>% mutate(expmu = exp(mu), .epred - expmu) # plot intervals around conditional means (epred_rvars) mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% add_epred_rvars(m_mpg) %>% ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) + stat_lineribbon(aes(dist = .epred), .width = c(.95, .8, .5), alpha = 1/3) + geom_point(aes(y = mpg), data = mtcars) + scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Set2") # plot posterior predictive intervals (predicted_rvars) mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% add_predicted_rvars(m_mpg) %>% ggplot(aes(x = hp, color = ordered(cyl), fill = ordered(cyl))) + stat_lineribbon(aes(dist = .prediction), .width = c(.95, .8, .5), alpha = 1/3) + geom_point(aes(y = mpg), data = mtcars) + scale_color_brewer(palette = "Dark2") + scale_fill_brewer(palette = "Set2") ## End(Not run)
Combines the chain and iteration columns of a tidy data frame of draws from a Bayesian model fit into a new column
that can uniquely identify each draw. Generally speaking not needed for pure tidybayes code, as tidybayes
functions now automatically include a .draw
column, but can be useful when interacting with packages that
do not provide such a column.
combine_chains(data, chain = .chain, iteration = .iteration, into = ".draw")
combine_chains(data, chain = .chain, iteration = .iteration, into = ".draw")
data |
Tidy data frame of draws with columns representing the chain and iteration of each draw. |
chain |
Bare name of column in |
iteration |
Bare name of column in |
into |
Name (as a character vector) of the column to combine chains into. The default, |
A data frame of tidy draws with a combined iteration column
Matthew Kay
library(magrittr) library(coda) data(line, package = "coda") # The `line` posterior has two chains with 200 iterations each: line %>% tidy_draws() %>% summary() # combine_chains combines the chain and iteration column into the .draw column. line %>% tidy_draws() %>% combine_chains() %>% summary()
library(magrittr) library(coda) data(line, package = "coda") # The `line` posterior has two chains with 200 iterations each: line %>% tidy_draws() %>% summary() # combine_chains combines the chain and iteration column into the .draw column. line %>% tidy_draws() %>% combine_chains() %>% summary()
Given posterior draws from a Bayesian model in long format (e.g. as
returned by spread_draws()
), compare the value of a variable in those draws
across different paired combinations of levels of a factor.
compare_levels( data, variable, by, fun = `-`, comparison = "default", draw_indices = c(".chain", ".iteration", ".draw"), ignore_groups = ".row" )
compare_levels( data, variable, by, fun = `-`, comparison = "default", draw_indices = c(".chain", ".iteration", ".draw"), ignore_groups = ".row" )
data |
Long-format |
variable |
Bare (unquoted) name of a column in data representing the
variable to compare across levels. Can be a numeric variable (as in
long-data-frame-of-draws format) or a |
by |
Bare (unquoted) name of a column in data that is a
|
fun |
Binary function to use for comparison. For each pair of levels of
|
comparison |
One of (a) the comparison types |
draw_indices |
Character vector of column names that should be treated
as indices of draws. Operations are done within combinations of these values.
The default is |
ignore_groups |
character vector of names of groups to ignore by
default in the input grouping. This is primarily provided to make it
easier to pipe output of |
This function simplifies conducting comparisons across levels of some
variable in a tidy data frame of draws. It applies fun
to all
values of variable
for each pair of levels of by
as selected
by comparison
. By default, all pairwise comparisons are generated if
by
is an unordered factor
and ordered comparisons are made if
by
is ordered
.
The included comparison
types are:
ordered
:
compare each level i
with level i - 1
; e.g. fun(i, i - 1)
pairwise
: compare each level of by
with every other
level.
control
: compare each level of by
with the first
level of by
. If you wish to compare with a different level, you can
first apply relevel()
to by
to set the control
(reference) level.
default
: use ordered
if
is.ordered(by)
and pairwise
otherwise.
A data.frame
with the same columns as data
, except
that the by
column contains a symbolic representation of the
comparison of pairs of levels of by
in data
, and
variable
contains the result of that comparison.
Matthew Kay
emmeans_comparison()
to use emmeans
-style contrast methods with
compare_levels()
.
library(dplyr) library(ggplot2) data(RankCorr, package = "ggdist") # Let's do all pairwise comparisons of b[i,1]: RankCorr %>% spread_draws(b[i,j]) %>% filter(j == 1) %>% compare_levels(b, by = i) %>% median_qi() # Or let's plot all comparisons against the first level (control): RankCorr %>% spread_draws(b[i,j]) %>% filter(j == 1) %>% compare_levels(b, by = i, comparison = control) %>% ggplot(aes(x = b, y = i)) + stat_halfeye() # Or let's plot comparisons of all levels of j within # all levels of i RankCorr %>% spread_draws(b[i,j]) %>% group_by(i) %>% compare_levels(b, by = j) %>% ggplot(aes(x = b, y = j)) + stat_halfeye() + facet_grid(cols = vars(i))
library(dplyr) library(ggplot2) data(RankCorr, package = "ggdist") # Let's do all pairwise comparisons of b[i,1]: RankCorr %>% spread_draws(b[i,j]) %>% filter(j == 1) %>% compare_levels(b, by = i) %>% median_qi() # Or let's plot all comparisons against the first level (control): RankCorr %>% spread_draws(b[i,j]) %>% filter(j == 1) %>% compare_levels(b, by = i, comparison = control) %>% ggplot(aes(x = b, y = i)) + stat_halfeye() # Or let's plot comparisons of all levels of j within # all levels of i RankCorr %>% spread_draws(b[i,j]) %>% group_by(i) %>% compare_levels(b, by = j) %>% ggplot(aes(x = b, y = j)) + stat_halfeye() + facet_grid(cols = vars(i))
Compose data into a list suitable to be passed into a Bayesian model (JAGS, BUGS, Stan, etc).
compose_data(..., .n_name = n_prefix("n"))
compose_data(..., .n_name = n_prefix("n"))
... |
Data to be composed into a list suitable for being passed into
Stan, JAGS, etc. Named arguments will have their name used as the |
.n_name |
A function that is used to form dimension index variables (a variable
whose value is number of levels in a factor or the length of a data frame in
|
This function recursively translates each argument into list elements using
as_data_list()
, merging all resulting lists together. By
default this means that:
numerics are included as-is.
logicals are translated into numeric using as.numeric()
.
factors are translated into numeric using as.numeric()
,
and an additional element named .n_name(argument_name)
is added
with the number of levels in the factor. The default .n_name
function prefixes "n_"
before the factor name; e.g. a factor
named foo
will have an element named n_foo
added containing
the number of levels in foo
.
character vectors are converted into factors then translated into numeric in the same manner as factors are.
lists are translated by translating all elements of the list (recursively) and adding them to the result.
data.frames are translated by translating every column of the data.frame
and adding them to the result. A variable named "n"
(or
.n_name(argument_name)
if the data.frame is passed as a named
argument argument_name
) is also added containing the number of rows
in the data frame.
NULL
values are dropped. Setting a named argument to NULL
can be used to drop that item from the resulting list (if an unwanted
element was added to the list by a previous argument, such as a column
from a data frame that is not needed in the model).
all other types are dropped (and a warning given)
As in functions like dplyr::mutate()
, each expression is evaluated in an
environment containing the data list built up so far.
For example, this means that if the first argument to compose_data
is a data frame, subsequent arguments can include direct references to columns
from that data frame. This allows you, for example, to easily use
x_at_y()
to generate indices for nested models.
If you wish to add support for additional types not described above,
provide an implementation of as_data_list()
for the type. See
the implementations of as_data_list.numeric
,
as_data_list.logical
, etc for examples.
A list where each element is a translated variable as described above.
Matthew Kay
x_at_y()
, spread_draws()
,
gather_draws()
.
library(magrittr) df = data.frame( plot = factor(paste0("p", rep(1:8, times = 2))), site = factor(paste0("s", rep(1:4, each = 2, times = 2))) ) # without changing `.n_name`, compose_data() will prefix indices # with "n" by default df %>% compose_data() # you can use n_prefix() to define a different prefix (e.g. "N"): df %>% compose_data(.n_name = n_prefix("N")) # If you have nesting, you may want a nested index, which can be generated using x_at_y() # Here, site[p] will give the site for plot p df %>% compose_data(site = x_at_y(site, plot))
library(magrittr) df = data.frame( plot = factor(paste0("p", rep(1:8, times = 2))), site = factor(paste0("s", rep(1:4, each = 2, times = 2))) ) # without changing `.n_name`, compose_data() will prefix indices # with "n" by default df %>% compose_data() # you can use n_prefix() to define a different prefix (e.g. "N"): df %>% compose_data(.n_name = n_prefix("N")) # If you have nesting, you may want a nested index, which can be generated using x_at_y() # Here, site[p] will give the site for plot p df %>% compose_data(site = x_at_y(site, plot))
Functions used by compose_data()
to create lists of data suitable for
input into a Bayesian modeling function. These functions typically should not be called directly
(instead use compose_data()
), but are exposed for the rare cases in which
you may need to provide your own conversion routines for a data type not already
supported (see Details).
data_list(...) as_data_list(object, name = "", ...) ## Default S3 method: as_data_list(object, name = "", ...) ## S3 method for class 'numeric' as_data_list(object, name = "", scalar_as_array = FALSE, ...) ## S3 method for class 'logical' as_data_list(object, name = "", ...) ## S3 method for class 'factor' as_data_list(object, name = "", .n_name = n_prefix("n"), ...) ## S3 method for class 'character' as_data_list(object, name = "", ...) ## S3 method for class 'list' as_data_list(object, name = "", ...) ## S3 method for class 'data.frame' as_data_list(object, name = "", .n_name = n_prefix("n"), ...) ## S3 method for class 'data_list' as_data_list(object, name = "", ...)
data_list(...) as_data_list(object, name = "", ...) ## Default S3 method: as_data_list(object, name = "", ...) ## S3 method for class 'numeric' as_data_list(object, name = "", scalar_as_array = FALSE, ...) ## S3 method for class 'logical' as_data_list(object, name = "", ...) ## S3 method for class 'factor' as_data_list(object, name = "", .n_name = n_prefix("n"), ...) ## S3 method for class 'character' as_data_list(object, name = "", ...) ## S3 method for class 'list' as_data_list(object, name = "", ...) ## S3 method for class 'data.frame' as_data_list(object, name = "", .n_name = n_prefix("n"), ...) ## S3 method for class 'data_list' as_data_list(object, name = "", ...)
... |
Additional arguments passed to other implementations of
|
object |
The object to convert (see Details). |
name |
The name of the element in the returned list corresponding to this object. |
scalar_as_array |
If |
.n_name |
A function that is used to form variables storing the number of
rows in data frames or the number of levels in factors in |
data_list
creates a list with class c("data_list", "list")
instead of c("list")
, but largely otherwise acts like the list()
function.
as_data_list
recursively translates its first argument into list elements,
concatenating all resulting lists together. By default this means that:
numerics are included as-is.
logicals are translated into numeric using as.numeric()
.
factors are translated into numeric using as.numeric()
, and an additional element named
.n_name(name)
is added with the number of levels in the factor.
character vectors are converted into factors then translated into numeric in the same manner as factors are.
lists are translated by translating all elements of the list (recursively) and adding them to the result.
data.frames are translated by translating every column of the data.frame and adding them to
the result. A variable named "n"
(or .n_name(name)
if name
is not ""
)
is also added containing the number of rows in the data frame.
all other types are dropped (and a warning given)
If you wish to add support for additional types not described above, provide
an implementation of as_data_list()
for the type. See the
implementations of as_data_list.numeric
, as_data_list.logical
,
etc for examples.
An object of class c("data_list", "list")
, where each element
is a translated variable as described above.
Matthew Kay
# Typically these functions should not be used directly. # See the compose_data function for examples of how to translate # data in lists for input to Bayesian modeling functions.
# Typically these functions should not be used directly. # See the compose_data function for examples of how to translate # data in lists for input to Bayesian modeling functions.
Generates a data frame of bins representing the kernel density (or
histogram) of a vector, suitable for use in generating predictive
distributions for visualization. These functions were originally
designed for use with the now-deprecated predict_curve()
, and
may be deprecated in the future.
density_bins(x, n = 101, ...) histogram_bins(x, n = 30, breaks = n, ...)
density_bins(x, n = 101, ...) histogram_bins(x, n = 30, breaks = n, ...)
x |
A numeric vector |
n |
Number of bins |
... |
|
breaks |
Used to set bins for |
These functions are simple wrappers to density()
and
hist()
that compute density estimates and return their results
in a consistent format: a data frame of bins suitable for use with
the now-deprecated predict_curve()
.
density_bins
computes a kernel density estimate using
density()
.
histogram_bins
computes a density histogram using hist()
.
A data frame representing bins and their densities with the following columns:
mid |
Bin midpoint |
lower |
Lower endpoint of each bin |
upper |
Upper endpoint of each bin |
density |
Density estimate of the bin |
Matthew Kay
See add_predicted_draws()
and stat_lineribbon()
for a better approach. These
functions may be deprecated in the future.
## Not run: library(ggplot2) library(dplyr) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars) step = 1 mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, by = step)) %>% add_predicted_draws(m_mpg) %>% summarise(density_bins(.prediction), .groups = "drop") %>% ggplot() + geom_rect(aes( xmin = hp - step/2, ymin = lower, ymax = upper, xmax = hp + step/2, fill = ordered(cyl), alpha = density )) + geom_point(aes(x = hp, y = mpg, fill = ordered(cyl)), shape = 21, data = mtcars) + scale_alpha_continuous(range = c(0, 1)) + scale_fill_brewer(palette = "Set2") ## End(Not run)
## Not run: library(ggplot2) library(dplyr) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars) step = 1 mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, by = step)) %>% add_predicted_draws(m_mpg) %>% summarise(density_bins(.prediction), .groups = "drop") %>% ggplot() + geom_rect(aes( xmin = hp - step/2, ymin = lower, ymax = upper, xmax = hp + step/2, fill = ordered(cyl), alpha = density )) + geom_point(aes(x = hp, y = mpg, fill = ordered(cyl)), shape = 21, data = mtcars) + scale_alpha_continuous(range = c(0, 1)) + scale_fill_brewer(palette = "Set2") ## End(Not run)
Convert emmeans contrast methods into comparison functions
suitable for use with compare_levels()
.
emmeans_comparison(method, ...)
emmeans_comparison(method, ...)
method |
An emmeans-style contrast method. One of: (1) a string specifying
the name of an emmeans contrast method, like
|
... |
Arguments passed on to the contrast method. |
Given an emmeans contrast method name as a string
(e.g., "pairwise"
, "trt.vs.ctrl"
, etc) or an emmeans-style contrast function
(e.g., emmeans::pairwise.emmc, emmeans::trt.vs.ctrl.emmc, etc), emmeans_comparison()
returns a new function that can be used in the comparison
argument to compare_levels()
to compute those contrasts.
A function that takes a single argument, var
, containing a variable
to generate contrasts for (e.g., a factor or a character vector) and returns
a function that generates a list of named unevaluated expressions representing
different contrasts of that variable. This function is suitable to be used
as the comparison
argument in compare_levels()
.
Matthew Kay
compare_levels()
, emmeans::contrast-methods.
See gather_emmeans_draws()
for a different approach to using emmeans
with
tidybayes
.
library(dplyr) library(ggplot2) data(RankCorr, package = "ggdist") # emmeans contrast methods return matrices. E.g. the "eff" comparison # compares each level to the average of all levels: emmeans:::eff.emmc(c("a","b","c","d")) # tidybayes::compare_levels() can't use a contrast matrix like this # directly; it takes arbitrary expressions of factor levels. But # we can use `emmeans_comparison` to generate the equivalent expressions: emmeans_comparison("eff")(c("a","b","c","d")) # We can use the "eff" comparison type with `compare_levels()` as follows: RankCorr %>% spread_draws(b[i,j]) %>% filter(j == 1) %>% compare_levels(b, by = i, comparison = emmeans_comparison("eff")) %>% median_qi()
library(dplyr) library(ggplot2) data(RankCorr, package = "ggdist") # emmeans contrast methods return matrices. E.g. the "eff" comparison # compares each level to the average of all levels: emmeans:::eff.emmc(c("a","b","c","d")) # tidybayes::compare_levels() can't use a contrast matrix like this # directly; it takes arbitrary expressions of factor levels. But # we can use `emmeans_comparison` to generate the equivalent expressions: emmeans_comparison("eff")(c("a","b","c","d")) # We can use the "eff" comparison type with `compare_levels()` as follows: RankCorr %>% spread_draws(b[i,j]) %>% filter(j == 1) %>% compare_levels(b, by = i, comparison = emmeans_comparison("eff")) %>% median_qi()
Extract draws from a Bayesian model for one or more variables (possibly with named dimensions) into one of two types of long-format data frames.
gather_draws( model, ..., regex = FALSE, sep = "[, ]", ndraws = NULL, seed = NULL, draw_indices = c(".chain", ".iteration", ".draw"), n ) spread_draws( model, ..., regex = FALSE, sep = "[, ]", ndraws = NULL, seed = NULL, draw_indices = c(".chain", ".iteration", ".draw"), n )
gather_draws( model, ..., regex = FALSE, sep = "[, ]", ndraws = NULL, seed = NULL, draw_indices = c(".chain", ".iteration", ".draw"), n ) spread_draws( model, ..., regex = FALSE, sep = "[, ]", ndraws = NULL, seed = NULL, draw_indices = c(".chain", ".iteration", ".draw"), n )
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Expressions in the form of
|
regex |
If |
sep |
Separator used to separate dimensions in variable names, as a regular expression. |
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
draw_indices |
Character vector of column names that should be treated
as indices of draws. Operations are done within combinations of these values.
The default is |
n |
(Deprecated). Use |
Imagine a JAGS or Stan fit named model
. The model may contain a variable named
b[i,v]
(in the JAGS or Stan language) with dimension i
in 1:100
and
dimension v
in 1:3
. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like "b[1,1]"
, "b[2,1]"
, etc.
spread_draws
and gather_draws
provide a straightforward
syntax to translate these columns back into properly-indexed variables in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.
spread_draws
and gather_draws
return data frames already grouped by
all dimensions used on the variables you specify.
The difference between spread_draws
is that names of variables in the model will
be spread across the data frame as column names, whereas gather_draws
will
gather variables into a single column named ".variable"
and place values of variables into a
column named ".value"
. To use naming schemes from other packages (such as broom
), consider passing
results through functions like to_broom_names()
or to_ggmcmc_names()
.
For example, spread_draws(model, a[i], b[i,v])
might return a grouped
data frame (grouped by i
and v
), with:
column ".chain"
: the chain number. NA
if not applicable to the model
type; this is typically only applicable to MCMC algorithms.
column ".iteration"
: the iteration number. Guaranteed to be unique within-chain only.
NA
if not applicable to the model type; this is typically only applicable to MCMC algorithms.
column ".draw"
: a unique number for each draw from the posterior. Order is not
guaranteed to be meaningful.
column "i"
: value in 1:5
column "v"
: value in 1:10
column "a"
: value of "a[i]"
for draw ".draw"
column "b"
: value of "b[i,v]"
for draw ".draw"
gather_draws(model, a[i], b[i,v])
on the same model would return a grouped
data frame (grouped by i
and v
), with:
column ".chain"
: the chain number
column ".iteration"
: the iteration number
column ".draw"
: the draw number
column "i"
: value in 1:5
column "v"
: value in 1:10
, or NA
if ".variable"
is "a"
.
column ".variable"
: value in c("a", "b")
.
column ".value"
: value of "a[i]"
(when ".variable"
is "a"
)
or "b[i,v]"
(when ".variable"
is "b"
) for draw ".draw"
spread_draws
and gather_draws
can use type information
applied to the model
object by recover_types()
to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the v
dimension
in the original data frame data
was a factor with levels c("a","b","c")
,
then we could use recover_types
before spread_draws
:
model %>% recover_types(data) spread_draws(model, b[i,v])
Which would return the same data frame as above, except the "v"
column
would be a value in c("a","b","c")
instead of 1:3
.
For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable d[i]
with the same i
subscript
as b[i,v]
, and a variable x
with no subscripts, we could do this:
spread_draws(model, x, d[i], b[i,v])
Which is roughly equivalent to this:
spread_draws(model, x) %>% inner_join(spread_draws(model, d[i])) %>% inner_join(spread_draws(model, b[i,v])) %>% group_by(i,v)
Similarly, this:
gather_draws(model, x, d[i], b[i,v])
Is roughly equivalent to this:
bind_rows( gather_draws(model, x), gather_draws(model, d[i]), gather_draws(model, b[i,v]) )
The c
and cbind
functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts i
and v
, we could do either of these:
spread_draws(model, c(w, x, y, z)[i,v])
spread_draws(model, cbind(w, x, y, z)[i,v]) # equivalent
Each of which is roughly equivalent to this:
spread_draws(model, w[i,v], x[i,v], y[i,v], z[i,v])
Besides being more compact, the c()
-style syntax is currently also
faster (though that may change).
Dimensions can be omitted from the resulting data frame by leaving their names
blank; e.g. spread_draws(model, b[,v])
will omit the first dimension of
b
from the output. This is useful if a dimension is known to contain all
the same value in a given model.
The shorthand ..
can be used to specify one column that should be put
into a wide format and whose names will be the base variable name, plus a dot
("."), plus the value of the dimension at ..
. For example:
spread_draws(model, b[i,..])
would return a grouped data frame
(grouped by i
), with:
column ".chain"
: the chain number
column ".iteration"
: the iteration number
column ".draw"
: the draw number
column "i"
: value in 1:20
column "b.1"
: value of "b[i,1]"
for draw ".draw"
column "b.2"
: value of "b[i,2]"
for draw ".draw"
column "b.3"
: value of "b[i,3]"
for draw ".draw"
An optional clause in the form | wide_dimension
can also be used to put
the data frame into a wide format based on wide_dimension
. For example, this:
spread_draws(model, b[i,v] | v)
is roughly equivalent to this:
spread_draws(model, b[i,v]) %>% spread(v,b)
The main difference between using the |
syntax instead of the
..
syntax is that the |
syntax respects prototypes applied to
dimensions with recover_types()
, and thus can be used to get
columns with nicer names. For example:
model %>% recover_types(data) %>% spread_draws(b[i,v] | v)
would return a grouped data frame
(grouped by i
), with:
column ".chain"
: the chain number
column ".iteration"
: the iteration number
column ".draw"
: the draw number
column "i"
: value in 1:20
column "a"
: value of "b[i,1]"
for draw ".draw"
column "b"
: value of "b[i,2]"
for draw ".draw"
column "c"
: value of "b[i,3]"
for draw ".draw"
The shorthand .
can be used to specify columns that should be nested
into vectors, matrices, or n-dimensional arrays (depending on how many dimensions
are specified with .
).
For example, spread_draws(model, a[.], b[.,.])
might return a
data frame, with:
column ".chain"
: the chain number.
column ".iteration"
: the iteration number.
column ".draw"
: a unique number for each draw from the posterior.
column "a"
: a list column of vectors.
column "b"
: a list column of matrices.
Ragged arrays are turned into non-ragged arrays with
missing entries given the value NA
.
Finally, variable names can be regular expressions by setting regex = TRUE
; e.g.:
spread_draws(model, `b_.*`[i], regex = TRUE)
Would return a tidy data frame with variables starting with b_
and having one dimension.
A data frame.
Matthew Kay
spread_rvars()
, recover_types()
, compose_data()
.
library(dplyr) library(ggplot2) data(RankCorr, package = "ggdist") RankCorr %>% spread_draws(b[i, j]) RankCorr %>% spread_draws(b[i, j], tau[i], u_tau[i]) RankCorr %>% gather_draws(b[i, j], tau[i], u_tau[i]) RankCorr %>% gather_draws(tau[i], typical_r) %>% median_qi()
library(dplyr) library(ggplot2) data(RankCorr, package = "ggdist") RankCorr %>% spread_draws(b[i, j]) RankCorr %>% spread_draws(b[i, j], tau[i], u_tau[i]) RankCorr %>% gather_draws(b[i, j], tau[i], u_tau[i]) RankCorr %>% gather_draws(tau[i], typical_r) %>% median_qi()
Extract draws from the result of a call to emmeans::emmeans()
(formerly lsmeans
)
or emmeans::ref_grid()
applied to a Bayesian model.
gather_emmeans_draws(object, value = ".value", ...) ## Default S3 method: gather_emmeans_draws(object, value = ".value", ...) ## S3 method for class 'emm_list' gather_emmeans_draws(object, value = ".value", grid = ".grid", ...)
gather_emmeans_draws(object, value = ".value", ...) ## Default S3 method: gather_emmeans_draws(object, value = ".value", ...) ## S3 method for class 'emm_list' gather_emmeans_draws(object, value = ".value", grid = ".grid", ...)
object |
An |
value |
The name of the output column to use to contain the values of draws. Defaults to |
... |
Additional arguments passed to the underlying method for the type of object given. |
grid |
If |
emmeans::emmeans()
provides a convenient syntax for generating draws from "estimated marginal means" from a model,
and can be applied to various Bayesian models, like rstanarm::stanreg-objects and
MCMCglmm::MCMCglmm()
. Given a emmeans::ref_grid()
object as returned by functions like
emmeans::ref_grid()
or emmeans::emmeans()
applied to a Bayesian model,
gather_emmeans_draws
returns a tidy format data frame of draws from
the marginal posterior distributions generated by emmeans::emmeans()
.
A tidy data frame of draws. The columns of the reference grid are returned as-is, with an
additional column called .value
(by default) containing marginal draws. The resulting data
frame is grouped by the columns from the reference grid to make use of summary functions like
point_interval()
straightforward.
If object
is an emmeans::emm_list()
, which contains estimates from different reference grids,
an additional column with the default name of ".grid"
is added to indicate the reference grid for each row in the output.
The name of this column is controlled by the grid
argument.
Matthew Kay
## Not run: library(dplyr) library(magrittr) library(brms) library(emmeans) # Here's an example dataset with a categorical predictor (`condition`) with several levels: set.seed(5) n = 10 n_condition = 5 ABC = tibble( condition = rep(c("A","B","C","D","E"), n), response = rnorm(n * 5, c(0,1,2,1,-1), 0.5) ) m = brm(response ~ condition, data = ABC, # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # Once we've fit the model, we can use emmeans() (and functions # from that package) to get whatever marginal distributions we want. # For example, we can get marginal means by condition: m %>% emmeans(~ condition) %>% gather_emmeans_draws() %>% median_qi() # or we could get pairwise differences: m %>% emmeans(~ condition) %>% contrast(method = "pairwise") %>% gather_emmeans_draws() %>% median_qi() # see the documentation of emmeans() for more examples of types of # contrasts supported by that packge. ## End(Not run)
## Not run: library(dplyr) library(magrittr) library(brms) library(emmeans) # Here's an example dataset with a categorical predictor (`condition`) with several levels: set.seed(5) n = 10 n_condition = 5 ABC = tibble( condition = rep(c("A","B","C","D","E"), n), response = rnorm(n * 5, c(0,1,2,1,-1), 0.5) ) m = brm(response ~ condition, data = ABC, # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # Once we've fit the model, we can use emmeans() (and functions # from that package) to get whatever marginal distributions we want. # For example, we can get marginal means by condition: m %>% emmeans(~ condition) %>% gather_emmeans_draws() %>% median_qi() # or we could get pairwise differences: m %>% emmeans(~ condition) %>% contrast(method = "pairwise") %>% gather_emmeans_draws() %>% median_qi() # see the documentation of emmeans() for more examples of types of # contrasts supported by that packge. ## End(Not run)
Fast method for producing combinations of values in a value column for different levels of a key column, assuming long-format (tidy) data with an equal number of values per key. Among other things, this is useful for producing scatter-plot matrices.
gather_pairs( data, key, value, row = ".row", col = ".col", x = ".x", y = ".y", triangle = c("lower only", "upper only", "lower", "upper", "both only", "both") )
gather_pairs( data, key, value, row = ".row", col = ".col", x = ".x", y = ".y", triangle = c("lower only", "upper only", "lower", "upper", "both only", "both") )
data |
Tidy data frame. |
key |
Bare name of column in |
value |
Bare name of column in |
row |
Character vector giving the name of the output column identifying rows in the matrix
of pairs (takes values of |
col |
Character vector giving the name of the output column identifying columns in the matrix
of pairs (takes values of |
x |
Character vector giving the name of the output column with x values in the matrix
of pairs (takes values of |
y |
Character vector giving the name of the output column with y values in the matrix
of pairs (takes values of |
triangle |
Should the upper or lower triangle of the matrix of all possible combinations be returned?
The default, This method is particularly useful for constructing scatterplot matrices. See examples below. |
A tidy data frame of combinations of values in key
and value
, with columns row
and col
(default names ".row"
and ".col"
) containing values from key
,
and columns y
and x
(default names ".y"
and ".x"
) containing values
from value
.
Matthew Kay
library(ggplot2) library(dplyr) t_a = rnorm(100) t_b = rnorm(100, t_a * 2) t_c = rnorm(100) df = rbind( data.frame(g = "a", t = t_a), data.frame(g = "b", t = t_b), data.frame(g = "c", t = t_c) ) df %>% gather_pairs(g, t, row = "g_row", col = "g_col", x = "t_x", y = "t_y") %>% ggplot(aes(t_x, t_y)) + geom_point() + facet_grid(vars(g_row), vars(g_col)) df %>% gather_pairs(g, t, triangle = "upper") %>% ggplot(aes(.x, .y)) + geom_point() + facet_grid(vars(.row), vars(.col)) df %>% gather_pairs(g, t, triangle = "both") %>% ggplot(aes(.x, .y)) + geom_point() + facet_grid(vars(.row), vars(.col)) data(line, package = "coda") line %>% tidy_draws() %>% gather_variables() %>% gather_pairs(.variable, .value) %>% ggplot(aes(.x, .y)) + geom_point(alpha = .25) + facet_grid(vars(.row), vars(.col)) line %>% tidy_draws() %>% gather_variables() %>% gather_pairs(.variable, .value) %>% ggplot(aes(.x, .y, color = factor(.chain))) + geom_density_2d(alpha = .5) + facet_grid(vars(.row), vars(.col))
library(ggplot2) library(dplyr) t_a = rnorm(100) t_b = rnorm(100, t_a * 2) t_c = rnorm(100) df = rbind( data.frame(g = "a", t = t_a), data.frame(g = "b", t = t_b), data.frame(g = "c", t = t_c) ) df %>% gather_pairs(g, t, row = "g_row", col = "g_col", x = "t_x", y = "t_y") %>% ggplot(aes(t_x, t_y)) + geom_point() + facet_grid(vars(g_row), vars(g_col)) df %>% gather_pairs(g, t, triangle = "upper") %>% ggplot(aes(.x, .y)) + geom_point() + facet_grid(vars(.row), vars(.col)) df %>% gather_pairs(g, t, triangle = "both") %>% ggplot(aes(.x, .y)) + geom_point() + facet_grid(vars(.row), vars(.col)) data(line, package = "coda") line %>% tidy_draws() %>% gather_variables() %>% gather_pairs(.variable, .value) %>% ggplot(aes(.x, .y)) + geom_point(alpha = .25) + facet_grid(vars(.row), vars(.col)) line %>% tidy_draws() %>% gather_variables() %>% gather_pairs(.variable, .value) %>% ggplot(aes(.x, .y, color = factor(.chain))) + geom_density_2d(alpha = .5) + facet_grid(vars(.row), vars(.col))
Extract draws from a Bayesian model for one or more variables (possibly with named dimensions) into one of two types of long-format data frames of posterior::rvar objects.
gather_rvars(model, ..., ndraws = NULL, seed = NULL) spread_rvars(model, ..., ndraws = NULL, seed = NULL)
gather_rvars(model, ..., ndraws = NULL, seed = NULL) spread_rvars(model, ..., ndraws = NULL, seed = NULL)
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Expressions in the form of
|
ndraws |
The number of draws to return, or |
seed |
A seed to use when subsampling draws (i.e. when |
Imagine a JAGS or Stan fit named model
. The model may contain a variable named
b[i,v]
(in the JAGS or Stan language) with dimension i
in 1:100
and
dimension v
in 1:3
. However, the default format for draws returned from
JAGS or Stan in R will not reflect this indexing structure, instead
they will have multiple columns with names like "b[1,1]"
, "b[2,1]"
, etc.
spread_rvars
and gather_rvars
provide a straightforward
syntax to translate these columns back into properly-indexed rvar
s in two different
tidy data frame formats, optionally recovering dimension types (e.g. factor levels) as it does so.
spread_rvars
will spread names of variables in the model across the data frame as column names,
whereas gather_rvars
will gather variable names into a single column named ".variable"
and place
values of variables into a column named ".value"
. To use naming schemes from other packages
(such as broom
), consider passing
results through functions like to_broom_names()
or to_ggmcmc_names()
.
For example, spread_rvars(model, a[i], b[i,v])
might return a data frame with:
column "i"
: value in 1:5
column "v"
: value in 1:10
column "a"
: rvar
containing draws from "a[i]"
column "b"
: rvar
containing draws from "b[i,v]"
gather_rvars(model, a[i], b[i,v])
on the same model would return a data frame with:
column "i"
: value in 1:5
column "v"
: value in 1:10
, or NA
on rows where ".variable"
is "a"
.
column ".variable"
: value in c("a", "b")
.
column ".value"
: rvar
containing draws from "a[i]"
(when ".variable"
is "a"
)
or "b[i,v]"
(when ".variable"
is "b"
)
spread_rvars
and gather_rvars
can use type information
applied to the model
object by recover_types()
to convert columns
back into their original types. This is particularly helpful if some of the dimensions in
your model were originally factors. For example, if the v
dimension
in the original data frame data
was a factor with levels c("a","b","c")
,
then we could use recover_types
before spread_rvars
:
model %>% recover_types(data) spread_rvars(model, b[i,v])
Which would return the same data frame as above, except the "v"
column
would be a value in c("a","b","c")
instead of 1:3
.
For variables that do not share the same subscripts (or share
some but not all subscripts), we can supply their specifications separately.
For example, if we have a variable d[i]
with the same i
subscript
as b[i,v]
, and a variable x
with no subscripts, we could do this:
spread_rvars(model, x, d[i], b[i,v])
Which is roughly equivalent to this:
spread_rvars(model, x) %>% inner_join(spread_rvars(model, d[i])) %>% inner_join(spread_rvars(model, b[i,v]))
Similarly, this:
gather_rvars(model, x, d[i], b[i,v])
Is roughly equivalent to this:
bind_rows( gather_rvars(model, x), gather_rvars(model, d[i]), gather_rvars(model, b[i,v]) )
The c
and cbind
functions can be used to combine multiple variable names that have
the same dimensions. For example, if we have several variables with the same
subscripts i
and v
, we could do either of these:
spread_rvars(model, c(w, x, y, z)[i,v])
spread_rvars(model, cbind(w, x, y, z)[i,v]) # equivalent
Each of which is roughly equivalent to this:
spread_rvars(model, w[i,v], x[i,v], y[i,v], z[i,v])
Besides being more compact, the c()
-style syntax is currently also slightly
faster (though that may change).
Dimensions can be left nested in the resulting rvar
objects by leaving their names
blank; e.g. spread_rvars(model, b[i,])
will place the first index (i
) into
rows of the data frame but leave the second index nested in the b
column
(see Examples below).
A data frame.
Matthew Kay
spread_draws()
, recover_types()
, compose_data()
. See also
posterior::rvar()
and posterior::as_draws_rvars()
, the functions that power
spread_rvars
and gather_rvars
.
library(dplyr) data(RankCorr, package = "ggdist") RankCorr %>% spread_rvars(b[i, j]) # leaving an index out nests the index in the column containing the rvar RankCorr %>% spread_rvars(b[i, ]) RankCorr %>% spread_rvars(b[i, j], tau[i], u_tau[i]) # gather_rvars places variables and values in a longer format data frame RankCorr %>% gather_rvars(b[i, j], tau[i], typical_r)
library(dplyr) data(RankCorr, package = "ggdist") RankCorr %>% spread_rvars(b[i, j]) # leaving an index out nests the index in the column containing the rvar RankCorr %>% spread_rvars(b[i, ]) RankCorr %>% spread_rvars(b[i, j], tau[i], u_tau[i]) # gather_rvars places variables and values in a longer format data frame RankCorr %>% gather_rvars(b[i, j], tau[i], typical_r)
Given a data frame such as might be returned by tidy_draws()
or spread_draws()
,
gather variables and their values from that data frame into a ".variable"
and ".value"
column.
gather_variables(data, exclude = c(".chain", ".iteration", ".draw", ".row"))
gather_variables(data, exclude = c(".chain", ".iteration", ".draw", ".row"))
data |
A data frame with variable names spread across columns, such as one returned by
|
exclude |
A character vector of names of columns to be excluded from the gather. Default ignores several meta-data column names used in tidybayes. |
This function gathers every column except grouping columns and those matching the expression
exclude
into key/value columns ".variable"
and ".value"
.
Imagine a data frame data
as returned by spread_draws(fit, a[i], b[i,v])
, like this:
column ".chain"
: the chain number
column ".iteration"
: the iteration number
column ".draw"
: the draw number
column "i"
: value in 1:5
column "v"
: value in 1:10
column "a"
: value of "a[i]"
for draw number ".draw"
column "b"
: value of "b[i,v]"
for draw number ".draw"
gather_variables(data)
on that data frame would return a grouped
data frame (grouped by i
and v
), with:
column ".chain"
: the chain number
column ".iteration"
: the iteration number
column ".draw"
: the draw number
column "i"
: value in 1:5
column "v"
: value in 1:10
column ".variable"
: value in c("a", "b")
.
column ".value"
: value of "a[i]"
(when ".variable"
is "a"
;
repeated for every value of "v"
) or "b[i,v]"
(when ".variable"
is
"b"
) for draw number ".draw"
In this example, this call:
gather_variables(data)
Is roughly equivalent to:
data %>% gather(.variable, .value, -c(.chain, .iteration, .draw, i, v)) %>% group_by(.variable, .add = TRUE)
A data frame.
Matthew Kay
library(dplyr) data(RankCorr, package = "ggdist") RankCorr %>% spread_draws(b[i,v], tau[i]) %>% gather_variables() %>% median_qi() # the first three lines below are roughly equivalent to ggmcmc::ggs(RankCorr) RankCorr %>% tidy_draws() %>% gather_variables() %>% median_qi()
library(dplyr) data(RankCorr, package = "ggdist") RankCorr %>% spread_draws(b[i,v], tau[i]) %>% gather_variables() %>% median_qi() # the first three lines below are roughly equivalent to ggmcmc::ggs(RankCorr) RankCorr %>% tidy_draws() %>% gather_variables() %>% median_qi()
Get a character vector of the names of the variables in a variety of fitted
Bayesian model types. All models supported by tidy_draws()
are
supported.
get_variables(model) ## Default S3 method: get_variables(model) ## S3 method for class 'mcmc' get_variables(model) ## S3 method for class 'mcmc.list' get_variables(model)
get_variables(model) ## Default S3 method: get_variables(model) ## S3 method for class 'mcmc' get_variables(model) ## S3 method for class 'mcmc.list' get_variables(model)
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
This function is often useful for inspecting a model interactively in order
to construct calls to spread_draws()
or gather_draws()
in order to extract draws from models in a tidy format.
A character vector of variable names in the fitted model.
Matthew Kay
spread_draws()
, gather_draws()
.
data(line, package = "coda") get_variables(line) data(RankCorr, package = "ggdist") get_variables(RankCorr)
data(line, package = "coda") get_variables(line) data(RankCorr, package = "ggdist") get_variables(RankCorr)
Generates a function for generating names of index columns for factors in
compose_data()
by prefixing a character vector to the original
column name.
n_prefix(prefix)
n_prefix(prefix)
prefix |
Character vector to be prepended to column names by
Returns a function. The function returned takes a character vector,
|
The .n_name
argument of compose_data()
.
library(magrittr) df = data.frame( plot = factor(paste0("p", rep(1:8, times = 2))), site = factor(paste0("s", rep(1:4, each = 2, times = 2))) ) # without changing `.n_name`, compose_data() will prefix indices # with "n" by default df %>% compose_data() # you can use n_prefix() to define a different prefix (e.g. "N"): df %>% compose_data(.n_name = n_prefix("N"))
library(magrittr) df = data.frame( plot = factor(paste0("p", rep(1:8, times = 2))), site = factor(paste0("s", rep(1:4, each = 2, times = 2))) ) # without changing `.n_name`, compose_data() will prefix indices # with "n" by default df %>% compose_data() # you can use n_prefix() to define a different prefix (e.g. "N"): df %>% compose_data(.n_name = n_prefix("N"))
rvar
columns in data framesConverts between data-frame-of-rvar
s format and long-data-frame-of-draws
formats by nesting or unnesting all columns containing posterior::rvar
objects.
nest_rvars(data) unnest_rvars(data)
nest_rvars(data) unnest_rvars(data)
data |
A data frame to nest or unnest.
|
For nest_rvars()
, returns a data frame without .chain
, .iteration
, and
.draw
columns, where all non-grouped columns have been converted to rvar
s.
For unnest_rvars()
, returns a data frame with .chain
, .iteration
, and
.draw
columns added, where every rvar
column in the input has been
converted to (one or more) columns containing draws from those rvar
s
in long format. The result is grouped by all non-rvar
columns in the
input; this ensures that nest_rvars(unnest_rvars(x))
returns x
.
library(dplyr) data(RankCorr, package = "ggdist") # here's a data frame with some rvars rvar_df = RankCorr %>% spread_rvars(b[i,], tau[i]) rvar_df # we can unnest it into long format. # note how the result is grouped by all non-rvar input columns, # and nested indices in `b` are converted into columns. draws_df = rvar_df %>% unnest_rvars() draws_df # calling nest_rvars() again on the result of unnest_rvars() # recovers the original data frame nest_rvars(draws_df)
library(dplyr) data(RankCorr, package = "ggdist") # here's a data frame with some rvars rvar_df = RankCorr %>% spread_rvars(b[i,], tau[i]) rvar_df # we can unnest it into long format. # note how the result is grouped by all non-rvar input columns, # and nested indices in `b` are converted into columns. draws_df = rvar_df %>% unnest_rvars() draws_df # calling nest_rvars() again on the result of unnest_rvars() # recovers the original data frame nest_rvars(draws_df)
Deprecated function for generating prediction curves (or a density for a prediction curve).
predict_curve(data, formula, summary = median, ...) predict_curve_density( data, formula, summary = function(...) density_bins(..., n = n), n = 50, ... )
predict_curve(data, formula, summary = median, ...) predict_curve_density( data, formula, summary = function(...) density_bins(..., n = n), n = 50, ... )
data |
A |
formula |
A formula specifying the prediction curve. The left-hand side
of the formula should be a name representing the name of the column that
will hold the predicted response in the returned data frame. The right-hand
side is an expression that may include numeric columns from |
summary |
The function to apply to summarize each predicted response.
Useful functions (if you just want a curve) might be |
... |
Variables defining the curve. The right-hand side of
|
n |
For |
This function is deprecated. Use modelr::data_grid()
combined
with point_interval()
or dplyr::do()
and
density_bins()
instead.
The function generates a predictive curve given posterior draws
(data
), an expression (formula
), and a set of variables
defining the curve (...
). For every group in data
(if it is a
grouped data frame—see group_by(); otherwise the entire data
frame is taken at once), and for each combination of values in ...
,
the right-hand side of formula
is evaluated and its results passed to
the summary
function. This allows a predictive curve to be generated,
given (e.g.) some samples of coefficients in data
and a set of
predictors defining the space of the curve in ...
.
Given a summary function like median()
or mean()
,
this function will produce the median (resp. mean) prediction at each point
on the curve.
Given a summary function like density_bins()
, this function will
produce a predictive distribution for each point on the curve.
predict_curve_density
is a shorthand for such a call, with a
convenient argument for adjusting the number of bins per point on the
curve.
If formula
is in the form lhs ~ rhs
and summary
is a function that returns a single value, such as median
or
mode
, then predict_curve
returns a data.frame
with a
column for each group in data
(if it was grouped), a column for each
variable in ...
, and a column named lhs
with the value of
summary(rhs)
evaluated for every group in data
and combination
of variables in ...
.
If summary
is a function that returns a data.frame
, such as
density_bins()
, predict_curve
has the same set of columns
as above, except that in place of the lhs
column is a set of columns
named lhs.x
for every column named x
returned by
summary
. For example, density_bins()
returns a data frame
with the columns mid
, lower
, upper
, and density
,
so the data frame returned by predict_curve
with summary = density_bins
will have columns lhs.mid
, lhs.lower
,
lhs.upper
, and lhs.density
in place of lhs
.
Matthew Kay
See density_bins()
.
# Deprecated; see examples for density_bins
# Deprecated; see examples for density_bins
Decorate a Bayesian model fit or a sample from it with types for
variable and dimension data types. Meant to be used before calling
spread_draws()
or gather_draws()
so that the values returned by
those functions are translated back into useful data types.
recover_types(model, ...)
recover_types(model, ...)
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Lists (or data frames) providing data prototypes used to convert
columns returned by |
Each argument in ...
specifies a list or data.frame. The model
is decorated with a list of constructors that can convert a numeric column
into the data types in the lists in ...
.
Then, when spread_draws()
or gather_draws()
is called on the decorated
model
, each list entry with the same name as the variable or a dimension
in variable_spec
is a used as a prototype for that variable or dimension —
i.e., its type is taken to be the expected type of that variable or dimension.
Those types are used to translate numeric values of variables back into
useful values (for example, levels of a factor).
The most common use of recover_types
is to automatically translate
dimensions of a variable that correspond to levels of a factor in the original data back into
levels of that factor. The simplest way to do this is to pass in the data
frame from which the original data came.
Supported types of prototypes are factor, ordered, and logical. For example:
if prototypes$v
is a factor, the v column in the
returned draws is translated into a factor using factor(v, labels=levels(prototypes$v), ordered=is.ordered(prototypes$v))
.
if
prototypes$v
is a logical, the v column is translated into a logical
using as.logical(v)
.
Additional data types can be supported by providing a custom implementation
of the generic function as_constructor
.
A decorated version of model
.
Matthew Kay
spread_draws()
, gather_draws()
, compose_data()
.
## Not run: library(dplyr) library(magrittr) library(rstan) # Here's an example dataset with a categorical predictor (`condition`) with several levels: set.seed(5) n = 10 n_condition = 5 ABC = tibble( condition = factor(rep(c("A","B","C","D","E"), n)), response = rnorm(n * 5, c(0,1,2,1,-1), 0.5) ) # We'll fit the following model to it: stan_code = " data { int<lower=1> n; int<lower=1> n_condition; int<lower=1, upper=n_condition> condition[n]; real response[n]; } parameters { real overall_mean; vector[n_condition] condition_zoffset; real<lower=0> response_sd; real<lower=0> condition_mean_sd; } transformed parameters { vector[n_condition] condition_mean; condition_mean = overall_mean + condition_zoffset * condition_mean_sd; } model { response_sd ~ cauchy(0, 1); // => half-cauchy(0, 1) condition_mean_sd ~ cauchy(0, 1); // => half-cauchy(0, 1) overall_mean ~ normal(0, 5); //=> condition_mean ~ normal(overall_mean, condition_mean_sd) condition_zoffset ~ normal(0, 1); for (i in 1:n) { response[i] ~ normal(condition_mean[condition[i]], response_sd); } } " m = stan(model_code = stan_code, data = compose_data(ABC), control = list(adapt_delta=0.99), # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # without using recover_types(), the `condition` column returned by spread_draws() # will be an integer: m %>% spread_draws(condition_mean[condition]) %>% median_qi() # If we apply recover_types() first, subsequent calls to other tidybayes functions will # automatically back-convert factors so that they are labeled with their original levels # (assuming the same name is used) m %<>% recover_types(ABC) # now the `condition` column with be a factor with levels "A", "B", "C", ... m %>% spread_draws(condition_mean[condition]) %>% median_qi() ## End(Not run)
## Not run: library(dplyr) library(magrittr) library(rstan) # Here's an example dataset with a categorical predictor (`condition`) with several levels: set.seed(5) n = 10 n_condition = 5 ABC = tibble( condition = factor(rep(c("A","B","C","D","E"), n)), response = rnorm(n * 5, c(0,1,2,1,-1), 0.5) ) # We'll fit the following model to it: stan_code = " data { int<lower=1> n; int<lower=1> n_condition; int<lower=1, upper=n_condition> condition[n]; real response[n]; } parameters { real overall_mean; vector[n_condition] condition_zoffset; real<lower=0> response_sd; real<lower=0> condition_mean_sd; } transformed parameters { vector[n_condition] condition_mean; condition_mean = overall_mean + condition_zoffset * condition_mean_sd; } model { response_sd ~ cauchy(0, 1); // => half-cauchy(0, 1) condition_mean_sd ~ cauchy(0, 1); // => half-cauchy(0, 1) overall_mean ~ normal(0, 5); //=> condition_mean ~ normal(overall_mean, condition_mean_sd) condition_zoffset ~ normal(0, 1); for (i in 1:n) { response[i] ~ normal(condition_mean[condition[i]], response_sd); } } " m = stan(model_code = stan_code, data = compose_data(ABC), control = list(adapt_delta=0.99), # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # without using recover_types(), the `condition` column returned by spread_draws() # will be an integer: m %>% spread_draws(condition_mean[condition]) %>% median_qi() # If we apply recover_types() first, subsequent calls to other tidybayes functions will # automatically back-convert factors so that they are labeled with their original levels # (assuming the same name is used) m %<>% recover_types(ABC) # now the `condition` column with be a factor with levels "A", "B", "C", ... m %>% spread_draws(condition_mean[condition]) %>% median_qi() ## End(Not run)
Given a tidy-format data frame of draws with a column indexing each draw, subsample the data frame to a given size based on a column indexing draws, ensuring that rows in sub-groups of a grouped data frame are sampled from the same draws.
sample_draws(data, ndraws, draw = ".draw", seed = NULL)
sample_draws(data, ndraws, draw = ".draw", seed = NULL)
data |
Data frame to sample from |
ndraws |
The number of draws to return, or |
draw |
The name of the column indexing the draws; default |
seed |
A seed to use when subsampling draws (i.e. when |
sample_draws()
makes it easier to sub-sample a grouped, tidy-format data frame of draws. On a grouped data frame,
the naive approach of using filter
with the .draw
column will give incorrect results as it will
select a different sample within each group. sample_draws()
ensures the same sample is selected within
each group.
Matthew Kay
## Not run: library(ggplot2) library(dplyr) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars, # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # draw 100 fit lines from the posterior and overplot them mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% add_epred_draws(m_mpg) %>% # NOTE: only use sample_draws here when making spaghetti plots; for # plotting intervals it is always best to use all draws sample_draws(100) %>% ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) + geom_line(aes(y = .epred, group = paste(cyl, .draw)), alpha = 0.25) + geom_point(data = mtcars) ## End(Not run)
## Not run: library(ggplot2) library(dplyr) library(brms) library(modelr) theme_set(theme_light()) m_mpg = brm(mpg ~ hp * cyl, data = mtcars, # 1 chain / few iterations just so example runs quickly # do not use in practice chains = 1, iter = 500) # draw 100 fit lines from the posterior and overplot them mtcars %>% group_by(cyl) %>% data_grid(hp = seq_range(hp, n = 101)) %>% add_epred_draws(m_mpg) %>% # NOTE: only use sample_draws here when making spaghetti plots; for # plotting intervals it is always best to use all draws sample_draws(100) %>% ggplot(aes(x = hp, y = mpg, color = ordered(cyl))) + geom_line(aes(y = .epred, group = paste(cyl, .draw)), alpha = 0.25) + geom_point(data = mtcars) ## End(Not run)
grouped_df
objectsAn implementation of posterior::summarise_draws()
for grouped data frames
(dplyr::grouped_df
objects) such as returned by dplyr::group_by()
and
the various grouped-data-aware functions in tidybayes, such as spread_draws()
,
gather_draws()
, add_epred_draws()
, and add_predicted_draws()
.
This function provides a quick way to get a variety of summary statistics
and diagnostics on draws.
## S3 method for class 'grouped_df' summarise_draws(.x, ...)
## S3 method for class 'grouped_df' summarise_draws(.x, ...)
.x |
A grouped data frame ( |
... |
Name-value pairs of summary or diagnostic functions. The provided names will be used as the names of the columns in the result unless the function returns a named vector, in which case the latter names are used. The functions can be specified in any format supported by as_function(). See Examples. |
While posterior::summarise_draws()
can operate on tidy data frames of
draws in the posterior::draws_df()
format, that format does not support
grouping columns. This provides an implementation of summarise_draws()
that does support grouped data tables, essentially applying
posterior::summarise_draws()
to every sub-table of .x
implied by the
groups defined on the data frame.
See posterior::summarise_draws()
for
more details on the summary statistics and diagnostics you can use with this
function. If you just want point summaries and intervals (not diagnostics),
particularly for plotting, see point_interval()
, which returns long-format
data tables more suitable for that purpose (especially if you want to plot
multiple uncertainty levels).
A data frame (actually, a tibble) with all grouping
columns from .x
, a "variable"
column containing variable names from .x
,
and the remaining columns containing summary statistics and diagnostics.
Matthew Kay
posterior::summarise_draws()
, point_interval()
library(posterior) library(dplyr) d = posterior::example_draws() # The default posterior::summarise_draws() summarises all variables without # splitting out indices: summarise_draws(d) # The grouped_df implementation of summarise_draws() in tidybayes can handle # output from spread_draws(), which is a grouped data table with the indices # (here, `i`) left as columns: d %>% spread_draws(theta[i]) %>% summarise_draws() # Summary functions can also be provided, as in posterior::summarise_draws(): d %>% spread_draws(theta[i]) %>% summarise_draws(median, mad, rhat, ess_tail)
library(posterior) library(dplyr) d = posterior::example_draws() # The default posterior::summarise_draws() summarises all variables without # splitting out indices: summarise_draws(d) # The grouped_df implementation of summarise_draws() in tidybayes can handle # output from spread_draws(), which is a grouped data table with the indices # (here, `i`) left as columns: d %>% spread_draws(theta[i]) %>% summarise_draws() # Summary functions can also be provided, as in posterior::summarise_draws(): d %>% spread_draws(theta[i]) %>% summarise_draws(median, mad, rhat, ess_tail)
Extract draws from a Bayesian fit into a wide-format data frame with a
.chain
, .iteration
, and .draw
column, as well as all variables
as columns. This function does not parse indices from variable names
(e.g. for variable names like "x[1]"
); see spread_draws()
or
gather_draws()
for functions that parse variable indices.
tidy_draws(model, ...) ## Default S3 method: tidy_draws(model, ...) ## S3 method for class 'draws' tidy_draws(model, ...) ## S3 method for class 'data.frame' tidy_draws(model, ...) ## S3 method for class 'mcmc.list' tidy_draws(model, ...) ## S3 method for class 'stanfit' tidy_draws(model, ...) ## S3 method for class 'stanreg' tidy_draws(model, ...) ## S3 method for class 'runjags' tidy_draws(model, ...) ## S3 method for class 'jagsUI' tidy_draws(model, ...) ## S3 method for class 'brmsfit' tidy_draws(model, ...) ## S3 method for class 'CmdStanFit' tidy_draws(model, ...) ## S3 method for class 'CmdStanMCMC' tidy_draws(model, ...) ## S3 method for class 'matrix' tidy_draws(model, ...) ## S3 method for class 'MCMCglmm' tidy_draws(model, ...)
tidy_draws(model, ...) ## Default S3 method: tidy_draws(model, ...) ## S3 method for class 'draws' tidy_draws(model, ...) ## S3 method for class 'data.frame' tidy_draws(model, ...) ## S3 method for class 'mcmc.list' tidy_draws(model, ...) ## S3 method for class 'stanfit' tidy_draws(model, ...) ## S3 method for class 'stanreg' tidy_draws(model, ...) ## S3 method for class 'runjags' tidy_draws(model, ...) ## S3 method for class 'jagsUI' tidy_draws(model, ...) ## S3 method for class 'brmsfit' tidy_draws(model, ...) ## S3 method for class 'CmdStanFit' tidy_draws(model, ...) ## S3 method for class 'CmdStanMCMC' tidy_draws(model, ...) ## S3 method for class 'matrix' tidy_draws(model, ...) ## S3 method for class 'MCMCglmm' tidy_draws(model, ...)
model |
A supported Bayesian model fit. Tidybayes supports a variety of model objects; for a full list of supported models, see tidybayes-models. |
... |
Further arguments passed to other methods (mostly unused). |
This function can be useful for quick glances at models (especially
combined with gather_variables()
and median_qi()
), and for models with
parameters without indices in their names (like "x[1]"
).
spread_draws()
and gather_draws()
, which do parse variable name indices,
call this function internally if their input is not already a tidy data frame.
To provide support for new models in tidybayes,
you must provide an implementation of this function or an implementation
of coda::as.mcmc.list()
(tidy_draws
should work on any model
with an implementation of coda::as.mcmc.list()
)
tidy_draws()
can be applied to a data frame that is already a tidy-format data frame
of draws, provided it has one row per draw. In other words, it can be applied to data frames
that have the same format it returns, and it will return the same data frame back, while
checking to ensure the .chain
, .iteration
, and .draw
columns are all
integers (converting if possible) and that the .draw
column is unique. This allows
you to pass already-tidy-format data frames into other tidybayes functions, like
spread_draws()
or gather_draws()
. This functionality can be useful if the
tidying step is expensive: you can tidy once, possibly subsetting to some
particular variables of interest, then call spread_draws()
or gather_draws()
repeatedly to extract variables and indices from the already-tidied data frame.
A data frame (actually, a tibble) with a .chain
column,
.iteration
column, .draw
column, and one column for every variable in model
.
Matthew Kay
spread_draws()
or gather_draws()
, which use this function
internally and provides a friendly interface for extracting tidy data frames from model fits.
library(magrittr) data(line, package = "coda") line %>% tidy_draws()
library(magrittr) data(line, package = "coda") line %>% tidy_draws()
Deprecated functions, arguments, and column names and their alternatives are listed below. Many of the deprecations are due to a naming scheme overhaul in tidybayes version 1.0 (see Deprecated Functions and Deprecated Arguments and Column Names below) or due to the deprecation of horizontal shortcut geoms and stats in tidybayes 2.1 (see Deprecated Horizontal Shortcut Geoms and Stats).
Several deprecated versions of functions use slightly different output
formats (e.g., they use names like term
and estimate
where new
functions use .variable
and .value
; or they set .iteration
even
when iteration information is not available — new functions always set .draw
but may not set .iteration
), so be careful when upgrading to new function names.
See Deprecated Arguments and Column Names, below, for more information.
Functions deprecated in tidybayes 3.0:
fitted_draws
and add_fitted_draws
are deprecated because their names
were confusing: it was unclear to many users if these functions returned
draws from the posterior predictive, the mean of the posterior predictive,
or the linear predictor (and depending on model type it might have been
either of the latter). Use epred_draws()
/add_epred_draws()
if you
want the expectation of the posterior predictive and use
linpred_draws()
/add_linpred_draws()
if you want the linear predictor.
Functions deprecated in tidybayes 1.0:
spread_samples
, extract_samples
, and tidy_samples
are
deprecated names for spread_draws()
. The spread/gather terminology
better distinguishes the resulting data frame format, and draws is more
correct terminology than samples for describing multiple realizations from
a posterior distribution.
gather_samples
is a deprecated name for gather_draws()
,
reflecting a package-wide move to using draws instead of samples for
describing multiple realizations from a distribution.
unspread_samples
is a deprecated name for unspread_draws()
,
reflecting a package-wide move to using draws instead of samples for
describing multiple realizations from a distribution.
ungather_samples
is a deprecated name for ungather_draws()
,
reflecting a package-wide move to using draws instead of samples for
describing multiple realizations from a distribution.
fitted_samples
/ add_fitted_samples
are deprecated names for
fitted_draws
/ add_fitted_draws
,
reflecting a package-wide move to using draws instead of samples for
describing multiple realizations from a distribution. (though see
the note above about the deprecation of fitted_draws
in favor of
epred_draws()
and linpred_draws()
).
predicted_samples
/ add_predicted_samples
are deprecated names for
predicted_draws()
/ add_predicted_draws()
,
reflecting a package-wide move to using draws instead of samples for
describing multiple realizations from a distribution.
gather_lsmeans_samples
and gather_emmeans_samples
are deprecated aliases
for gather_emmeans_draws()
. The new name (estimated marginal means) is more
appropriate for Bayesian models than the old name (least-squares means), and reflects the
naming of the newer emmeans
package. It also reflects
a package-wide move to using draws instead of samples for
describing multiple realizations from a distribution.
as_sample_tibble
and as_sample_data_frame
are deprecated aliases
for tidy_draws()
. The original intent of as_sample_tibble
was to be
used primarily internally (hence its less user-friendly name); however, increasingly
I have come across use cases of tidy_draws
that warrant a more user-friendly name.
It also reflects a package-wide move to using draws instead of samples for
describing multiple realizations from a distribution.
ggeye
is deprecated: for a package whose goal is flexible and customizable
visualization, monolithic functions are inflexible and do not sufficiently capitalize on users'
existing knowledge of ggplot; instead, I think it is more flexible to design geoms and stats
that can used within a complete ggplot workflow. stat_eye()
offers a horizontal
eye plot geom that can be used instead of ggeye
.
See the sections below for additional deprecated functions, including horizontal geoms, stats, and point_intervals
geom_eye
, geom_eyeh
, and geom_halfeyeh
are deprecated spellings of stat_eye()
and
stat_halfeye()
from before name standardization of stats and geoms. Use those functions instead.
Due to the introduction of automatic orientation detection in tidybayes 2.1,
shortcut geoms and stats (which end in h
) are no longer necessary, and are
deprecated. In most cases, these can simply be replaced with the same
geom without the h
suffix and they will remain horizontal; e.g.
stat_halfeyeh(...)
can simply be replaced with stat_halfeye(...)
.
If automatic orientation detection fails, override it with the orientation
parameter; e.g. stat_halfeye(orientation = "horizontal")
.
These deprecated stats and geoms include:
stat_eyeh
/ stat_dist_eyeh
stat_halfeyeh
/ stat_dist_halfeyeh
geom_slabh
/ stat_slabh
/ stat_dist_slabh
geom_intervalh
/ stat_intervalh
/ stat_dist_intervalh
geom_pointintervalh
/ stat_pointintervalh
/ stat_dist_pointintervalh
stat_gradientintervalh
/ stat_dist_gradientintervalh
stat_cdfintervalh
/ stat_dist_cdfintervalh
stat_ccdfintervalh
/ stat_dist_ccdfintervalh
geom_dotsh
/ stat_dotsh
/ stat_dist_dotsh
geom_dotsintervalh
/ stat_intervalh
/ stat_dist_intervalh
stat_histintervalh
These functions ending in h
(e.g., point_intervalh
, median_qih
)
used to be needed for use with ggstance::stat_summaryh
, but are
no longer necessary because ggplot2::stat_summary()
supports
automatic orientation detection, so they have been deprecated.
They behave identically to the corresponding function without the h
,
except that when passed a vector, they return a data frame with
x
/xmin
/xmax
instead of y
/ymin
/ymax
.
point_intervalh
mean_qih
/ median_qih
/ mode_qih
mean_hdih
/ median_hdih
/ mode_hdih
mean_hdcih
/ median_hdcih
/ mode_hdcih
Arguments deprecated in tidybayes 3.0 are:
The n
argument is now called ndraws
in predicted_draws()
, linpred_draws()
, etc.
This prevents some bugs due to partial matching of argument names where n
might
be mistaken for newdata
.
The value
argument in linpred_draws()
is now spelled linpred
and defaults to
".linpred"
in the same way that the predicted_draws()
and epred_draws()
functions
work.
The scale
argument in linpred_draws()
is no longer allowed (use transform
instead)
as this naming scheme only made sense when linpred_draws()
was an alias for
fitted_draws()
, which it no longer is (see note above about the deprecation of
fitted_draws()
).
Versions of tidybayes before version 1.0 used a different naming scheme for several arguments and output columns.
Arguments and column names deprecated in tidybayes 1.0 are:
term
is now .variable
estimate
is now .value
pred
is now .prediction
conf.low
is now .lower
conf.high
is now .upper
.prob
is now .width
The .draw
column was added, and should be used instead of .chain
and .iteration
to uniquely identify draws when you do not care about chains. (.chain
and
.iteration
are still provided for identifying draws within chains, if desired).
To translate to/from the old naming scheme in output, use to_broom_names()
and from_broom_names()
.
Many of these names were updated in version 1.0 in order to make terminology more consistent and in order to satisfy these criteria:
Ignore compatibility with broom names on the assumption an adapter function can be created.
Use names that could be compatible with frequentist approaches (hence .width
instead of .prob
).
Always precede with "." to avoid collisions with variable names in models.
No abbreviations (remembering if something is abbreviated or not can be a pain).
No two-word names (multi-word names can always be standardized on and used in documentation, but I think data frame output should be succinct).
Names should be nouns (I made an exception for lower/upper because they are common).
Matthew Kay
Tidybayes supports two classes of models and sample formats: Models/formats that provide prediction functions, and those that do not.
All supported models/formats support the base tidybayes sample extraction functions, such as
tidy_draws()
, spread_draws()
, gather_draws()
, spread_rvars()
, and gather_rvars()
.
These models/formats include:
rstan models
cmdstanr models
brms::brm()
models
rstanarm models
runjags::runjags()
models
rjags::jags.model()
models, if sampled using rjags::coda.samples()
jagsUI::jags()
models
MCMCglmm::MCMCglmm()
models
coda::mcmc()
and coda::mcmc.list()
objects, which are output by several model
types.
posterior::draws objects
Any object with an implementation of posterior::as_draws_df()
or posterior::as_draws()
.
For a list of those available in your environment, run methods(as_draws_df)
or methods(as_draws)
Any object with an implementation of coda::as.mcmc.list()
.
For a list of those available in your environment, run methods(as.mcmc.list)
If you install the tidybayes.rethinking package, models from the rethinking package are also supported.
In addition, the following models support fit and prediction extraction functions, such as
add_epred_draws()
, add_predicted_draws()
, add_linpred_draws()
, add_epred_rvars()
,
add_predicted_rvars()
, and add_linpred_rvars()
:
brms::brm()
models
rstanarm models
any package with implementations of rstantools::posterior_epred()
,
rstantools::posterior_predict()
, or rstantools::posterior_linpred()
that
include an argument called newdata
which takes a data frame of predictors.
If your model type is not in the above list, you may still be able to use the add_draws()
function to turn matrices of predictive draws (or fit draws) into tidy data frames. Or,
you can wrap output from a prediction function in posterior::rvar()
and add it
to a data frame so long as that output is a matrix with draws as rows.
If you install the tidybayes.rethinking package, models from the rethinking package are also supported.
To include basic support for new models, one need only implement the tidy_draws()
generic function
for that model. Alternatively, objects that support posterior::as_draws()
or coda::as.mcmc.list()
will automatically be supported by tidy_draws()
.
To include support for estimation and prediction, one must either implement the epred_draws()
,
predicted_draws()
, and linpred_draws()
functions or their correspond functions from
rstantools: rstantools::posterior_epred()
, rstantools::posterior_predict()
, and
rstantools::posterior_linpred()
. If you take the latter approach, you should include
newdata
and ndraws
arguments that work as documented in predicted_draws()
.
Inverse operations of spread_draws()
and gather_draws()
, giving
results that look like tidy_draws()
.
ungather_draws( data, ..., variable = ".variable", value = ".value", draw_indices = c(".chain", ".iteration", ".draw"), drop_indices = FALSE ) unspread_draws( data, ..., draw_indices = c(".chain", ".iteration", ".draw"), drop_indices = FALSE )
ungather_draws( data, ..., variable = ".variable", value = ".value", draw_indices = c(".chain", ".iteration", ".draw"), drop_indices = FALSE ) unspread_draws( data, ..., draw_indices = c(".chain", ".iteration", ".draw"), drop_indices = FALSE )
data |
A tidy data frame of draws, such as one output by |
... |
Expressions in the form of
|
variable |
The name of the column in |
value |
The name of the column in |
draw_indices |
Character vector of column names that should be treated
as indices of draws. Operations are done within combinations of these values.
The default is |
drop_indices |
Drop the columns specified by |
These functions take symbolic specifications of variable names and dimensions in the same format as
spread_draws()
and gather_draws()
and invert the tidy data frame back into
a data frame whose column names are variables with dimensions in them.
A data frame.
Matthew Kay
spread_draws()
, gather_draws()
, tidy_draws()
.
library(dplyr) data(RankCorr, package = "ggdist") # We can use unspread_draws to allow us to manipulate draws with tidybayes # and then transform the draws into a form we can use with packages like bayesplot. # Here we subset b[i,j] to just values of i in 1:2 and j == 1, then plot with bayesplot RankCorr %>% spread_draws(b[i,j]) %>% filter(i %in% 1:2, j == 1) %>% unspread_draws(b[i,j], drop_indices = TRUE) %>% bayesplot::mcmc_areas() # As another example, we could use compare_levels to plot all pairwise comparisons # of b[1,j] for j in 1:3 RankCorr %>% spread_draws(b[i,j]) %>% filter(i == 1, j %in% 1:3) %>% compare_levels(b, by = j) %>% unspread_draws(b[j], drop_indices = TRUE) %>% bayesplot::mcmc_areas()
library(dplyr) data(RankCorr, package = "ggdist") # We can use unspread_draws to allow us to manipulate draws with tidybayes # and then transform the draws into a form we can use with packages like bayesplot. # Here we subset b[i,j] to just values of i in 1:2 and j == 1, then plot with bayesplot RankCorr %>% spread_draws(b[i,j]) %>% filter(i %in% 1:2, j == 1) %>% unspread_draws(b[i,j], drop_indices = TRUE) %>% bayesplot::mcmc_areas() # As another example, we could use compare_levels to plot all pairwise comparisons # of b[1,j] for j in 1:3 RankCorr %>% spread_draws(b[i,j]) %>% filter(i == 1, j %in% 1:3) %>% compare_levels(b, by = j) %>% unspread_draws(b[j], drop_indices = TRUE) %>% bayesplot::mcmc_areas()
Generates a lookup vector such that x_at_y(x, y)[y] == x
. Particularly useful
for generating lookup tables for nested indices in conjunction with compose_data()
.
x_at_y(x, y, missing = NA)
x_at_y(x, y, missing = NA)
x |
Values in the resulting lookup vector. There should be only
one unique value of |
y |
Keys in the resulting lookup vector. Should be factors or integers. |
missing |
Missing levels from |
x_at_y(x, y)
returns a vector k
such that k[y] == x
. It also
fills in missing values in y
: if y
is an integer, k
will contain
entries for all values from 1
to max(y)
; if y
is a factor,
k
will contain entries for all values from 1
to nlevels(y)
.
Missing values are replaced with missing
(default NA
).
Matthew Kay
library(magrittr) df = data.frame( plot = factor(paste0("p", rep(1:8, times = 2))), site = factor(paste0("s", rep(1:4, each = 2, times = 2))) ) # turns site into a nested index: site[p] gives the site for plot p df %>% compose_data(site = x_at_y(site, plot))
library(magrittr) df = data.frame( plot = factor(paste0("p", rep(1:8, times = 2))), site = factor(paste0("s", rep(1:4, each = 2, times = 2))) ) # turns site into a nested index: site[p] gives the site for plot p df %>% compose_data(site = x_at_y(site, plot))