gpboost.GPModel

class gpboost.GPModel(likelihood='gaussian', group_data=None, group_rand_coef_data=None, ind_effect_group_rand_coef=None, drop_intercept_group_rand_effect=None, gp_coords=None, gp_rand_coef_data=None, cov_function='matern', cov_fct_shape=1.5, gp_approx='none', num_parallel_threads=None, GPU_use=False, matrix_inversion_method='default', weights=None, likelihood_learning_rate=1.0, cov_fct_taper_range=1.0, cov_fct_taper_shape=1.0, num_neighbors=None, vecchia_ordering='random', ind_points_selection='kmeans++', num_ind_points=None, cover_tree_radius=1.0, seed=0, cluster_ids=None, num_data=None, likelihood_additional_param=None, fidelity_specific_mean=True, free_raw_data=False, model_file=None, model_dict=None, vecchia_approx=None, vecchia_pred_type=None, num_neighbors_pred=None)[source]

Bases: object

Class for random effects model (Gaussian process, grouped random effects, mixed effects models, etc.)

Authors:: Fabio Sigrist

__init__(likelihood='gaussian', group_data=None, group_rand_coef_data=None, ind_effect_group_rand_coef=None, drop_intercept_group_rand_effect=None, gp_coords=None, gp_rand_coef_data=None, cov_function='matern', cov_fct_shape=1.5, gp_approx='none', num_parallel_threads=None, GPU_use=False, matrix_inversion_method='default', weights=None, likelihood_learning_rate=1.0, cov_fct_taper_range=1.0, cov_fct_taper_shape=1.0, num_neighbors=None, vecchia_ordering='random', ind_points_selection='kmeans++', num_ind_points=None, cover_tree_radius=1.0, seed=0, cluster_ids=None, num_data=None, likelihood_additional_param=None, fidelity_specific_mean=True, free_raw_data=False, model_file=None, model_dict=None, vecchia_approx=None, vecchia_pred_type=None, num_neighbors_pred=None)[source]

Initialize a GPModel.

Parameters:

likelihood –

likelihood function (distribution) of the response variable. Available options:

”gaussian”

”bernoulli_logit”:

Bernoulli likelihood with a logit link function for binary classification. Aliases: “binary”, “binary_logit”

”bernoulli_probit”:

Bernoulli likelihood with a probit link function for binary classification. Aliases: “binary_probit”

”quasi_bernoulli_logit”:

quasi-Bernoulli likelihood with a logit link function for y in [0,1]. Aliases: “quasi_binary”, “quasi_binary_logit”

”quasi_bernoulli_probit”:

quasi-Bernoulli likelihood with a probit link function for y in [0,1]. Aliases: “quasi_binary_probit”

”binomial_logit”:

Binomial likelihood with a logit link function. The response variable ‘y’ needs to contain proportions of successes / trials, and the ‘weights’ parameter needs to contain the numbers of trials. Aliases: “binomial”

”binomial_probit”:

Binomial likelihood with a probit link function. The response variable ‘y’ needs to contain proportions of successes / trials, and the ‘weights’ parameter needs to contain the numbers of trials

”beta_binomial”:

Beta-binomial likelihood with a logit link function. The response variable ‘y’ needs to contain proportions of successes / trials, and the ‘weights’ parameter needs to contain the numbers of trials. Aliases: “betabinomial”, “beta-binomial”

”poisson”:

Poisson likelihood with a log link function

”negative_binomial”:

Negative binomial likelihood with a log link function (aka “nbinom2”, “negative_binomial_2”). The variance is mu * (mu + r) / r, mu = mean, r = shape, with this parametrization

”negative_binomial_1”:

Negative binomial 1 (aka “nbinom1”) likelihood with a log link function. The variance is mu * (1 + phi), mu = mean, phi = dispersion, with this parametrization

”gamma”:

Gamma likelihood with a log link function

”tweedie”:

Compound Poisson–Gamma Tweedie likelihood with a log link, variance phi * mu**p, and 1.01 < p < 1.99. Both phi and p are estimated.

”tweedie_fixed_p”:

The same Tweedie likelihood with p fixed through ‘likelihood_additional_param’; only phi is estimated. The fixed power is mandatory and must satisfy 1.01 < p < 1.99.

”gpd”: generalized Pareto likelihood for finite positive responses. The log scale parameter equals the latent predictor eta (sum of fixed and random effects), sigma = exp(eta), and the estimated auxiliary parameter is shape > -0.5.

”egpd_power”: Naveau power-carrier extended generalized Pareto likelihood with auxiliary parameters shape and kappa.

”egpd_power_mixture”: Naveau ordered power-mixture carrier with auxiliary parameters shape, kappa1, delta_kappa, and p.

”egpd_beta”: Naveau beta-carrier extended generalized Pareto likelihood with auxiliary parameters shape and delta.

”egpd_power_beta”: Naveau power-beta carrier with auxiliary parameters shape, delta, and kappa.

All GPD/EGPD likelihoods require finite y > 0. Response means exist for shape < 1 and response variances for shape < 0.5.

”lognormal”:

Log-normal likelihood with a log link function

”beta”:

Beta likelihood with a logit link function (parametrization of Ferrari and Cribari-Neto, 2004)

”t”:

t-distribution (e.g., for robust regression)

”t_fix_df”:

t-distribution with the degrees-of-freedom (df) held fixed and not estimated

The degrees-of-freedom (df) can be set via the ‘likelihood_additional_param’ parameter. The default is df = 2

”quantile_regression” / “asymmetric_laplace” : an asymmetric Laplace likelihood for quantile regression, aliases: “asymmetric_laplace”, “quantile_regression”

The quantile must be supplied through ‘likelihood_additional_param’ and must be strictly between 0 and 1.

”hurdle_gamma”:

The hurdle (or zero-inflated) gamma likelihood is intended for nonnegative continuous response variables with an excess probability of exact zeros. It combines a point mass at zero with a gamma distribution for positive observations. The log-transformed conditional mean of the positive part equals the sum of fixed and random effects, E(y | y > 0) = mu = exp(F(X) + Zb), and the gamma rate parameter equals gamma / mu, where gamma is the shape parameter. Consequently, the mean of the entire response distribution is E(y) = (1-p0) * mu. Both the zero-probability parameter ‘p0’ and the shape parameter ‘gamma’ are estimated.

Example

>>> # Grouped random effects model
>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> # Gaussian process model
>>> gp_model = gpb.GPModel(gp_coords=coords, cov_function="matern", cov_fct_shape=1.5, likelihood="gaussian")

Methods

`__init__`([likelihood, group_data, ...])	Initialize a GPModel.
`fit`(y[, X, params, offset, fixed_effects])	Fit / estimate a GPModel by maximizing the marginal likelihood
`get_aux_pars`([std_err, format_pandas])	Get (estimated) auxiliary (additional) parameters of the likelihood such as the shape parameter of a gamma or a negative binomial distribution.
`get_coef`([std_err, format_pandas])	Get (estimated) linear regression coefficients
`get_cov_pars`([std_err, format_pandas])	Get (estimated) covariance parameters
`get_current_neg_log_likelihood`()	Get the current value of the negative log-likelihood
`model_to_dict`([include_response_data])	Convert a GPModel to a dict for saving.
`neg_log_likelihood`(cov_pars, y[, ...])	Evaluate the negative log-likelihood.
`predict`([predict_response, predict_var, ...])	Make predictions for a GPModel.
`predict_training_data_random_effects`([...])	Predict ("estimate") training data random effects.
`save_model`(filename)	Save a GPModel to file.
`set_optim_params`(params)	Set parameters for estimation of the covariance parameters.
`set_prediction_data`([vecchia_pred_type, ...])	Set the data required for making predictions with a GPModel.
`summary`([std_err])	Print summary of fitted model parameters.

fit(y, X=None, params=None, offset=None, fixed_effects=None)[source]

Fit / estimate a GPModel by maximizing the marginal likelihood

Parameters:

y (list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)) – Response variable data
X (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Covariate data for the fixed effects linear regression term (if there is one)
params (dict or None, optional (default=None)) –
Parameters for the estimation / optimization
- tracebool, optional (default = False)
  If True, information on the progress of the parameter optimization is printed.
- init_cov_parsnumpy array or pandas DataFrame, optional (default = None)
  Initial values for covariance parameters of Gaussian process and random effects (can be None). The order is the same as the order of the parameters in the summary function: first is the error variance (only for “gaussian” likelihood), next follow the variances of the grouped random effects (if there are any, in the order provided in ‘group_data’), and then follow the marginal variance and the range of the Gaussian process. If there are multiple Gaussian processes, then the variances and ranges follow alternatingly. If ‘init_cov_pars = None’, an internatl choice is used that depends on the likelihood and the random effects type and covariance function. If you select the option ‘trace = True’ in the ‘params’ argument, you will see the first initial covariance parameters in iteration 0.
- init_coefnumpy array or pandas DataFrame, optional (default = None)
  Initial values for the regression coefficients (if there are any, can be None)
- init_aux_parsnumpy array or pandas DataFrame, optional (default = None)
  Initial values for additional parameters for non-Gaussian likelihoods (e.g., shape parameter of a gamma or negative binomial likelihood) (can be None).
- init_coef_aux_pars_from_iid_modelbool, optional (default = True)
  If True, regression coefficients and auxiliary parameters are initialized from an iid model (only for models with a linear regression term). This option is ignored if init_coef is provided. If init_aux_pars is provided but init_coef is not, only regression coefficients are initialized from an iid model.
- estimate_cov_par_indexlist, numpy 1-D array, pandas Series / one-column DataFrame with integer data or None, optional (default = -1)
  This allows for disabling the estimation of some (or all) covariance parameters. If estimate_cov_par_index = -1, all covariance parameters are estimated. If estimate_cov_par_index != -1, this should be a vector with length equal to the number of covariance parameters, and estimate_cov_par_index[i] should be of bool type indicating whether parameter number i is estimated or not. For instance, “estimate_cov_par_index”: [1,1,0] means that the first two covariance parameters are estimated and the last one not. Parameters that are not estimated are kept at their initial values (see ‘init_cov_pars’).
- estimate_aux_parsbool, (default = True)
  If True, any additional parameters for non-Gaussian likelihoods are also estimated (e.g., shape parameter of a gamma or negative binomial likelihood).
- optimizer_covstring, optional (default = “lbfgs”)
  Optimizer used for estimating covariance parameters. Options: “lbfgs”, “gradient_descent”, “fisher_scoring”, “newton” ,”nelder_mead”. If there are additional auxiliary parameters for non-Gaussian likelihoods, ‘optimizer_cov’ is also used for those
- optimizer_coefstring, optional (default = “wls” for Gaussian data and “lbfgs” for other likelihoods)
  Optimizer used for estimating linear regression coefficients, if there are any (for the GPBoost algorithm there are usually none). Options: “gradient_descent”, “lbfgs”, “wls”, “nelder_mead”. Gradient descent steps are done simultaneously with gradient descent steps for the covariance parameters. “wls” refers to doing coordinate descent for the regression coefficients using weighted least squares. If ‘optimizer_cov’ is set to “nelder_mead” or “lbfgs”, ‘optimizer_coef’ is automatically also set to the same value.
- maxitinteger, optional (default = 1000)
  Maximal number of iterations for optimization algorithm. If maxit = -999, internal default values are used.
- delta_rel_convdouble, optional (default = 1e-6 except for “nelder_mead” for which the default is 1e-8)
  Convergence tolerance. The algorithm stops if the relative change in eiher the (approximate) log-likelihood or the parameters is below this value. If delta_rel_conv = -999, internal default values are used. Default = 1e-6 except for “nelder_mead” for which the default is 1e-8.
- cg_max_num_it: integer, optional (default = 1000)
  Maximal number of iterations for conjugate gradient algorithms. If cg_max_num_it = -999, internal default values are used.
- cg_max_num_it_tridiag: integer, optional (default = 1000)
  Maximal number of iterations for conjugate gradient algorithm when being run as Lanczos algorithm for tridiagonalization. If cg_max_num_it_tridiag = -999, internal default values are used.
- cg_delta_conv: double, optional (default = 1e-2)
  Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for parameter estimation. If cg_delta_conv = -999, internal default values are used.
- num_rand_vec_trace: integer, optional (default = 50)
  Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix. If num_rand_vec_trace = -999, internal default values are used.
- reuse_rand_vec_trace: boolean, optional (default = True)
  If true, random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix are sampled only once at the beginning of Newton’s method for finding the mode in the Laplace approximation and are then reused in later trace approximations. Otherwise they are sampled every time a trace is calculated.
- seed_rand_vec_trace: integer, optional (default = 1)
  Seed number to generate random vectors (e.g., Rademacher).
- cg_preconditioner_type: string, optional
  Type of preconditioner used for conjugate gradient algorithms.
  
  Options for grouped random effects:
  
  ”ssor” (= default): SSOR preconditioner
  
  ”incomplete_cholesky”: zero fill-in incomplete Cholesky factorization
  
  Options for likelihood != “gaussian” and gp_approx == “vecchia” or likelihood == “gaussian” and gp_approx == “vecchia_latent”:
  
  ”vadu” (= default): (B^T * (D^-1 + W) * B) as preconditioner for inverting (B^T * D^-1 * B + W), where B^T * D^-1 * B approx= Sigma^-1
  
  ”fitc”: modified predictive process preconditioner for inverting (B^-1 * D * B^-T + W^-1)
  
  ”pivoted_cholesky” (= default): (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1), where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma
  
  ”incomplete_cholesky”: zero fill-in incomplete (reverse) Cholesky factorization of (B^T * D^-1 * B + W) using the sparsity pattern of B^T * D^-1 * B approx= Sigma^-1
  
  Options for likelihood != “gaussian” and gp_approx == “full_scale_vecchia”
  
  ”fitc” ( = default): FITC / modified predictive process preconditioner
  
  ”vifdu”: VIF with diagonal update preconditioner
  
  Options for likelihood == “gaussian” and gp_approx == “full_scale_tapering”:
  
  ”fitc” (= default): modified predictive process preconditioner
  
  ”none”: no preconditioner
- fitc_piv_chol_preconditioner_rank: integer, optional
  Rank of the FITC and pivoted Cholesky decomposition preconditioners for iterative methods for Vecchia and VIF approximations (for full_scale_tapering, the same inducing points as in the approximation as used). If fitc_piv_chol_preconditioner_rank = -999, internal default values are used:
  
  200 for the FITC preconditioner
  
  50 for the pivoted Cholesky decomposition preconditioner
- convergence_criterionstring, optional (default = “relative_change_in_log_likelihood”, only relevant for “gradient_descent”, “fisher_scoring”, and “newton”)
  The convergence criterion used for terminating the optimization algorithm. Options: “relative_change_in_log_likelihood” or “relative_change_in_parameters”. If convergence_criterion = “default”, internal default values are used.
- lr_covdouble, optional (default = 0.1 for “gradient_descent” and 1. otherwise, only relevant for “gradient_descent”, “fisher_scoring”, and “newton”)
  Initial learning rate for covariance parameters if a gradient-based optimization method is used.
  
  If lr_cov = -999, internal default values are used (0.1 for “gradient_descent” and 1. otherwise).
  
  If there are additional auxiliary parameters for non-Gaussian likelihoods, ‘lr_cov’ is also used for those.
  
  For “lbfgs”, this is divided by the norm of the gradient in the first iteration.
- lr_coefdouble, optional (default = 0.1, only relevant for “gradient_descent”, “fisher_scoring”, and “newton”)
  Learning rate for fixed effect regression coefficients. If lr_coef = -999, internal default values are used.
- use_nesterov_accbool, optional (default = True, only relevant for “gradient_descent”)
  If True, Nesterov acceleration is used for gradient descent.
- nesterov_schedule_versioninteger, optional (default = 0, only relevant for “gradient_descent”)
  Selects the version of the Nesterov schedule (0 or 1). If nesterov_schedule_version = -999, internal default values are used.
- acc_rate_covdouble, optional (default = 0.5, only relevant for “gradient_descent”)
  Acceleration rate for covariance parameters for Nesterov acceleration. If acc_rate_cov = -999, internal default values are used.
- acc_rate_coefdouble, optional (default = 0.5, only relevant for “gradient_descent”)
  Acceleration rate for regression coefficients (if there are any) for Nesterov acceleration. If acc_rate_coef = -999, internal default values are used.
- momentum_offsetinteger, optional (default = 2, only relevant for “gradient_descent”)
  Number of iterations for which no momentum is applied in the beginning. If momentum_offset = -999, internal default values are used.
- m_lbfgsinteger, optional (default = 6)
  Number of corrections to approximate the inverse Hessian matrix for the “lbfgs” optimizer. If m_lbfgs = -999, internal default values are used.
- delta_conv_mode_findingdouble, optional (default = 1e-8)
  Convergence tolerance in mode finding algorithm for Laplace approximation for non-Gaussian likelihoods. If delta_conv_mode_finding = -999, internal default values are used.
offset (numpy 1-D array or None, optional (default=None)) – Additional fixed effects contributions that are added to the linear predictor (= offset). The length of this vector needs to equal the number of training data points times the number of fixed-effect sets.
fixed_effects (numpy 1-D array or None, optional (default=None)) – This is discontinued. Use the renamed equivalent argument ‘offset’ instead.

Example

>>> # Grouped random effects model
>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.fit(y=y, X=X)
>>> # Gaussian process model
>>> gp_model = gpb.GPModel(gp_coords=X, cov_function="matern", cov_fct_shape=1.5, likelihood="gaussian")
>>> gp_model.fit(y=y)

get_aux_pars(std_err=False, format_pandas=True)[source]

Get (estimated) auxiliary (additional) parameters of the likelihood such as the shape parameter of a gamma or a negative binomial distribution. Some likelihoods (e.g., bernoulli_logit or poisson) have no auxiliary parameters

Parameters:

std_err (bool (default=False)) – If True, (approximate) standard errors are calculated
format_pandas (bool (default=True)) – If True, a pandas DataFrame is returned, otherwise a numpy array is returned

Returns:

result – auxiliary (additional) parameters of the likelihood and standard errors (if std_err=True)

Return type:

numpy array or pandas DataFrame

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gamma")
>>> gp_model.fit(y=y, X=X)
>>> gp_model.get_aux_pars()

get_coef(std_err=False, format_pandas=True)[source]

Get (estimated) linear regression coefficients

Parameters:

std_err (bool (default=False)) – If True, (approximate) standard errors are calculated
format_pandas (bool (default=True)) – If True, a pandas DataFrame is returned, otherwise a numpy array is returned

Returns:

result – (estimated) linear regression coefficients and standard errors (if std_err=True)

Return type:

numpy array or pandas DataFrame

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.fit(y=y, X=X)
>>> gp_model.get_cov_pars()

get_cov_pars(std_err=False, format_pandas=True)[source]

Get (estimated) covariance parameters

Parameters:

std_err (bool (default=False)) – If True, (approximate) standard errors are calculated
format_pandas (bool (default=True)) – If True, a pandas DataFrame is returned, otherwise a numpy array is returned

Returns:

result – (estimated) covariance parameters and standard errors (if std_err=True)

Return type:

pandas DataFrame

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.fit(y=y, X=X)
>>> gp_model.get_cov_pars()

get_current_neg_log_likelihood()[source]

Get the current value of the negative log-likelihood

Returns:: result
Return type:: the current value of the negative log-likelihood

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.fit(y=y)
>>> gp_model.get_current_neg_log_likelihood()

model_to_dict(include_response_data=True)[source]

Convert a GPModel to a dict for saving.

Parameters:: include_response_data (bool (default=False)) – If true, the response variable data is also included in the dict
Returns:: model_dict – GPModel in dict format.
Return type:: dict

neg_log_likelihood(cov_pars, y, fixed_effects=None, aux_pars=None)[source]

Evaluate the negative log-likelihood. If there is a linear fixed effects predictor term, this needs to be calculated “manually” prior to calling this function (see example below)

Parameters:

cov_pars (list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)) – Covariance parameters of Gaussian process and random effects
y (list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)) – Response variable data
fixed_effects (numpy 1-D array or None, optional (default=None)) – A vector with fixed effects, e.g., containing a linear predictor. The length of this vector needs to equal the number of training data points times the number of fixed-effect sets.
aux_pars (numpy array or pandas DataFrame, optional (default = None)) – Additional parameters for non-Gaussian likelihoods (e.g., shape parameter of a gamma or negative binomial likelihood) (can be None)

Returns:

result

Return type:

the value of the negative log-likelihood

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> coef = [0, 0.1]
>>> fixed_effects = X.dot(coef)
>>> gp_model.neg_log_likelihood(y=y, cov_pars=[1.,1.], fixed_effects=fixed_effects)

predict(predict_response=True, predict_var=False, predict_cov_mat=False, sample_posterior=False, sample_prior=False, num_post_samples=100, num_prior_samples=100, y=None, cov_pars=None, group_data_pred=None, group_rand_coef_data_pred=None, gp_coords_pred=None, gp_rand_coef_data_pred=None, cluster_ids_pred=None, X_pred=None, use_saved_data=False, offset=None, offset_pred=None, fixed_effects=None, fixed_effects_pred=None, vecchia_pred_type=None, num_neighbors_pred=None)[source]

Make predictions for a GPModel.

Parameters:

predict_response (bool (default=True)) – If True, the response variable (label) is predicted, otherwise the latent random effects
predict_var (bool (default=False)) – If True, the (posterior) predictive variances are calculated
predict_cov_mat (bool (default=False)) – If True, the (posterior) predictive covariance is calculated in addition to the (posterior) predictive mean
sample_posterior (bool (default=False)) – If True, samples from the posterior are drawn
sample_prior (bool (default=False)) – If True, samples from the prior are drawn
num_post_samples (integer (default=100)) – Number of posterior samples to draw if ‘sample_posterior=True’
num_prior_samples (integer (default=100)) – Number of prior samples to draw if ‘sample_prior=True’
y (list, numpy 1-D array, pandas Series / one-column DataFrame or None, optional (default=None)) – Observed response variable data (can be None, e.g. when the model has been estimated already and the same data is used for making predictions)
cov_pars (numpy array or None, optional (default = None)) – A vector containing covariance parameters which are used if the gp_model has not been trained or if predictions should be made for other parameters than the estimated ones
group_data_pred (numpy array or pandas DataFrame with numeric or string data or None, optional (default=None)) – The elements are group levels for which predictions are made (if there are any grouped random effects in the model)
group_rand_coef_data_pred (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Covariate data for grouped random coefficients (if there are some in the model)
gp_coords_pred (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Prediction coordinates (=features) for Gaussian process (if there is a GP in the model)
gp_rand_coef_data_pred (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Covariate data for Gaussian process random coefficients (if there are some in the model)
cluster_ids_pred (list, numpy 1-D array, pandas Series / one-column DataFrame with numeric or string data or None, optional (default=None)) – The elements indicating independent realizations of random effects / Gaussian processes for which predictions are made (set to None if you have not specified this when creating the model)
X_pred (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Prediction covariate data for the fixed effects linear regression term (if there is one)
use_saved_data (bool (default=False)) – If True, predictions are done using a priory set data via the function ‘set_prediction_data’ (this option is not used by users directly)
offset (numpy 1-D array or None, optional (default=None)) – Additional fixed effects contributions that are added to the linear predictor (= offset). The length of this vector needs to equal the number of training data points times the number of fixed-effect sets.
offset_pred (numpy 1-D array or None, optional (default=None)) – Additional fixed effects contributions that are added to the linear predictor for the prediction points (= offset). The length of this vector needs to equal the number of prediction points times the number of fixed-effect sets.
fixed_effects (numpy 1-D array or None, optional (default=None)) – This is discontinued. Use the renamed equivalent argument ‘offset’ instead
fixed_effects_pred (numpy 1-D array or None, optional (default=None)) – This is discontinued. Use the renamed equivalent argument ‘offset_pred’ instead
vecchia_pred_type (string, optional (default=None)) – The type of Vecchia approximation used for making predictions. This is discontinued here. Use the function ‘set_prediction_data’ to specify this
num_neighbors_pred (integer or None, optional (default=None)) – The number of neighbors for making predictions. This is discontinued here. Use the function ‘set_prediction_data’ to specify this

Returns:

result –

‘mu’ (first entry):

Predictive (=posterior) mean. For (generalized) linear mixed effects models, i.e., models with a linear regression term, this consists of the sum of fixed effects and random effects predictions
’cov’ (second entry):

Predictive (=posterior) covariance matrix. This is None if ‘predict_cov_mat=False’
’var’ (third entry):

Predictive (=posterior) variances. This is None if ‘predict_var=False’

Return type:

a dict with three entries having numpy arrays as values

Example

>>> # Grouped random effects model
>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.fit(y=y, X=X)
>>> pred = gp_model.predict(X_pred=X_test, group_data_pred=group_test,
                            predict_var=True, predict_response=False)
>>> print(pred['mu']) # Predicted latent mean
>>> print(pred['var']) # Predicted latent variance
>>> # Gaussian process model
>>> gp_model = gpb.GPModel(gp_coords=X, cov_function="matern", cov_fct_shape=1.5, likelihood="gaussian")
>>> gp_model.fit(y=y)
>>> pred = gp_model.predict(X_pred=X_test, gp_coords_pred=coords_test,
>>>                         predict_var=True, predict_response=False)

predict_training_data_random_effects(predict_var=False, offset=None)[source]

Predict (“estimate”) training data random effects.

Parameters:

predict_var (bool (default=False)) – If True, the (posterior) predictive variances are calculated
offset (numpy array or None, optional (default=None)) – Fixed effects for the training data. For likelihoods with multiple fixed-effect sets, this must contain one block per fixed-effect set.

Returns:

result

Return type:

a matrix with predicted (“estimated”) training data random effects

Example

>>> # Grouped random effects model
>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.fit(y=y, X=X)
>>> gp_model.predict_training_data_random_effects()
>>> # The function 'predict_training_data_random_effects' returns predicted random effects for all data points.
>>> # Unique random effects for every group can be obtained as follows
>>> first_occurences = [np.where(group==i)[0][0] for i in np.unique(group)]
>>> training_data_random_effects = all_training_data_random_effects.iloc[first_occurences]

save_model(filename)[source]

Save a GPModel to file.

Parameters:: filename (string) – Filename to save a GPModel.
Returns:: self – Returns self.
Return type:: GPModel

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.fit(y=y, X=X)
>>> gp_model.save_model('gp_model.json')

set_optim_params(params)[source]

Set parameters for estimation of the covariance parameters.

Parameters:

params (dict or None, optional (default=None)) –

Parameters for the estimation / optimization

tracebool, optional (default = False)
If True, information on the progress of the parameter optimization is printed.

init_cov_parsnumpy array or pandas DataFrame, optional (default = None)
Initial values for covariance parameters of Gaussian process and random effects (can be None). The order is the same as the order of the parameters in the summary function: first is the error variance (only for “gaussian” likelihood), next follow the variances of the grouped random effects (if there are any, in the order provided in ‘group_data’), and then follow the marginal variance and the range of the Gaussian process. If there are multiple Gaussian processes, then the variances and ranges follow alternatingly. If ‘init_cov_pars = None’, an internatl choice is used that depends on the likelihood and the random effects type and covariance function. If you select the option ‘trace = True’ in the ‘params’ argument, you will see the first initial covariance parameters in iteration 0.

init_coefnumpy array or pandas DataFrame, optional (default = None)
Initial values for the regression coefficients (if there are any, can be None)

init_aux_parsnumpy array or pandas DataFrame, optional (default = None)
Initial values for additional parameters for non-Gaussian likelihoods (e.g., shape parameter of a gamma or negative binomial likelihood) (can be None).

init_coef_aux_pars_from_iid_modelbool, optional (default = True)
If True, regression coefficients and auxiliary parameters are initialized from an iid model (only for models with a linear regression term). This option is ignored if init_coef is provided. If init_aux_pars is provided but init_coef is not, only regression coefficients are initialized from an iid model.

estimate_cov_par_indexlist, numpy 1-D array, pandas Series / one-column DataFrame with integer data or None, optional (default = -1)
This allows for disabling the estimation of some (or all) covariance parameters. If estimate_cov_par_index = -1, all covariance parameters are estimated. If estimate_cov_par_index != -1, this should be a vector with length equal to the number of covariance parameters, and estimate_cov_par_index[i] should be of bool type indicating whether parameter number i is estimated or not. For instance, “estimate_cov_par_index”: [1,1,0] means that the first two covariance parameters are estimated and the last one not. Parameters that are not estimated are kept at their initial values (see ‘init_cov_pars’).

estimate_aux_parsbool, (default = True)
If True, any additional parameters for non-Gaussian likelihoods are also estimated (e.g., shape parameter of a gamma or negative binomial likelihood).

optimizer_covstring, optional (default = “lbfgs”)
Optimizer used for estimating covariance parameters. Options: “lbfgs”, “gradient_descent”, “fisher_scoring”, “newton” ,”nelder_mead”. If there are additional auxiliary parameters for non-Gaussian likelihoods, ‘optimizer_cov’ is also used for those

optimizer_coefstring, optional (default = “wls” for Gaussian data and “lbfgs” for other likelihoods)
Optimizer used for estimating linear regression coefficients, if there are any (for the GPBoost algorithm there are usually none). Options: “gradient_descent”, “lbfgs”, “wls”, “nelder_mead”. Gradient descent steps are done simultaneously with gradient descent steps for the covariance parameters. “wls” refers to doing coordinate descent for the regression coefficients using weighted least squares. If ‘optimizer_cov’ is set to “nelder_mead” or “lbfgs”, ‘optimizer_coef’ is automatically also set to the same value.

maxitinteger, optional (default = 1000)
Maximal number of iterations for optimization algorithm. If maxit = -999, internal default values are used.

delta_rel_convdouble, optional (default = 1e-6 except for “nelder_mead” for which the default is 1e-8)
Convergence tolerance. The algorithm stops if the relative change in eiher the (approximate) log-likelihood or the parameters is below this value. If delta_rel_conv = -999, internal default values are used. Default = 1e-6 except for “nelder_mead” for which the default is 1e-8.

cg_max_num_it: integer, optional (default = 1000)
Maximal number of iterations for conjugate gradient algorithms. If cg_max_num_it = -999, internal default values are used.

cg_max_num_it_tridiag: integer, optional (default = 1000)
Maximal number of iterations for conjugate gradient algorithm when being run as Lanczos algorithm for tridiagonalization. If cg_max_num_it_tridiag = -999, internal default values are used.

cg_delta_conv: double, optional (default = 1e-2)
Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for parameter estimation. If cg_delta_conv = -999, internal default values are used.

num_rand_vec_trace: integer, optional (default = 50)
Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix. If num_rand_vec_trace = -999, internal default values are used.

reuse_rand_vec_trace: boolean, optional (default = True)
If true, random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix are sampled only once at the beginning of Newton’s method for finding the mode in the Laplace approximation and are then reused in later trace approximations. Otherwise they are sampled every time a trace is calculated.

seed_rand_vec_trace: integer, optional (default = 1)
Seed number to generate random vectors (e.g., Rademacher).

cg_preconditioner_type: string, optional
Type of preconditioner used for conjugate gradient algorithms.

Options for grouped random effects:

”ssor” (= default): SSOR preconditioner

”incomplete_cholesky”: zero fill-in incomplete Cholesky factorization

Options for likelihood != “gaussian” and gp_approx == “vecchia” or likelihood == “gaussian” and gp_approx == “vecchia_latent”:

”vadu” (= default): (B^T * (D^-1 + W) * B) as preconditioner for inverting (B^T * D^-1 * B + W), where B^T * D^-1 * B approx= Sigma^-1

”fitc”: modified predictive process preconditioner for inverting (B^-1 * D * B^-T + W^-1)

”pivoted_cholesky” (= default): (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1), where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma

”incomplete_cholesky”: zero fill-in incomplete (reverse) Cholesky factorization of (B^T * D^-1 * B + W) using the sparsity pattern of B^T * D^-1 * B approx= Sigma^-1

Options for likelihood != “gaussian” and gp_approx == “full_scale_vecchia”

”fitc” ( = default): FITC / modified predictive process preconditioner

”vifdu”: VIF with diagonal update preconditioner

Options for likelihood == “gaussian” and gp_approx == “full_scale_tapering”:

”fitc” (= default): modified predictive process preconditioner

”none”: no preconditioner

fitc_piv_chol_preconditioner_rank: integer, optional
Rank of the FITC and pivoted Cholesky decomposition preconditioners for iterative methods for Vecchia and VIF approximations (for full_scale_tapering, the same inducing points as in the approximation as used). If fitc_piv_chol_preconditioner_rank = -999, internal default values are used:

200 for the FITC preconditioner

50 for the pivoted Cholesky decomposition preconditioner

convergence_criterionstring, optional (default = “relative_change_in_log_likelihood”, only relevant for “gradient_descent”, “fisher_scoring”, and “newton”)
The convergence criterion used for terminating the optimization algorithm. Options: “relative_change_in_log_likelihood” or “relative_change_in_parameters”. If convergence_criterion = “default”, internal default values are used.

lr_covdouble, optional (default = 0.1 for “gradient_descent” and 1. otherwise, only relevant for “gradient_descent”, “fisher_scoring”, and “newton”)
Initial learning rate for covariance parameters if a gradient-based optimization method is used.

If lr_cov = -999, internal default values are used (0.1 for “gradient_descent” and 1. otherwise).

If there are additional auxiliary parameters for non-Gaussian likelihoods, ‘lr_cov’ is also used for those.

For “lbfgs”, this is divided by the norm of the gradient in the first iteration.

lr_coefdouble, optional (default = 0.1, only relevant for “gradient_descent”, “fisher_scoring”, and “newton”)
Learning rate for fixed effect regression coefficients. If lr_coef = -999, internal default values are used.

use_nesterov_accbool, optional (default = True, only relevant for “gradient_descent”)
If True, Nesterov acceleration is used for gradient descent.

nesterov_schedule_versioninteger, optional (default = 0, only relevant for “gradient_descent”)
Selects the version of the Nesterov schedule (0 or 1). If nesterov_schedule_version = -999, internal default values are used.

acc_rate_covdouble, optional (default = 0.5, only relevant for “gradient_descent”)
Acceleration rate for covariance parameters for Nesterov acceleration. If acc_rate_cov = -999, internal default values are used.

acc_rate_coefdouble, optional (default = 0.5, only relevant for “gradient_descent”)
Acceleration rate for regression coefficients (if there are any) for Nesterov acceleration. If acc_rate_coef = -999, internal default values are used.

momentum_offsetinteger, optional (default = 2, only relevant for “gradient_descent”)
Number of iterations for which no momentum is applied in the beginning. If momentum_offset = -999, internal default values are used.

m_lbfgsinteger, optional (default = 6)
Number of corrections to approximate the inverse Hessian matrix for the “lbfgs” optimizer. If m_lbfgs = -999, internal default values are used.

delta_conv_mode_findingdouble, optional (default = 1e-8)
Convergence tolerance in mode finding algorithm for Laplace approximation for non-Gaussian likelihoods. If delta_conv_mode_finding = -999, internal default values are used.

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.set_optim_params(params={"optimizer_cov": "nelder_mead", "trace": True})

set_prediction_data(vecchia_pred_type=None, num_neighbors_pred=None, cg_delta_conv_pred=None, nsim_var_pred=None, rank_pred_approx_matrix_lanczos=None, group_data_pred=None, group_rand_coef_data_pred=None, gp_coords_pred=None, gp_rand_coef_data_pred=None, cluster_ids_pred=None, X_pred=None)[source]

Set the data required for making predictions with a GPModel.

Parameters:

vecchia_pred_type (string, optional (default=None)) –
Type of Vecchia approximation used for making predictions

Default value if vecchia_pred_type = None: “order_obs_first_cond_obs_only”

Available options:
- ”order_obs_first_cond_obs_only”:
  
  Vecchia approximation for the observable process and observed training data is ordered first and the neighbors are only observed training data points
- ”order_obs_first_cond_all”:
  
  Vecchia approximation for the observable process and observed training data is ordered first and the neighbors are selected among all points (training + prediction)
- ”latent_order_obs_first_cond_obs_only”:
  
  Vecchia approximation for the latent process and observed data is ordered first and neighbors are only observed points}
- ”latent_order_obs_first_cond_all”:
  
  Vecchia approximation or the latent process and observed data is ordered first and neighbors are selected among all points
- ”order_pred_first”:
  
  Vecchia approximation for the observable process and prediction data is ordered first for making predictions. This option is only available for Gaussian likelihoods
num_neighbors_pred (integer or None, optional (default=None)) –
Number of neighbors for the Vecchia approximation for making predictions

Default value if None: num_neighbors_pred = 2 * num_neighbors
cg_delta_conv_pred (double or None, optional (default=None)) –
Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for prediction

Default value if None: 1e-3
nsim_var_pred (integer or None, optional (default=None)) –
The number of samples when simulation is used for calculating predictive variances

Internal default values if None:
- 500 for grouped random effects
- 1000 for gp_approx = “vecchia” and gp_approx = “full_scale_tapering”
- 100 for gp_approx = “full_scale_vecchia”
rank_pred_approx_matrix_lanczos (integer or None, optional (default=None)) –
The rank of the matrix for approximating predictive covariances obtained using the Lanczos algorithm

Default value if None: 1000
group_data_pred (numpy array or pandas DataFrame with numeric or string data or None, optional (default=None)) – The elements are group levels for which predictions are made (if there are any grouped random effects in the model)
group_rand_coef_data_pred (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Covariate data for grouped random coefficients (if there are some in the model)
gp_coords_pred (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Prediction coordinates (=features) for Gaussian process (if there is a GP in the model)
gp_rand_coef_data_pred (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Covariate data for Gaussian process random coefficients (if there are some in the model)
cluster_ids_pred (list, numpy 1-D array, pandas Series / one-column DataFrame with numeric or string data)
None (or) – The elements indicating independent realizations of random effects / Gaussian processes for which predictions are made (set to None if you have not specified this when creating the model)
(default=None) (optional) – The elements indicating independent realizations of random effects / Gaussian processes for which predictions are made (set to None if you have not specified this when creating the model)
X_pred (numpy array or pandas DataFrame with numeric data or None, optional (default=None)) – Prediction covariate data for the fixed effects linear regression term (if there is on

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> pred = gp_model.set_prediction_data(group_data_pred=group_valid)

summary(std_err=True)[source]

Print summary of fitted model parameters.

Parameters:: std_err (bool (default=False)) – If True, (approximate) standard errors are calculated

Example

>>> gp_model = gpb.GPModel(group_data=group, likelihood="gaussian")
>>> gp_model.fit(y=y, X=X)
>>> gp_model.summary()