C API

Defines

C_API_DTYPE_FLOAT32 (0)

float32 (single precision float).

C_API_DTYPE_FLOAT64 (1)

float64 (double precision float).

C_API_DTYPE_INT32 (2)

int32.

C_API_DTYPE_INT64 (3)

int64.

C_API_FEATURE_IMPORTANCE_GAIN (1)

Gain type of feature importance.

C_API_FEATURE_IMPORTANCE_SPLIT (0)

Split type of feature importance.

C_API_MATRIX_TYPE_CSC (1)

CSC sparse matrix type.

C_API_MATRIX_TYPE_CSR (0)

CSR sparse matrix type.

C_API_PREDICT_CONTRIB (3)

Predict feature contributions (SHAP values).

C_API_PREDICT_LEAF_INDEX (2)

Predict leaf index.

C_API_PREDICT_NORMAL (0)

Normal prediction, with transform (if needed).

C_API_PREDICT_RAW_SCORE (1)

Predict raw score.

THREAD_LOCAL thread_local

Thread local specifier.

Typedefs

typedef void *BoosterHandle

Handle of booster.

typedef void *DatasetHandle

Handle of dataset.

This file is part of GPBoost a C++ library for combining boosting with Gaussian process and mixed effects models

Original work Copyright (c) 2016 Microsoft Corporation. All rights reserved. Modified work Copyright (c) 2020 Fabio Sigrist. All rights reserved.

Licensed under the Apache License Version 2.0 See LICENSE file in the project root for license information.

Note

To avoid type conversion on large data, the most of our exposed interface supports both float32 and float64, except the following:

  1. gradient and Hessian;

  2. current score for training and validation data.

The reason is that they are called frequently, and the type conversion on them may be time-cost.

typedef void *FastConfigHandle

Handle of FastConfig.

typedef void *REModelHandle

Handle of re_model.

Functions

GPBOOST_C_EXPORT int GPB_CanCalculateStandardErrorsCovPars(REModelHandle handle, int *out)
GPBOOST_C_EXPORT int GPB_CreateREModel(int32_t num_data, const int32_t *cluster_ids_data, const char *re_group_data, int32_t num_re_group, const double *re_group_rand_coef_data, const int32_t *ind_effect_group_rand_coef, int32_t num_re_group_rand_coef, const int *drop_intercept_group_rand_effect, int32_t num_gp, const double *gp_coords_data, const int dim_gp_coords, const double *gp_rand_coef_data, int32_t num_gp_rand_coef, const char *cov_fct, double cov_fct_shape, const char *gp_approx, double cov_fct_taper_range, double cov_fct_taper_shape, int num_neighbors, const char *vecchia_ordering, int num_ind_points, double cover_tree_radius, const char *ind_points_selection, const char *likelihood, double likelihood_additional_param, const char *matrix_inversion_method, int seed, int num_parallel_threads, bool GPU_use, bool has_weights, const double *weights, double likelihood_learning_rate, REModelHandle *out)

Create REModel.

Parameters:
  • num_data – Number of data points

  • cluster_ids_data – IDs / labels indicating independent realizations of random effects / Gaussian processes (same values = same process realization)

  • re_group_data – Labels of group levels for the grouped random effects in column-major format (i.e. first the levels for the first effect, then for the second, etc.). Every group label needs to end with the null character ‘\0’

  • num_re_group – Number of grouped random effects

  • re_group_rand_coef_data – Covariate data for grouped random coefficients

  • ind_effect_group_rand_coef – Indices that relate every random coefficients to a “base” intercept grouped random effect. Counting starts at 1.

  • num_re_group_rand_coef – Number of grouped random coefficients

  • drop_intercept_group_rand_effect – Indicates whether intercept random effects are dropped (only for random coefficients). If drop_intercept_group_rand_effect[k] > 0, the intercept random effect number k is dropped. Only random effects with random slopes can be dropped.

  • num_gp – Number of Gaussian processes (intercept only, random coefficients not counting)

  • gp_coords_data – Coordinates (features) for Gaussian process

  • dim_gp_coords – Dimension of the coordinates (=number of features) for Gaussian process

  • gp_rand_coef_data – Covariate data for Gaussian process random coefficients

  • num_gp_rand_coef – Number of Gaussian process random coefficients

  • cov_fct – Type of covariance function for Gaussian process (GP)

  • cov_fct_shape – Shape parameter of covariance function (=smoothness parameter for Matern and Wendland covariance. This parameter is irrelevant for some covariance functions such as the exponential or Gaussian

  • gp_approx – Type of GP-approximation for handling large data

  • cov_fct_taper_range – Range parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)

  • cov_fct_taper_shape – Shape parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)

  • num_neighbors – The number of neighbors used in the Vecchia approximation

  • vecchia_ordering – Ordering used in the Vecchia approximation. “none” = no ordering, “random” = random ordering

  • num_ind_points – Number of inducing points / knots for, e.g., a predictive process approximation

  • cover_tree_radius – Radius (= “spatial resolution”) for the cover tree algorithm

  • ind_points_selection – Method for choosing inducing points

  • likelihood – Likelihood function for the observed response variable

  • likelihood_additional_param – Additional parameter for the likelihood which cannot be estimated (e.g., degrees of freedom for likelihood = “t”)

  • matrix_inversion_method – Method which is used for matrix inversion

  • seed – Seed used for model creation (e.g., random ordering in Vecchia approximation)

  • num_parallel_threads – Number of parallel threads for OMP

  • GPU_use – If TRUE, GPU acceleration will be used if supported.

  • has_weights – True, if sample weights should be used

  • weights – Sample weights

  • likelihood_learning_rate – Likelihood learning rate for generalized Bayesian inference (only non-Gaussian likelihoods)

  • out[out] Created REModel

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_EvalNegLogLikelihood(REModelHandle handle, const double *y_data, double *cov_pars, const double *fixed_effects, double *negll)

Calculate the value of the negative log-likelihood.

Parameters:
  • handle – Handle of REModel

  • y_data – Response variable data

  • cov_pars – Values for covariance parameters of RE components

  • fixed_effects – Fixed effects component of location parameter for observed data (only used for non-Gaussian data). For Gaussian data, this is ignored

  • negll[out] Negative log-likelihood

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetAuxPars(REModelHandle handle, double *aux_pars, char *out_str)

Get additional likelihood parameters (e.g., shape parameter for a gamma likelihood)

Parameters:
  • handle – Handle of REModel

  • aux_pars[out] Additional likelihood parameters (aux_pars_). This vector needs to be pre-allocated

  • out_str[out] Name of the first parameter

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetCGPreconditionerType(REModelHandle handle, char *out_str, int *num_char)

Get name of preconditioner for conjugate gradient algorithm.

Parameters:
  • handle – Handle of REModel

  • out_str[out] Optimizer name

  • num_char[out] Number of characters

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetCoef(REModelHandle handle, double *optim_coef, bool calc_std_dev)

Get / export regression coefficients Note: You should pre-allocate memory for optim_cov_pars. Its length equals the number of covariates or twice this if calc_std_dev = true.

Parameters:
  • handle – Handle of REModel

  • optim_coef[out] Optimal regression coefficients

  • calc_std_dev – If true, standard deviations are also exported

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetCovariateData(REModelHandle handle, double *covariate_data)

Return covariate data.

Parameters:
  • handle – Handle of REModel

  • covariate_data[out] covariate data

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetCovPar(REModelHandle handle, double *optim_cov_pars, bool calc_std_dev)

Get covariance parameters Note: You should pre-allocate memory for optim_cov_pars. Its length equals the number of covariance parameters (num_cov_pars) or twice this if calc_std_dev = true.

Parameters:
  • handle – Handle of REModel

  • optim_cov_pars[out] Optimal covariance parameters

  • calc_std_dev – If true, standard deviations are also exported

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetCurrentNegLogLikelihood(REModelHandle handle, double *negll)

Get the current value of the negative log-likelihood.

Parameters:
  • handle – Handle of REModel

  • negll[out] Negative log-likelihood

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetInitAuxPars(REModelHandle handle, double *aux_pars)

Get initial values for additional likelihood parameters (e.g., shape parameter for a gamma likelihood)

Parameters:
  • handle – Handle of booster

  • aux_pars[out] Initial values for additional likelihood parameters

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetInitCovPar(REModelHandle handle, double *init_cov_pars)

Get initial values for covariance parameters Note: You should pre-allocate memory for optim_cov_pars. Its length equals the number of covariance parameters (num_cov_pars) or twice this if calc_std_dev = true.

Parameters:
  • handle – Handle of REModel

  • init_cov_pars[out] Initial covariance parameters

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetLikelihoodName(REModelHandle handle, char *out_str, int *num_char)

Get name of likelihood.

Parameters:
  • handle – Handle of REModel

  • out_str[out] Likelihood name

  • num_char[out] Number of characters

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetNumAuxPars(BoosterHandle handle, int *num_aux_pars)

Get number of additional likelihood parameters (e.g., shape parameter for a gamma likelihood)

Parameters:
  • handle – Handle of booster

  • num_aux_pars[out] Number of additional likelihood parameters

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetNumCGSteps(BoosterHandle handle, int *num_cg_steps)

Returns the number of CG steps when the CG method was last run.

Parameters:
  • handle – Handle of booster

  • num_cg_steps[out] Number of CG steps

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetNumCGStepsTridiag(BoosterHandle handle, int *num_cg_steps)

Returns the number of CG steps when the CG method was last run for the SLQ method.

Parameters:
  • handle – Handle of booster

  • num_cg_steps[out] Number of CG steps

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetNumIt(REModelHandle handle, int *num_it)

Get / export the number of iterations until convergence Note: You should pre-allocate memory for num_it (length = 1)

Parameters:
  • handle – Handle of REModel

  • num_it[out] Number of iterations for convergence

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetNumModeFindingSteps(BoosterHandle handle, int *num_cg_steps)

Returns the number of mode finding steps from the last mode finding in a Laplace approximation.

Parameters:
  • handle – Handle of booster

  • num_cg_steps[out] Number of steps

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetOffsetData(REModelHandle handle, double *fixed_effects)

Return offset data.

Parameters:
  • handle – Handle of REModel

  • fixed_effects[out] offset data

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetOptimizerCoef(REModelHandle handle, char *out_str, int *num_char)

Get name of linear regression coefficients optimizer.

Parameters:
  • handle – Handle of REModel

  • out_str[out] Optimizer name

  • num_char[out] Number of characters

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetOptimizerCovPars(REModelHandle handle, char *out_str, int *num_char)

Get name of covariance parameter optimizer.

Parameters:
  • handle – Handle of REModel

  • out_str[out] Optimizer name

  • num_char[out] Number of characters

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_GetResponseData(REModelHandle handle, double *response_data)

Return (last used) response variable data.

Parameters:
  • handle – Handle of REModel

  • response_data[out] Response variable data (memory needs to be preallocated)

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_OptimCovPar(REModelHandle handle, const double *y_data, const double *fixed_effects)

Find parameters that minimize the negative log-ligelihood (=MLE)

Parameters:
  • handle – Handle of REModel

  • y_data – Response variable data

  • fixed_effects – Fixed effects component of location parameter (only used for non-Gaussian data). For Gaussian data, this is ignored

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_OptimLinRegrCoefCovPar(REModelHandle handle, const double *y_data, const double *covariate_data, int num_covariates, const double *fixed_effects)

Find linear regression coefficients and covariance parameters that minimize the negative log-ligelihood (=MLE) Note: You should pre-allocate memory for optim_pars. Its length equals 1 + number of covariance parameters + number of linear regression coefficients and 1.

Parameters:
  • handle – Handle of REModel

  • y_data – Response variable data

  • covariate_data – Covariate (=independent variable, feature) data

  • num_covariates – Number of covariates

  • fixed_effects – Additional fixed effects that are added to the linear predictor (= offset)

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_PredictREModel(REModelHandle handle, const double *y_data, int32_t num_data_pred, double *out_predict, bool predict_cov_mat, bool predict_var, bool predict_response, bool sample_posterior, bool sample_prior, int num_post_samples, int num_prior_samples, const int32_t *cluster_ids_data_pred, const char *re_group_data_pred, const double *re_group_rand_coef_data_pred, double *gp_coords_data_pred, const double *gp_rand_coef_data_pred, const double *cov_pars, const double *covariate_data_pred, bool use_saved_data, const double *fixed_effects, const double *fixed_effects_pred)

Make predictions: calculate conditional mean and variances or covariance matrix Note: You should pre-allocate memory for out_predict Its length is equal to num_data_pred if only the conditional mean is predicted (predict_cov_mat==false && predict_var==false) or num_data_pred * (1 + num_data_pred) if the predictive covariance matrix is also calculated (predict_cov_mat==true) or num_data_pred * 2 if predictive variances are also calculated (predict_var==true)

Parameters:
  • handle – Handle of REModel

  • y_data – Response variable for observed data

  • num_data_pred – Number of data points for which predictions are made

  • out_predict[out] Predictive mean at prediction points followed by the predictive covariance matrix in column-major format (if predict_cov_mat==true) or the predictive variances (if predict_var==true)

  • predict_cov_mat – If true, the predictive/conditional covariance matrix is calculated (default=false) (predict_var and predict_cov_mat cannot be both true)

  • predict_var – If true, the predictive/conditional variances are calculated (default=false) (predict_var and predict_cov_mat cannot be both true)

  • predict_response – If true, the response variable (label) is predicted, otherwise the latent random effects

  • sample_posterior – If true, posterior samples are generated

  • sample_prior – If true, prior samples are generated

  • num_post_samples – Number of posterior samples

  • num_prior_samples – Number of prior samples

  • cluster_ids_data_pred – IDs / labels indicating independent realizations of Gaussian processes (same values = same process realization) for which predictions are to be made

  • re_group_data_pred – Labels of group levels for the grouped random effects in column-major format (i.e. first the levels for the first effect, then for the second, etc.). Every group label needs to end with the null character ‘\0’

  • re_group_rand_coef_data_pred – Covariate data for grouped random coefficients

  • gp_coords_data_pred – Coordinates (features) for Gaussian process

  • gp_rand_coef_data_pred – Covariate data for Gaussian process random coefficients

  • cov_pars – Covariance parameters of RE components

  • covariate_data_pred – Covariate data (=independent variables, features) for prediction

  • use_saved_data – If true previusly set data on groups, coordinates, and covariates are used and some arguments of this function are ignored

  • fixed_effects – Fixed effects component of location parameter for observed data (only used for non-Gaussian data). For Gaussian data, this is ignored

  • fixed_effects_pred – Fixed effects component of location parameter for predicted data (only used for non-Gaussian data)

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_PredictREModelTrainingDataRandomEffects(REModelHandle handle, const double *cov_pars_pred, const double *y_obs, double *out_predict, const double *fixed_effects, bool calc_var)

Predict (“estimate”) training data random effects.

Parameters:
  • handle – Handle of REModel

  • cov_pars_pred – Covariance parameters of components

  • y_obs – Response variable for observed data

  • out_predict[out] Predicted training data random effects and variances if calc_var

  • fixed_effects – Fixed effects component of location parameter for observed data (only used for non-Gaussian data). For Gaussian data, this is ignored

  • calc_var – If true, variances are also calculated

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_REModelFree(REModelHandle handle)

Free space for REModel.

Parameters:
  • handle – Handle of REModel to be freed

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_SetLikelihood(REModelHandle handle, const char *likelihood)

Set the type of likelihood.

Parameters:
  • handle – Handle of REModel

  • likelihood – Likelihood name

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_SetOffsetData(REModelHandle handle, const double *fixed_effects)

Set offset data.

Parameters:
  • handle – Handle of REModel

  • fixed_effects – offset data

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_SetOptimConfig(REModelHandle handle, double *init_cov_pars, double lr, double acc_rate_cov, int max_iter, double delta_rel_conv, bool use_nesterov_acc, int nesterov_schedule_version, bool trace, const char *optimizer, int momentum_offset, const char *convergence_criterion, int num_covariates, double *init_coef, double lr_coef, double acc_rate_coef, const char *optimizer_coef, int cg_max_num_it, int cg_max_num_it_tridiag, double cg_delta_conv, int num_rand_vec_trace, bool reuse_rand_vec_trace, const char *cg_preconditioner_type, int seed_rand_vec_trace, int piv_chol_rank, double *init_aux_pars, bool estimate_aux_pars, const int *estimate_cov_par_index, int m_lbfgs, double delta_conv_mode_finding)

Set configuration parameters for the optimizer.

Parameters:
  • handle – Handle of REModel

  • init_cov_pars – Initial values for covariance parameters of RE components

  • lr – Learning rate for covariance parameters. If lr<= 0, internal default values are used (0.1 for “gradient_descent” and 1. for “fisher_scoring”)

  • acc_rate_cov – Acceleration rate for covariance parameters for Nesterov acceleration (only relevant if nesterov_schedule_version == 0)

  • max_iter – Maximal number of iterations

  • delta_rel_conv – Convergence tolerance. The algorithm stops if the relative change in eiher the log-likelihood or the parameters is below this value. For “bfgs”, the L2 norm of the gradient is used instead of the relative change in the log-likelihood

  • use_nesterov_acc – Indicates whether Nesterov acceleration is used in the gradient descent for finding the covariance parameters (only used for “gradient_descent”)

  • nesterov_schedule_version – Which version of Nesterov schedule should be used (only relevant if use_nesterov_acc)

  • trace – If true, the value of the gradient is printed for some iterations

  • optimizer – Optimizer for covariance parameters

  • momentum_offset – Number of iterations for which no mometum is applied in the beginning (only relevant if use_nesterov_acc)

  • convergence_criterion – The convergence criterion used for terminating the optimization algorithm. Options: “relative_change_in_log_likelihood” or “relative_change_in_parameters”

  • num_covariates – Number of covariates

  • init_coef – Initial values for the regression coefficients

  • lr_coef – Learning rate for fixed-effect linear coefficients

  • acc_rate_coef – Acceleration rate for coefficients for Nesterov acceleration (only relevant if nesterov_schedule_version == 0)

  • optimizer_coef – Optimizer for linear regression coefficients

  • cg_max_num_it – Maximal number of iterations for conjugate gradient algorithm

  • cg_max_num_it_tridiag – Maximal number of iterations for conjugate gradient algorithm when being run as Lanczos algorithm for tridiagonalization

  • cg_delta_conv – Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for parameter estimation

  • num_rand_vec_trace – Number of random vectors (e.g. Rademacher) for stochastic approximation of the trace of a matrix

  • reuse_rand_vec_trace – If true, random vectors (e.g. Rademacher) for stochastic approximation of the trace of a matrix are sampled only once at the beginning and then reused in later trace approximations, otherwise they are sampled everytime a trace is calculated

  • cg_preconditioner_type – Type of preconditioner used for the conjugate gradient algorithm

  • seed_rand_vec_trace – Seed number to generate random vectors (e.g. Rademacher) for stochastic approximation of the trace of a matrix

  • piv_chol_rank – Rank of the pivoted cholseky decomposition used as preconditioner of the conjugate gradient algorithm

  • init_aux_pars – Initial values for values for aux_pars_ (e.g., shape parameter of gamma likelihood)

  • estimate_aux_pars – If true, any additional parameters for non-Gaussian likelihoods are also estimated (e.g., shape parameter of gamma likelihood)

  • estimate_cov_par_index – If estimate_cov_par_index[0] >= 0, some covariance parameters might not be estimated, estimate_cov_par_index[i] is then bool and indicates which ones are estimated

  • m_lbfgs – Number of corrections to approximate the inverse Hessian matrix for the lbfgs optimizer

  • delta_conv_mode_finding – Used for checking convergence in mode finding algorithm for non-Gaussian likelihoods

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int GPB_SetPredictionData(REModelHandle handle, int32_t num_data_pred, const int32_t *cluster_ids_data_pred, const char *re_group_data_pred, const double *re_group_rand_coef_data_pred, double *gp_coords_data_pred, const double *gp_rand_coef_data_pred, const double *covariate_data_pred, const char *vecchia_pred_type, int num_neighbors_pred, double cg_delta_conv_pred, int nsim_var_pred, int rank_pred_approx_matrix_lanczos)

Set the data used for making predictions (useful if the same data is used repeatedly, e.g., in validation of GPBoost)

Parameters:
  • handle – Handle of REModel

  • num_data_pred – Number of data points for which predictions are made

  • cluster_ids_data_pred – IDs / labels indicating independent realizations of Gaussian processes (same values = same process realization) for which predictions are to be made

  • re_group_data_pred – Labels of group levels for the grouped random effects in column-major format (i.e. first the levels for the first effect, then for the second, etc.). Every group label needs to end with the null character ‘\0’

  • re_group_rand_coef_data_pred – Covariate data for grouped random coefficients

  • gp_coords_data_pred – Coordinates (features) for Gaussian process

  • gp_rand_coef_data_pred – Covariate data for Gaussian process random coefficients

  • covariate_data_pred – Covariate data (=independent variables, features) for prediction

  • vecchia_pred_type – Type of Vecchia approximation for making predictions. “order_obs_first_cond_obs_only” = observed data is ordered first and neighbors are only observed points, “order_obs_first_cond_all” = observed data is ordered first and neighbors are selected among all points (observed + predicted), “order_pred_first” = predicted data is ordered first for making predictions, “latent_order_obs_first_cond_obs_only” = Vecchia approximation for the latent process and observed data is ordered first and neighbors are only observed points, “latent_order_obs_first_cond_all” = Vecchia approximation for the latent process and observed data is ordered first and neighbors are selected among all points

  • num_neighbors_pred – The number of neighbors used in the Vecchia approximation for making predictions (-1 means that the value already set at initialization is used)

  • cg_delta_conv_pred – Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for prediction

  • nsim_var_pred – Number of samples when simulation is used for calculating predictive variances

  • rank_pred_approx_matrix_lanczos – Rank of the matrix for approximating predictive covariances obtained using the Lanczos algorithm

static char *LastErrorMsg()

Handle of error message.

Returns:

Error message

GPBOOST_C_EXPORT int LGBM_BoosterAddValidData(BoosterHandle handle, const DatasetHandle valid_data)

Add new validation data to booster.

Parameters:
  • handle – Handle of booster

  • valid_data – Validation dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterCalcNumPredict(BoosterHandle handle, int num_row, int predict_type, int start_iteration, int num_iteration, int64_t *out_len)

Get number of predictions.

Parameters:
  • handle – Handle of booster

  • num_row – Number of rows

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iterations for prediction, <= 0 means no limit

  • out_len[out] Length of prediction

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterCreate(const DatasetHandle train_data, const char *parameters, BoosterHandle *out)

Create a new boosting learner.

Parameters:
  • train_data – Training dataset

  • parameters – Parameters in format ‘key1=value1 key2=value2’

  • out[out] Handle of created booster

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterCreateFromModelfile(const char *filename, int *out_num_iterations, BoosterHandle *out)

Load an existing booster from model file.

Parameters:
  • filename – Filename of model

  • out_num_iterations[out] Number of iterations of this booster

  • out[out] Handle of created booster

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterDumpModel(BoosterHandle handle, int start_iteration, int num_iteration, int feature_importance_type, int64_t buffer_len, int64_t *out_len, char *out_str)

Dump model to JSON.

Parameters:
  • handle – Handle of booster

  • start_iteration – Start index of the iteration that should be dumped

  • num_iteration – Index of the iteration that should be dumped, <= 0 means dump all

  • feature_importance_type – Type of feature importance, can be C_API_FEATURE_IMPORTANCE_SPLIT or C_API_FEATURE_IMPORTANCE_GAIN

  • buffer_len – String buffer length, if buffer_len < out_len, you should re-allocate buffer

  • out_len[out] Actual output length

  • out_str[out] JSON format string of model, should pre-allocate memory

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterFeatureImportance(BoosterHandle handle, int num_iteration, int importance_type, double *out_results)

Get model feature importance.

Parameters:
  • handle – Handle of booster

  • num_iteration – Number of iterations for which feature importance is calculated, <= 0 means use all

  • importance_type – Method of importance calculation:

    • C_API_FEATURE_IMPORTANCE_SPLIT: result contains numbers of times the feature is used in a model;

    • C_API_FEATURE_IMPORTANCE_GAIN: result contains total gains of splits which use the feature

  • out_results[out] Result array with feature importance

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterFree(BoosterHandle handle)

Free space for booster.

Parameters:
  • handle – Handle of booster to be freed

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterFreePredictSparse(void *indptr, int32_t *indices, void *data, int indptr_type, int data_type)

Method corresponding to LGBM_BoosterPredictSparseOutput to free the allocated data.

Parameters:
  • indptr – Pointer to output row headers or column headers to be deallocated

  • indices – Pointer to sparse indices to be deallocated

  • data – Pointer to sparse data space to be deallocated

  • indptr_type – Type of indptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetCurrentIteration(BoosterHandle handle, int *out_iteration)

Get index of the current boosting iteration.

Parameters:
  • handle – Handle of booster

  • out_iteration[out] Index of the current boosting iteration

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetEval(BoosterHandle handle, int data_idx, int *out_len, double *out_results)

Get evaluation for training data and validation data.

Note

  1. You should call LGBM_BoosterGetEvalNames first to get the names of evaluation datasets.

  2. You should pre-allocate memory for out_results, you can get its length by LGBM_BoosterGetEvalCounts.

Parameters:
  • handle – Handle of booster

  • data_idx – Index of data, 0: training data, 1: 1st validation data, 2: 2nd validation data and so on

  • out_len[out] Length of output result

  • out_results[out] Array with evaluation results

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetEvalCounts(BoosterHandle handle, int *out_len)

Get number of evaluation datasets.

Parameters:
  • handle – Handle of booster

  • out_len[out] Total number of evaluation datasets

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetEvalNames(BoosterHandle handle, const int len, int *out_len, const size_t buffer_len, size_t *out_buffer_len, char **out_strs)

Get names of evaluation datasets.

Parameters:
  • handle – Handle of booster

  • len – Number of char* pointers stored at out_strs. If smaller than the max size, only this many strings are copied

  • out_len[out] Total number of evaluation datasets

  • buffer_len – Size of pre-allocated strings. Content is copied up to buffer_len - 1 and null-terminated

  • out_buffer_len[out] String sizes required to do the full string copies

  • out_strs[out] Names of evaluation datasets, should pre-allocate memory

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetFeatureNames(BoosterHandle handle, const int len, int *out_len, const size_t buffer_len, size_t *out_buffer_len, char **out_strs)

Get names of features.

Parameters:
  • handle – Handle of booster

  • len – Number of char* pointers stored at out_strs. If smaller than the max size, only this many strings are copied

  • out_len[out] Total number of features

  • buffer_len – Size of pre-allocated strings. Content is copied up to buffer_len - 1 and null-terminated

  • out_buffer_len[out] String sizes required to do the full string copies

  • out_strs[out] Names of features, should pre-allocate memory

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetLeafValue(BoosterHandle handle, int tree_idx, int leaf_idx, double *out_val)

Get leaf value.

Parameters:
  • handle – Handle of booster

  • tree_idx – Index of tree

  • leaf_idx – Index of leaf

  • out_val[out] Output result from the specified leaf

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetLinear(BoosterHandle handle, bool *out)

Get boolean representing whether booster is fitting linear trees.

Parameters:
  • handle – Handle of booster

  • out[out] The address to hold linear trees indicator

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetLowerBoundValue(BoosterHandle handle, double *out_results)

Get model lower bound value.

Parameters:
  • handle – Handle of booster

  • out_results[out] Result pointing to min value

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetNumClasses(BoosterHandle handle, int *out_len)

Get number of classes.

Parameters:
  • handle – Handle of booster

  • out_len[out] Number of classes

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetNumFeature(BoosterHandle handle, int *out_len)

Get number of features.

Parameters:
  • handle – Handle of booster

  • out_len[out] Total number of features

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetNumPredict(BoosterHandle handle, int data_idx, int64_t *out_len)

Get number of predictions for training data and validation data (this can be used to support customized evaluation functions).

Parameters:
  • handle – Handle of booster

  • data_idx – Index of data, 0: training data, 1: 1st validation data, 2: 2nd validation data and so on

  • out_len[out] Number of predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetPredict(BoosterHandle handle, int data_idx, int64_t *out_len, double *out_result)

Get prediction for training data and validation data.

Note

You should pre-allocate memory for out_result, its length is equal to num_class * num_data.

Parameters:
  • handle – Handle of booster

  • data_idx – Index of data, 0: training data, 1: 1st validation data, 2: 2nd validation data and so on

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterGetUpperBoundValue(BoosterHandle handle, double *out_results)

Get model upper bound value.

Parameters:
  • handle – Handle of booster

  • out_results[out] Result pointing to max value

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterLoadModelFromString(const char *model_str, int *out_num_iterations, BoosterHandle *out)

Load an existing booster from string.

Parameters:
  • model_str – Model string

  • out_num_iterations[out] Number of iterations of this booster

  • out[out] Handle of created booster

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterMerge(BoosterHandle handle, BoosterHandle other_handle)

Merge model from other_handle into handle.

Parameters:
  • handle – Handle of booster, will merge another booster into this one

  • other_handle – Other handle of booster

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterNumberOfTotalModel(BoosterHandle handle, int *out_models)

Get number of weak sub-models.

Parameters:
  • handle – Handle of booster

  • out_models[out] Number of weak sub-models

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterNumModelPerIteration(BoosterHandle handle, int *out_tree_per_iteration)

Get number of trees per iteration.

Parameters:
  • handle – Handle of booster

  • out_tree_per_iteration[out] Number of trees per iteration

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSC(BoosterHandle handle, const void *col_ptr, int col_ptr_type, const int32_t *indices, const void *data, int data_type, int64_t ncol_ptr, int64_t nelem, int64_t num_row, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)

Make prediction for a new dataset in CSC format.

Note

You should pre-allocate memory for out_result:

  • for normal and raw score, its length is equal to num_class * num_data;

  • for leaf index, its length is equal to num_class * num_data * num_iteration;

  • for feature contributions, its length is equal to num_class * num_data * (num_feature + 1).

Parameters:
  • handle – Handle of booster

  • col_ptr – Pointer to column headers

  • col_ptr_type – Type of col_ptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • indices – Pointer to row indices

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • ncol_ptr – Number of columns in the matrix + 1

  • nelem – Number of nonzero elements in the matrix

  • num_row – Number of rows

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iteration for prediction, <= 0 means no limit

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSR(BoosterHandle handle, const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)

Make prediction for a new dataset in CSR format.

Note

You should pre-allocate memory for out_result:

  • for normal and raw score, its length is equal to num_class * num_data;

  • for leaf index, its length is equal to num_class * num_data * num_iteration;

  • for feature contributions, its length is equal to num_class * num_data * (num_feature + 1).

Parameters:
  • handle – Handle of booster

  • indptr – Pointer to row headers

  • indptr_type – Type of indptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • indices – Pointer to column indices

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nindptr – Number of rows in the matrix + 1

  • nelem – Number of nonzero elements in the matrix

  • num_col – Number of columns

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iterations for prediction, <= 0 means no limit

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSRSingleRow(BoosterHandle handle, const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)

Make prediction for a new dataset in CSR format. This method re-uses the internal predictor structure from previous calls and is optimized for single row invocation.

Note

You should pre-allocate memory for out_result:

  • for normal and raw score, its length is equal to num_class * num_data;

  • for leaf index, its length is equal to num_class * num_data * num_iteration;

  • for feature contributions, its length is equal to num_class * num_data * (num_feature + 1).

Parameters:
  • handle – Handle of booster

  • indptr – Pointer to row headers

  • indptr_type – Type of indptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • indices – Pointer to column indices

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nindptr – Number of rows in the matrix + 1

  • nelem – Number of nonzero elements in the matrix

  • num_col – Number of columns

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iterations for prediction, <= 0 means no limit

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSRSingleRowFast(FastConfigHandle fastConfig_handle, const void *indptr, const int indptr_type, const int32_t *indices, const void *data, const int64_t nindptr, const int64_t nelem, int64_t *out_len, double *out_result)

Faster variant of LGBM_BoosterPredictForCSRSingleRow.

Score single rows after setup with LGBM_BoosterPredictForCSRSingleRowFastInit.

By removing the setup steps from this call extra optimizations can be made like initializing the config only once, instead of once per call.

Note

Setting up the number of threads is only done once at LGBM_BoosterPredictForCSRSingleRowFastInit instead of at each prediction. If you use a different number of threads in other calls, you need to start the setup process over, or that number of threads will be used for these calls as well.

Note

You should pre-allocate memory for out_result:

  • for normal and raw score, its length is equal to num_class * num_data;

  • for leaf index, its length is equal to num_class * num_data * num_iteration;

  • for feature contributions, its length is equal to num_class * num_data * (num_feature + 1).

Parameters:
  • fastConfig_handle – FastConfig object handle returned by LGBM_BoosterPredictForCSRSingleRowFastInit

  • indptr – Pointer to row headers

  • indptr_type – Type of indptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • indices – Pointer to column indices

  • data – Pointer to the data space

  • nindptr – Number of rows in the matrix + 1

  • nelem – Number of nonzero elements in the matrix

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSRSingleRowFastInit(BoosterHandle handle, const int predict_type, const int start_iteration, const int num_iteration, const int data_type, const int64_t num_col, const char *parameter, FastConfigHandle *out_fastConfig)

Initialize and return a FastConfigHandle for use with LGBM_BoosterPredictForCSRSingleRowFast.

Release the FastConfig by passing its handle to LGBM_FastConfigFree when no longer needed.

Parameters:
  • handle – Booster handle

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iterations for prediction, <= 0 means no limit

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • num_col – Number of columns

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • out_fastConfig[out] FastConfig object with which you can call LGBM_BoosterPredictForCSRSingleRowFast

Returns:

0 when it succeeds, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForFile(BoosterHandle handle, const char *data_filename, int data_has_header, int predict_type, int start_iteration, int num_iteration, const char *parameter, const char *result_filename)

Make prediction for file.

Parameters:
  • handle – Handle of booster

  • data_filename – Filename of file with data

  • data_has_header – Whether file has header or not

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iterations for prediction, <= 0 means no limit

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • result_filename – Filename of result file in which predictions will be written

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForMat(BoosterHandle handle, const void *data, int data_type, int32_t nrow, int32_t ncol, int is_row_major, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)

Make prediction for a new dataset.

Note

You should pre-allocate memory for out_result:

  • for normal and raw score, its length is equal to num_class * num_data;

  • for leaf index, its length is equal to num_class * num_data * num_iteration;

  • for feature contributions, its length is equal to num_class * num_data * (num_feature + 1).

Parameters:
  • handle – Handle of booster

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nrow – Number of rows

  • ncol – Number of columns

  • is_row_major – 1 for row-major, 0 for column-major

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iteration for prediction, <= 0 means no limit

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForMats(BoosterHandle handle, const void **data, int data_type, int32_t nrow, int32_t ncol, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)

Make prediction for a new dataset presented in a form of array of pointers to rows.

Note

You should pre-allocate memory for out_result:

  • for normal and raw score, its length is equal to num_class * num_data;

  • for leaf index, its length is equal to num_class * num_data * num_iteration;

  • for feature contributions, its length is equal to num_class * num_data * (num_feature + 1).

Parameters:
  • handle – Handle of booster

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nrow – Number of rows

  • ncol – Number columns

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iteration for prediction, <= 0 means no limit

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForMatSingleRow(BoosterHandle handle, const void *data, int data_type, int ncol, int is_row_major, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)

Make prediction for a new dataset. This method re-uses the internal predictor structure from previous calls and is optimized for single row invocation.

Note

You should pre-allocate memory for out_result:

  • for normal and raw score, its length is equal to num_class * num_data;

  • for leaf index, its length is equal to num_class * num_data * num_iteration;

  • for feature contributions, its length is equal to num_class * num_data * (num_feature + 1).

Parameters:
  • handle – Handle of booster

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • ncol – Number columns

  • is_row_major – 1 for row-major, 0 for column-major

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iteration for prediction, <= 0 means no limit

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForMatSingleRowFast(FastConfigHandle fastConfig_handle, const void *data, int64_t *out_len, double *out_result)

Faster variant of LGBM_BoosterPredictForMatSingleRow.

Score a single row after setup with LGBM_BoosterPredictForMatSingleRowFastInit.

By removing the setup steps from this call extra optimizations can be made like initializing the config only once, instead of once per call.

Note

Setting up the number of threads is only done once at LGBM_BoosterPredictForMatSingleRowFastInit instead of at each prediction. If you use a different number of threads in other calls, you need to start the setup process over, or that number of threads will be used for these calls as well.

Parameters:
  • fastConfig_handle – FastConfig object handle returned by LGBM_BoosterPredictForMatSingleRowFastInit

  • data – Single-row array data (no other way than row-major form).

  • out_len[out] Length of output result

  • out_result[out] Pointer to array with predictions

Returns:

0 when it succeeds, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictForMatSingleRowFastInit(BoosterHandle handle, const int predict_type, const int start_iteration, const int num_iteration, const int data_type, const int32_t ncol, const char *parameter, FastConfigHandle *out_fastConfig)

Initialize and return a FastConfigHandle for use with LGBM_BoosterPredictForMatSingleRowFast.

Release the FastConfig by passing its handle to LGBM_FastConfigFree when no longer needed.

Parameters:
  • handle – Booster handle

  • predict_type – What should be predicted

    • C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);

    • C_API_PREDICT_RAW_SCORE: raw score;

    • C_API_PREDICT_LEAF_INDEX: leaf index;

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iterations for prediction, <= 0 means no limit

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • ncol – Number of columns

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • out_fastConfig[out] FastConfig object with which you can call LGBM_BoosterPredictForMatSingleRowFast

Returns:

0 when it succeeds, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterPredictSparseOutput(BoosterHandle handle, const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col_or_row, int predict_type, int start_iteration, int num_iteration, const char *parameter, int matrix_type, int64_t *out_len, void **out_indptr, int32_t **out_indices, void **out_data)

Make sparse prediction for a new dataset in CSR or CSC format. Currently only used for feature contributions.

Note

The outputs are pre-allocated, as they can vary for each invocation, but the shape should be the same:

  • for feature contributions, the shape of sparse matrix will be num_class * num_data * (num_feature + 1). The output indptr_type for the sparse matrix will be the same as the given input indptr_type. Call LGBM_BoosterFreePredictSparse to deallocate resources.

Parameters:
  • handle – Handle of booster

  • indptr – Pointer to row headers for CSR or column headers for CSC

  • indptr_type – Type of indptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • indices – Pointer to column indices for CSR or row indices for CSC

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nindptr – Number of rows in the matrix + 1

  • nelem – Number of nonzero elements in the matrix

  • num_col_or_row – Number of columns for CSR or number of rows for CSC

  • predict_type – What should be predicted, only feature contributions supported currently

    • C_API_PREDICT_CONTRIB: feature contributions (SHAP values)

  • start_iteration – Start index of the iteration to predict

  • num_iteration – Number of iterations for prediction, <= 0 means no limit

  • parameter – Other parameters for prediction, e.g. early stopping for prediction

  • matrix_type – Type of matrix input and output, can be C_API_MATRIX_TYPE_CSR or C_API_MATRIX_TYPE_CSC

  • out_len[out] Length of output indices and data

  • out_indptr[out] Pointer to output row headers for CSR or column headers for CSC

  • out_indices[out] Pointer to sparse column indices for CSR or row indices for CSC

  • out_data[out] Pointer to sparse data space

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterRefit(BoosterHandle handle, const int32_t *leaf_preds, int32_t nrow, int32_t ncol)

Refit the tree model using the new data (online learning).

Parameters:
  • handle – Handle of booster

  • leaf_preds – Pointer to predicted leaf indices

  • nrow – Number of rows of leaf_preds

  • ncol – Number of columns of leaf_preds

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterResetParameter(BoosterHandle handle, const char *parameters)

Reset config for booster.

Parameters:
  • handle – Handle of booster

  • parameters – Parameters in format ‘key1=value1 key2=value2’

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterResetTrainingData(BoosterHandle handle, const DatasetHandle train_data)

Reset training data for booster.

Parameters:
  • handle – Handle of booster

  • train_data – Training dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterRollbackOneIter(BoosterHandle handle)

Rollback one iteration.

Parameters:
  • handle – Handle of booster

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterSaveModel(BoosterHandle handle, int start_iteration, int num_iteration, int feature_importance_type, const char *filename)

Save model into file.

Parameters:
  • handle – Handle of booster

  • start_iteration – Start index of the iteration that should be saved

  • num_iteration – Index of the iteration that should be saved, <= 0 means save all

  • feature_importance_type – Type of feature importance, can be C_API_FEATURE_IMPORTANCE_SPLIT or C_API_FEATURE_IMPORTANCE_GAIN

  • filename – The name of the file

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterSaveModelToString(BoosterHandle handle, int start_iteration, int num_iteration, int feature_importance_type, int64_t buffer_len, int64_t *out_len, char *out_str)

Save model to string.

Parameters:
  • handle – Handle of booster

  • start_iteration – Start index of the iteration that should be saved

  • num_iteration – Index of the iteration that should be saved, <= 0 means save all

  • feature_importance_type – Type of feature importance, can be C_API_FEATURE_IMPORTANCE_SPLIT or C_API_FEATURE_IMPORTANCE_GAIN

  • buffer_len – String buffer length, if buffer_len < out_len, you should re-allocate buffer

  • out_len[out] Actual output length

  • out_str[out] String of model, should pre-allocate memory

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterSetLeafValue(BoosterHandle handle, int tree_idx, int leaf_idx, double val)

Set leaf value.

Parameters:
  • handle – Handle of booster

  • tree_idx – Index of tree

  • leaf_idx – Index of leaf

  • val – Leaf value

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterShuffleModels(BoosterHandle handle, int start_iter, int end_iter)

Shuffle models.

Parameters:
  • handle – Handle of booster

  • start_iter – The first iteration that will be shuffled

  • end_iter – The last iteration that will be shuffled

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterUpdateOneIter(BoosterHandle handle, int *is_finished)

Update the model for one iteration.

Parameters:
  • handle – Handle of booster

  • is_finished[out] 1 means the update was successfully finished (cannot split any more), 0 indicates failure

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_BoosterUpdateOneIterCustom(BoosterHandle handle, const float *grad, const float *hess, int *is_finished)

Update the model by specifying gradient and Hessian directly (this can be used to support customized loss functions).

Parameters:
  • handle – Handle of booster

  • grad – The first order derivative (gradient) statistics

  • hess – The second order derivative (Hessian) statistics

  • is_finished[out] 1 means the update was successfully finished (cannot split any more), 0 indicates failure

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetAddFeaturesFrom(DatasetHandle target, DatasetHandle source)

Add features from source to target.

Parameters:
  • target – The handle of the dataset to add features to

  • source – The handle of the dataset to take features from

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetCreateByReference(const DatasetHandle reference, int64_t num_total_row, DatasetHandle *out)

Allocate the space for dataset and bucket feature bins according to reference dataset.

Parameters:
  • reference – Used to align bin mapper with other dataset

  • num_total_row – Number of total rows

  • out[out] Created dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetCreateFromCSC(const void *col_ptr, int col_ptr_type, const int32_t *indices, const void *data, int data_type, int64_t ncol_ptr, int64_t nelem, int64_t num_row, const char *parameters, const DatasetHandle reference, DatasetHandle *out)

Create a dataset from CSC format.

Parameters:
  • col_ptr – Pointer to column headers

  • col_ptr_type – Type of col_ptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • indices – Pointer to row indices

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • ncol_ptr – Number of columns in the matrix + 1

  • nelem – Number of nonzero elements in the matrix

  • num_row – Number of rows

  • parameters – Additional parameters

  • reference – Used to align bin mapper with other dataset, nullptr means isn’t used

  • out[out] Created dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetCreateFromCSR(const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col, const char *parameters, const DatasetHandle reference, DatasetHandle *out)

Create a dataset from CSR format.

Parameters:
  • indptr – Pointer to row headers

  • indptr_type – Type of indptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • indices – Pointer to column indices

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nindptr – Number of rows in the matrix + 1

  • nelem – Number of nonzero elements in the matrix

  • num_col – Number of columns

  • parameters – Additional parameters

  • reference – Used to align bin mapper with other dataset, nullptr means isn’t used

  • out[out] Created dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetCreateFromCSRFunc(void *get_row_funptr, int num_rows, int64_t num_col, const char *parameters, const DatasetHandle reference, DatasetHandle *out)

Create a dataset from CSR format through callbacks.

Parameters:
  • get_row_funptr – Pointer to std::function<void(int idx, std::vector<std::pair<int, double>>& ret)> (called for every row and expected to clear and fill ret)

  • num_rows – Number of rows

  • num_col – Number of columns

  • parameters – Additional parameters

  • reference – Used to align bin mapper with other dataset, nullptr means isn’t used

  • out[out] Created dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetCreateFromFile(const char *filename, const char *parameters, const DatasetHandle reference, DatasetHandle *out)

Load dataset from file (like LightGBM CLI version does).

Parameters:
  • filename – The name of the file

  • parameters – Additional parameters

  • reference – Used to align bin mapper with other dataset, nullptr means isn’t used

  • out[out] A loaded dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetCreateFromMat(const void *data, int data_type, int32_t nrow, int32_t ncol, int is_row_major, const char *parameters, const DatasetHandle reference, DatasetHandle *out)

Create dataset from dense matrix.

Parameters:
  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nrow – Number of rows

  • ncol – Number of columns

  • is_row_major – 1 for row-major, 0 for column-major

  • parameters – Additional parameters

  • reference – Used to align bin mapper with other dataset, nullptr means isn’t used

  • out[out] Created dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetCreateFromMats(int32_t nmat, const void **data, int data_type, int32_t *nrow, int32_t ncol, int is_row_major, const char *parameters, const DatasetHandle reference, DatasetHandle *out)

Create dataset from array of dense matrices.

Parameters:
  • nmat – Number of dense matrices

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nrow – Number of rows

  • ncol – Number of columns

  • is_row_major – 1 for row-major, 0 for column-major

  • parameters – Additional parameters

  • reference – Used to align bin mapper with other dataset, nullptr means isn’t used

  • out[out] Created dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetCreateFromSampledColumn(double **sample_data, int **sample_indices, int32_t ncol, const int *num_per_col, int32_t num_sample_row, int32_t num_total_row, const char *parameters, DatasetHandle *out)

Allocate the space for dataset and bucket feature bins according to sampled data.

Parameters:
  • sample_data – Sampled data, grouped by the column

  • sample_indices – Indices of sampled data

  • ncol – Number of columns

  • num_per_col – Size of each sampling column

  • num_sample_row – Number of sampled rows

  • num_total_row – Number of total rows

  • parameters – Additional parameters

  • out[out] Created dataset

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetDumpText(DatasetHandle handle, const char *filename)

Save dataset to text file, intended for debugging use only.

Parameters:
  • handle – Handle of dataset

  • filename – The name of the file

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetFree(DatasetHandle handle)

Free space for dataset.

Parameters:
  • handle – Handle of dataset to be freed

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetGetFeatureNames(DatasetHandle handle, const int len, int *num_feature_names, const size_t buffer_len, size_t *out_buffer_len, char **feature_names)

Get feature names of dataset.

Parameters:
  • handle – Handle of dataset

  • len – Number of char* pointers stored at out_strs. If smaller than the max size, only this many strings are copied

  • num_feature_names[out] Number of feature names

  • buffer_len – Size of pre-allocated strings. Content is copied up to buffer_len - 1 and null-terminated

  • out_buffer_len[out] String sizes required to do the full string copies

  • feature_names[out] Feature names, should pre-allocate memory

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetGetField(DatasetHandle handle, const char *field_name, int *out_len, const void **out_ptr, int *out_type)

Get info vector from dataset.

Parameters:
  • handle – Handle of dataset

  • field_name – Field name

  • out_len[out] Used to set result length

  • out_ptr[out] Pointer to the result

  • out_type[out] Type of result pointer, can be C_API_DTYPE_INT32, C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetGetNumData(DatasetHandle handle, int *out)

Get number of data points.

Parameters:
  • handle – Handle of dataset

  • out[out] The address to hold number of data points

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetGetNumFeature(DatasetHandle handle, int *out)

Get number of features.

Parameters:
  • handle – Handle of dataset

  • out[out] The address to hold number of features

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetGetSubset(const DatasetHandle handle, const int32_t *used_row_indices, int32_t num_used_row_indices, const char *parameters, DatasetHandle *out)

Create subset of a data.

Parameters:
  • handle – Handle of full dataset

  • used_row_indices – Indices used in subset

  • num_used_row_indices – Length of used_row_indices

  • parameters – Additional parameters

  • out[out] Subset of data

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetPushRows(DatasetHandle dataset, const void *data, int data_type, int32_t nrow, int32_t ncol, int32_t start_row)

Push data to existing dataset, if nrow + start_row == num_total_row, will call dataset->FinishLoad.

Parameters:
  • dataset – Handle of dataset

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nrow – Number of rows

  • ncol – Number of columns

  • start_row – Row start index

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetPushRowsByCSR(DatasetHandle dataset, const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col, int64_t start_row)

Push data to existing dataset, if nrow + start_row == num_total_row, will call dataset->FinishLoad.

Parameters:
  • dataset – Handle of dataset

  • indptr – Pointer to row headers

  • indptr_type – Type of indptr, can be C_API_DTYPE_INT32 or C_API_DTYPE_INT64

  • indices – Pointer to column indices

  • data – Pointer to the data space

  • data_type – Type of data pointer, can be C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

  • nindptr – Number of rows in the matrix + 1

  • nelem – Number of nonzero elements in the matrix

  • num_col – Number of columns

  • start_row – Row start index

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetSaveBinary(DatasetHandle handle, const char *filename)

Save dataset to binary file.

Parameters:
  • handle – Handle of dataset

  • filename – The name of the file

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetSetFeatureNames(DatasetHandle handle, const char **feature_names, int num_feature_names)

Save feature names to dataset.

Parameters:
  • handle – Handle of dataset

  • feature_names – Feature names

  • num_feature_names – Number of feature names

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetSetField(DatasetHandle handle, const char *field_name, const void *field_data, int num_element, int type)

Set vector to a content in info.

Note

  • group only works for C_API_DTYPE_INT32;

  • label and weight only work for C_API_DTYPE_FLOAT32;

  • init_score only works for C_API_DTYPE_FLOAT64.

Parameters:
  • handle – Handle of dataset

  • field_name – Field name, can be label, weight, init_score, group

  • field_data – Pointer to data vector

  • num_element – Number of elements in field_data

  • type – Type of field_data pointer, can be C_API_DTYPE_INT32, C_API_DTYPE_FLOAT32 or C_API_DTYPE_FLOAT64

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_DatasetUpdateParamChecking(const char *old_parameters, const char *new_parameters)

Raise errors for attempts to update dataset parameters.

Parameters:
  • old_parameters – Current dataset parameters

  • new_parameters – New dataset parameters

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_FastConfigFree(FastConfigHandle fastConfig)

Release FastConfig object.

Parameters:
  • fastConfig – Handle to the FastConfig object acquired with a *FastInit() method.

Returns:

0 when it succeeds, -1 when failure happens

GPBOOST_C_EXPORT const char *LGBM_GetLastError()

Get string message of the last error.

Returns:

Error information

GPBOOST_C_EXPORT int LGBM_GPBoosterCreate(const DatasetHandle train_data, const char *parameters, const REModelHandle re_model, BoosterHandle *out)

Create a new boosting learner.

Parameters:
  • train_data – Training dataset

  • parameters – Parameters in format ‘key1=value1 key2=value2’

  • re_model – Gaussian process model

  • out[out] Handle of created booster

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_NetworkFree()

Finalize the network.

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_NetworkInit(const char *machines, int local_listen_port, int listen_time_out, int num_machines)

Initialize the network.

Parameters:
  • machines – List of machines in format ‘ip1:port1,ip2:port2’

  • local_listen_port – TCP listen port for local machines

  • listen_time_out – Socket time-out in minutes

  • num_machines – Total number of machines

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_NetworkInitWithFunctions(int num_machines, int rank, void *reduce_scatter_ext_fun, void *allgather_ext_fun)

Initialize the network with external collective functions.

Parameters:
  • num_machines – Total number of machines

  • rank – Rank of local machine

  • reduce_scatter_ext_fun – The external reduce-scatter function

  • allgather_ext_fun – The external allgather function

Returns:

0 when succeed, -1 when failure happens

GPBOOST_C_EXPORT int LGBM_RegisterLogCallback(void (*callback)(const char*))

Register a callback function for log redirecting.

Parameters:
  • callback – The callback function to register

Returns:

0 when succeed, -1 when failure happens

inline void LGBM_SetLastError(const char *msg)

Set string message of the last error.

Parameters:
  • msg – Error message