C API
Defines
-
C_API_DTYPE_FLOAT32 (0)
float32 (single precision float).
-
C_API_DTYPE_FLOAT64 (1)
float64 (double precision float).
-
C_API_DTYPE_INT32 (2)
int32.
-
C_API_DTYPE_INT64 (3)
int64.
-
C_API_FEATURE_IMPORTANCE_GAIN (1)
Gain type of feature importance.
-
C_API_FEATURE_IMPORTANCE_SPLIT (0)
Split type of feature importance.
-
C_API_MATRIX_TYPE_CSC (1)
CSC sparse matrix type.
-
C_API_MATRIX_TYPE_CSR (0)
CSR sparse matrix type.
-
C_API_PREDICT_CONTRIB (3)
Predict feature contributions (SHAP values).
-
C_API_PREDICT_LEAF_INDEX (2)
Predict leaf index.
-
C_API_PREDICT_NORMAL (0)
Normal prediction, with transform (if needed).
-
C_API_PREDICT_RAW_SCORE (1)
Predict raw score.
-
THREAD_LOCAL thread_local
Thread local specifier.
Typedefs
-
typedef void *BoosterHandle
Handle of booster.
-
typedef void *DatasetHandle
Handle of dataset.
This file is part of GPBoost a C++ library for combining boosting with Gaussian process and mixed effects models
Original work Copyright (c) 2016 Microsoft Corporation. All rights reserved. Modified work Copyright (c) 2020 Fabio Sigrist. All rights reserved.
Licensed under the Apache License Version 2.0 See LICENSE file in the project root for license information.
Note
To avoid type conversion on large data, the most of our exposed interface supports both float32 and float64, except the following:
gradient and Hessian;
current score for training and validation data.
-
typedef void *FastConfigHandle
Handle of FastConfig.
-
typedef void *REModelHandle
Handle of re_model.
Functions
-
GPBOOST_C_EXPORT int GPB_CanCalculateStandardErrorsCovPars(REModelHandle handle, int *out)
-
GPBOOST_C_EXPORT int GPB_CreateREModel(int32_t num_data, const int32_t *cluster_ids_data, const char *re_group_data, int32_t num_re_group, const double *re_group_rand_coef_data, const int32_t *ind_effect_group_rand_coef, int32_t num_re_group_rand_coef, const int *drop_intercept_group_rand_effect, int32_t num_gp, const double *gp_coords_data, const int dim_gp_coords, const double *gp_rand_coef_data, int32_t num_gp_rand_coef, const char *cov_fct, double cov_fct_shape, const char *gp_approx, double cov_fct_taper_range, double cov_fct_taper_shape, int num_neighbors, const char *vecchia_ordering, int num_ind_points, double cover_tree_radius, const char *ind_points_selection, const char *likelihood, double likelihood_additional_param, const char *matrix_inversion_method, int seed, int num_parallel_threads, bool GPU_use, bool has_weights, const double *weights, double likelihood_learning_rate, REModelHandle *out)
Create REModel.
- Parameters:
num_data – Number of data points
cluster_ids_data – IDs / labels indicating independent realizations of random effects / Gaussian processes (same values = same process realization)
re_group_data – Labels of group levels for the grouped random effects in column-major format (i.e. first the levels for the first effect, then for the second, etc.). Every group label needs to end with the null character ‘\0’
num_re_group – Number of grouped random effects
re_group_rand_coef_data – Covariate data for grouped random coefficients
ind_effect_group_rand_coef – Indices that relate every random coefficients to a “base” intercept grouped random effect. Counting starts at 1.
num_re_group_rand_coef – Number of grouped random coefficients
drop_intercept_group_rand_effect – Indicates whether intercept random effects are dropped (only for random coefficients). If drop_intercept_group_rand_effect[k] > 0, the intercept random effect number k is dropped. Only random effects with random slopes can be dropped.
num_gp – Number of Gaussian processes (intercept only, random coefficients not counting)
gp_coords_data – Coordinates (features) for Gaussian process
dim_gp_coords – Dimension of the coordinates (=number of features) for Gaussian process
gp_rand_coef_data – Covariate data for Gaussian process random coefficients
num_gp_rand_coef – Number of Gaussian process random coefficients
cov_fct – Type of covariance function for Gaussian process (GP)
cov_fct_shape – Shape parameter of covariance function (=smoothness parameter for Matern and Wendland covariance. This parameter is irrelevant for some covariance functions such as the exponential or Gaussian
gp_approx – Type of GP-approximation for handling large data
cov_fct_taper_range – Range parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)
cov_fct_taper_shape – Shape parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)
num_neighbors – The number of neighbors used in the Vecchia approximation
vecchia_ordering – Ordering used in the Vecchia approximation. “none” = no ordering, “random” = random ordering
num_ind_points – Number of inducing points / knots for, e.g., a predictive process approximation
cover_tree_radius – Radius (= “spatial resolution”) for the cover tree algorithm
ind_points_selection – Method for choosing inducing points
likelihood – Likelihood function for the observed response variable
likelihood_additional_param – Additional parameter for the likelihood which cannot be estimated (e.g., degrees of freedom for likelihood = “t”)
matrix_inversion_method – Method which is used for matrix inversion
seed – Seed used for model creation (e.g., random ordering in Vecchia approximation)
num_parallel_threads – Number of parallel threads for OMP
GPU_use – If TRUE, GPU acceleration will be used if supported.
has_weights – True, if sample weights should be used
weights – Sample weights
likelihood_learning_rate – Likelihood learning rate for generalized Bayesian inference (only non-Gaussian likelihoods)
out – [out] Created REModel
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_EvalNegLogLikelihood(REModelHandle handle, const double *y_data, double *cov_pars, const double *fixed_effects, double *negll)
Calculate the value of the negative log-likelihood.
- Parameters:
handle – Handle of REModel
y_data – Response variable data
cov_pars – Values for covariance parameters of RE components
fixed_effects – Fixed effects component of location parameter for observed data (only used for non-Gaussian data). For Gaussian data, this is ignored
negll – [out] Negative log-likelihood
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetAuxPars(REModelHandle handle, double *aux_pars, char *out_str)
Get additional likelihood parameters (e.g., shape parameter for a gamma likelihood)
- Parameters:
handle – Handle of REModel
aux_pars – [out] Additional likelihood parameters (aux_pars_). This vector needs to be pre-allocated
out_str – [out] Name of the first parameter
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetCGPreconditionerType(REModelHandle handle, char *out_str, int *num_char)
Get name of preconditioner for conjugate gradient algorithm.
- Parameters:
handle – Handle of REModel
out_str – [out] Optimizer name
num_char – [out] Number of characters
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetCoef(REModelHandle handle, double *optim_coef, bool calc_std_dev)
Get / export regression coefficients Note: You should pre-allocate memory for optim_cov_pars. Its length equals the number of covariates or twice this if calc_std_dev = true.
- Parameters:
handle – Handle of REModel
optim_coef – [out] Optimal regression coefficients
calc_std_dev – If true, standard deviations are also exported
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetCovariateData(REModelHandle handle, double *covariate_data)
Return covariate data.
- Parameters:
handle – Handle of REModel
covariate_data – [out] covariate data
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetCovPar(REModelHandle handle, double *optim_cov_pars, bool calc_std_dev)
Get covariance parameters Note: You should pre-allocate memory for optim_cov_pars. Its length equals the number of covariance parameters (num_cov_pars) or twice this if calc_std_dev = true.
- Parameters:
handle – Handle of REModel
optim_cov_pars – [out] Optimal covariance parameters
calc_std_dev – If true, standard deviations are also exported
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetCurrentNegLogLikelihood(REModelHandle handle, double *negll)
Get the current value of the negative log-likelihood.
- Parameters:
handle – Handle of REModel
negll – [out] Negative log-likelihood
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetInitAuxPars(REModelHandle handle, double *aux_pars)
Get initial values for additional likelihood parameters (e.g., shape parameter for a gamma likelihood)
- Parameters:
handle – Handle of booster
aux_pars – [out] Initial values for additional likelihood parameters
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetInitCovPar(REModelHandle handle, double *init_cov_pars)
Get initial values for covariance parameters Note: You should pre-allocate memory for optim_cov_pars. Its length equals the number of covariance parameters (num_cov_pars) or twice this if calc_std_dev = true.
- Parameters:
handle – Handle of REModel
init_cov_pars – [out] Initial covariance parameters
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetLikelihoodName(REModelHandle handle, char *out_str, int *num_char)
Get name of likelihood.
- Parameters:
handle – Handle of REModel
out_str – [out] Likelihood name
num_char – [out] Number of characters
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetNumAuxPars(BoosterHandle handle, int *num_aux_pars)
Get number of additional likelihood parameters (e.g., shape parameter for a gamma likelihood)
- Parameters:
handle – Handle of booster
num_aux_pars – [out] Number of additional likelihood parameters
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetNumCGSteps(BoosterHandle handle, int *num_cg_steps)
Returns the number of CG steps when the CG method was last run.
- Parameters:
handle – Handle of booster
num_cg_steps – [out] Number of CG steps
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetNumCGStepsTridiag(BoosterHandle handle, int *num_cg_steps)
Returns the number of CG steps when the CG method was last run for the SLQ method.
- Parameters:
handle – Handle of booster
num_cg_steps – [out] Number of CG steps
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetNumIt(REModelHandle handle, int *num_it)
Get / export the number of iterations until convergence Note: You should pre-allocate memory for num_it (length = 1)
- Parameters:
handle – Handle of REModel
num_it – [out] Number of iterations for convergence
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetNumModeFindingSteps(BoosterHandle handle, int *num_cg_steps)
Returns the number of mode finding steps from the last mode finding in a Laplace approximation.
- Parameters:
handle – Handle of booster
num_cg_steps – [out] Number of steps
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetOffsetData(REModelHandle handle, double *fixed_effects)
Return offset data.
- Parameters:
handle – Handle of REModel
fixed_effects – [out] offset data
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetOptimizerCoef(REModelHandle handle, char *out_str, int *num_char)
Get name of linear regression coefficients optimizer.
- Parameters:
handle – Handle of REModel
out_str – [out] Optimizer name
num_char – [out] Number of characters
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetOptimizerCovPars(REModelHandle handle, char *out_str, int *num_char)
Get name of covariance parameter optimizer.
- Parameters:
handle – Handle of REModel
out_str – [out] Optimizer name
num_char – [out] Number of characters
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_GetResponseData(REModelHandle handle, double *response_data)
Return (last used) response variable data.
- Parameters:
handle – Handle of REModel
response_data – [out] Response variable data (memory needs to be preallocated)
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_OptimCovPar(REModelHandle handle, const double *y_data, const double *fixed_effects)
Find parameters that minimize the negative log-ligelihood (=MLE)
- Parameters:
handle – Handle of REModel
y_data – Response variable data
fixed_effects – Fixed effects component of location parameter (only used for non-Gaussian data). For Gaussian data, this is ignored
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_OptimLinRegrCoefCovPar(REModelHandle handle, const double *y_data, const double *covariate_data, int num_covariates, const double *fixed_effects)
Find linear regression coefficients and covariance parameters that minimize the negative log-ligelihood (=MLE) Note: You should pre-allocate memory for optim_pars. Its length equals 1 + number of covariance parameters + number of linear regression coefficients and 1.
- Parameters:
handle – Handle of REModel
y_data – Response variable data
covariate_data – Covariate (=independent variable, feature) data
num_covariates – Number of covariates
fixed_effects – Additional fixed effects that are added to the linear predictor (= offset)
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_PredictREModel(REModelHandle handle, const double *y_data, int32_t num_data_pred, double *out_predict, bool predict_cov_mat, bool predict_var, bool predict_response, bool sample_posterior, bool sample_prior, int num_post_samples, int num_prior_samples, const int32_t *cluster_ids_data_pred, const char *re_group_data_pred, const double *re_group_rand_coef_data_pred, double *gp_coords_data_pred, const double *gp_rand_coef_data_pred, const double *cov_pars, const double *covariate_data_pred, bool use_saved_data, const double *fixed_effects, const double *fixed_effects_pred)
Make predictions: calculate conditional mean and variances or covariance matrix Note: You should pre-allocate memory for out_predict Its length is equal to num_data_pred if only the conditional mean is predicted (predict_cov_mat==false && predict_var==false) or num_data_pred * (1 + num_data_pred) if the predictive covariance matrix is also calculated (predict_cov_mat==true) or num_data_pred * 2 if predictive variances are also calculated (predict_var==true)
- Parameters:
handle – Handle of REModel
y_data – Response variable for observed data
num_data_pred – Number of data points for which predictions are made
out_predict – [out] Predictive mean at prediction points followed by the predictive covariance matrix in column-major format (if predict_cov_mat==true) or the predictive variances (if predict_var==true)
predict_cov_mat – If true, the predictive/conditional covariance matrix is calculated (default=false) (predict_var and predict_cov_mat cannot be both true)
predict_var – If true, the predictive/conditional variances are calculated (default=false) (predict_var and predict_cov_mat cannot be both true)
predict_response – If true, the response variable (label) is predicted, otherwise the latent random effects
sample_posterior – If true, posterior samples are generated
sample_prior – If true, prior samples are generated
num_post_samples – Number of posterior samples
num_prior_samples – Number of prior samples
cluster_ids_data_pred – IDs / labels indicating independent realizations of Gaussian processes (same values = same process realization) for which predictions are to be made
re_group_data_pred – Labels of group levels for the grouped random effects in column-major format (i.e. first the levels for the first effect, then for the second, etc.). Every group label needs to end with the null character ‘\0’
re_group_rand_coef_data_pred – Covariate data for grouped random coefficients
gp_coords_data_pred – Coordinates (features) for Gaussian process
gp_rand_coef_data_pred – Covariate data for Gaussian process random coefficients
cov_pars – Covariance parameters of RE components
covariate_data_pred – Covariate data (=independent variables, features) for prediction
use_saved_data – If true previusly set data on groups, coordinates, and covariates are used and some arguments of this function are ignored
fixed_effects – Fixed effects component of location parameter for observed data (only used for non-Gaussian data). For Gaussian data, this is ignored
fixed_effects_pred – Fixed effects component of location parameter for predicted data (only used for non-Gaussian data)
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_PredictREModelTrainingDataRandomEffects(REModelHandle handle, const double *cov_pars_pred, const double *y_obs, double *out_predict, const double *fixed_effects, bool calc_var)
Predict (“estimate”) training data random effects.
- Parameters:
handle – Handle of REModel
cov_pars_pred – Covariance parameters of components
y_obs – Response variable for observed data
out_predict – [out] Predicted training data random effects and variances if calc_var
fixed_effects – Fixed effects component of location parameter for observed data (only used for non-Gaussian data). For Gaussian data, this is ignored
calc_var – If true, variances are also calculated
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_REModelFree(REModelHandle handle)
Free space for REModel.
- Parameters:
handle – Handle of REModel to be freed
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_SetLikelihood(REModelHandle handle, const char *likelihood)
Set the type of likelihood.
- Parameters:
handle – Handle of REModel
likelihood – Likelihood name
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_SetOffsetData(REModelHandle handle, const double *fixed_effects)
Set offset data.
- Parameters:
handle – Handle of REModel
fixed_effects – offset data
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_SetOptimConfig(REModelHandle handle, double *init_cov_pars, double lr, double acc_rate_cov, int max_iter, double delta_rel_conv, bool use_nesterov_acc, int nesterov_schedule_version, bool trace, const char *optimizer, int momentum_offset, const char *convergence_criterion, int num_covariates, double *init_coef, double lr_coef, double acc_rate_coef, const char *optimizer_coef, int cg_max_num_it, int cg_max_num_it_tridiag, double cg_delta_conv, int num_rand_vec_trace, bool reuse_rand_vec_trace, const char *cg_preconditioner_type, int seed_rand_vec_trace, int piv_chol_rank, double *init_aux_pars, bool estimate_aux_pars, const int *estimate_cov_par_index, int m_lbfgs, double delta_conv_mode_finding)
Set configuration parameters for the optimizer.
- Parameters:
handle – Handle of REModel
init_cov_pars – Initial values for covariance parameters of RE components
lr – Learning rate for covariance parameters. If lr<= 0, internal default values are used (0.1 for “gradient_descent” and 1. for “fisher_scoring”)
acc_rate_cov – Acceleration rate for covariance parameters for Nesterov acceleration (only relevant if nesterov_schedule_version == 0)
max_iter – Maximal number of iterations
delta_rel_conv – Convergence tolerance. The algorithm stops if the relative change in eiher the log-likelihood or the parameters is below this value. For “bfgs”, the L2 norm of the gradient is used instead of the relative change in the log-likelihood
use_nesterov_acc – Indicates whether Nesterov acceleration is used in the gradient descent for finding the covariance parameters (only used for “gradient_descent”)
nesterov_schedule_version – Which version of Nesterov schedule should be used (only relevant if use_nesterov_acc)
trace – If true, the value of the gradient is printed for some iterations
optimizer – Optimizer for covariance parameters
momentum_offset – Number of iterations for which no mometum is applied in the beginning (only relevant if use_nesterov_acc)
convergence_criterion – The convergence criterion used for terminating the optimization algorithm. Options: “relative_change_in_log_likelihood” or “relative_change_in_parameters”
num_covariates – Number of covariates
init_coef – Initial values for the regression coefficients
lr_coef – Learning rate for fixed-effect linear coefficients
acc_rate_coef – Acceleration rate for coefficients for Nesterov acceleration (only relevant if nesterov_schedule_version == 0)
optimizer_coef – Optimizer for linear regression coefficients
cg_max_num_it – Maximal number of iterations for conjugate gradient algorithm
cg_max_num_it_tridiag – Maximal number of iterations for conjugate gradient algorithm when being run as Lanczos algorithm for tridiagonalization
cg_delta_conv – Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for parameter estimation
num_rand_vec_trace – Number of random vectors (e.g. Rademacher) for stochastic approximation of the trace of a matrix
reuse_rand_vec_trace – If true, random vectors (e.g. Rademacher) for stochastic approximation of the trace of a matrix are sampled only once at the beginning and then reused in later trace approximations, otherwise they are sampled everytime a trace is calculated
cg_preconditioner_type – Type of preconditioner used for the conjugate gradient algorithm
seed_rand_vec_trace – Seed number to generate random vectors (e.g. Rademacher) for stochastic approximation of the trace of a matrix
piv_chol_rank – Rank of the pivoted cholseky decomposition used as preconditioner of the conjugate gradient algorithm
init_aux_pars – Initial values for values for aux_pars_ (e.g., shape parameter of gamma likelihood)
estimate_aux_pars – If true, any additional parameters for non-Gaussian likelihoods are also estimated (e.g., shape parameter of gamma likelihood)
estimate_cov_par_index – If estimate_cov_par_index[0] >= 0, some covariance parameters might not be estimated, estimate_cov_par_index[i] is then bool and indicates which ones are estimated
m_lbfgs – Number of corrections to approximate the inverse Hessian matrix for the lbfgs optimizer
delta_conv_mode_finding – Used for checking convergence in mode finding algorithm for non-Gaussian likelihoods
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int GPB_SetPredictionData(REModelHandle handle, int32_t num_data_pred, const int32_t *cluster_ids_data_pred, const char *re_group_data_pred, const double *re_group_rand_coef_data_pred, double *gp_coords_data_pred, const double *gp_rand_coef_data_pred, const double *covariate_data_pred, const char *vecchia_pred_type, int num_neighbors_pred, double cg_delta_conv_pred, int nsim_var_pred, int rank_pred_approx_matrix_lanczos)
Set the data used for making predictions (useful if the same data is used repeatedly, e.g., in validation of GPBoost)
- Parameters:
handle – Handle of REModel
num_data_pred – Number of data points for which predictions are made
cluster_ids_data_pred – IDs / labels indicating independent realizations of Gaussian processes (same values = same process realization) for which predictions are to be made
re_group_data_pred – Labels of group levels for the grouped random effects in column-major format (i.e. first the levels for the first effect, then for the second, etc.). Every group label needs to end with the null character ‘\0’
re_group_rand_coef_data_pred – Covariate data for grouped random coefficients
gp_coords_data_pred – Coordinates (features) for Gaussian process
gp_rand_coef_data_pred – Covariate data for Gaussian process random coefficients
covariate_data_pred – Covariate data (=independent variables, features) for prediction
vecchia_pred_type – Type of Vecchia approximation for making predictions. “order_obs_first_cond_obs_only” = observed data is ordered first and neighbors are only observed points, “order_obs_first_cond_all” = observed data is ordered first and neighbors are selected among all points (observed + predicted), “order_pred_first” = predicted data is ordered first for making predictions, “latent_order_obs_first_cond_obs_only” = Vecchia approximation for the latent process and observed data is ordered first and neighbors are only observed points, “latent_order_obs_first_cond_all” = Vecchia approximation for the latent process and observed data is ordered first and neighbors are selected among all points
num_neighbors_pred – The number of neighbors used in the Vecchia approximation for making predictions (-1 means that the value already set at initialization is used)
cg_delta_conv_pred – Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for prediction
nsim_var_pred – Number of samples when simulation is used for calculating predictive variances
rank_pred_approx_matrix_lanczos – Rank of the matrix for approximating predictive covariances obtained using the Lanczos algorithm
-
static char *LastErrorMsg()
Handle of error message.
- Returns:
Error message
-
GPBOOST_C_EXPORT int LGBM_BoosterAddValidData(BoosterHandle handle, const DatasetHandle valid_data)
Add new validation data to booster.
- Parameters:
handle – Handle of booster
valid_data – Validation dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterCalcNumPredict(BoosterHandle handle, int num_row, int predict_type, int start_iteration, int num_iteration, int64_t *out_len)
Get number of predictions.
- Parameters:
handle – Handle of booster
num_row – Number of rows
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iterations for prediction, <= 0 means no limit
out_len – [out] Length of prediction
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterCreate(const DatasetHandle train_data, const char *parameters, BoosterHandle *out)
Create a new boosting learner.
- Parameters:
train_data – Training dataset
parameters – Parameters in format ‘key1=value1 key2=value2’
out – [out] Handle of created booster
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterCreateFromModelfile(const char *filename, int *out_num_iterations, BoosterHandle *out)
Load an existing booster from model file.
- Parameters:
filename – Filename of model
out_num_iterations – [out] Number of iterations of this booster
out – [out] Handle of created booster
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterDumpModel(BoosterHandle handle, int start_iteration, int num_iteration, int feature_importance_type, int64_t buffer_len, int64_t *out_len, char *out_str)
Dump model to JSON.
- Parameters:
handle – Handle of booster
start_iteration – Start index of the iteration that should be dumped
num_iteration – Index of the iteration that should be dumped, <= 0 means dump all
feature_importance_type – Type of feature importance, can be
C_API_FEATURE_IMPORTANCE_SPLITorC_API_FEATURE_IMPORTANCE_GAINbuffer_len – String buffer length, if
buffer_len < out_len, you should re-allocate bufferout_len – [out] Actual output length
out_str – [out] JSON format string of model, should pre-allocate memory
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterFeatureImportance(BoosterHandle handle, int num_iteration, int importance_type, double *out_results)
Get model feature importance.
- Parameters:
handle – Handle of booster
num_iteration – Number of iterations for which feature importance is calculated, <= 0 means use all
importance_type – Method of importance calculation:
C_API_FEATURE_IMPORTANCE_SPLIT: result contains numbers of times the feature is used in a model;C_API_FEATURE_IMPORTANCE_GAIN: result contains total gains of splits which use the feature
out_results – [out] Result array with feature importance
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterFree(BoosterHandle handle)
Free space for booster.
- Parameters:
handle – Handle of booster to be freed
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterFreePredictSparse(void *indptr, int32_t *indices, void *data, int indptr_type, int data_type)
Method corresponding to
LGBM_BoosterPredictSparseOutputto free the allocated data.- Parameters:
indptr – Pointer to output row headers or column headers to be deallocated
indices – Pointer to sparse indices to be deallocated
data – Pointer to sparse data space to be deallocated
indptr_type – Type of
indptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetCurrentIteration(BoosterHandle handle, int *out_iteration)
Get index of the current boosting iteration.
- Parameters:
handle – Handle of booster
out_iteration – [out] Index of the current boosting iteration
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetEval(BoosterHandle handle, int data_idx, int *out_len, double *out_results)
Get evaluation for training data and validation data.
Note
You should call
LGBM_BoosterGetEvalNamesfirst to get the names of evaluation datasets.You should pre-allocate memory for
out_results, you can get its length byLGBM_BoosterGetEvalCounts.
- Parameters:
handle – Handle of booster
data_idx – Index of data, 0: training data, 1: 1st validation data, 2: 2nd validation data and so on
out_len – [out] Length of output result
out_results – [out] Array with evaluation results
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetEvalCounts(BoosterHandle handle, int *out_len)
Get number of evaluation datasets.
- Parameters:
handle – Handle of booster
out_len – [out] Total number of evaluation datasets
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetEvalNames(BoosterHandle handle, const int len, int *out_len, const size_t buffer_len, size_t *out_buffer_len, char **out_strs)
Get names of evaluation datasets.
- Parameters:
handle – Handle of booster
len – Number of
char*pointers stored atout_strs. If smaller than the max size, only this many strings are copiedout_len – [out] Total number of evaluation datasets
buffer_len – Size of pre-allocated strings. Content is copied up to
buffer_len - 1and null-terminatedout_buffer_len – [out] String sizes required to do the full string copies
out_strs – [out] Names of evaluation datasets, should pre-allocate memory
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetFeatureNames(BoosterHandle handle, const int len, int *out_len, const size_t buffer_len, size_t *out_buffer_len, char **out_strs)
Get names of features.
- Parameters:
handle – Handle of booster
len – Number of
char*pointers stored atout_strs. If smaller than the max size, only this many strings are copiedout_len – [out] Total number of features
buffer_len – Size of pre-allocated strings. Content is copied up to
buffer_len - 1and null-terminatedout_buffer_len – [out] String sizes required to do the full string copies
out_strs – [out] Names of features, should pre-allocate memory
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetLeafValue(BoosterHandle handle, int tree_idx, int leaf_idx, double *out_val)
Get leaf value.
- Parameters:
handle – Handle of booster
tree_idx – Index of tree
leaf_idx – Index of leaf
out_val – [out] Output result from the specified leaf
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetLinear(BoosterHandle handle, bool *out)
Get boolean representing whether booster is fitting linear trees.
- Parameters:
handle – Handle of booster
out – [out] The address to hold linear trees indicator
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetLowerBoundValue(BoosterHandle handle, double *out_results)
Get model lower bound value.
- Parameters:
handle – Handle of booster
out_results – [out] Result pointing to min value
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetNumClasses(BoosterHandle handle, int *out_len)
Get number of classes.
- Parameters:
handle – Handle of booster
out_len – [out] Number of classes
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetNumFeature(BoosterHandle handle, int *out_len)
Get number of features.
- Parameters:
handle – Handle of booster
out_len – [out] Total number of features
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetNumPredict(BoosterHandle handle, int data_idx, int64_t *out_len)
Get number of predictions for training data and validation data (this can be used to support customized evaluation functions).
- Parameters:
handle – Handle of booster
data_idx – Index of data, 0: training data, 1: 1st validation data, 2: 2nd validation data and so on
out_len – [out] Number of predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetPredict(BoosterHandle handle, int data_idx, int64_t *out_len, double *out_result)
Get prediction for training data and validation data.
Note
You should pre-allocate memory for
out_result, its length is equal tonum_class * num_data.- Parameters:
handle – Handle of booster
data_idx – Index of data, 0: training data, 1: 1st validation data, 2: 2nd validation data and so on
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterGetUpperBoundValue(BoosterHandle handle, double *out_results)
Get model upper bound value.
- Parameters:
handle – Handle of booster
out_results – [out] Result pointing to max value
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterLoadModelFromString(const char *model_str, int *out_num_iterations, BoosterHandle *out)
Load an existing booster from string.
- Parameters:
model_str – Model string
out_num_iterations – [out] Number of iterations of this booster
out – [out] Handle of created booster
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterMerge(BoosterHandle handle, BoosterHandle other_handle)
Merge model from
other_handleintohandle.- Parameters:
handle – Handle of booster, will merge another booster into this one
other_handle – Other handle of booster
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterNumberOfTotalModel(BoosterHandle handle, int *out_models)
Get number of weak sub-models.
- Parameters:
handle – Handle of booster
out_models – [out] Number of weak sub-models
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterNumModelPerIteration(BoosterHandle handle, int *out_tree_per_iteration)
Get number of trees per iteration.
- Parameters:
handle – Handle of booster
out_tree_per_iteration – [out] Number of trees per iteration
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSC(BoosterHandle handle, const void *col_ptr, int col_ptr_type, const int32_t *indices, const void *data, int data_type, int64_t ncol_ptr, int64_t nelem, int64_t num_row, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)
Make prediction for a new dataset in CSC format.
Note
You should pre-allocate memory for
out_result:for normal and raw score, its length is equal to
num_class * num_data;for leaf index, its length is equal to
num_class * num_data * num_iteration;for feature contributions, its length is equal to
num_class * num_data * (num_feature + 1).
- Parameters:
handle – Handle of booster
col_ptr – Pointer to column headers
col_ptr_type – Type of
col_ptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64indices – Pointer to row indices
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64ncol_ptr – Number of columns in the matrix + 1
nelem – Number of nonzero elements in the matrix
num_row – Number of rows
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iteration for prediction, <= 0 means no limit
parameter – Other parameters for prediction, e.g. early stopping for prediction
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSR(BoosterHandle handle, const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)
Make prediction for a new dataset in CSR format.
Note
You should pre-allocate memory for
out_result:for normal and raw score, its length is equal to
num_class * num_data;for leaf index, its length is equal to
num_class * num_data * num_iteration;for feature contributions, its length is equal to
num_class * num_data * (num_feature + 1).
- Parameters:
handle – Handle of booster
indptr – Pointer to row headers
indptr_type – Type of
indptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64indices – Pointer to column indices
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nindptr – Number of rows in the matrix + 1
nelem – Number of nonzero elements in the matrix
num_col – Number of columns
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iterations for prediction, <= 0 means no limit
parameter – Other parameters for prediction, e.g. early stopping for prediction
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSRSingleRow(BoosterHandle handle, const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)
Make prediction for a new dataset in CSR format. This method re-uses the internal predictor structure from previous calls and is optimized for single row invocation.
Note
You should pre-allocate memory for
out_result:for normal and raw score, its length is equal to
num_class * num_data;for leaf index, its length is equal to
num_class * num_data * num_iteration;for feature contributions, its length is equal to
num_class * num_data * (num_feature + 1).
- Parameters:
handle – Handle of booster
indptr – Pointer to row headers
indptr_type – Type of
indptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64indices – Pointer to column indices
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nindptr – Number of rows in the matrix + 1
nelem – Number of nonzero elements in the matrix
num_col – Number of columns
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iterations for prediction, <= 0 means no limit
parameter – Other parameters for prediction, e.g. early stopping for prediction
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSRSingleRowFast(FastConfigHandle fastConfig_handle, const void *indptr, const int indptr_type, const int32_t *indices, const void *data, const int64_t nindptr, const int64_t nelem, int64_t *out_len, double *out_result)
Faster variant of
LGBM_BoosterPredictForCSRSingleRow.Score single rows after setup with
LGBM_BoosterPredictForCSRSingleRowFastInit.By removing the setup steps from this call extra optimizations can be made like initializing the config only once, instead of once per call.
Note
Setting up the number of threads is only done once at
LGBM_BoosterPredictForCSRSingleRowFastInitinstead of at each prediction. If you use a different number of threads in other calls, you need to start the setup process over, or that number of threads will be used for these calls as well.Note
You should pre-allocate memory for
out_result:for normal and raw score, its length is equal to
num_class * num_data;for leaf index, its length is equal to
num_class * num_data * num_iteration;for feature contributions, its length is equal to
num_class * num_data * (num_feature + 1).
- Parameters:
fastConfig_handle – FastConfig object handle returned by
LGBM_BoosterPredictForCSRSingleRowFastInitindptr – Pointer to row headers
indptr_type – Type of
indptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64indices – Pointer to column indices
data – Pointer to the data space
nindptr – Number of rows in the matrix + 1
nelem – Number of nonzero elements in the matrix
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForCSRSingleRowFastInit(BoosterHandle handle, const int predict_type, const int start_iteration, const int num_iteration, const int data_type, const int64_t num_col, const char *parameter, FastConfigHandle *out_fastConfig)
Initialize and return a
FastConfigHandlefor use withLGBM_BoosterPredictForCSRSingleRowFast.Release the
FastConfigby passing its handle toLGBM_FastConfigFreewhen no longer needed.- Parameters:
handle – Booster handle
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iterations for prediction, <= 0 means no limit
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64num_col – Number of columns
parameter – Other parameters for prediction, e.g. early stopping for prediction
out_fastConfig – [out] FastConfig object with which you can call
LGBM_BoosterPredictForCSRSingleRowFast
- Returns:
0 when it succeeds, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForFile(BoosterHandle handle, const char *data_filename, int data_has_header, int predict_type, int start_iteration, int num_iteration, const char *parameter, const char *result_filename)
Make prediction for file.
- Parameters:
handle – Handle of booster
data_filename – Filename of file with data
data_has_header – Whether file has header or not
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iterations for prediction, <= 0 means no limit
parameter – Other parameters for prediction, e.g. early stopping for prediction
result_filename – Filename of result file in which predictions will be written
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForMat(BoosterHandle handle, const void *data, int data_type, int32_t nrow, int32_t ncol, int is_row_major, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)
Make prediction for a new dataset.
Note
You should pre-allocate memory for
out_result:for normal and raw score, its length is equal to
num_class * num_data;for leaf index, its length is equal to
num_class * num_data * num_iteration;for feature contributions, its length is equal to
num_class * num_data * (num_feature + 1).
- Parameters:
handle – Handle of booster
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nrow – Number of rows
ncol – Number of columns
is_row_major – 1 for row-major, 0 for column-major
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iteration for prediction, <= 0 means no limit
parameter – Other parameters for prediction, e.g. early stopping for prediction
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForMats(BoosterHandle handle, const void **data, int data_type, int32_t nrow, int32_t ncol, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)
Make prediction for a new dataset presented in a form of array of pointers to rows.
Note
You should pre-allocate memory for
out_result:for normal and raw score, its length is equal to
num_class * num_data;for leaf index, its length is equal to
num_class * num_data * num_iteration;for feature contributions, its length is equal to
num_class * num_data * (num_feature + 1).
- Parameters:
handle – Handle of booster
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nrow – Number of rows
ncol – Number columns
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iteration for prediction, <= 0 means no limit
parameter – Other parameters for prediction, e.g. early stopping for prediction
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForMatSingleRow(BoosterHandle handle, const void *data, int data_type, int ncol, int is_row_major, int predict_type, int start_iteration, int num_iteration, const char *parameter, int64_t *out_len, double *out_result)
Make prediction for a new dataset. This method re-uses the internal predictor structure from previous calls and is optimized for single row invocation.
Note
You should pre-allocate memory for
out_result:for normal and raw score, its length is equal to
num_class * num_data;for leaf index, its length is equal to
num_class * num_data * num_iteration;for feature contributions, its length is equal to
num_class * num_data * (num_feature + 1).
- Parameters:
handle – Handle of booster
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64ncol – Number columns
is_row_major – 1 for row-major, 0 for column-major
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iteration for prediction, <= 0 means no limit
parameter – Other parameters for prediction, e.g. early stopping for prediction
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForMatSingleRowFast(FastConfigHandle fastConfig_handle, const void *data, int64_t *out_len, double *out_result)
Faster variant of
LGBM_BoosterPredictForMatSingleRow.Score a single row after setup with
LGBM_BoosterPredictForMatSingleRowFastInit.By removing the setup steps from this call extra optimizations can be made like initializing the config only once, instead of once per call.
Note
Setting up the number of threads is only done once at
LGBM_BoosterPredictForMatSingleRowFastInitinstead of at each prediction. If you use a different number of threads in other calls, you need to start the setup process over, or that number of threads will be used for these calls as well.- Parameters:
fastConfig_handle – FastConfig object handle returned by
LGBM_BoosterPredictForMatSingleRowFastInitdata – Single-row array data (no other way than row-major form).
out_len – [out] Length of output result
out_result – [out] Pointer to array with predictions
- Returns:
0 when it succeeds, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictForMatSingleRowFastInit(BoosterHandle handle, const int predict_type, const int start_iteration, const int num_iteration, const int data_type, const int32_t ncol, const char *parameter, FastConfigHandle *out_fastConfig)
Initialize and return a
FastConfigHandlefor use withLGBM_BoosterPredictForMatSingleRowFast.Release the
FastConfigby passing its handle toLGBM_FastConfigFreewhen no longer needed.- Parameters:
handle – Booster handle
predict_type – What should be predicted
C_API_PREDICT_NORMAL: normal prediction, with transform (if needed);C_API_PREDICT_RAW_SCORE: raw score;C_API_PREDICT_LEAF_INDEX: leaf index;C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iterations for prediction, <= 0 means no limit
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64ncol – Number of columns
parameter – Other parameters for prediction, e.g. early stopping for prediction
out_fastConfig – [out] FastConfig object with which you can call
LGBM_BoosterPredictForMatSingleRowFast
- Returns:
0 when it succeeds, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterPredictSparseOutput(BoosterHandle handle, const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col_or_row, int predict_type, int start_iteration, int num_iteration, const char *parameter, int matrix_type, int64_t *out_len, void **out_indptr, int32_t **out_indices, void **out_data)
Make sparse prediction for a new dataset in CSR or CSC format. Currently only used for feature contributions.
Note
The outputs are pre-allocated, as they can vary for each invocation, but the shape should be the same:
for feature contributions, the shape of sparse matrix will be
num_class * num_data * (num_feature + 1). The output indptr_type for the sparse matrix will be the same as the given input indptr_type. CallLGBM_BoosterFreePredictSparseto deallocate resources.
- Parameters:
handle – Handle of booster
indptr – Pointer to row headers for CSR or column headers for CSC
indptr_type – Type of
indptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64indices – Pointer to column indices for CSR or row indices for CSC
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nindptr – Number of rows in the matrix + 1
nelem – Number of nonzero elements in the matrix
num_col_or_row – Number of columns for CSR or number of rows for CSC
predict_type – What should be predicted, only feature contributions supported currently
C_API_PREDICT_CONTRIB: feature contributions (SHAP values)
start_iteration – Start index of the iteration to predict
num_iteration – Number of iterations for prediction, <= 0 means no limit
parameter – Other parameters for prediction, e.g. early stopping for prediction
matrix_type – Type of matrix input and output, can be
C_API_MATRIX_TYPE_CSRorC_API_MATRIX_TYPE_CSCout_len – [out] Length of output indices and data
out_indptr – [out] Pointer to output row headers for CSR or column headers for CSC
out_indices – [out] Pointer to sparse column indices for CSR or row indices for CSC
out_data – [out] Pointer to sparse data space
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterRefit(BoosterHandle handle, const int32_t *leaf_preds, int32_t nrow, int32_t ncol)
Refit the tree model using the new data (online learning).
- Parameters:
handle – Handle of booster
leaf_preds – Pointer to predicted leaf indices
nrow – Number of rows of
leaf_predsncol – Number of columns of
leaf_preds
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterResetParameter(BoosterHandle handle, const char *parameters)
Reset config for booster.
- Parameters:
handle – Handle of booster
parameters – Parameters in format ‘key1=value1 key2=value2’
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterResetTrainingData(BoosterHandle handle, const DatasetHandle train_data)
Reset training data for booster.
- Parameters:
handle – Handle of booster
train_data – Training dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterRollbackOneIter(BoosterHandle handle)
Rollback one iteration.
- Parameters:
handle – Handle of booster
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterSaveModel(BoosterHandle handle, int start_iteration, int num_iteration, int feature_importance_type, const char *filename)
Save model into file.
- Parameters:
handle – Handle of booster
start_iteration – Start index of the iteration that should be saved
num_iteration – Index of the iteration that should be saved, <= 0 means save all
feature_importance_type – Type of feature importance, can be
C_API_FEATURE_IMPORTANCE_SPLITorC_API_FEATURE_IMPORTANCE_GAINfilename – The name of the file
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterSaveModelToString(BoosterHandle handle, int start_iteration, int num_iteration, int feature_importance_type, int64_t buffer_len, int64_t *out_len, char *out_str)
Save model to string.
- Parameters:
handle – Handle of booster
start_iteration – Start index of the iteration that should be saved
num_iteration – Index of the iteration that should be saved, <= 0 means save all
feature_importance_type – Type of feature importance, can be
C_API_FEATURE_IMPORTANCE_SPLITorC_API_FEATURE_IMPORTANCE_GAINbuffer_len – String buffer length, if
buffer_len < out_len, you should re-allocate bufferout_len – [out] Actual output length
out_str – [out] String of model, should pre-allocate memory
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterSetLeafValue(BoosterHandle handle, int tree_idx, int leaf_idx, double val)
Set leaf value.
- Parameters:
handle – Handle of booster
tree_idx – Index of tree
leaf_idx – Index of leaf
val – Leaf value
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterShuffleModels(BoosterHandle handle, int start_iter, int end_iter)
Shuffle models.
- Parameters:
handle – Handle of booster
start_iter – The first iteration that will be shuffled
end_iter – The last iteration that will be shuffled
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterUpdateOneIter(BoosterHandle handle, int *is_finished)
Update the model for one iteration.
- Parameters:
handle – Handle of booster
is_finished – [out] 1 means the update was successfully finished (cannot split any more), 0 indicates failure
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_BoosterUpdateOneIterCustom(BoosterHandle handle, const float *grad, const float *hess, int *is_finished)
Update the model by specifying gradient and Hessian directly (this can be used to support customized loss functions).
- Parameters:
handle – Handle of booster
grad – The first order derivative (gradient) statistics
hess – The second order derivative (Hessian) statistics
is_finished – [out] 1 means the update was successfully finished (cannot split any more), 0 indicates failure
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetAddFeaturesFrom(DatasetHandle target, DatasetHandle source)
Add features from
sourcetotarget.- Parameters:
target – The handle of the dataset to add features to
source – The handle of the dataset to take features from
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetCreateByReference(const DatasetHandle reference, int64_t num_total_row, DatasetHandle *out)
Allocate the space for dataset and bucket feature bins according to reference dataset.
- Parameters:
reference – Used to align bin mapper with other dataset
num_total_row – Number of total rows
out – [out] Created dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetCreateFromCSC(const void *col_ptr, int col_ptr_type, const int32_t *indices, const void *data, int data_type, int64_t ncol_ptr, int64_t nelem, int64_t num_row, const char *parameters, const DatasetHandle reference, DatasetHandle *out)
Create a dataset from CSC format.
- Parameters:
col_ptr – Pointer to column headers
col_ptr_type – Type of
col_ptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64indices – Pointer to row indices
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64ncol_ptr – Number of columns in the matrix + 1
nelem – Number of nonzero elements in the matrix
num_row – Number of rows
parameters – Additional parameters
reference – Used to align bin mapper with other dataset, nullptr means isn’t used
out – [out] Created dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetCreateFromCSR(const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col, const char *parameters, const DatasetHandle reference, DatasetHandle *out)
Create a dataset from CSR format.
- Parameters:
indptr – Pointer to row headers
indptr_type – Type of
indptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64indices – Pointer to column indices
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nindptr – Number of rows in the matrix + 1
nelem – Number of nonzero elements in the matrix
num_col – Number of columns
parameters – Additional parameters
reference – Used to align bin mapper with other dataset, nullptr means isn’t used
out – [out] Created dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetCreateFromCSRFunc(void *get_row_funptr, int num_rows, int64_t num_col, const char *parameters, const DatasetHandle reference, DatasetHandle *out)
Create a dataset from CSR format through callbacks.
- Parameters:
get_row_funptr – Pointer to
std::function<void(int idx, std::vector<std::pair<int, double>>& ret)>(called for every row and expected to clear and fillret)num_rows – Number of rows
num_col – Number of columns
parameters – Additional parameters
reference – Used to align bin mapper with other dataset, nullptr means isn’t used
out – [out] Created dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetCreateFromFile(const char *filename, const char *parameters, const DatasetHandle reference, DatasetHandle *out)
Load dataset from file (like LightGBM CLI version does).
- Parameters:
filename – The name of the file
parameters – Additional parameters
reference – Used to align bin mapper with other dataset, nullptr means isn’t used
out – [out] A loaded dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetCreateFromMat(const void *data, int data_type, int32_t nrow, int32_t ncol, int is_row_major, const char *parameters, const DatasetHandle reference, DatasetHandle *out)
Create dataset from dense matrix.
- Parameters:
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nrow – Number of rows
ncol – Number of columns
is_row_major – 1 for row-major, 0 for column-major
parameters – Additional parameters
reference – Used to align bin mapper with other dataset, nullptr means isn’t used
out – [out] Created dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetCreateFromMats(int32_t nmat, const void **data, int data_type, int32_t *nrow, int32_t ncol, int is_row_major, const char *parameters, const DatasetHandle reference, DatasetHandle *out)
Create dataset from array of dense matrices.
- Parameters:
nmat – Number of dense matrices
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nrow – Number of rows
ncol – Number of columns
is_row_major – 1 for row-major, 0 for column-major
parameters – Additional parameters
reference – Used to align bin mapper with other dataset, nullptr means isn’t used
out – [out] Created dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetCreateFromSampledColumn(double **sample_data, int **sample_indices, int32_t ncol, const int *num_per_col, int32_t num_sample_row, int32_t num_total_row, const char *parameters, DatasetHandle *out)
Allocate the space for dataset and bucket feature bins according to sampled data.
- Parameters:
sample_data – Sampled data, grouped by the column
sample_indices – Indices of sampled data
ncol – Number of columns
num_per_col – Size of each sampling column
num_sample_row – Number of sampled rows
num_total_row – Number of total rows
parameters – Additional parameters
out – [out] Created dataset
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetDumpText(DatasetHandle handle, const char *filename)
Save dataset to text file, intended for debugging use only.
- Parameters:
handle – Handle of dataset
filename – The name of the file
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetFree(DatasetHandle handle)
Free space for dataset.
- Parameters:
handle – Handle of dataset to be freed
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetGetFeatureNames(DatasetHandle handle, const int len, int *num_feature_names, const size_t buffer_len, size_t *out_buffer_len, char **feature_names)
Get feature names of dataset.
- Parameters:
handle – Handle of dataset
len – Number of
char*pointers stored atout_strs. If smaller than the max size, only this many strings are copiednum_feature_names – [out] Number of feature names
buffer_len – Size of pre-allocated strings. Content is copied up to
buffer_len - 1and null-terminatedout_buffer_len – [out] String sizes required to do the full string copies
feature_names – [out] Feature names, should pre-allocate memory
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetGetField(DatasetHandle handle, const char *field_name, int *out_len, const void **out_ptr, int *out_type)
Get info vector from dataset.
- Parameters:
handle – Handle of dataset
field_name – Field name
out_len – [out] Used to set result length
out_ptr – [out] Pointer to the result
out_type – [out] Type of result pointer, can be
C_API_DTYPE_INT32,C_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetGetNumData(DatasetHandle handle, int *out)
Get number of data points.
- Parameters:
handle – Handle of dataset
out – [out] The address to hold number of data points
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetGetNumFeature(DatasetHandle handle, int *out)
Get number of features.
- Parameters:
handle – Handle of dataset
out – [out] The address to hold number of features
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetGetSubset(const DatasetHandle handle, const int32_t *used_row_indices, int32_t num_used_row_indices, const char *parameters, DatasetHandle *out)
Create subset of a data.
- Parameters:
handle – Handle of full dataset
used_row_indices – Indices used in subset
num_used_row_indices – Length of
used_row_indicesparameters – Additional parameters
out – [out] Subset of data
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetPushRows(DatasetHandle dataset, const void *data, int data_type, int32_t nrow, int32_t ncol, int32_t start_row)
Push data to existing dataset, if
nrow + start_row == num_total_row, will calldataset->FinishLoad.- Parameters:
dataset – Handle of dataset
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nrow – Number of rows
ncol – Number of columns
start_row – Row start index
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetPushRowsByCSR(DatasetHandle dataset, const void *indptr, int indptr_type, const int32_t *indices, const void *data, int data_type, int64_t nindptr, int64_t nelem, int64_t num_col, int64_t start_row)
Push data to existing dataset, if
nrow + start_row == num_total_row, will calldataset->FinishLoad.- Parameters:
dataset – Handle of dataset
indptr – Pointer to row headers
indptr_type – Type of
indptr, can beC_API_DTYPE_INT32orC_API_DTYPE_INT64indices – Pointer to column indices
data – Pointer to the data space
data_type – Type of
datapointer, can beC_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64nindptr – Number of rows in the matrix + 1
nelem – Number of nonzero elements in the matrix
num_col – Number of columns
start_row – Row start index
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetSaveBinary(DatasetHandle handle, const char *filename)
Save dataset to binary file.
- Parameters:
handle – Handle of dataset
filename – The name of the file
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetSetFeatureNames(DatasetHandle handle, const char **feature_names, int num_feature_names)
Save feature names to dataset.
- Parameters:
handle – Handle of dataset
feature_names – Feature names
num_feature_names – Number of feature names
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetSetField(DatasetHandle handle, const char *field_name, const void *field_data, int num_element, int type)
Set vector to a content in info.
Note
group only works for
C_API_DTYPE_INT32;label and weight only work for
C_API_DTYPE_FLOAT32;init_score only works for
C_API_DTYPE_FLOAT64.
- Parameters:
handle – Handle of dataset
field_name – Field name, can be label, weight, init_score, group
field_data – Pointer to data vector
num_element – Number of elements in
field_datatype – Type of
field_datapointer, can beC_API_DTYPE_INT32,C_API_DTYPE_FLOAT32orC_API_DTYPE_FLOAT64
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_DatasetUpdateParamChecking(const char *old_parameters, const char *new_parameters)
Raise errors for attempts to update dataset parameters.
- Parameters:
old_parameters – Current dataset parameters
new_parameters – New dataset parameters
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_FastConfigFree(FastConfigHandle fastConfig)
Release FastConfig object.
- Parameters:
fastConfig – Handle to the FastConfig object acquired with a
*FastInit()method.
- Returns:
0 when it succeeds, -1 when failure happens
-
GPBOOST_C_EXPORT const char *LGBM_GetLastError()
Get string message of the last error.
- Returns:
Error information
-
GPBOOST_C_EXPORT int LGBM_GPBoosterCreate(const DatasetHandle train_data, const char *parameters, const REModelHandle re_model, BoosterHandle *out)
Create a new boosting learner.
- Parameters:
train_data – Training dataset
parameters – Parameters in format ‘key1=value1 key2=value2’
re_model – Gaussian process model
out – [out] Handle of created booster
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_NetworkFree()
Finalize the network.
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_NetworkInit(const char *machines, int local_listen_port, int listen_time_out, int num_machines)
Initialize the network.
- Parameters:
machines – List of machines in format ‘ip1:port1,ip2:port2’
local_listen_port – TCP listen port for local machines
listen_time_out – Socket time-out in minutes
num_machines – Total number of machines
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_NetworkInitWithFunctions(int num_machines, int rank, void *reduce_scatter_ext_fun, void *allgather_ext_fun)
Initialize the network with external collective functions.
- Parameters:
num_machines – Total number of machines
rank – Rank of local machine
reduce_scatter_ext_fun – The external reduce-scatter function
allgather_ext_fun – The external allgather function
- Returns:
0 when succeed, -1 when failure happens
-
GPBOOST_C_EXPORT int LGBM_RegisterLogCallback(void (*callback)(const char*))
Register a callback function for log redirecting.
- Parameters:
callback – The callback function to register
- Returns:
0 when succeed, -1 when failure happens
-
inline void LGBM_SetLastError(const char *msg)
Set string message of the last error.
- Parameters:
msg – Error message