EDAspy.optimization.custom.probabilistic_models package

Submodules

EDAspy.optimization.custom.probabilistic_models.adaptiveunivariategaussian module

class EDAspy.optimization.custom.probabilistic_models.adaptive_univariate_gaussian.AdaptUniGauss(variables: list, lower_bound: float, alpha: float = 0.5)[source]

Bases: ProbabilisticModel

This class implements the adaptive univariate Gaussians. With this implementation we are updating N univariate Gaussians in each iteration. When a dataset is given, each column is updated independently. The implementation involves a matrix with two rows, in which the first row are the means and the second one, are the standard deviations. Each Gaussian mean is updates as follows, where the two best individuals and the worst are considered.

\[\mu_{l+1} = (1 - \alpha) \mu_l + \alpha (x^{best, 1}_l + x^{best, 2}_l - x^{worst}_l)\]
sample(size: int) array[source]

Samples new solutions from the probabilistic model. In each solution, each variable is sampled from its respective normal distribution.

Parameters:

size – number of samplings of the probabilistic model.

Returns:

array with the dataset sampled

Return type:

np.array

learn(dataset: array, *args, **kwargs)[source]

Estimates the independent Gaussian for each variable.

Parameters:
  • dataset – dataset from which learn the probabilistic model.

  • alpha – adaptive parameter in formula

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process. Univariate approaches generate no-edged graphs.

Returns:

list of arcs between variables

Return type:

list

EDAspy.optimization.custom.probabilistic_models.discrete_bayesian_network module

class EDAspy.optimization.custom.probabilistic_models.discrete_bayesian_network.BN(variables: list)[source]

Bases: ProbabilisticModel

This probabilistic model is Discrete Bayesian Network. This implementation uses pgmpy library [1].

References

[1]: Ankan, A., & Panda, A. (2015). pgmpy: Probabilistic graphical models using python. In Proceedings of the 14th python in science conference (scipy 2015) (Vol. 10). Citeseer.

learn(dataset: array, score: str = 'bicscore', *args, **kwargs)[source]

Learn a discrete Bayesian network from the dataset passed as argument.

Parameters:
  • dataset – dataset from which learn the GBN.

  • score – score used for the score-based structure learning algorithm

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process.

Returns:

list of arcs between variables

Return type:

list

sample(size: int) array[source]

EDAspy.optimization.custom.probabilistic_models.gaussian_bayesian_network module

class EDAspy.optimization.custom.probabilistic_models.gaussian_bayesian_network.GBN(variables: list, white_list: list | None = None, black_list: list | None = None, evidences: dict | None = None)[source]

Bases: ProbabilisticModel

This probabilistic model is Gaussian Bayesian Network. All the relationships between the variables in the model are defined to be linearly Gaussian, and the variables distributions are assumed to be Gaussian. This is a very common approach when facing to continuous data as it is relatively easy and fast to learn a Gaussian distributions between variables. This implementation uses Pybnesian library [1].

References

[1]: Atienza, D., Bielza, C., & Larrañaga, P. (2022). PyBNesian: an extensible Python package for Bayesian networks. Neurocomputing, 504, 204-209.

learn(dataset: array, *args, **kwargs)[source]

Learn a Gaussian Bayesian network from the dataset passed as argument.

Parameters:

dataset – dataset from which learn the GBN.

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process.

Returns:

list of arcs between variables

Return type:

list

sample(size: int) array[source]
logl(data: DataFrame)[source]

Returns de log-likelihood of some data in the model.

Parameters:

data – dataset to evaluate its likelihood in the model.

Returns:

log-likelihood of the instances in the model.

Return type:

np.array

get_mu(var_mus=None) array[source]

Computes the conditional mean of the Gaussians of each node in the GBN.

Parameters:

var_mus (list) – Variables to compute its Gaussian mean. If None, then all the variables are computed.

Returns:

Array with the conditional Gaussian means.

Return type:

np.array

get_sigma(var_sigma=None) array[source]

Computes the conditional covariance matrix of the model for the variables in the GBN.

Parameters:

var_sigma (list) – Variables to compute its Gaussian mean. If None, then all the variables are computed.

Returns:

Matrix with the conditional covariance matrix.

Return type:

np.array

inference(evidence, var_names) -> (<built-in function array>, <built-in function array>)[source]

Compute the posterior conditional probability distribution conditioned to some given evidences. :param evidence: list of values fixed as evidences in the model. :type evidence: list :param var_names: list of variables measured in the model. :type var_names: list :return: (posterior mean, posterior covariance matrix) :rtype: (np.array, np.array)

maximum_a_posteriori(evidence, var_names)[source]

EDAspy.optimization.custom.probabilistic_models.kde_bayesian_network module

class EDAspy.optimization.custom.probabilistic_models.kde_bayesian_network.KDEBN(variables: list, white_list: list | None = None, black_list: list | None = None)[source]

Bases: ProbabilisticModel

This probabilistic model is a Kernel Density Estimation Bayesian network [1]. It allows dependencies between variables which have been estimated using KDE.

References

[1]: Atienza, D., Bielza, C., & Larrañaga, P. (2022). PyBNesian: an extensible Python package for Bayesian networks. Neurocomputing, 504, 204-209.

learn(dataset: array, num_folds: int = 10, *args, **kwargs)[source]

Learn a KDE Bayesian network from the dataset passed as argument.

Parameters:
  • dataset – dataset from which learn the KDEBN.

  • num_folds – Number of folds used for the SPBN learning. The higher, the more accurate, but also higher CPU demand. By default, it is set to 10.

sample(size: int) array[source]

Samples the KDE Bayesian network several times defined by the user. The dataset is returned as a numpy matrix. The sampling process is implemented using probabilistic logic sampling.

Parameters:

size – number of samplings of the KDE Bayesian network.

Returns:

array with the dataset sampled.

Return type:

np.array

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process.

Returns:

list of arcs between variables

Return type:

list

logl(data: DataFrame)[source]

Returns de log-likelihood of some data in the model.

Parameters:

data – dataset to evaluate its likelihood in the model.

Returns:

log-likelihood of the instances in the model.

Return type:

np.array

EDAspy.optimization.custom.probabilistic_models.multivariate_gaussian module

class EDAspy.optimization.custom.probabilistic_models.multivariate_gaussian.MultiGauss(variables: list, lower_bound: float, upper_bound: float)[source]

Bases: ProbabilisticModel

This class implements all the code needed to learn and sample multivariate Gaussian distributions defined by a vector of means and a covariance matrix among the variables. This is a simpler approach compared to Gaussian Bayesian networks, as multivariate Gaussian distributions do not identify conditional dependeces between the variables.

sample(size: int) array[source]

Samples the multivariate Gaussian distribution several times defined by the user. The dataset is returned as a numpy matrix.

Parameters:

size – number of samplings of the Gaussian Bayesian network.

Returns:

array with the dataset sampled.

Return type:

np.array

learn(dataset: array, *args, **kwargs)[source]

Estimates a multivariate Gaussian distribution from the dataset.

Parameters:

dataset – dataset from which learn the multivariate Gaussian distribution.

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process. Univariate approaches generate no-edged graphs.

Returns:

list of arcs between variables

Return type:

list

EDAspy.optimization.custom.probabilistic_models.semiparametric_bayesian_network module

class EDAspy.optimization.custom.probabilistic_models.semiparametric_bayesian_network.SPBN(variables: list, white_list: list | None = None, black_list: list | None = None)[source]

Bases: ProbabilisticModel

This probabilistic model is a Semiparametric Bayesian network [1]. It allows dependencies between variables which have been estimated using KDE with variables which fit a Gaussian distribution.

References

[1]: Atienza, D., Bielza, C., & Larrañaga, P. (2022). PyBNesian: an extensible Python package for Bayesian networks. Neurocomputing, 504, 204-209.

learn(dataset: array, num_folds: int = 10, *args, **kwargs)[source]

Learn a semiparametric Bayesian network from the dataset passed as argument.

Parameters:
  • dataset – dataset from which learn the SPBN.

  • num_folds – Number of folds used for the SPBN learning. The higher, the more accurate, but also higher CPU demand. By default, it is set to 10.

  • max_iters – number maximum of iterations for the learning process.

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process.

Returns:

list of arcs between variables

Return type:

list

sample(size: int) array[source]

Samples the Semiparametric Bayesian network several times defined by the user. The dataset is returned as a numpy matrix. The sampling process is implemented using probabilistic logic sampling.

Parameters:

size – number of samplings of the Semiparametric Bayesian network.

Returns:

array with the dataset sampled.

Return type:

np.array

logl(data: DataFrame)[source]

Returns de log-likelihood of some data in the model.

Parameters:

data – dataset to evaluate its likelihood in the model.

Returns:

log-likelihood of the instances in the model.

Return type:

np.array

EDAspy.optimization.custom.probabilistic_models.univariate_binary module

class EDAspy.optimization.custom.probabilistic_models.univariate_binary.UniBin(variables: list, upper_bound: float, lower_bound: float)[source]

Bases: ProbabilisticModel

This is the simplest probabilistic model implemented in this package. This is used for binary EDAs where all the solutions are binary. The implementation involves a vector of independent probabilities [0, 1]. When sampling, a random float is sampled [0, 1]. If the float is below the probability, then the sampling is a 1. Thus, the probabilities show probabilities of a sampling being 1.

sample(size: int) array[source]

Samples new solutions from the probabilistic model. In each solution, each variable is sampled from its respective binary probability.

Parameters:

size – number of samplings of the probabilistic model.

Returns:

array with the dataset sampled.

Return type:

np.array

learn(dataset: array, *args, **kwargs)[source]

Estimates the independent probability of each variable of being 1.

Parameters:

dataset – dataset from which learn the probabilistic model.

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process. Univariate approaches generate no-edged graphs.

Returns:

list of arcs between variables

Return type:

list

EDAspy.optimization.custom.probabilistic_models.univariate_categorical module

EDAspy.optimization.custom.probabilistic_models.univariate_categorical.obtain_probabilities(array) dict[source]
class EDAspy.optimization.custom.probabilistic_models.univariate_categorical.UniCategorical(variables: list)[source]

Bases: ProbabilisticModel

This probabilistic model is discrete and univariate.

learn(dataset: array, *args, **kwargs)[source]

Estimates the independent categorical probability distribution for each variable.

Parameters:

dataset – dataset from which learn the probabilistic model.

sample(size: int) array[source]

Samples new solutions from the probabilistic model. In each solution, each variable is sampled from its respective categorical distribution.

Parameters:

size – number of samplings of the probabilistic model.

Returns:

array with the dataset sampled

Return type:

np.array

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process. Univariate approaches generate no-edged graphs.

Returns:

list of arcs between variables

Return type:

list

EDAspy.optimization.custom.probabilistic_models.univariate_gaussian module

class EDAspy.optimization.custom.probabilistic_models.univariate_gaussian.UniGauss(variables: list, lower_bound: float)[source]

Bases: ProbabilisticModel

This class implements the univariate Gaussians. With this implementation we are updating N univariate Gaussians in each iteration. When a dataset is given, each column is updated independently. The implementation involves a matrix with two rows, in which the first row are the means and the second one, are the standard deviations.

sample(size: int) array[source]

Samples new solutions from the probabilistic model. In each solution, each variable is sampled from its respective normal distribution.

Parameters:

size – number of samplings of the probabilistic model.

Returns:

array with the dataset sampled

Return type:

np.array

learn(dataset: array, *args, **kwargs)[source]

Estimates the independent Gaussian for each variable.

Parameters:

dataset – dataset from which learn the probabilistic model.

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process. Univariate approaches generate no-edged graphs.

Returns:

list of arcs between variables

Return type:

list

EDAspy.optimization.custom.probabilistic_models.univariate_kde module

class EDAspy.optimization.custom.probabilistic_models.univariate_kde.UniKDE(variables: list)[source]

Bases: ProbabilisticModel

This class implements the univariate Kernel Density Estimation. With this implementation we are updating N univariate KDE in each iteration. When a dataset is given, each column is updated independently.

sample(size: int) array[source]

Samples new solutions from the probabilistic model. In each solution, each variable is sampled from its respective normal distribution.

Parameters:

size – number of samplings of the probabilistic model.

Returns:

array with the dataset sampled

Return type:

np.array

learn(dataset: array, *args, **kwargs)[source]

Estimates the independent KDE for each variable.

Parameters:

dataset – dataset from which learn the probabilistic model.

print_structure() list[source]

Prints the arcs between the nodes that represent the variables in the dataset. This function must be used after the learning process. Univariate approaches generate no-edged graphs.

Returns:

list of arcs between variables

Return type:

list

Module contents