autopandas.generators package

Submodules

autopandas.generators.anm module

class autopandas.generators.anm.ANM(model=None)[source]

Bases: object

__init__(model=None)[source]

Data generator using multiple imputations with random forest (or another model).

Parameters:model – Model used for imputations.
fit(data, noise=False)[source]

Fit one random forest (or another model) for each column, given the others.

Parameters:noise – If True, add noise during sampling relative to the residual matrix
partial_fit_generate(n=1, p=0.8, replace=True, noise=False)[source]

Fit and generate for high dimensional case. To avoid memory error, features are trained and generated one by one.

Parameters:
  • n – Number of examples to sample
  • p – The probability of changing a value if p=0, the generated dataset will be equals to the original if p=1, the generated dataset will contains only new values
  • replace – If True, sample the original data with replacement before the imputations
  • noise – If True, add noise relative to the residual matrix. NOT IMPLEMENTED (not possible?)
Returns:

Generated data

Return type:

pd.DataFrame

sample(n=1, p=0.8, replace=True, noise=False)[source]

Generate n rows by copying data and then do values imputations.

Parameters:
  • n – Number of examples to sample
  • p – The probability of changing a value if p=0, the generated dataset will be equals to the original if p=1, the generated dataset will contains only new values
  • replace – If True, sample the original data with replacement before the imputations
  • noise – If True, add noise relative to the residual matrix
Returns:

Generated data

Return type:

pd.DataFrame

autopandas.generators.artificial module

class autopandas.generators.artificial.Artificial(method='moons')[source]

Bases: object

__init__(method='moons')[source]

Artificial data generator. Generate 2D classification datasets.

Parameters:method – ‘moons’, ‘blobs’ or ‘circles’.
sample(n=1, noise=0.01)[source]

Sample data from the artificial data generator.

Parameters:n – Number of artificial points to create.

autopandas.generators.autoencoder module

class autopandas.generators.autoencoder.AE(input_dim, layers=[], latent_dim=2, architecture='fully', loss='nll', optimizer='rmsprop', decoder_layers=None)[source]

Bases: object

__init__(input_dim, layers=[], latent_dim=2, architecture='fully', loss='nll', optimizer='rmsprop', decoder_layers=None)[source]

Autoencoder with fully connected layers.

Default behaviour: Symmetric layers but no weight sharing. Default behaviour: For CNN architecture, if latent_dim is None then there is no dense layers.

The latent space dimension depends on the convolutional layers in this case.
Parameters:
  • input_dim – Input/output size.
  • layers – Dimension of intermediate layers (encoder and decoder). It can be: - an integer (one intermediate layer) - a list of integers (several intermediate layers)
  • latent_dim – Dimension of latent space layer.
  • architecture – ‘fully’, ‘cnn’.
  • espilon_std – Standard deviation of gaussian distribution prior.
  • decoder_layers – Dimension of intermediate decoder layers for asymmetrical architectures.
distance(X, Y, **kwargs)[source]

Step 1: project X and Y in the learned latent space, Step 2: compute distance between the projections (NNAA score by default).

fit(X, X2=None, **kwargs)[source]
get_autoencoder()[source]
get_decoder()[source]
get_encoder()[source]
init_loss(loss='nll')[source]
init_model(architecture='fully')[source]
Parameters:architecture – ‘fully’, ‘cnn’
sample(n=100, loc=0, scale=1)[source]
Parameters:scale – Standard deviation of gaussian distribution prior.
siamese_distance(x, y, **kwargs)[source]

autopandas.generators.copula module

class autopandas.generators.copula.Copula[source]

Bases: object

__init__()[source]

Copula generator.

fit(data)[source]

Use the copula trick and train the generator with data.

Parameters:data – Data frame to use as training set.
sample(n=1, replace=False)[source]

Sample from trained generator.

Parameters:
  • n – Number of examples to sample.
  • replace – If True, sample with replacement.
autopandas.generators.copula.copula_generate(X, generator=None, n=None)[source]

Generate using copula trick.

Parameters:
  • generator – Model to fit and sample from. KDE by default.
  • n – Number of examples to generate. By default it is the number of observations in X.
autopandas.generators.copula.marginal_retrofit(Xartif, Xreal)[source]

Retrofit the marginal distributions of the features in Xartif to those in Xreal.

autopandas.generators.copula.matrix_to_rank(X)[source]
autopandas.generators.copula.rank_matrix_to_inverse(X)[source]
autopandas.generators.copula.rank_vector_to_inverse(x)[source]
autopandas.generators.copula.vector_to_rank(x, reverse=False)[source]

autopandas.generators.copycat module

class autopandas.generators.copycat.Copycat[source]

Bases: object

__init__()[source]

Baseline generator: simply copy training data.

fit(data)[source]

Train the generator with data.

Parameters:data – The data to copy.
sample(n=1, replace=False)[source]

Sample from train data.

Parameters:
  • n – Number of examples to sample.
  • replace – If True, sample with replacement.

autopandas.generators.gmm module

class autopandas.generators.gmm.GMM(**kwargs)[source]

Bases: object

__init__(**kwargs)[source]

Gaussian Mixture Model.

fit(data, **kwargs)[source]

Train the generator with data.

Parameters:data – The training data.
sample(n=1, **kwargs)[source]

Sample from trained GMM.

Parameters:n – Number of examples to sample.

autopandas.generators.kde module

class autopandas.generators.kde.KDE(**kwargs)[source]

Bases: object

__init__(**kwargs)[source]

Kernel Density Estimation (parzen windows).

fit(data, **kwargs)[source]

Train the generator with data.

Parameters:data – The training data.
sample(n=1, **kwargs)[source]

Sample from trained KDE.

Parameters:n – Number of examples to sample.

autopandas.generators.sae module

class autopandas.generators.sae.SAE(layers, normalization=False, **kwargs)[source]

Bases: autopandas.generators.autoencoder.AE

__init__(layers, normalization=False, **kwargs)[source]

Stacked Autoencoder. AE with submodel training.

Parameters:layers – Dimension list of layers including input, intermediate (at least one) and latent layer.
autoencode(X)[source]
decode(X)[source]
encode(X)[source]
fit(X, epochs=10, validation_data=None, **kwargs)[source]
normalize(X, i=None)[source]
reset_normalization()[source]
sample(n=100, loc=0, scale=1)[source]
Parameters:scale – Standard deviation of gaussian distribution prior.
autopandas.generators.sae.merge(model1, model2)[source]

autopandas.generators.vae module

class autopandas.generators.vae.KLDivergenceLayer(*args, **kwargs)[source]

Bases: tensorflow.python.keras.engine.base_layer.Layer

Identity transform layer that adds KL divergence to the final model loss.

__init__(*args, **kwargs)[source]
call(inputs)[source]

This is where the layer’s logic lives.

Parameters:
  • inputs – Input tensor, or list/tuple of input tensors.
  • **kwargs – Additional keyword arguments.
Returns:

A tensor or list/tuple of tensors.

class autopandas.generators.vae.VAE(input_dim, layers=[], latent_dim=2, architecture='fully', epsilon_std=1.0, loss='nll', optimizer='rmsprop', decoder_layers=None)[source]

Bases: autopandas.generators.autoencoder.AE

__init__(input_dim, layers=[], latent_dim=2, architecture='fully', epsilon_std=1.0, loss='nll', optimizer='rmsprop', decoder_layers=None)[source]

Variational Autoencoder.

Parameters:
  • input_dim – Input/output size.
  • layers – Dimension of intermediate layers (encoder and decoder). It can be: - an integer (one intermediate layer) - a list of integers (several intermediate layers)
  • latent_dim – Dimension of latent space layer.
  • architecture – ‘fully’, ‘cnn’.
  • espilon_std – Standard deviation of gaussian distribution prior.
  • decoder_layers – Dimension of intermediate decoder layers for asymmetrical architectures.

Module contents