Data Batches¶
-
pysgmcmc.data_batches.
generate_batches
(x, y, x_placeholder, y_placeholder, batch_size=20, seed=None)[source]¶ Infinite generator of random minibatches for a dataset.
For general reference on (infinite) generators, see: https://www.python.org/dev/peps/pep-0255/Parameters: - x (np.ndarray (N, D)) – Training data points/features
- y (np.ndarray (N, 1)) – Training data labels
- x_placeholder (tensorflow.placeholder) – Placeholder for batches of data from x.
- y_placeholder (tensorflow.placeholder) – Placeholder for batches of data from y.
- batch_size (int, optional) – Number of datapoints to put into a batch.
- seed (int, optional) – Random seed to use during batch generation. Defaults to None.
Yields: batch_dict (dict) – A dictionary that maps x_placeholder and y_placeholder to batch_size sized minibatches of data (numpy.ndarrays) from the dataset x, y.
Examples
Simple batch extraction example:
>>> import numpy as np >>> import tensorflow as tf >>> N, D = 100, 3 # 100 datapoints with 3 features each >>> x = np.asarray([np.random.uniform(-10, 10, D) for _ in range(N)]) >>> y = np.asarray([np.random.choice([0., 1.]) for _ in range(N)]) >>> x.shape, y.shape ((100, 3), (100,)) >>> x_placeholder, y_placeholder = tf.placeholder(dtype=tf.float64), tf.placeholder(dtype=tf.float64) >>> batch_size = 20 >>> gen = generate_batches(x, y, x_placeholder, y_placeholder, batch_size) >>> batch_dict = next(gen) # extract a batch >>> set(batch_dict.keys()) == set((x_placeholder, y_placeholder)) True >>> batch_dict[x_placeholder].shape, batch_dict[y_placeholder].shape ((20, 3), (20, 1))
Batch extraction resizes batch size if dataset is too small:
>>> import numpy as np >>> import tensorflow as tf >>> N, D = 10, 3 # 10 datapoints with 3 features each >>> x = np.asarray([np.random.uniform(-10, 10, D) for _ in range(N)]) >>> y = np.asarray([np.random.choice([0., 1.]) for _ in range(N)]) >>> x.shape, y.shape ((10, 3), (10,)) >>> x_placeholder, y_placeholder = tf.placeholder(dtype=tf.float64), tf.placeholder(dtype=tf.float64) >>> batch_size = 20 >>> gen = generate_batches(x, y, x_placeholder, y_placeholder, batch_size) >>> batch_dict = next(gen) # extract a batch >>> set(batch_dict.keys()) == set((x_placeholder, y_placeholder)) True >>> batch_dict[x_placeholder].shape, batch_dict[y_placeholder].shape ((10, 3), (10, 1))
In this case, the batches contain exactly all datapoints:
>>> np.allclose(batch_dict[x_placeholder], x), np.allclose(batch_dict[y_placeholder].reshape(N,), y) (True, True)
-
pysgmcmc.data_batches.
generate_shuffled_batches
(x, y, x_placeholder, y_placeholder, batch_size=20, seed=None)[source]¶ Infinite generator of shuffled random minibatches for a dataset.
For general reference on (infinite) generators, see: https://www.python.org/dev/peps/pep-0255/Parameters: - x (np.ndarray (N, D)) – Training data points/features
- y (np.ndarray (N, 1)) – Training data labels
- x_placeholder (tensorflow.placeholder) – Placeholder for batches of data from x.
- y_placeholder (tensorflow.placeholder) – Placeholder for batches of data from y.
- batch_size (int, optional) – Number of datapoints to put into a batch.
- seed (int, optional) – Random seed to use during batch generation (and for shuffling!). Defaults to None.
Yields: batch_dict (dict) – A dictionary that maps x_placeholder and y_placeholder to batch_size sized minibatches of data (numpy.ndarrays) from the dataset x, y.
Examples
Simple shuffled batch extraction example:
>>> import numpy as np >>> import tensorflow as tf >>> N, D = 100, 3 # 100 datapoints with 3 features each >>> x = np.asarray([np.random.uniform(-10, 10, D) for _ in range(N)]) >>> y = np.asarray([np.random.choice([0., 1.]) for _ in range(N)]) >>> x.shape, y.shape ((100, 3), (100,)) >>> x_placeholder, y_placeholder = tf.placeholder(dtype=tf.float64), tf.placeholder(dtype=tf.float64) >>> batch_size = 20 >>> gen = generate_shuffled_batches(x, y, x_placeholder, y_placeholder, batch_size) >>> batch_dict = next(gen) # extract a batch >>> set(batch_dict.keys()) == set((x_placeholder, y_placeholder)) True >>> batch_dict[x_placeholder].shape, batch_dict[y_placeholder].shape ((20, 3), (20, 1))
TODO: Demonstrate that shuffled batches are shuffled correctly, e.g. datapoint still matches corresponding label