febhd_clustering.FebHD¶

class febhd_clustering.FebHD(clusters: int, features: int, dim: int = 4000)¶

Bases: object

Hyperdimensional clustering algorithm. FebHD utilizes a (c, d) sized tensor for the model initialized empty. Every vector of this matrix is the high dimensional representation of a cluster. One learning algorithm starts (i.e. through fit) the clusters are initialized from input data randomly. After this, iterative algorithm starts.

During each iteration HDCluster updates the model based on the most similar samples. This iteration continues until the prediction of cluster for all samples remains unchanged for two iterations in a row, or until a preset number of iterations is achieved.

Parameters

clusters (int, > 0) – The number of clusters of the problem.
features (int, > 0) – Dimensionality of original data.
dim (int, > 0) – The target dimensionality of the high dimensional representation.

Example

>>> import febhd_clustering
>>> dim = 10000
>>> n_samples = 1000
>>> features = 100
>>> clusters = 5
>>> x = torch.randn(n_samples, features) # dummy data
>>> model = febhd_clustering.FebHD(clusters, features, dim=dim)
>>> if torch.cuda.is_available():
...     print('Training on GPU!')
...     model = model.to('cuda')
...     x = x.to('cuda')
...
Training on GPU!
>>> model.fit(x, epochs=10)
>>> ypred = model(x)
>>> ypred.size()
torch.Size([1000])

__call__(x: torch.Tensor, encoded: bool = False)¶

Returns the predicted cluster of each data point in x.

Parameters

x (torch.Tensor) – The data points to predict. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).
encoded (bool) – Specifies if input data is already encoded.

Returns

The predicted class of each data point. Has size (n?,).

Return type

torch.Tensor

encode(x: torch.Tensor)¶: Encodes input data

See also

febhd.Encoder for more information.

fit(x: torch.Tensor, encoded: bool = False, epochs: int = 40, batch_size: Optional[Union[int, float]] = None, adaptive_update: bool = True, binary_update: bool = False)¶

Starts learning process using datapoints x as input.

Parameters

x (torch.Tensor) – Input data points. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).
encoded (bool) – Specifies if input data is already encoded.
epochs (int, > 0) – Max number of epochs allowed.
batch_size (int, > 0 and <= n?, or float, > 0 and <= 1, or None) – If int, the number of samples to use in each batch. If float, the fraction of the samples to use in each batch. If none the whole dataset will be used per epoch (same if used 1.0 or n?).
adaptive_update (bool) – Whether to use adaptive update or not.
binary_update (bool) – Whether to use binarized datapoints to update the clustering model or not.

Returns

self

Return type

FebHD

predict(x: torch.Tensor, encoded: bool = False)¶: Returns predicted class of each element in x. See __call__() for details.

probabilities(x: torch.Tensor, encoded: bool = False)¶

Returns the probabilities of belonging to a certain cluster for each data point in x.

Parameters

x (torch.Tensor) – The data points to use. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).
encoded (bool) – Specifies if input data is already encoded.

Returns

The cluster probability of each data point. Has size (n?, clusters).

Return type

torch.Tensor

scores(x: torch.Tensor, encoded: bool = False)¶

Returns the hamming similarity of each datapoint in x with each cluster hypervector. The output of this function is the matrix \(\delta\) given by:

\[\delta_{ij} = 1 - \frac{H(sign(x_i), sign(models_j))}{dim}\]

Where \(x\) is the input data, \(models\) are the cluster hypervectors and \(H(\cdot, \cdots)\) is the hamming similarity.

Parameters

x (torch.Tensor) – The data points to score. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).
encoded (bool) – Specifies if input data is already encoded.

Returns

The predicted class of each data point. Has size (n?, clusters).

Return type

torch.Tensor

to(*args)¶

Moves data to the device specified, e.g. cuda, cpu or changes dtype of the data representation, e.g. half or double. Because the internal data is saved as torch.tensor, the parameter can be anything that torch accepts. The change is done in-place.

Returns: self
Return type: FebHD