febhd_clustering.FebHD¶
-
class
febhd_clustering.FebHD(clusters: int, features: int, dim: int = 4000)¶ Bases:
objectHyperdimensional clustering algorithm. FebHD utilizes a (c, d) sized tensor for the model initialized empty. Every vector of this matrix is the high dimensional representation of a cluster. One learning algorithm starts (i.e. through fit) the clusters are initialized from input data randomly. After this, iterative algorithm starts.
During each iteration HDCluster updates the model based on the most similar samples. This iteration continues until the prediction of cluster for all samples remains unchanged for two iterations in a row, or until a preset number of iterations is achieved.
- Parameters
Example
>>> import febhd_clustering >>> dim = 10000 >>> n_samples = 1000 >>> features = 100 >>> clusters = 5 >>> x = torch.randn(n_samples, features) # dummy data >>> model = febhd_clustering.FebHD(clusters, features, dim=dim) >>> if torch.cuda.is_available(): ... print('Training on GPU!') ... model = model.to('cuda') ... x = x.to('cuda') ... Training on GPU! >>> model.fit(x, epochs=10) >>> ypred = model(x) >>> ypred.size() torch.Size([1000])
-
__call__(x: torch.Tensor, encoded: bool = False)¶ Returns the predicted cluster of each data point in x.
- Parameters
x (
torch.Tensor) – The data points to predict. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).encoded (bool) – Specifies if input data is already encoded.
- Returns
The predicted class of each data point. Has size (n?,).
- Return type
-
encode(x: torch.Tensor)¶ Encodes input data
See also
febhd.Encoderfor more information.
-
fit(x: torch.Tensor, encoded: bool = False, epochs: int = 40, batch_size: Optional[Union[int, float]] = None, adaptive_update: bool = True, binary_update: bool = False)¶ Starts learning process using datapoints x as input.
- Parameters
x (
torch.Tensor) – Input data points. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).encoded (bool) – Specifies if input data is already encoded.
epochs (int, > 0) – Max number of epochs allowed.
batch_size (int, > 0 and <= n?, or float, > 0 and <= 1, or None) – If int, the number of samples to use in each batch. If float, the fraction of the samples to use in each batch. If none the whole dataset will be used per epoch (same if used 1.0 or n?).
adaptive_update (bool) – Whether to use adaptive update or not.
binary_update (bool) – Whether to use binarized datapoints to update the clustering model or not.
- Returns
self
- Return type
-
predict(x: torch.Tensor, encoded: bool = False)¶ Returns predicted class of each element in x. See
__call__()for details.
-
probabilities(x: torch.Tensor, encoded: bool = False)¶ Returns the probabilities of belonging to a certain cluster for each data point in x.
- Parameters
x (
torch.Tensor) – The data points to use. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).encoded (bool) – Specifies if input data is already encoded.
- Returns
The cluster probability of each data point. Has size (n?, clusters).
- Return type
-
scores(x: torch.Tensor, encoded: bool = False)¶ Returns the hamming similarity of each datapoint in x with each cluster hypervector. The output of this function is the matrix \(\delta\) given by:
\[\delta_{ij} = 1 - \frac{H(sign(x_i), sign(models_j))}{dim}\]Where \(x\) is the input data, \(models\) are the cluster hypervectors and \(H(\cdot, \cdots)\) is the hamming similarity.
- Parameters
x (
torch.Tensor) – The data points to score. Must have size (n?, dim) if encoded=False, otherwise must have size (n?, features).encoded (bool) – Specifies if input data is already encoded.
- Returns
The predicted class of each data point. Has size (n?, clusters).
- Return type