febhd_clustering.Encoder

class febhd_clustering.Encoder(features: int, dim: int = 4000)

Bases: object

The nonlinear encoder class maps data nonlinearly to high dimensional space. To do this task, it uses two randomly generated tensors:

\(B\). The (dim, features) sized random basis hypervectors, drawn from a standard normal distribution \(b\). An additional (dim,) sized base, drawn from a uniform distribution between \([0, 2\pi]\).

The hypervector \(H \in \mathbb{R}^D\) of \(X \in \mathbb{R}^f\) is:

\[H_i = \cos(X \cdot B_i + b_i) \sin(X \cdot B_i)\]
Parameters
  • features (int, > 0) – Dimensionality of original data.

  • dim (int, > 0) – Target dimension for output data.

__call__(x: torch.Tensor)

Encodes each data point in x to high dimensional space. The encoded representation of the (n?, features) samples described in \(x\), is the (n?, dim) matrix \(H\):

\[H_{ij} = \cos(x_i \cdot B_j + b_j) \sin(x_i \cdot B_j)\]

Note

This encoder is very sensitive to data preprocessing. Try making input have unit norm (normalizing) or standarizing each feature to have mean=0 and std=1/sqrt(features) (scaling).

Parameters

x (torch.Tensor) – The original data points to encode. Must have size (n?, features).

Returns

The high dimensional representation of each of the n? data points in x, which respects the equation given above. It has size (n?, dim).

Return type

torch.Tensor

to(*args)

Moves data to the device specified, e.g. cuda, cpu or changes dtype of the data representation, e.g. half or double. Because the internal data is saved as torch.tensor, the parameter can be anything that torch accepts. The change is done in-place.

Returns

self

Return type

Encoder