This vignette presents a general overview of the clugen algorithm. A complete description of the algorithm’s theoretical framework is available in the article “Generating multidimensional clusters with support lines” (an open version is available on arXiv).
Clugen is an algorithm for generating multidimensional clusters. Each cluster is supported by a line segment, the position, orientation and length of which guide where the respective points are placed. For brevity, line segments will be referred to as lines.
Given an \(n\)-dimensional direction vector \(\mathbf{d}\) (and a number of additional parameters, which will be discussed shortly), the clugen algorithm works as follows (\(^*\) means the algorithm step is stochastic):
Figure 1 provides a stylized overview of the algorithm’s steps.
The example in Figure 1 was generated with the following parameters:
Parameter values | Description |
---|---|
\(n=2\) | Number of dimensions. |
\(c=4\) | Number of clusters. |
\(p=200\) | Total number of points. |
\(\mathbf{d}=\begin{bmatrix}1 & 1\end{bmatrix}^T\) | Average direction. |
\(\theta_\sigma=\pi/16\approx{}11.25^{\circ}\) | Angle dispersion. |
\(\mathbf{s}=\begin{bmatrix}10 & 10\end{bmatrix}^T\) | Average cluster separation. |
\(l=10\) | Average line length. |
\(l_\sigma=1.5\) | Line length dispersion. |
\(f_\sigma=1\) | Cluster lateral dispersion. |
Additionally, all optional parameters (not listed above) were left to
their default values. The complete list of parameters is presented in
the clugen()
function documentation.