Class KMeansPlusPlusClusterer<T extends Clusterable<T>>
- java.lang.Object
-
- org.apache.commons.math.stat.clustering.KMeansPlusPlusClusterer<T>
-
- Type Parameters:
T
- type of the points to cluster
public class KMeansPlusPlusClusterer<T extends Clusterable<T>> extends java.lang.Object
Clustering algorithm based on David Arthur and Sergei Vassilvitski k-means++ algorithm.- Since:
- 2.0
- See Also:
- K-means++ (wikipedia)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
KMeansPlusPlusClusterer.EmptyClusterStrategy
Strategies to use for replacing an empty cluster.
-
Constructor Summary
Constructors Constructor Description KMeansPlusPlusClusterer(java.util.Random random)
Build a clusterer.KMeansPlusPlusClusterer(java.util.Random random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.List<Cluster<T>>
cluster(java.util.Collection<T> points, int k, int maxIterations)
Runs the K-means++ clustering algorithm.
-
-
-
Constructor Detail
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(java.util.Random random)
Build a clusterer.The default strategy for handling empty clusters that may appear during algorithm iterations is to split the cluster with largest distance variance.
- Parameters:
random
- random generator to use for choosing initial centers
-
KMeansPlusPlusClusterer
public KMeansPlusPlusClusterer(java.util.Random random, KMeansPlusPlusClusterer.EmptyClusterStrategy emptyStrategy)
Build a clusterer.- Parameters:
random
- random generator to use for choosing initial centersemptyStrategy
- strategy to use for handling empty clusters that may appear during algorithm iterations- Since:
- 2.2
-
-
Method Detail
-
cluster
public java.util.List<Cluster<T>> cluster(java.util.Collection<T> points, int k, int maxIterations)
Runs the K-means++ clustering algorithm.- Parameters:
points
- the points to clusterk
- the number of clusters to split the data intomaxIterations
- the maximum number of iterations to run the algorithm for. If negative, no maximum will be used- Returns:
- a list of clusters containing the points
-
-