C# | KMeans聚类算法的实现,轻松将数据点分组成具有相似特征的簇
发布人:shili8
发布时间:2023-07-01 10:01
阅读次数:49
KMeans聚类算法是一种常用的无监督学习算法,用于将数据点分组成具有相似特征的簇。在本文中,我们将使用C#语言实现KMeans聚类算法,并提供一些代码示例和注释来帮助您理解算法的实现过程。
首先,我们需要定义一个数据点的类,用于存储数据点的特征和所属的簇。代码示例如下:
csharp public class DataPoint { public double[] Features { get; set; } public int Cluster { get; set; } }
接下来,我们需要实现KMeans聚类算法的主要逻辑。代码示例如下:
csharp public class KMeans { private int k; // 簇的数量 private ListdataPoints; // 数据点集合 private List centroids; // 质心集合 public KMeans(int k List dataPoints) { this.k = k; this.dataPoints = dataPoints; this.centroids = new List (); } public void Cluster() { // 初始化质心 InitializeCentroids(); bool converged = false; while (!converged) { // 分配数据点到最近的质心 AssignDataPointsToCentroids(); // 更新质心的位置 converged = UpdateCentroids(); } } private void InitializeCentroids() { // 随机选择k个数据点作为初始质心 Random random = new Random(); for (int i = 0; i < k; i++) { int index = random.Next(dataPoints.Count); centroids.Add(dataPoints[index]); } } private void AssignDataPointsToCentroids() { foreach (DataPoint dataPoint in dataPoints) { double minDistance = double.MaxValue; int minIndex = -1; for (int i = 0; i < k; i++) { double distance = CalculateDistance(dataPoint.Features centroids[i].Features); if (distance < minDistance) { minDistance = distance; minIndex = i; } } dataPoint.Cluster = minIndex; } } private bool UpdateCentroids() { bool converged = true; for (int i = 0; i < k; i++) { List clusterDataPoints = dataPoints.Where(dp => dp.Cluster == i).ToList(); if (clusterDataPoints.Count > 0) { double[] newCentroid = new double[dataPoints[0].Features.Length]; for (int j = 0; j < dataPoints[0].Features.Length; j++) { double sum = 0; foreach (DataPoint dataPoint in clusterDataPoints) { sum += dataPoint.Features[j]; } newCentroid[j] = sum / clusterDataPoints.Count; } if (!centroids[i].Features.SequenceEqual(newCentroid)) { centroids[i].Features = newCentroid; converged = false; } } } return converged; } private double CalculateDistance(double[] features1 double[] features2) { double sum = 0; for (int i = 0; i < features1.Length; i++) { sum += Math.Pow(features1[i] - features2[i] 2); } return Math.Sqrt(sum); } }
现在,我们可以使用上述代码来进行数据点的聚类。代码示例如下:
csharp ListdataPoints = new List { new DataPoint { Features = new double[] { 1 2 } } new DataPoint { Features = new double[] { 2 1 } } new DataPoint { Features = new double[] { 5 6 } } new DataPoint { Features = new double[] { 6 5 } } new DataPoint { Features = new double[] { 10 12 } } new DataPoint { Features = new double[] { 12 10 } } }; KMeans kMeans = new KMeans(2 dataPoints); kMeans.Cluster(); foreach (DataPoint dataPoint in dataPoints) { Console.WriteLine($Data point: [{string.Join( dataPoint.Features)}] Cluster: {dataPoint.Cluster}); }
上述代码中,我们创建了一个包含6个数据点的列表,并使用KMeans聚类算法将数据点分为2个簇。最后,我们打印每个数据点的特征和所属的簇。
希望本文能够帮助您理解和实现KMeans聚类算法。请注意,上述代码示例仅为演示目的,可能需要根据实际需求进行适当的修改和优化。