当前位置:实例文章 » C#开发实例» [文章]C# | KMeans聚类算法的实现,轻松将数据点分组成具有相似特征的簇

C# | KMeans聚类算法的实现,轻松将数据点分组成具有相似特征的簇

发布人:shili8 发布时间:2023-07-01 10:01 阅读次数:49

KMeans聚类算法是一种常用的无监督学习算法,用于将数据点分组成具有相似特征的簇。在本文中,我们将使用C#语言实现KMeans聚类算法,并提供一些代码示例和注释来帮助您理解算法的实现过程。

首先,我们需要定义一个数据点的类,用于存储数据点的特征和所属的簇。代码示例如下:

csharp
public class DataPoint
{
    public double[] Features { get; set; }
    public int Cluster { get; set; }
}


接下来,我们需要实现KMeans聚类算法的主要逻辑。代码示例如下:

csharp
public class KMeans
{
    private int k; // 簇的数量
    private List dataPoints; // 数据点集合
    private List centroids; // 质心集合

    public KMeans(int k List dataPoints)
    {
        this.k = k;
        this.dataPoints = dataPoints;
        this.centroids = new List();
    }

    public void Cluster()
    {
        // 初始化质心
        InitializeCentroids();

        bool converged = false;
        while (!converged)
        {
            // 分配数据点到最近的质心
            AssignDataPointsToCentroids();

            // 更新质心的位置
            converged = UpdateCentroids();
        }
    }

    private void InitializeCentroids()
    {
        // 随机选择k个数据点作为初始质心
        Random random = new Random();
        for (int i = 0; i < k; i++)
        {
            int index = random.Next(dataPoints.Count);
            centroids.Add(dataPoints[index]);
        }
    }

    private void AssignDataPointsToCentroids()
    {
        foreach (DataPoint dataPoint in dataPoints)
        {
            double minDistance = double.MaxValue;
            int minIndex = -1;

            for (int i = 0; i < k; i++)
            {
                double distance = CalculateDistance(dataPoint.Features centroids[i].Features);
                if (distance < minDistance)
                {
                    minDistance = distance;
                    minIndex = i;
                }
            }

            dataPoint.Cluster = minIndex;
        }
    }

    private bool UpdateCentroids()
    {
        bool converged = true;

        for (int i = 0; i < k; i++)
        {
            List clusterDataPoints = dataPoints.Where(dp => dp.Cluster == i).ToList();

            if (clusterDataPoints.Count > 0)
            {
                double[] newCentroid = new double[dataPoints[0].Features.Length];

                for (int j = 0; j < dataPoints[0].Features.Length; j++)
                {
                    double sum = 0;
                    foreach (DataPoint dataPoint in clusterDataPoints)
                    {
                        sum += dataPoint.Features[j];
                    }

                    newCentroid[j] = sum / clusterDataPoints.Count;
                }

                if (!centroids[i].Features.SequenceEqual(newCentroid))
                {
                    centroids[i].Features = newCentroid;
                    converged = false;
                }
            }
        }

        return converged;
    }

    private double CalculateDistance(double[] features1 double[] features2)
    {
        double sum = 0;
        for (int i = 0; i < features1.Length; i++)
        {
            sum += Math.Pow(features1[i] - features2[i] 2);
        }

        return Math.Sqrt(sum);
    }
}


现在,我们可以使用上述代码来进行数据点的聚类。代码示例如下:

csharp
List dataPoints = new List
{
    new DataPoint { Features = new double[] { 1 2 } }
    new DataPoint { Features = new double[] { 2 1 } }
    new DataPoint { Features = new double[] { 5 6 } }
    new DataPoint { Features = new double[] { 6 5 } }
    new DataPoint { Features = new double[] { 10 12 } }
    new DataPoint { Features = new double[] { 12 10 } }
};

KMeans kMeans = new KMeans(2 dataPoints);
kMeans.Cluster();

foreach (DataPoint dataPoint in dataPoints)
{
    Console.WriteLine($Data point: [{string.Join(  dataPoint.Features)}] Cluster: {dataPoint.Cluster});
}


上述代码中,我们创建了一个包含6个数据点的列表,并使用KMeans聚类算法将数据点分为2个簇。最后,我们打印每个数据点的特征和所属的簇。

希望本文能够帮助您理解和实现KMeans聚类算法。请注意,上述代码示例仅为演示目的,可能需要根据实际需求进行适当的修改和优化。

其他信息

其他资源

Top