MINERÍA la mejor puerta de acceso al sector minero MINERÍA / AGOSTO 2022 / EDICIÓN 539 41 Abstract The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. The task of actual data mining is the automatic or semi-automatic analysis of large amounts of data to extract interesting patterns hitherto unknown or that cannot be determined with the naked eye using traditional statistics, such as groups of data records (cluster analysis), unusual records (anomaly detection) and dependencies (association rule mining). For this purpose, artificial intelligence methods, machine learning and database systems are used. In this technical paper, the Machine Learning K-Means Algorithm was used, which is a clustering method that allows determining the optimal number of clusters by adjusting a model, for example, to establish whether there is one or more clusters in the oxide content present in the mineral and what is its impact on the recovery of valuable mineral in a copper flotation process by determining the limit values between clusters. Clustering is a technique for finding and classifying K data clusters. Thus, elements that share similar characteristics will be grouped together in the same cluster and separated from other clusters that do not share the same characteristics. K-means algorithm uses the distance between the data to find out if the data are similar or different. Observations that are similar will have a smaller distance between them. In general, the Euclidean distance is used as a measure, although other functions can also be used depending on the type of data, such as the Manhattan Distance, Minkowski Distance, Chebyshev Distance and Mahalanobis Distance. It should be mentioned that this is only the application of the K-means algorithm to determine the impact groups existing in the %Tox and their relationship with the recovery of a copper ore. The next step would be to continue to feed back the model to increase the current %Accuracy of 65% and increase its degree of prediction, this is called causal inference, but it is not the objective of this work. El clustering es una técnica para encontrar y clasificar K grupos de datos (clústeres). Así, los elementos que comparten características semejantes estarán juntos en un mismo grupo y separados de los otros grupos con los que no comparten características. Para saber si los datos son parecidos o diferentes el algoritmo K-Medias utiliza la distancia entre los datos. Las observaciones que se parecen tendrán una menor distancia entre ellas. En general, como medida se utiliza la distancia euclidiana, aunque también se pueden usar otras funciones dependiendo del tipo de datos como, por ejemplo, la Distancia de Manhattan, Distancia de Minkowski, Distancia de Chebyshev y la Distancia de Mahalanobis. Se debe mencionar que esto es solo la aplicación del uso del algoritmo de K-Means para determinar los grupos de impacto existentes en el %Tox y su relación con la recuperación de un mineral de cobre. El siguiente paso sería, continuar retroalimentando el modelo para aumentar el %Accurary actual de 65% e ir incrementado su grado de predicción, a esto se le denomina inferencia causal, pero no es el objetivo de este trabajo.