Clustering Analysis of Customers Based on Purchasing Patterns with K-Means Clustering

Main Article Content

Wayne Joel Marcelino Lubis

Abstract

There are various techniques to classify data, one of which is clustering. What distinguishes clustering techniques from classification techniques is that they do not rely on the labels in the dataset. The main purpose of clustering is to divide data into several clusters based on similar characteristics, while Classification Technique is a technique of grouping data based on the similarity of the labels of the data under study. In this study, the dataset was created using secondary data from kaggle. The analysis process begins with data pre-processing to normalize the variables used, followed by the application of the K-Means Clustering method to group customers into several clusters based on the similarity of their purchasing patterns. This research demonstrates the potential of using clustering analysis to improve understanding of customer behavior and develop more effective business strategies.

Article Details

Section
Articles

References

S. Agarwal, "Data mining: Data mining concepts and techniques," Proceedings - 2013 International Conference on Machine Intelligence Research and Advancement, ICMIRA 2013, pp. 203-207, 2014, doi: 10.1109/ICMIRA.2013.45.

P. Tan, M. Steinbach, V. Kumar, T. Pang-Ning, M. Steinbach, and V. Kumar, "Introduction to data mining: Instructur’s," in Library of Congress, 2006, vol. 769.

M. Cui, "Introduction to the k-means clustering algorithm based on the elbow method," Accounting, Auditing and Finance, vol. 1, no. 1, pp. 5-8, 2020.

R. T. Ng and H. Jiawei, "CLARANS: a method for clustering objects for spatial data mining," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 5, pp. 1003-1016, 2002, doi: 10.1109/TKDE.2002.1033770.

J. J. Hox and H. R. Boeije, "Data collection, primary vs. secondary."

A. Famili, W.-M. Shen, R. Weber, and E. Simoudis, "Data preprocessing and intelligent data analysis," Intelligent data analysis, vol. 1, no. 1, pp. 3-23, 1997.

S. García, J. Luengo, and F. Herrera, Data preprocessing in data mining. Springer.

V. N. G. Raju, K. P. Lakshmi, V. M. Jain, A. Kalidindi, and V. Padma, "Study the Influence of Normalization/Transformation process on the Accuracy of Supervised Classification," Proceedings of the 3rd International Conference on Smart Systems and Inventive Technology, ICSSIT 2020, no. Icssit, pp. 729-735, 2020, doi: 10.1109/ICSSIT48917.2020.9214160.

E. Bisong, "Introduction to Scikit-learn," in Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners, E. Bisong Ed. Berkeley, CA: Apress, 2019, pp. 215-229.

K. R. Shahapure and C. Nicholas, "Cluster quality analysis using silhouette score," Proceedings - 2020 IEEE 7th International Conference on Data Science and Advanced Analytics, DSAA 2020, pp. 747-748, 2020, doi: 10.1109/DSAA49011.2020.00096.

G. Ogbuabor and U. F. N, "Clustering Algorithm for a Healthcare Dataset Using Silhouette Score Value," International Journal of Computer Science and Information Technology, vol. 10, no. 2, pp. 27-37, 2018, doi: 10.5121/ijcsit.2018.10203.

R. Nainggolan, R. Perangin-Angin, E. Simarmata, and A. F. Tarigan, "Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) optimized by using the Elbow Method," Journal of Physics: Conference Series, vol. 1361, no. 1, 2019, doi: 10.1088/1742-6596/1361/1/012015.

Abstract viewed = 0 times