Data Mining Through Cluster Analysis with Python

Data Mining Through Cluster Analysis with Python

Learn about an important data mining technique, cluster analysis


This course is ideal for those that are interested in data mining, and it is a beginner course. You should have at least beginner or intermediate skill at Python, as the course focuses on the cluster analysis and much less on the programming portion.

Most data in the world (whether text,audio,visual, etc) is raw or unlabeled. This is precisely the reason that unsupervised machine learning has become so important. By using certain approaches to unsupervised machine learning (like clustering) we can discover patterns or underlying structures in data. This is a major component of exploratory data mining. Furthermore, when one does exploratory data mining, it is used to draw hypotheses, assess assumptions about our statistical inferences, and its used as a basis for further research. For example, the conclusion of a cluster analysis could result in the initiation of a full scale experiment.

The course covers two of the most important and common non-hierarchical clustering algorithms, K-means and DBSCAN using Python.

With K-Means, we start with a ‘starter’ (or simple) example. We then discuss ‘Completeness Score’. The next lesson we discuss how k-means deals with larger variances and different shapes. Then we discuss ‘Color Quantization’. This is used when an individual wants to decrease the size of an image/and or see if there is any underlying structure to an image. Finally, we will take a look at cells of the human body, and do some cell segmentation. For DBSCAN, we will look at a starter example as well using Blobs. Then I will show you how DBSCAN overcomes some of the issues of K-means.

If you are interested in data mining, and want to get a taste of how it works, this course is a great introduction!