Hierarchical Clustering

It often makes sense to observe data by grouping them, for example, by finding groups of similar countries, images or people. The popular hierarchical clustering can be experience with a kinesthetic activity described here. To lead somewhere, it can be followed by one of activities in which we cluster real data.

Preparation of “data”

Draw the axes of the scatter diagram on the floor of the classroom or a suitably sized room (you will need about 3x3 metres) using painter’s tape. If we don’t have tape, we will just wave around a bit more.
We will place the students in a scatter diagram according to their math knowledge and football skills. We can give them random names, but we arrange them in such a way that we get three groups and a lone student. (Let’s make it fun: we declare a typical football player a mathematician, or vice versa. Of course, let’s make sure we don’t offend anyone.)
In the example above, we have two excellent footballers, Emma and Hannah, with Emma not being particularly good at math but Hannah not being much better either. Fanny, for example, excels at both. Ivy, on the other hand, is an excellent mathematician with two left feet for football.
We ask students this: if we were to divide students into groups according to their football and math skills, how many groups do we have? Here, probably three: footballers, mathematicians and those who are both. And we have Helga, who is not good at any of these.
Kako pa bi skupine določili z računalnikom? Računalnik ne more kar “videti skupin”, temveč potrebuje postopek. Odvisno od starosti in računalniškega znanja učencev, si morda lahko zamislijo kak algoritem - morda pa tudi ne. :)
But how would we use a computer to determine the groups? Computer cannot simply “see” the groups, like we can; it needs a procedure. Depending upon the students age and algorithmic skill, they may come up with an algorithm - or maybe not. :)

Clustering process

The below text describes the process for the specific arrangement from the image above. In the classroom, the names will be different, but the process will be the same.

At the beginning, we say that each student is a group on their own.
In the first step, we will group the two students who are most similar. Using a tape measure, we measure the distances (at least between those who are a little closer, for example Ginny and Cecilia, Ginny and Ivy, and Hannah and Benjamin). Hannah and Benjamin are the closest, so they will be in the same group. To remember this, Hannah puts her hand on Benjamin’s shoulder.
In the next step, we again search for the nearest pair. We measure and find that it is probably Ginny and Cecilia. Ginny puts her hand on Cecilia’s shoulder.
The next closest couple are Ginny and Ivy. So Ginny will put her hand on Ivan’s shoulder (or vice versa). That makes three students in this group.
We continue to do so. Sometimes we merge individuals, sometimes we add individuals to groups, towards the end we merge whole groups. It can get a bit tricky with the hands; it may happen that the person who is supposed to join the groups has both hands already occupied. Reorganize; instead of Ginny having her hand on Ivy’s shoulder, Ivy puts her on Ginny’s shoulder, freeing on of Ginny’s hands. Hands cannot run out.
Keep doing this as long as you can - either until the end, or until the arms get too short. Both are appropriate for explaining the algorithm. If we go to the end, we explain that we have obtained a hierarchy of groups. If we finish a little earlier, we say that the grouping stops when the groups get too big.

How does a computer do this?

It does the same, but some details cannot be swept under the rug.

Here we use measure distances as the crow flies. In practice, there are different definitions of distances, and we also need to be careful with scales. Body height in metres (between 1 and 2) is not icommensurable with weight in kilograms, which is between 30 and 100. They need to be scaled. Categorical variables, such as eye colour, should also be treated appropriately.

The second detail is the distance between the groups. How far apart are the two groups? As far as its nearest members or its furthest members? Or do we calculate the average distance? This is called the linkage function. In this demonstration we measure the closest distance, but in practice a more complicated definition - rouglhly the average distance - works much better.

This demonstration will typically be part of the processing of some concrete data and will continue by demonstrating the process in the Hierarchical Clustering widget.

Subject:
Age:
AI topic:

Preparation for the lesson
- painter's tape for marking the axis of the graph on the floor
- tape measure