In this activity, we get to know a simple example of a recommendation system. Students find the activity attractive because of its topic – picking their favourite cartoons –, and at the same time, are curious about how video recommendation websites actually work. We can also use the activity as a starting point for a discussion about how such systems that recommend us not just videos but also news and social media posts, tend to trap us into bubbles by showing us only the tyoe of content we already know and like.
The activity has a mathematical foundation, and if we decide to carry it out with students older than ten or eleven, we can also incorporate set operations (intersections, unions); with the younger students on the other hand, we calculate the sizes of intersections, but without using the mathematical terminology. In the activity, students also draw a similarity graph, a kind of sociogram, but again, we don’t need to use these terms.
One way to do the activity is by starting with the students selecting the cards with their favourite cartoons, and then, divided into different groups, having each group create a recommendation system on a piece of paper. We then use the computer and continue with a recommendation system based on the data for the entire class. In this version, the activity takes two school hours (i.e. 90 minutes). If we want to make it shorter however, we can skip the first part and start directly with the students entering their data into the computers (or tablets). However, that makes the activity less fun and less educational, since in that version, the students don’t construct a recommendation system themselves but only observe it.
To introduce the topic of recommendation systems, we engage the students in a discussion about their experiences with video websites – we ask them whether they use such sites and how they think these websites work. They will probably say that the sites recommend them videos that are similar to those they typically watch. We can then ask them how they think a website like that decides which videos are similar to each other. If the students reply that it does that based on descriptions next to the videos, we can argue that those descriptions are typically really short. But more importantly – how does the computer compare them? Are two descriptions similar if they contain similar words?
We promise the students to show them how a video recommendation system that doesn’t rely on such descriptions would work.
Before bringing the activity into the classroom, we set up a data entry form on the data entry website (https://data.pumice.si/cartoons). We decide on the number of cartoons that each student will need to select (the default setting of 7 should be suitable for the fouth grade, i.e. students aged 11). Doing so, we get a link, for instance https://data.pumice.si/princess-dragon. We enter this link into the tablets (or computers) that the students will use to enter their data.
You can test the data entry site yourself beforehand by submitting some random selections, or using an example containing fictitious data. For the work in the classroom, you should then create a new form with a new link.
We first divide the students into groups of approximately eight.
Each group receives an A3-sized sheet of paper and cards with images of cartoons; a good idea is to print out two sets of cards for each group, so that each cartoon appears twice. Each student also receives another piece of paper (A5 should do).
Each students selects 7 cartoons that they like the most. On the left side of their A5 sheet, they write down the names and reference numbers of their selected cartoons. (The same cartoon can be chosen by more than one student. We mention this because there was once an instance, where in one group, »the girls took all the coolest cartoons« so there were none left for the boys. :))
If we decide to do the activity with older students, we can allow them to pick a different number of cartoons, for example 6-9. In that case, the activity also includes calculating intersections and unions, as well as practising division and decimal numbers. Details for this second version are provided below (section Levelling Up).
On the right side of their A5 sheet, each student writes down the names of all the other students in their group, and next to each of those names, how many of the same cartoons as them they have chosen. If it makes it easier for them, they can first write down the reference numbers of the cartoons they have in common with each student, and then count them.
Then, each student circles the three members from their group who have chosen the most cartoons in common with them. If there’s more than one person that matches, they should just pick either one of them; what matters is that in the end, there are only three students circled.
Each group then takes their A3 sheet and writes on it the names of all the group members. After that, each member draws arrows pointing from their name to the 3 fellow members of the group who are most similar to them in their cartoon choices. We encourage the groups to make the drawing (a “sociogram”) as neat as possible, i.e. by writing the names of the students that are most similar to each other closer together, and having the arrows cross as little as possible.
Next to their own name, each group member also writes the reference numbers of the cartoons they have selected, in blue.
Finally, each of the group members writes, in red, the reference numbers of the cartoons selected by the three students who are most similar to each of them in their cartoon picks. If the same cartoon has been selected by two of the students who that particular student is connected with, they should put a circle around it; and if a cartoon has been chosen by three of them, they circle it twice.
The circled cartoons are each student’s personalised recommendations, they can look them up. Do they like the selection? Does the recommendation system work?
When the students are far enough along with their work (for example, when they are drawing on the A3 sheet), we hand to the groups the tablets, in which we have entered the link to the data entry form. The students search and select their cartoons.
Once they are done, the teacher transfers the data into their own computer; the link is the same as the data entry link, except it has »data« at the end, for example: http://data.pumice.si/princess-dragon/data. (The teacher can already try this at home beforehand, using this example of a similar kind of archive (https://pumice.si/aktivnosti/risanke/resources/princess-dragon.xlsx). We then use Orange to construct a workflow that looks like this:
Don’t panic, we’ll explain everything step by step :). You can also get the workflow from this site, and just change the URL in the classroom. However, it is of course much more interesting, if you construct it as you go. (For example, in one of our experiences with 4th graders, the students were curious about Orange, and wanted to know “What programme is that you’re using that just lets you drag things together like that?”)
In the widget File, we change the URL to our own one, for instance http://data.pumice.si/princessa-dragon/data (the actual URL will of course look different). In the widget Table, we can show the students what the collected data looks like: the rows represent the cartoons, while the columns represent the students. The value is 1 if a student has selected a particular cartoon, or 0 if they haven’t. Additionally, each row contains the cartoon title and the name of the file containing that cartoon’s poster. It’s pretty straightforward.
The widget Distances computes the distances between students. Here, we mustn’t forget to choose the correct settings: we compute the distances between columns, and use the Jaccard distance.
Using the Distance Matrix widget, we can, if we wish, demonstrate to the students, what the result of that calculation is. If two students have chosen the same cartoons, the difference will be 0, and if their choices are completely different, it will be 1. The used definition of difference/similarity is slightly different from the one used by students, who simply counted the number of common cartoons, but results are similar. If you don’t think showing and explaining this is necessary, you can easily skip this widget. (The Jaccard similarity is defined as the size of the intersection of cartoons, divided by the size of their union. The result is a number between 0 and 1, since the maximum size of the intersection cannot be higher than the size the union. To calculate the Jaccard distance, we subtract the Jaccard similarity from 1.)
We then come to the widget called Network of Neighbours. We set this widget by connecting each of the students with their three nearest neighbours.
The result can be seen in the Network Explorer widget.
The links between the students are directional: there are exactly three arrows leading from each student. Where there is more than one arrow pointing to a single student, that means that student probably has a more “mainstream” taste.
And finally, it’s time for the most important part: recommendations! The Recommendations widget needs to be fed data about similarity, i.e. the network we have obtained from the Network of Neighbours, as well as the data about cartoons, which is provided by the File widget. For this reason, we need to connect both those widgets to Recommendations. Now, each student can see, which cartoons they have selected, which other students are similar to them, and most of all, which cartoons this type of recommendation system recommends them to see.
If we allow students to select different numbers of cartoons, the similarity cannot be defined as the cartoons they have in common, since that would clearly lead to those who have selected a higher number of cartoons being “more similar” to each other. In this scenario, the number of cartoons that two students have in common needs to be divided by the total number of all the cartoons chosen by one or the other or both students. In more mathematical terms: the size of the intersection of the sets of the selected cartoons must be divided by the size of the union of those sets.
To be able to do that, the students need to be familiar with decimal numbers, meaning the activity is no longer suitable for fourth graders. Another option is to work around this limitation by multiplying the number of the cartoons the students have in common by 10 or even 100, and then calculating the integer quotient.
Students typically really enjoy this activity: they have fun selecting the cartoons, and are intrigued by computing similarity and drawing graphs. In one class, in the discussion towards the end of the activity, they told us they like the fact that they now know (more or less) how these types of websites work.