Yesterday, Neil Kodner wrote an interesting post in which he scraped and analysed the tribute messages for Steve Jobs on the Apple website. Some interesting insights were, for example, that people talked about the Mac and iPhone the most, and compared Steve Jobs with great minds like Einstein, Ford and Edison. Also, Neil found that ‘rest in peace’ was the most used trigram in all the messages.
Seeing this, it made me think of applying the apriori algorithm, which I recently implemented for my Web Text Mining class, to the tribute messages. The apriori algorithm explained according to wikipedia:
In computer science and data mining, Apriori is a classic algorithm for learning association rules. As is common in association rule mining, given a set of itemsets, the algorithm attempts to find subsets which are common to at least a minimum number C of the itemsets.
The way me and my group-mate Rene Dekker implemented it, the algorithm extracts association rules for words from a sentence or document (stopwords, punctuation and numbers are removed from analysis). So, I took the text file with tribute messages and applied the algorithm to see what word combinations are used frequently within one tribute message. I’ll get into the algorithm in a later blogpost, but here are the results for a minimum support level of 1% and minimum confidence level of 85%.
Interpreting the results
Jobs, friends, condolences -> family
This means that the four words ‘Jobs’, ‘friends’, ‘condolences’, & ‘family’ together (but not necessarily next to each other) occur in at least 1% of the tribute messages. Also, when ‘Jobs’, ‘friends’, & ‘condolences’ occur, at least 85% of the times the word ‘family’ is also present in the message.
friends, Steve -> family Peace, Jobs -> Steve Thank, Jobs, us -> Steve world, friends, condolences -> family Mr, world -> Jobs Jobs, computers -> Steve know, friends -> family friends, many -> family Mr, friends -> family friends -> family iPad, Jobs, Apple -> Steve Jobs, created -> Steve condolences, friends -> family Mr, friends -> Jobs friends, lost -> family go, friends -> family friends, Apple -> family never, friends, Steve -> family people, Jobs, world -> Steve friends, like -> family life, friends, Steve -> family Jobs, friends, condolences -> family friends, thoughts -> family friends, always -> family never, Mr -> Jobs friends, Steves -> family friends, Apple, condolences -> family world, friends, Steve -> family friends, man -> family condolences, Steves -> family Jobs, life, great -> Steve prayers, friends -> family Jobs, world, changed -> Steve human, Jobs -> Steve friends, Apple, Steve -> family brought, Jobs -> Steve friends, condolences, Steve -> family friends, condolences -> family friends, us -> family
I’ll elaborate more on the algorithm and different improvements in efficiency and usefulness we made in a later blogpost — please stay tuned.