Text Mining with the tm Library

Mhairi McNeill
16/09/2015

library(tm)

What I'll cover

  • Loading in the data
  • Cleaning text
  • Making a document-term matrix
  • Where you can go from there

Loading in Data

  • First make a source object
  • Then a corpus object
reviews <- read.csv("reviews.csv", stringsAsFactors=FALSE)

review_source <- VectorSource(reviews$text)
corpus <- Corpus(review_source)

Cleaning

inspect(corpus)
[1] " SO ADDICTING  DEFF DOWNLAOD ITS EPIC YOU CAT LOVERS WILL FALL IN LOVE <3"

Top Terms for each country

Country Top Terms
UK iphone, version, watch
US everything, wish, back
Australia gems, amount, phone
New Zealand clans, troops, thanks

Summary

  • Read in data
  • Clean text
  • Make document term matrix
  • Remove sparse terms
  • Do anything else you like

Summary

Thanks for listening!

https://github.com/mhairi