I decided to try and use the ‘rtweet’ package tonight!
I tested it out by scraping 5 minutes of tweets directed at President Donald Trump (@realDonaldTrump), and then tried to analyze what sentiments were being sent to the commander in chief. I think the results are pretty cool!
First, I had to set up the twitter app and get the token and secret (it’s hidden, but I did it!)
Now it’s time to scrape some tweets!
Here I collected 5 minutes-worth of tweets mentioning "@realDonaldTrump" (11:41p - 11:46p EDT, Monday, August 14th 2017)
# stream tweets mentioning @realDonaldTrump for 5 minutes (300 sec)
#this_guy <- stream_tweets("realDonaldTrump", timeout = 300)
#write the streamed tweet data into a csv file
#write.csv(this_guy, file = "this_guy.csv")
#read the streamed tweet data csv file into a data frame
this_guy <- read.csv("this_guy.csv", header = TRUE)
#convert tweet text into character data
this_guy$text <- as.character(this_guy$text)
## When I knit this file, I don't want it to run another 5 minutes of tweet scraping, so I hashed out my original scrape, wrote a csv file with those tweets, then added in a read csv.
Okay! So now there is a data frame called “this_guy” with 8,007 tweets mentioning President Trump’s twitter account.
That’s 1,601 tweets per minute (tpm) or 26 tweets per second (tps)!
First - I only wanted to look at new tweets (no retweets) - so I filtered all the retweets out.
Now I was left with 3,672 non-retweet tweets. (734 tpm or 12 tps)
## n()
## 1 3672
Now the data is in good enough shape to look at:
What are the most common words in these tweets?
## # A tibble: 8,249 × 2
## word n
## <chr> <int>
## 1 realdonaldtrump 3155
## 2 https 2019
## 3 t.co 2018
## 4 jackposobiec 365
## 5 potus 242
## 6 president 169
## 7 home 155
## 8 white 155
## 9 people 148
## 10 jules_su 136
## # ... with 8,239 more rows
You can see that “realdonaldtrump” is the most common by far - that makes sense because tweets mentioning him are going to need his twitter handle.
Below is a histogram filtering out the first 3 hits (twitter handle and tags associated with website links)
We see “Jack Posobiec” was the most tweeted word - these are probably tweets that also mention his twitter handle (@jackposobiec). Apparently this guy is an alt-right conspiracy theorist? Will need to wikipedia more later.
EDIT 2:25am 8/15/17 - Apparently President Trump retweeted this guy Jack Posobiec who asked where the outrage was this weekend for all of the shootings in Chicago. People are making the connection that President Trump is okay with retweeting people who openly associate with the alt-right movement.
Moving on…
Let’s take a look at how frequent positive and negative words are. Word sentiment comes from the ‘bing’ dictionary that comes with the tidytext package.
There were a lot more negative words being tweeted at Donald Trump than positive words.
And now for everyone’s favorite: word clouds!
This first one is a general word cloud of the top 100 words that have sentiment data. Sentimented words?
This one is a comparison cloud for positive and negitave words!
Of note: “supremacy” was being coded as a positive word. In this case, since it is most likely being used for “white supremacy”, I removed it entirely from the data set. It was mentioned less than 50 times, as it does not show up in any of the frequency graphs, but did show up in the comparison cloud.
…
This has been an interesting (and very informative) excersise using the rtweet package and LOTS of help from the web version of the book “Text Mining with R” by Julia Silge and David Robinson. A lot of the code was adapted from their examples - you should buy their book!