Goal of this blog post: to prove to myself (and other overly ambitious data scientists or those interested in television analytics) that personally curated content can lead to weird thought patterns.
NOTE: I am not in any way affiliated with Roku. Apparently I’m just very passionate about their product when I’m running on little sleep.
What is this new TV streaming app I’ve never seen before? (sidenote, when I just went back to this screen the This is Us picture was replaced with an annoucement that Boy Meets World is now on Hulu. Smartly played Hulu)
Free TV for me? As in just for me? As in aggregating my user preferences and clustering me with other users that have similar watching behavior? Genius, that’s pretty much my idea, but in execution so of course it’s better.
Whaaaat Roku has a handy search feature that searches across all streaming sites? Let’s search this movie that I remember being hilarious and seems super relevant to the current state of the world but no one has ever heard of. Is Roku going to judge me for using the search term “Dick”? Wait, is this movie titled Dick so it can fly under the radar? That’s my idea, to use loaded terms to fly under the search radar. Wait, the Koch Brothers do that so much better than I ever could, is that why their last name is a homophone with a company that is basically America’s greatest accomplishment?
NOTE: Since watching this movie (twice in two days) it has since been taken off my Roku channel? Coincidence? Probably.
NOTE p2: When it was on the Roku Channel Ryan Reynolds and Will Ferrell were first billed, Will Ferrell plays Bob Woodword, Ryan Reynolds’s part was so small I had no memory of him being in the movie. So I turned to imdb.
This is terrible search engine optimization. What happened to you imdb, who owns you?
Oh that’s right.
Okay let’s get to the actual movie. Kirsten Dunst and Michelle Williams play ditsy 15 year old girls who get to the bottom of the Watergate Scandal by stumbling their way into a friendship with Richard Nixon after getting lost on a class field trip. Best side plot line ever, Kirsten Dunst’s brother hides his pot in a walnut jar, hidden in plain sight in their kitchen, tells her it’s just the walnut leaves, she and Michelle Williams make cookies with the contaminated walnuts and they feed the resulting cookies to everyone in the White House.
Am I on to something? Will I be rewarded for finding out about this amazing Roku channel that is pretty much talking to me right now by finding this movie and watching it? I’ve googled “what streaming sites is Dick on” with very terrible results enough times to conciveably trigger some sort of watch list.
Ha Dick on TV, that’s a funny image.
Wait, I’m really on to something, all this deep study of the Bachelor is finally paying off. Politics is basically the bachelor. Richard Nixon = Dean Unglert, celebrity status sucks. And if Dean had to sign an iron clad contract not to disparage the Bachelor after his unflattering edit I can’t imagine what Richard Nixon had to sign. Maybe if I take a picture of it sideways the robots at the CIA won’t realize I’m on to them.
The CIA drives around in a van that says Plumbers? That’s not suspicious at all, flying under the radar. Oh wait, Plumbers is a superset of my new team name (Player Lifecycle Marketing, the u is for fun). Do I work for the CIA?
Pretty much. I have to take this video sideways and from far away because it’s so relevant to my life it has to be proprietary.
Baby Ryan Reynolds, I remember you, how could anyone forget? Oh wait, I forgot you were in this movie.
Are burps the only thing that are real? Is Elon Musk right, are we really just living in a simulation? Is this movie at this time a break in the simulation? Or am I just being personally recruited by Elon Musk after googling him so many times?
DEAN?!?! DEAN UNGLERT? Are you in on this too? Is the Bachelor just a front for the CIA? Do they have a hit out on Dean for being so vocal and candid about his experience on the show? Do they pick exotic locations because some Bachelor producer (or possibly Mike Fleiss) has to travel around the world to kill operatives? I knew Chuck Barris wasn’t crazy for writing Confessions of a Dangerous Mind.
Oh I am just high? It’s possible. I do live in Seattle. Yeah the Bachelor probably has no connection to the CIA.
**You’re so Vain by Carly Simon starts playing for the closing montage. This is the moment the whole movie has been leading up to, Nixon’s resignation. Brought down by two 15 year old girls just trying their best to be helpful. **
This song is so iconic, there’s no way they could have messed with this song. Wait what is Gavote?? Are they cutting up the flag?? I guess they can, they pretty much saved America by getting Nixon stoned enough to stop the war. I better shake this video as I’m taking it so it can’t be traced by robots owned by the CIA. Or just because I don’t want to have to pay for clips from this movie when I present my case to an interested party. Is that how movie rights work?
Clouds in my coffee? Carly Simon, you’re deep. Maybe this is just a really great movie made by someone who likes puns. I didn’t think puns were funny, but maybe they’re just too high brow for me to understand because this is probably my favorite still from a movie, punny sign and all.
SUPER DRUGS ARE REAL?!?! Or maybe the message is just follow your bliss and it doesn’t matter if you make money. Or maybe I should just invest my money in big pharma. Was this really a documentary hidden in plain sight with amazing actors a la HBO’s hit movies, Game Change and Too Big to Fail? Only one way to check, let’s consult the most reputable source I know.
Yep, basically a documentary.
In hindsight this maybe wasn’t the best thing to watch instead of sleeping.
Does this show exist because of a sponsorship deal with Walmart? If so, good job Walmart, your online ordering system is easy to use. Bravo for creatively getting off my blacklist.
I want to be just like Emily.
She don’t take no shit from no one.
Arie is hotter than he was when I was watching at age 20, my taste in men has really changed.
Chris Bukowski used to be kind of a good person. What has the Bachelor Franchise turned you into Chris?
Jef with one freaking f, what a fuckboi you turned out to be, roommates with Chase and Robby? Really! Really? Really.
I remember the night this epsiode came out, it was my first Bachelorette watch party. First we made fun of the mushroom farmer, we all fell in love with Jef’s skateboard limo exit, and some drunk idiot would not shut up in the important parts. After the episode my best friend and I sat in the middle of the street talking about our boy problems. College was so great, I should be more social.
The Bachelorette really is the strongest
sense sensory experience tied to memory.
It’s funny that the pressure of doing something every week kept me from doing something I love so much, analyzing bachelor data. I couldn’t get the thought of the judgement of bachelor nation out of my head.
When Lauren B started her style blog it was hyped up so much. Tons of “news articles” were written about it (wetpaint, bustle, etc), but they were just citing each other to get clicks. She had a launch party, but I doubt she did any of the planning. Other people were behind this blog trying to make money using the Lauren B brand. Eventually she stopped posting and and all I could think was what could Lauren B possibly have going on in her life that would stand in the way of her creating blog posts about her personal style? It turns out, a lot. We all saw on Ben and Lauren, Happily Ever After? the breakdown of a relationship, of an engagement. It wasn’t Ben and Lauren’s fault they were being overexposed, it was the people behind that show, the people encouraging us to judge them. I thought “Why is it Happily Ever After with a question mark? What could this perfect white bread couple actually have to struggle with?” I have a lot of thoughts on that unpopular spin off but that is for another post.
That judgement is so debilitating. Even if it’s not actually there. Even if I just imagine it. I think about thousands of Data Science PhDs reading my work and laughing at how uninformed it us. But that’s not reality. Why am I so worried about not telling my truth for fear of critique? Critique would make me better, it’s not someone laughing at how stupid I am.
It’s easy to forget that people on the Bachelor are actually people. It’s easy to forget how hard it is to put yourself out there for the world to judge. But it’s also easy to keep all of these thoughts in your head until it drives you insane and turns you into a shell of a person who feels like they can’t accomplish anything they set their mind too.
I’m cautiously optimistic that naming Arie as the next bachelor is a sign of things to change. Arie was on Emily Maynard’s season which aired in 2012, it was my first season of the Bachelorette and at the time I didn’t realize how amazing Emily was. I’ve been slowly rewatching and all of these thoughts are flooding into my head. Emily knew how much the producers wanted to be the Bachelorette because of her amazing story and she used it as a platform to be a total badass. Yet I’ve still seen criticisms of Emily as “not the smartest.”
It’s obvious that Emily knows her story is amazing, and knows her value. Yet I am so quick to tear down Kelsey Poe for doing the same thing, for telling the cameras that her story is amazing. The famous “isn’t my story amazing” sound bite was repeated over and over on the show making Kelsey look unstable and insane when in reality she was just grieving. Why am I so quick to judge someone going through a person tragedy based on 30 seconds of strategically placed footage?
I want to read more articles judging the producers, editors, Mike Fleiss, Chris Harrison, instead of criticizing people that who are just trying to live their best life, people who are just trying to get their own redemption edit. And I’m tired of sitting and waiting for the judgement to stop, the only thing I can do is get over my social anxiety and feeling sorry for myself and write them myself.
This was an especially crazy week for the Bachelor (and for the US). Corinne continued to be the perfect Bachelor villain or just very drunk, it’s too early to differentiate. We heard Liz repeat the same soundbite about her and Nick hooking up at Jade and Tanner’s wedding after every commercial break, and when Nick finally reveals the truth about their relationship on the group date we get hit with a “to be continued…”
Let’s see what people are talking about:
Timeline notes provided by my dad. His complete detailed minute by minute notes are here.
This week I wanted to look at location data. About 1% of the tweets I pulled had full location data, so to look at a bigger set I extracted state information from the user location description. Since the location description is freeform text I had to do some extra work to extract usable data. I made a dictionary of state names, abbreviations, and big unambiguous cities (no Springfields or Auburns) that mapped to each state. After this process I ended up with location data for a little over 50% of tweets. And then I made a ton of maps.
NOTE: This is not a random sample, if the other 50% of tweets had extractable locations it could completely change these maps. There are also not as many west coast tweeters in the time I collect tweets, so that influences the numbers.
Total Number of Tweets (Retweets Included)
Number of Tweeters
In the above two visualizations states in the south, particularly Alabama and South Carolina, have some of the most dramatic decreases from week 1 to week 2. These numbers could have been affected by the CFP National Championship that also aired at 8pm est, the two teams playing were Alabama and Clemson.
Number of Verified Tweeters
Most Talked about Contestant
Contestants with prominent story lines were excluded from analysis.
I spent a lot of time last week looking at people were saying about the women, this week I wanted to look at what people are saying about Nick. Almost none of it was flattering. I’m still a fan of Nick, but for one reason or another (producers) he is making some questionable decisions.
Last week I used nGram to pull out popular phrases about the top women. This works well when everyone is saying similar things because it just uses frequency. People were saying a lot of different things about Nick, and I had some ideas about the topics, but I wasn’t quite sure where to start. With a term frequency method some of the smaller topics (still 300-500) tweets would have gotten buried, and I would have never gotten to read all of the tweets about Nick’s camo button down.
To find out the kinds of things people are saying about Nick, I used LDA (Latent Dirichlet Allocation) to pull out popular topics. Here are some of the more well-defined topics that came out of the analysis and a few tweets that summarize each topic:
Nick and Danielle M:
Danielle is an absolute angel. Nick doesn’t deserve her NO ONE DOES -@summer95 Danielle M is the not only the best but as my mom said brings out the best in Nick #TeamDanielleM #hopeforthisshow -@Feeeeeeney Danielle is too good for Nick but can we please introduce her to LukePell -@haleyrgeorge
Nick and the Boob Hold:
Nick held my boobs today LIKE HE HELD MY BOOBS. This girl is literally the reason I hate everyone -@edem_ily Nick held my boobs today Like my BOOBS Great commentary Corrine -@BrittJayne28 Nick held my boobs today My BOOBS -Corinne future CEO and successful business woman -@Swainsch
911 yes hello Nick is wearing a camo button down -@monica_aldean Oh man Forgot how much Nick likes to wear jeggings -@Uve_Been_Duped Nick looks like a motor biker if all motor bikers were characters from Westside Story -@NealLovesYou
Nick and Kissing:
Well were only 10 minutes into TheBachelor and Nick has already kissed 6 girls -@Sammy10101 Nick is kissing everyone tonight He would have kissed Chris Harrison if he was on right now -@rhlederer Nick is bored with this though Hes like can we kiss or something -@paigeDav
Nick giving Corinne the group date rose:
I have been so Team Nick but giving Corrine the rose makes me think twice -@liz_lolol Nick gave the date rose to Corinne. Is he trying to prove hes scummy? -@Kinabutterjelly Nick whyy I stood up for you. WHYYYY HER #byecorrine -@CadieNGetz
Liz and Nick, the perfect drinking game to blackout:
Did Liz hook up with Nick at Jade and Tanner's wedding? I can’t tell -@expecthexpectd I'm taking a shot every time Liz mentions sleeping with Nick at the wedding #cirrhosis -@toni_nic0le SHE MET NICK AT JADE AND TANNER'S WEDDING OMG HIW DO WE NOT KNOW THIS YET? -@charex07
General Nick Criticism:
The robots on westworld have more natural conversations than Nick & the women on #TheBachelor... -@seabass5555 Why are all these girls literally like 13 years younger than Nick -@whitmv_ Oh Nick pls try to act like you're looking for a wife and not your extra 15 minutes boy bye #ICant -@ CarolinaGirlToo
So all of this criticism brings up the point, would Luke have been better bachelor? There are a lot of people who think so:
If Liz had just given Nick her number all those months ago then LUKE could've been our rightful bachelor TheBachelor -@emilynrutt Luke wouldn't have given the rose to Corinne -@ daniip13 Danielle and Luke would have been the cutest couple -@RachelJ614 I can't even watch the bachelor anymore -worst two episodes I’ve ever seen. Should have chosen Luke #trash -@lindseyytanner
I assumed that these people would be more focused in the south, but they are pretty spread out across the country with a huge hub in Texas. The below map shows the breakdown of tweets about Luke by state (retweets included).
We’ll see if Nick wins over the naysayers in weeks to come.
A few of the women have won over the haters this week:
Raven, Taylor, and Christen have really come up in the sentiment rankings. Danielle M is still a fan favorite, Corinne and Liz are still very hated, and we didn’t see much of anyone else this week.
Just a note, the next two episodes will have a combined analysis because I’ll be out of the country next week, hopefully I don’t miss the chance to watch The Most Dramatic Episode Ever live, but with the Liz drama probably wrapping up this week I think I’m safe.
Week one of The Bachelor did not disappoint. It was an episode filled with moments that are bound to provoke deep conversation, contestants with very clear villain and fan favorite edits, and an A+ bachelor. Let’s look at the data.
I collected ~168k tweets in a period of 4 hours and 15 minutes. I took out all retweets from analysis this week, this left me with a set of 95,075 tweets. Below is the number of tweets by minute over the period my tweet streamer was active (times in EST).
Obviously the west coast needs to step up its live tweeting game. Because they showed such a disappointing live tweet performance, I eliminated their tweets from the tweet count timeline. If you’re offended by the lack of west coast topics, tune in next week when I take a closer look at regional differences between tweet content.
In case you don’t remember exactly what was happening in the episode to cause the peaks of tweets, my dad took extremely detailed notes. It was the first episode he’s ever seen, but I think his timeline really captures what people are tweeting about at these popular times. Here are some excerpts of his timeline that correspond to the peak tweet times.
8:01:20 I'm nick and I'm the bachelor -- tastes funny coming out of the mouth 8:10:53 Nick gets advice from Ben Chris and Sean 8:12:57 Trust yourself....Be Nick 8:25:14 She [Corrine] has a nanny 8:38:34 Yellow dancer -- Christine [Christien] 8:39:13 Taylor in maroon 8:43:50 Hailey short intro 8:44:06 Lame joke [Do you know what a girl wearing underwear says?] 8:56:36 Red dresses 9:00:20 Shark or Dolphin?? 9:12:47 Girl talk --what a hoe [Corinne] 9:22:59 Liz and Nick are talking 9:34:10 Gives the first impression rose to Rachel 9:46:34 Hailey gets rose 9:46:55 Whitney get rose 9:51:28 Stay tuned for exciting highlights 9:59 ??? [Timeline cuts off at 9:51]
Obviously my Bachelor rookie dad didn’t realize the best part of the premiere is the “This season on the Bachelor…” promo.
A quick look at the data answers the question of what people were tweeting about at 9:59
My vagiiiiiineeeee is platinum DYING -@NikNacNicky Her vageeennn is platinum -@HollywoodTony28 Omg Did Corinne just say that -@ JMP119 My heart is gold but my vajeen is platinum. Corinne what would your nanny think -@monicakwatson
Oh right, the highlight of the premiere:
Oh Corinne, I don’t know what happened to you to make you the perfect Bachelor contestant. This prompts the question, what is the proper spelling of the word “vagine”? To answer I made a word cloud.
I also wanted to look at what people thought of this season’s contestants and determine some early fan favorites.
I used text tagging to separate out tweets by contestant. With text tagging I could define rules to classify each of the contestants’ tweets. Many people don’t tweet about the contestant by name, so I had to dive into the personality of each contestant.
Example rule for a singular name:
To separate the two Danielles I used different rule sets.
Even with this deep analysis there were still tweets that didn’t fall into Danielle L or Danielle M’s categories, so there is also an Ambiguous Danielle tag.
To determine fan favorites and contestants fans love to hate I used Sentiment Analysis. Sentiment Analysis scores each tweet based on a dictionary of positive and negative words and adds the word scores to determine if a tweet is positive, negative, or neutral.
Here are some examples of the scoring process:
Vanessa looks very fertile -@StevenWoahdick Shark girl got skills And gills -@SarahJulson
Positive tweets have scores greater than 0. These tweets are positive because words skills and fertile have a score of 1. Since both tweets have one positive word they both have a score of 1.
I do not like Josephine. Do not. Not at all. She's crazy. Mark my words AND THIS BITCH BROUGHT A HOT DOG OMG I TOLD YOU CRAZY -@aMyLyNn1984
Negative tweets have scores less than 0. This tweet is negative and has multiple words with a score of -1: crazy x 2, not like, bitch. If you add these -1 scores together you get the score of the tweet, -4.
Danielle is a host from westworld -@jvandegriff92 Corinne is gorgeous and successful but her nasty attitude will be her downfall -@paper_canyon
Neutral tweets have scores of 0. The first tweet has no words in the sentiment dictionary so its default score is 0. The second tweet has positive and negative words whose scores add up to 0: gorgeous 1, successful 1, nasty -1, downfall -1.
There is not one perfect sentiment dictionary for all situations, different types of text need different sentiment dictionaries, and for bachelor tweets the dictionary needed a lot of adjustments. In the base sentiment dictionary words like dope, shark, and damn had negative scores. In the world of bachelor tweets none of these words are necessarily negative, so I added and subtracted words from the base sentiment dictionary to make a custom bachelor dictionary.
Here are a few examples of words and scores that I added:
(‘kween’,1),(‘queen’,1) (‘my girl’, 1) (‘front runner’,1) (‘sociopath’,-1), (‘ho’,-1)(‘giving me life’,1) (‘smh’,-1)(‘da fuq’,-1) (‘bachelorette’,1) (‘yaaaas’,1) (‘yass’,1) (‘trump supporter’, –1)(‘psycho’,-1)
I ran sentiment analysis on all of the tweets with contestant tags, put it in Tableau, and made the below visualization which shows number of tweets and the sentiment breakdown by contestant. Red is negative sentiment, orange is neutral, and green is positive.
I also extracted some popular phrases for some of the most talked about contestants. I didn’t include some sentiments on some contestants because there wasn’t enough consensus, i.e. Vanessa did have negative feedback, but people didn’t give the same negative feedback so there weren’t popular negative phrases associated with her.
Alexis was by far the most talked about contestant, her appearance sparked the Shark or Dolphin debate heard ’round the world. I wouldn’t call her a fan favorite because of the fairly equal amount of positive and negative tweets.
Early front runners seen to be Rachel, Vanessa, and Danielle M. Danielle L, Sarah and Raven also look like early fan favorites. They weren’t as talked about, but they do have a very high proportion of positive feedback and low negative feedback. It’s likely the next bachelorette will come from this group of women, Rachel in particular is getting a lot of bachelorette buzz.
Corinne, Liz, and Josephine seem to be the most controversial with higher negative tweets than positive. Taylor could potentially join that group in the future, her positive tweets barely beat out the negative.
If you have some analysis you would like to see in the future, think my analytics are total crap, or want to know my thoughts about the shark vs dolphin debate, leave a comment or email me at email@example.com
Since I learned that analytics could be applied to television I knew that’s what I wanted to do, but I quickly learned interesting data about television is hard to acquire for a nonprofessional television and analytics enthusiast. A few months ago I got put on a project at work pulling and analyzing tweets for a company around the same time as speculation started about who was going to be the next bachelor. It finally occurred to me that I had access to a huge repository of data about television, social media, twitter in particular.
I wanted to know what people were actually saying about the next bachelor and break it down a little more than the one sided view I was getting on the internet. At that time people were mostly tweeting about Bachelor in Paradise with the #TheBachelor hashtag and with the Twitter API limits if I just used the standard hashtag I would get a very small number of tweets about the next Bachelor. Fortunately Bachelor creator Mike Fleiss started tweeting clues about the next bachelor, so I pulled all of the tweets in response to his clues, did some simple sentiment analysis, and came up with this:
Not exactly earth shattering stuff, I didn’t even standardize the axes, but I had a lot of fun doing it. I currently work in a consulting type role, so the data I analyze is from a variety of industries. I never get the opportunity to be a subject matter expert in any of the data that I work with. I would consider myself a Bachelor Franchise subject matter expert, and it was exciting to actually go into a data set and know what I was looking at right away. I loved it and I wanted to do more of it, this time with bigger data sets and more analysis.