Does Roy Kent Drop More F-Bombs When Coaching or Dating?

An investigation into one of Ted Lasso's favorite characters verbal tendencies

Sep 30, 2023

Every Tuesday, the R for data science community publishes a new “Tidy Tuesday” dataset that people can use to produce interesting graphics, models, or just practice their data wrangling skills on.

Well this week’s Tidy Tuesday data came from the hit TV show Ted Lasso where someone happened to have documented the number of times one of the stars of the show, Roy Kent, drops the F-bomb. In addition to this, there were some other metrics about the particular episode to accompany the counts such as whether or not he was dating and/or coaching in the particular episode where the F-Bombs were counted.

The inspiration for this writeup came from Julia Silge’s recent YouTube video where she demonstrated how bootstrap resampling can be used to help produce more informed statistics and confidence intervals, particularly when sample sizes are small. In the below code, I will show my own implementation of this same approach using bootstrapping and Poisson regression.

First, I’ll load the data and do some minor variable name cleanup.

data = Import[FileNameJoin[{NotebookDirectory[], "Roy_Kent.csv"}]] //
   	Function[Function[row, Thread[First[#] -> row]] /@ Rest[#]] //
  	Map[Association];

data = data //
   	keyReplaceAll[{"Dating_flag" -> "DatingQ", 
    "Coaching_flag" -> "CoachingQ", "F_count_RK" -> "FCount"}];

Once we have the data, we see that only 34 rows exist and that’s because there have only been 34 episodes of Ted Lasso so far.

If we start to look at the counts of F-Bombs, we can get an idea of the potential distributions in three scenarios:

Overall
When Roy is coaching
When Roy is dating

Looking at these distributions, we start to see potential evidence that the counts may be higher when Roy starts coaching. But given the low sample size, how sure can we actually be that this is “real”?

Histogram[
 GroupBy[data, 
  Function[
    Which[#DatingQ == "Yes", "Dating", #CoachingQ == "Yes", 
     "Coaching", True, "Overall"]] -> Function[#FCount]], {1}, 
 ChartLegends -> Automatic, Frame -> {{True, False}, {True, False}}, 
 FrameLabel -> {"Number of F-Bombs", "Count"}]

This is where bootstrap resampling comes in. Bootstrap resampling is a statistical technique used to estimate the uncertainty or variability of a statistic or model parameter by repeatedly sampling from your data with replacement. It's a relatively simple concept with powerful applications.

In this case, we’ll combine bootstrap resampling with Poisson regression to help us understand the effects of the F-Bomb count when controlling for dating and coaching. Poisson regression should be used here instead of linear regression because technically we are dealing with discrete values in terms of the F-Bomb counts.

SeedRandom[1234];
{datingYes, coachingYes} = Transpose@Table[
    	fit = 
     GeneralizedLinearModelFit[
      RandomChoice[
       Values[data[[All, {"DatingQ", "CoachingQ", "FCount"}]]], 
       Length[data]], {dating, coaching}, {dating, coaching}, 
      ExponentialFamily -> "Poisson", 
      NominalVariables -> {dating, coaching}];
    fit["ParameterConfidenceIntervalTableEntries"][[{2, 3}, 1]]
    , 1000];

Now, by looking at our distributions of the estimate for F-Bomb count when Roy is dating and when he is coaching, we can start to see a very clear (and more interpretable) difference.

The estimate for when Roy is dating appears to be centered around 0. In fact, if we take the 2.5 and 97.5 percentiles, we can produce a 95% confidence interval for this estimate and because it is found to overlap 0, we know that it will NOT be statistically significant. This of course means that the number of times Roy drops the F-Bombs is not affected by whether or not he’s dating.

In contrast, looking at the estimate for when Roy is coaching, we see a very different behavior. The values are nearly all positive. Taking that same 2.5 and 97.5 percentile and creating a 95% confidence interval, we will see that 0 is not contained in the interval, therefore we know that the estimate will be statistically significant.

This means that when Roy is coaching, he is more likely to drop more F-Bombs.

If you’re a fan of the show, this shouldn’t really come as a surprise, but interesting nonetheless.

Data and Discipline

Discussion about this post