More than blue jeans, trucks, and beer?

An exploration of country music lyrics

Data analysis : Darby Schumacher | Data visualization: Ian Jones

When we listen to country music we hear certain themes repeated over and over and over: love for God, love for America, driving trucks, country roads, falling in love, wearing blue jeans, etc. However, are these themes really as prevalent as they seem?

Using data analysis and visualization techniques, we want to identify words that occur more often in country music than in an amalgamation of other popular genres in order to answer the question: what words make up the country music genre?
First, we're going to rank words that are most common in country lyrics. "Baby", for example, appears 49 times for every 10,000 words. Looking only at these words, it seems that "baby" is used in country the most. But what we really care about in our analysis is usage in country compared to usage in all music lyrics.

We want to determine words that make "country" feel "country", that is, words that have high frequency in country and low frequency everywhere else.

A lot of this analysis focuses on comparing lyric occurrence between different categories. First, we'll look at lyric occurrence in country vs. all other genres.
In order to compare the two categories, we first compiled a "country" corpus and an "all other genres" corpus. To create the "country" corpus, we took songs from the Billboard Hot Country charts from December 2013 - November 2018; to create the "all other genres" corpus, we took songs from the Billboard Hot 100 charts (removing any songs that appeared in the Billboard Hot Country charts) during the same time period.

This resulted in a grand total of 3044 songs (about 1.1 million words) that we then gathered from Genius Lyrics.
As you can see, although "baby" occurs in country the most out of these few words, it also occurs frequently in all other genres.

Interestingly, although "trailer" appears much less frequently in both categories, it occurs much more frequently in country music compared to all other genres - this is what we care about.
Now you can see all the words we measured. Dots farther to the right are more popular in country. Dots farther up are more popular in other genres.

Keep scrolling to see some interesting trends we found (don't worry, you'll still be able to play around with the graph).
The "Least Country" words
Odds not in country
Country word count
Other word count
1. b***h
353:1
2
2631
2. ayy
291:1
1
1084
3. f**k
207:1
3
2309
4. ho
88:1
1
328
5. trap
70:1
1
260
6. bounce
47:1
1
174
7. mi
44:1
1
162
8. plug
42:1
1
157
9. boom
41:1
1
151
10. vibe
40:1
1
149

Notice anything about these particular words? Sound like words that are pretty common in hip-hop songs, right? We think this is really interesting: looking into the list of "least country words", it appears that country and hip-hop are pretty much opposites when it comes to common word usage.
This is the one you've all been waiting for. We present the top 10 "country" words.

The "Most Country" words
Odds in country
Country word count
Other word count
1. beer
106:1
143
5
2. southern
96:1
77
3
3. whiskey
58:1
187
12
4. homegrown
37:1
30
3
5. boots
35:1
102
11
6. hometown
34:1
45
5
7. Tennessee
26:1
48
7
8. gravel
23:1
19
3
9. sunset
22:1
53
9
10. Carolina
21:1
23
4

Pretty much what we expected, huh? Take some time to explore the visualization: it's particularly interesting to look at the words that hug the x and y axes, these points correspond to words that are virtually absent one of the corpora (points on the x axis appear very infrequently in the "other genres" corpus, while points on the y axis appear very infrequently in the "country corpus")

What lyrics are "Most Country"?

Occurrences per 10,000 lyrics

After comparing country songs with other genres, we wanted to look more into the country genre in particular. We wondered if there was a difference in lyric usage between male and female country artists.

So we crunched some numbers and came up with some pretty neat findings.
Here's that line plot from earlier, only this time, we're plotting a couple of the more frequent country words (remember 'beer' and 'gravel'?), as well as one not-so-frequent one.

Notice that these points correspond to the words' occurrences in the country corpus in general. Now remember what this looks like...
OK, so what just happened? Now we're looking at the occurrences of these words in lyrics by male artists vs. in lyrics by female artists. Notice that there is a discrepancy between the line plot above and this graph: it seems that although 'beer' and 'gravel' had extremely high odds of appearing in country music compared to other genres, when we look at their occurrences within country in particular, we can see that male artists tend to use the word 'beer' much more than female artists.

Furthermore, even though 'blessed' doesn't appear very frequently in country lyrics, when it does, there is a high likelihood that it was used by a female artist.
Similar to before, words that appear closer to the x axis appear more frequently in lyrics by female artists, and words that appear closer to the y axis appear more frequently in lyrics by male artists.
Before you see the lists of words that are used most frequently by male and female artists, respectively, take some time to explore the graph. We recommend looking up words that appeared very frequently in the country corpus and see how their usage varies between male and female artists.
Here's the list of words that are used most frequently by male country artists compared to female country artists. Interestingly, 'beer' is still on the top (they really like their beer).

The "Most Male" country words
Odds in male
Male word count
Female word count
1. beer
12:1
139
2
2. blame
6:1
72
2
3. body
6:1
70
2
4. turned
6:1
66
2
5. gettin
5:1
64
2
6. loud
5:1
62
2
7. tshirt
5:1
58
2
8. top
5:1
116
4
9. ones
5:1
87
3
10. taste
5:1
55
2
And now, the words used most frequently by female country artists compared to male country artists. If you noticed that the word count is exceptionally low on most of these words, that's because only a small percentage of artists on the Billboard Hot Country charts are female (just 14 percent!).

The "Most Female" country words
Odds in Female
Female word count
Male word count
1. tux
50:1
17
2
2. runaway
35:1
17
2
3. horse
29:1
12
2
4. toy
25:1
15
3
5. mmmm
22:1
17
4
6. queens
19:1
19
5
7. boyfriend
19:1
13
4
8. smokin
18:1
19
6
9. biscuits
18:1
6
2
10. giddy
21:1
6
2

How does lyric usage vary with gender?

Occurrences per 10,000 lyrics

Now that all that analysis is behind us, you probably want to actually listen to some music, huh?

We did one final bit of analysis on these datasets: in order to create a ranking of the "most country" songs, we used a measure called "cosine similarity" to determine which songs most closely resemble the country corpus. From these results, we've compiled our Top 15 chart. But before we show you that, here's the "most country" Hot 100 song:
The Top Non-Country

Most Country Song
Thanks for making it through this journey with us. As promised, here's our Top 15 chart. We hope you enjoy:
The Top Fifteen

Most Country Songs
Methodology Notes
All lyric data was compiled from the last five years of songs that charted on the Billboard Hot 100 and Hot Country charts.

Using the GeniusR package, we scraped the lyrics for each song and created a corpus with the count of lyrics for each word on each chart. We removed all songs from the Hot 100 corpus that also charted on Hot Country. For each word in each corpus, we calculated its occurrence per 10,000 lyrics.

We used Javascript's D3 library to implement our visualizations.