Deeper Look into Hamilton, the Musical

Source: Majestic Theatre

Text Analysis

Rebecca Wycoff

The musical, Hamilton, has many emotions throughout, focusing on war, revolution, love, time, and betrayal. Created by Lin-Manuel Miranda, the musical intertwines the life of Alexander Hamilton, one of America’s Founding Fathers, with a diverse cast that brings historical figures to life through a unique blend of hip-hop, R&B, and traditional show tunes; the narrative unfolds in a way that highlights not only Hamilton’s rise from humble beginnings as an orphan in the Caribbean but also his pivotal role in shaping the nation’s financial systems and political landscape during the Revolutionary War and the early years of the United States. In Act I, Hamilton rises from his humble beginnings to become a key figure in the American Revolution, driven by ambition and determination. In Act II, he faces personal and political challenges as his ambition leads to conflict, scandals, and his downfall. From this, we can hypothesize that Act I will reflect a more positive sentiment, as well as the female characters, compared to the male.

This project will test this hypothesis through sentiment analysis of the musical’s lyrics, exploring average sentiment scores of individual songs, acts, and characters, with particular attention to gender differences and emotional shifts across the narrative. It will also look into the words “Alexander Hamilton,” showing the connection between characters and more about him.

I got my data set from GitHub, found here,

Hamilton Lyrics

The project begins by loading necessary packages, tokenizing the lyrics, filtering out common stop words, and counting word frequencies to identify prominent themes.

Code
library(tidyverse)
library(tidytext)
library(devtools)
library(wordcloud2)
library(plotly)

library(readr)
ham_lyrics <- read_csv("ham_lyrics.csv")

ham_lyrics |> 
  unnest_tokens(word, lines) -> ham_words

ham_words |> 
  anti_join(stop_words) |> 
  count(word,sort = TRUE) |> 
  arrange(desc(n)) 
# A tibble: 2,464 × 2
   word          n
   <chr>     <int>
 1 da           89
 2 wait         81
 3 time         77
 4 hamilton     75
 5 hey          69
 6 burr         63
 7 shot         58
 8 sir          56
 9 alexander    50
10 whoa         42
# ℹ 2,454 more rows

Being a musical, there are a lot of words that do not contribute to the meaning of the show, or overall sentiment. Once musical words were taken out, I created a word cloud using the top 180 words.

Code
ham_words |> 
  filter(!word %in% c('da', 'whoa', 'hey', 'ha', 'em','ya','ooo',
                      'aaa', 'dat', 'uh', 'ooh', 'yo')) |> 
  anti_join(stop_words) |> 
  count(word,sort = TRUE) |> 
  arrange(desc(n)) |> 
  head(180) -> ham_top_words

wordcloud2(ham_top_words, shape = 'star', size = 0.33)

From there, I organized the data by assigning each song a “track” number and an act, as well as giving each speaker a corresponding gender.

Code
ham_words$order <- match(ham_words$title, unique(ham_words$title)) 
colnames(ham_words)[4]<- 'track'

ham_words |> 
  arrange(track) |> 
  inner_join(get_sentiments('afinn')) -> ham_sentiment

ham_sentiment |> 
  mutate(act = ifelse(title %in% c(
    "Alexander Hamilton", "Aaron Burr, Sir", "My Shot", 
    "The Story of Tonight", "The Schuyler Sisters", "Farmer Refuted", "You'll Be Back", "Right Hand Man", 
    "A Winter's Ball", "Helpless", "Satisfied", "The Story of Tonight (Reprise)", 
    "Wait For It", "Stay Alive", "Ten Duel Commandments", "Meet Me Inside", 
    "That Would Be Enough", "Guns and Ships", "History Has Its Eyes On You", 
    "Yorktown (The World Turned Upside Down)", "What Comes Next?", "Dear Theodosia", 
    "Non-Stop"), "Act I", "Act II" )) -> ham_sentiment

ham_sentiment |> 
  mutate(gender = case_when(
    speaker %in% c('ANGELICA', 'DOLLY', 'ELIZA',
                   'MARIA', 'MARTHA', 'PEGGY', 'WOMEN', 'ALL WOMEN',
                   'ENSEMBLE WOMEN', 'FEMALE ENSEMBLE', 'TWO WOMEN', 'FEMALE VOTER 1') ~ "Female",
    speaker %in% c('BURR', 'GEORGE WACKER', 'HAMILTON', 'JAMES REYNOLDS', 'JEFFERSON', 
                   'KING GEORGE', 'LAFAYETTE', 'LAURENS', 'LEE', 'MADISON', 
                   'MULLIGAN', 'PHILIP', 'SEABURY', 'WASHINGTON', 'MEN', 'ENSEMBLE MAN', 'ENSEMBLE MEN', 'ALL MEN',
                   'MALE VOTER 1', 'MALE VOTER 2', 'TWO MEN VOTERS') ~ "Male",
    speaker %in% c('COMPANY', 'VOTERS', 'FULL COMPANY', 'ENSEMBLE', 'FULL COMPANY (EXCEPT HAMILTON)') ~ "Mixed"
  )) -> ham_sentiment

Sentiment by Act and Song

With this, I was now able to dive deeper into the data, starting with the sentiment by act, and then individual song.

Code
ham_sentiment |> 
  group_by(act) |> 
  summarise(avg_sentiment = mean(value)) |> 
  ggplot(aes(reorder(act, avg_sentiment), avg_sentiment, fill = act,
             text = paste( "average sentiment:", avg_sentiment, "<br>",
                         "act:", act))) + geom_col() + coord_flip() +
  labs(title = "Average Sentiment By Act",
       y = "Average Sentiment",
       x = NULL) +   theme(legend.position = "none") -> sent_act
  
ggplotly(sent_act, tooltip = "text")
Code
ham_sentiment |> 
  group_by(title, act) |> 
  summarise(avg_sentiment = mean(value), track = first(track)) |> 
  ggplot(aes(reorder(title, -track), avg_sentiment, fill = act,
                          text = paste("song:", title, "<br>",
                          "average sentiment:", avg_sentiment, "<br>",
                         "act:", act))) + geom_col() + coord_flip() +
  labs(title = "Average Sentiment By Song",
       y = "Average Sentiment",
       x = NULL) +
  theme(axis.text.y = element_text(size = 8)) -> sent_song

ggplotly(sent_song, tooltip = "text", height = 650)

Here, we can see our hypothesis is proven true. With the average sentiment of Act I being higher than Act II. Looking at the songs that showed the highest and lowest sentiment, I was curious as to what words dragged that so much.

In “Best of Wives and Best of Women,” Eliza expresses her love and support for Alexander Hamilton as they face personal and political challenges, capturing the depth of their relationship and her unwavering strength. “Hurricane” portrays Hamilton’s introspective moment as he reflects on his tumultuous life and the consequences of his choices, blending themes of regret and determination with a sense of urgency.

Code
ham_sentiment |> 
  summarise(title, speaker, value, word) |> 
  filter(title %in% "Best of Wives and Best of Women") |> 
  ggplot(aes(word, value, fill = value)) + geom_col() + coord_flip() +
  labs(title = "Words with Sentiment Value \n in 'Best Of Wives and Best of Women'",
       y = "Value",
       x = NULL) +   theme(legend.position = "none") -> word_bestofwives
ggplotly(word_bestofwives)
ham_sentiment |> 
  summarise(title, speaker, value, word) |> 
  filter(title %in% "Hurricane") |>
  ggplot(aes(word, value, fill = value)) + geom_col() + coord_flip() +
  labs(title = "Words with Sentiment Value \n in 'Hurricane'",
       y = "Value",
       x = NULL) +   theme(legend.position = "none") -> word_hurricane
ggplotly(word_hurricane)

The positive words surround the theme of love and caring, while the negative words focus on death and destruction.

Sentiment by Character and Gender

From here, we can look at each character in the show to see who speaks more positively, and negatively. I broke the characters by gender, Male, Female, and Mixed (for the ensemble).

Code
ham_sentiment |> 
  mutate(gender = case_when(
    speaker %in% c('ANGELICA', 'DOLLY', 'ELIZA',
                   'MARIA', 'MARTHA', 'PEGGY', 'WOMEN', 'ALL WOMEN',
                   'ENSEMBLE WOMEN', 'FEMALE ENSEMBLE', 'TWO WOMEN', 'FEMALE VOTER 1') ~ "Female",
    speaker %in% c('BURR', 'GEORGE WACKER', 'HAMILTON', 'JAMES REYNOLDS', 'JEFFERSON', 
                   'KING GEORGE', 'LAFAYETTE', 'LAURENS', 'LEE', 'MADISON', 
                   'MULLIGAN', 'PHILIP', 'SEABURY', 'WASHINGTON', 'MEN', 'ENSEMBLE MAN', 'ENSEMBLE MEN', 'ALL MEN',
                   'MALE VOTER 1', 'MALE VOTER 2', 'TWO MEN VOTERS') ~ "Male",
    speaker %in% c('COMPANY', 'VOTERS', 'FULL COMPANY', 'ENSEMBLE', 'FULL COMPANY (EXCEPT HAMILTON)') ~ "Mixed"
  )) -> ham_sentiment

ham_sentiment |> 
  na.omit() |> 
  group_by(speaker, gender) |> 
  summarise(avg_sentiment = mean(value)) |> 
  ggplot(aes(reorder(speaker, avg_sentiment), avg_sentiment, fill = gender,
             text = paste("character:", speaker, "<br>",
                          "average sentiment:", avg_sentiment, "<br>",
                         "gender:", gender))) + geom_col() + coord_flip() +
  labs(title = "Average Sentiment by Speaker",
       y = "Average Sentiment",
       x = NULL) -> sent_speaker
ggplotly(sent_speaker, tooltip = "text", height = 650)

Here, it is interesting to see how many characters have an overall positive sentiment, and how the two most negative speakers are females. Even though this is true, looking at the gender average, the positive females make up for “WOMEN” and “PEGGY.”

Code
ham_sentiment |>
  na.omit() |> 
  group_by(gender) |> 
  summarise(avg_sentiment = mean(value)) |> 
  ggplot(aes(gender, avg_sentiment, fill = gender, 
             text = paste("gender:", gender, "<br>",
                          "average sentiment:", avg_sentiment))) + geom_col() + coord_flip() +
  labs(title = "Average Sentiment by Gender",
       y = "Average Sentiment",
       x = NULL) + theme(legend.position = "none") ->sent_gender
ggplotly(sent_gender, tooltip = "text")

I wanted to then take a closer look at the Principle Characters in the show using ggplot mapping to compare the characters to each other. In the code below, I find the standard error of each character’s sentiment score by taking the standard deviation and diving it by the number of times they speak - this gives a sense of the variability in sentiment for each speaker.

Code
ham_sentiment |> 
  filter(speaker %in% c('ANGELICA',  'ELIZA',
                        'PEGGY', 'BURR',  'HAMILTON',  'JEFFERSON', 
                        'KING GEORGE', 'LAFAYETTE', 'LAURENS',  'MADISON', 
                        'MULLIGAN', 'PHILIP', 'WASHINGTON', 'MARIA')) |> 
  group_by(speaker) |>
  summarise(avg_sentiment = mean(value),
            se = sd(value)/n()) |>
  ggplot(mapping = aes(y = fct_reorder(speaker, avg_sentiment), avg_sentiment,
                       text = paste("character:", speaker, "<br>",
                          "average sentiment:", avg_sentiment))) +
  geom_pointrange(mapping = aes(
    xmin = avg_sentiment - 2 * se,
    xmax = avg_sentiment + 2 * se
  )) +
  labs(
    title = "Average Sentiment by Principle",
    x = "Average sentiment",
    y = NULL) +
  theme(legend.position = "none") -> avg_sent
ggplotly(avg_sent, tooltip = "text")

Source: PlaybillThe Schuyler sisters—Angelica, Eliza, and Peggy—are central characters in “Hamilton,” each representing distinct traits and strengths. Angelica is witty and fiercely intelligent, serving as a confidante to both her sisters and Hamilton, while Eliza embodies warmth and loyalty, ultimately becoming Hamilton’s devoted wife. Peggy, the youngest, is less developed compared to her sisters. In the chart, we can see that out of the principle characters, Angelica (in pink) speaks the most positive, while Peggy (in yellow) is the most negative.

Code
ham_sentiment |> 
  filter(speaker %in% "ANGELICA") |> 
  ggplot(aes(reorder(word, value), value, fill = value, 
             text = paste("word:", word, "<br>",
                          "value:", value))) + geom_col() + coord_flip() +
  labs(title = "Words with Sentiment Value \n Spoken by Angelica",
       y = "Value",
       x = NULL) -> word_angelica
ggplotly(word_angelica, tooltip = "text", height = 650)
ham_sentiment |> 
  filter(speaker %in% "PEGGY") |> 
  ggplot(aes(reorder(word, value), value, fill = value,
             text = paste("word:", word, "<br>",
                          "value:", value))) + geom_col() + coord_flip() +
  labs(title = "Words with Sentiment Value \n Spoken by Peggy",
       y = "Value",
       x = NULL) -> word_peggy
ggplotly(word_peggy, tooltip = "text")

Peggy’s negative average is brought down heavily by three words of “bad,” “violence,” and “war.” She does not speak a lot, which makes the words she does say impact heavily when calculating this sentiment. Looking at Angelica, who stays alive and active throughout the whole show, her words are more dispersed. Angelica speakers highly when saying words such as “win,” “fun,” “praise,” “happy,” “rich,” “freedom,” and satisfied.” She is more negative when saying words like “leave,” “suffering,” “regret,” and “hunger.”

Alexander Hamilton

Alexander Hamilton is the focus of this production, although most of the musical is narrated by Aaron Burr. I wanted to focus on the phrase and name, Alexander Hamilton, to see different observations.

I began by creating bigrams, to get the phrase “Alexander Hamilton” together and to see the words that came before and after “Hamilton.”

Code
ham_lyrics|> 
  unnest_tokens(bigram, lines, token = 'ngrams', n=2) |> 
  filter(!is.na(bigram)) -> ham_bigrams

ham_bigrams |> 
  separate(bigram, c("word1", "word2"), sep = " ") -> bigrams_seperated

bigrams_seperated |> 
  filter(!word1 %in% stop_words$word) |> 
  filter(!word2 %in% stop_words$word) -> bigrams_filtered
Code
bigrams_filtered |> 
  filter(word1 == "alexander", word2 == "hamilton") |> 
  count(speaker, sort = TRUE) |> 
  ggplot(aes(speaker,n, fill = n, 
             text = paste("character:", speaker, "<br>",
                          "number:", n))) + geom_col() +
  labs(title = "People who say 'Alexander Hamilton'", x = "Speaker", y = "Number of Times Said") -> alexander_ham
ggplotly(alexander_ham, tooltip = "text")

Looking at the count of the phrase “Alexander Hamilton,” we can see different characters’ relationships with him. It is interesting to see that no principle in the production ever says his full name. The repetition also symbolizes how central he is to the story.

This next sections analyze the words that appear before and after “Hamilton.” These can give clues about what qualities or actions are most associated with him:

Code
bigrams_filtered |> 
  filter(word2 == "hamilton") |> 
  count(word1, speaker, sort = TRUE)
bigrams_filtered |> 
  filter(word1 == "hamilton") |> 
  count(word2, speaker, sort = TRUE)
# A tibble: 12 × 3
   word1     speaker        n
   <chr>     <chr>      <int>
 1 alexander COMPANY        8
 2 alexander HAMILTON       6
 3 alexander JEFFERSON      2
 4 alexander WASHINGTON     2
 5 secretary WASHINGTON     2
 6 alexander BURR           1
 7 fires     BURR           1
 8 hires     BURR           1
 9 monsieur  LAFAYETTE      1
10 recess    WASHINGTON     1
11 walk      WASHINGTON     1
12 watched   BURR           1
# A tibble: 10 × 3
   word2     speaker         n
   <chr>     <chr>       <int>
 1 arrived   BURR            1
 2 drew      BURR            1
 3 examine   BURR            1
 4 forgets   JEFFERSON       1
 5 ha        MEN & WOMEN     1
 6 john      ENSEMBLE        1
 7 publishes BURR            1
 8 sit       BURR            1
 9 sits      JEFFERSON       1
10 wrote     BURR            1

We can see from this count event more about the relationships between Hamilton and others. The words before “Hamilton” tell you about his role as secretary. Burr saying “fires” and “hires” suggest moments where Hamilton is taking signification actions, such as hiring or firing individuals. These words indicate pivotal moments where Hamilton’s decisions have serious implications for those around him, further fueling Burr’s anger.

Looking at the works after “Hamilton” show the actions associated with him and how other characters describe him. Burr mentions that Hamilton “arrived,” “drew,” “publishes,” “sit,” and “examine,” reflecting Hamilton’s decisive actions in pivotal moments like the duel, his public disclosures, and his contemplative nature, often narrated by Burr with a sense of rivalry and impending doom. Jefferson says Hamilton “forgets” and “sits,” which may reflect his criticism of Hamilton’s political approach, implying that Hamilton overlooks important values. The ensemble collectively says “ha,” which could be a reaction to a particularly dramatic or emotional moment involving Hamilton, possibly underscoring the tension or humor in a scene.

Conclusion

When looking at the words in Hamilton, you can pick apart each song and character to determine the sentiment of each. Taking a look at the hypothesis, it was shown to be true - Act I sentiment is higher than Act II, and female characters, on average, were more positive than the males. Throughout, it was interesting to see which words stood out. The positive and negative words that stood out are the ones that encapsulate the themes of the musical. There is love, hatred, desire, death, and war, happening in almost every scene, which can be seen through the graphs. The word frequency analysis highlights key phrases and sentiments expressed by characters like Angelica and Peggy, while the bigram analysis shows that “Alexander Hamilton” is frequently referenced, underscoring his central role. Overall, these findings illuminate the emotional landscape of the musical, the distinct voices of its characters, and the interplay between gender and sentiment throughout the narrative.

References:

  1. https://rstudio-pubs-static.s3.amazonaws.com/516633_c5ceb17730f7453fb3422884d55b5144.html

  2. https://info2950.infosci.cornell.edu/tutorials/text-mining-hamilton.html

  3. https://rpubs.com/tbarnett3/972243