How do I count thee? Let me count the ways?

Miss Google

Miss Google Some time ago I reported that the voice of Google Maps, whom I call Miss Google, seemed to be getting lazy. I facetiously sai...

Sunday, February 11, 2024

Taylor Swift and Data Analysis

Taylor Swift and Data Analysis. by Jerry Tuttle

Who will be the most talked-about celebrity before, during, and after the Super Bowl?

She is an accomplished performer. songwriter, businesswoman, and philanthropist. I think she is very pretty. And those lips!

So what can a data analyst add to everything that has been said about her?

I was curious whether R could identify her lipstick color. I should preface this by saying I have some degree of color-challengedness, although I am not colorblind. I am also aware that you can Google something like "what lipstick shade does taylor swift use" and you will get many replies. But I am more interested in an answer like E41D4F. I do wonder if I could visit a cosmetics store and say, "I'd like to buy a lipstick for my wife. Do you have anything in E41D4F ?"

There are sites that take an image, allow you to hover over a particular point, and the site will attempt to identify the computer color. Here is one: RedKetchup But I want a more R-related approach. A note on computers and colors: A computer represents an image in units called pixels. Each pixel contains a combination of base sixteen numbers for red, green and blue. A base 16 number ranges from 0 through F. Each of red, green and blue is a two-digit base 16 number, so a full number is a six-digit base 16 number. There are 16^ 6 = 16,777,216 possible colors. E41D4F is one of those 16.8 million colors.

There are R packages that will take an image and identify the most frequent colors. I tried this with the image above, and I got unhelpful colors. This is because the image contains the background, her hair, her clothing, and lots of other things unrelated to her lips. If you think about it, the lips are really a small portion of a face anyway. So I need to narrow down the image to her lips.

I plotted the image on a rectangular grid, using the number of columns of the image file as the xlimit width, and the number of rows as the ylimit height. Then by trial and error I manually found the coordinates of a rectangle for the lips. The magick library allows you to crop an image, using this crop format:

<width>x<height>{+-}<xoffset>{+-}<yoffset> The y offset is from the top, not the bottom. The cropped image can be printed.

The package colouR will then identify the most frequent colors. I found it necessary to save the cropped image to my computer and then read it back in because colouR would not accept it otherwise. The getTopCol command will extract the top colors by frequency. I assume it is counting frequency of hex color codes among the pixel elements. Here is a histogram of the result:

Really? I'm disappointed. Although I am color-challenged, this can't be right.

I have tried this with other photos of Taylor. I do get that she wears more than one lipstick color. I have also learned that with 16.8 million colors, perhaps the color is not identical on the entire lip - maybe some of you lipstick aficionados have always known this. All of these attempts have been somewhat unsatisfactory. There are too many colors on the graph that seem absolutely wrong, and no one color seems to really capture her shade, at least as I perceive it. Any suggestions from the R community?

No matter who you root for in the Super Bowl - go Taylor.

Here is my R code:

library(png)
library(ggplot2)
library(grid)
library(colouR)
library(magick)

xpos <- c(0,0,0)
ypos <- c(0,0,0)
df <- data.frame(xpos = xpos, ypos = ypos)

# downloaded from
# https://img.etimg.com/thumb/msid-100921419,width-300,height-225,imgsize-50890,resizemode-75/taylor-swift-mitchell-taebel-from-indiana-arrested-for-stalking-threatening-singer.jpg

img <- "C:/Users/Jerry/Desktop/R_files/taylor/taylor_swift.png"
img <- readPNG(img, native=TRUE)
height <- nrow(img) # 457
width <- ncol(img) # 584
img <- rasterGrob (img, interpolate = TRUE)

# print onto grid
ggplot(data = df,
aes(xpos, ypos)) +
xlim(0, width) + ylim(0, height) +
geom_blank() +
annotation_custom(img, xmin=0, xmax=width, ymin=0, ymax=height)

#############################################
# choose dimensions of subset rectangle

width <- 105
height <- 47
x1 <- 215 # from left
y1 <- 300 # from top

library(magick)
# must read in as magick object
img <- image_read('C:/Users/Jerry/Desktop/R_files/taylor/taylor_swift.png')
# print(img)

# crop format: x{+-}{+-}
cropped_img <- image_crop(img, "105x47+215+300")
print(cropped_img) # lips only
image_write(cropped_img, path = "C:/Users/Jerry/Desktop/R_files/taylor/lips1.png", format = "png")

##############################################

# extract top colors of lips image

top10 <- colouR::getTopCol(path = "C:/Users/Jerry/Desktop/R_files/taylor/lips1.png",
n = 10, avgCols = FALSE, exclude = FALSE)
top10

# plot
ggplot(top10, aes(x = hex, y = freq, , fill = hex)) +
geom_bar(stat = 'identity') +
labs(title="Top 10 colors by frequency") +
xlab("HEX colour code") + ylab("Frequency") +
theme(
legend.position="NULL",
plot.title = element_text(size=15, face="bold"),
axis.title = element_text(size=15, face="bold"),
axis.text.x = element_text(angle = 45, hjust = 1, size=12, face="bold"))

# End
##################################################################################

2 comments:

  1. Hi Jerry
    aesthetic fill interpret hex as label and not a color. To do so you need to as scale_fill_manual(values = top10$hex) and then colorbar will have same color as the variable content

    ReplyDelete
  2. First comment makes sense! Colors look suspiciously similar to default ggplot2 color scale ... Interesting approach for a blog post, though, thanks for writing it up!

    ReplyDelete