Online College Math Teacher

Sunday, December 3, 2023

Ten Lords-a-Leaping

Just what is a lord-a-leaping? Well, what is a lord? A lord is a title of nobility, usually inherited, that exists in the UK and other countries. And those lords like to leap, especially during the twelve days of Christmas.

The song the Twelve Days of Christmas is a well-known Christmas song, whose earliest known publication was in London in 1780. There are various versions of the lyrics, various melodies, and meanings of the gifts. As usual, this is all nicely summarized in Wikipedia https://en.wikipedia.org/wiki/The_Twelve_Days_of_Christmas_(song) .

PNC Bank, one of the largest banks in the US, has been calculating the prices of the twelve gifts given by my true love since 1984, and has trademarked its PNC Christmas Price Index ® . Two senior executives at PNC calculate the prices, and many of the details are available at https://www.pnc.com/en/about-pnc/topics/pnc-christmas-price-index.html#about , especially in their FAQ. In particular, they note that the price of services has generally increased while the price of goods has slowed. The price index is a humorous proxy for the general cost of inflation.

On day one there was 1 gift (the partridge). On day two there were 3 gifts (2 doves + 1 partridge). On day three there were 6 gifts (3 hens + 2 doves + 1 partridge). On day twelve there were 78 gifts, and 78 is the sum of the first 12 natural numbers, whose general formula Σn = n(n+1)/2 was known by Gauss in the 1700’s.

The cumulative number of gifts is 1 + 3 + 6 + … + 78, whose sum is 364. (One fewer than the number of days in a year. Coincidence?) Each of these numbers is called a Triangular number T_i , and the general formula of their sum is Σ T_i = n(n+1)(n+2)/6.

The PNC Christmas Price Index ®, or the Total Cost of Christmas reflects the total cost of the 78 gifts: one set of each of the gifts. For 2023 that cost is $46,729.86, versus $45,523.33 in 2022, a change of + 2.7%. The prior year’s change was 10.5%. The largest individual item in the index is not the five gold rings as I had thought ($1,245), but rather those leaping lords ($14,539, up 4.0%), followed by the swimming swans ($13,125 and unchanged for many years).

PNC also calculates the True Cost of Christmas, which is the cost of 364 gifts. For 2023 that cost is $201,972.66, a change of 2.5% over a year ago.

And PNC calculates a core index excluding the swans, which some time ago had been the most volatile item, and also an e-commerce index buying all items online.

The overall Bureau of Labor Statistics CPI for All Urban Consumers (CPI-U) increased 3.2% for twelve months ending October 2023. October is the closest month for CPI-U to the PNC data. CPI-U of course is based on a broad market basket of goods including food, energy, medical care, housing, transportation, etc., which are not the gifts given in the song, but CPI-U is a common measure of inflation. The PNC index is based on a very specific twelve items and is heavily weighted toward the lords and the swans.

The PNC website contains detailed information on its calculations, but it does not contain historical information on CPI-U. I used twelve-month October historical CPI-U percent changes from https://www.bls.gov/regions/mid-atlantic/data/consumerpriceindexhistorical_us_table.htm . Then I graphed the percentage changes of the PNC Christmas Price Index® , the PNC True Cost of Christmas index, and the CPI.

With such a small number of items, the two PNC indices fluctuate drastically. 2014 reflects a one-time increase in the cost of the swans. 2020 was the unusual year during the pandemic when some of the gifts (including the lords!) were unavailable and so the cost that year was zero. The two PNC indices were fairly close to CPI-U for five years starting in 2015 and again for 2022 and 2023. Maybe these PNC indices are pretty good.

PNC uses the Philadelphia Ballet to calculate the cost of the lords-a-leaping.

Here is the R code I used:

library(readxl)
library(ggplot2)
df1 <- read_excel("C:/Users/Jerry/Desktop/R_files/xmas.xlsx", sheet = 1)
df2 <- read_excel("C:/Users/Jerry/Desktop/R_files/xmas.xlsx", sheet = 2)
cpi <- round(df2$Percent_change,3)
df1 <- df1[c(3:13)]
year <- as.numeric(colnames(df1)[2:11])
total_cost_dollars <- colSums(df1)
total_cost_index <- vector()
true_cost_dollars <- vector()
true_cost_index <- vector()
for(i in 1:length(total_cost_dollars)){
true_cost_dollars[i] <- 12*df1[1,i] + 11*df1[2,i] + 10*df1[3,i] + 9*df1[4,i] + 8*df1[5,i] +
7*df1[6,i] + 6*df1[7,i] + 5*df1[8,i] + 4*df1[9,i] + 3*df1[10,i] + 2*df1[11,i] + 1*df1[12,i]
}
true_cost_dollars <- unlist(true_cost_dollars)
for(i in 1:length(total_cost_dollars) - 1){
total_cost_index[i] <- round(100*(total_cost_dollars[i+1]/total_cost_dollars[i] - 1),1)
true_cost_index[i] <- round(100*(true_cost_dollars[i+1]/true_cost_dollars[i] - 1),1)
}
df <- data.frame(cbind(year, total_cost_index, true_cost_index, cpi))

colors <- c("total_cost_index" = "red", "true_cost_index" = "navy", "cpi" = "grey")
ggplot(df, aes(x=year)) +
geom_line(aes(y=total_cost_index, color="total_cost_index")) +
geom_line(aes(y=true_cost_index, color="true_cost_index"))
geom_line(aes(y=cpi, color="cpi")) +
labs(title = "12 Days of Christmas", x = "Year", y = "Percent change", color = "Legend") +
scale_color_manual(values = colors) +
# scale_y_continuous(labels = scales::percent_format(scale = 1, prefix = "", suffix = "%")) +
theme(
legend.position="right",
plot.title = element_text(size=15, face="bold"),
axis.title = element_text(size=15, face="bold"),
axis.text = element_text(size=15, face="bold"),
legend.title = element_text(size=15, face="bold"),
legend.text = element_text(size=15, face="bold"))

Tuesday, August 8, 2023

Black hole word numbers in multiple languages

      A few months ago I used R to investigate black hole word numbers in the English language. A friend suggested there are probably black hole word numbers in other languages. There are only three other languages that I have a nodding acquaintance of (spoken languages, not computer languages), and all three do have such black holes. Here is the result of my research. My R code for all four languages is at the end.

      First, a review with English words. Every English word gets you to the same black hole number as you count the number of letters in the word and then successively count the number of letters in the resulting word number. That black hole is at four. Once you get to four, you are stuck and can't get out. Here is an example.

The word hippopotomonstrosesquippedaliophobia (fear of long words) has 36 letters.
The word thirtysix has nine letters.
The word nine has four letters.
The word four has four letters.

      Here are some more English words, with their word number length counting sequence. I found a long list of English words, so this list is truly a random sample. (For the other languages, I could not find a nice long list, so the words are not random but rather a convenience sample.)

miscognizable thirteen eight five four
harvestry nine four
geopolitist eleven six three five four
jessed six three five four
pardonee eight five four
whitfield nine four
ghazal six three five four
morphophonemically eighteen eight five four
calonectria eleven six three five four
conceptiveness fourteen eight five four

      Every German word also gets you to the same black hole number: vier.

handschuh neun vier
flugzeug acht vier
staubsauger elf drei vier
waschmaschine dreizehn acht vier
haustürschlüssel sechszehn neun vier
lächeln sieben sechs funf vier
geutscher neun vier
danke funf vier
morgen sechs funf vier
tee drei vier
torschlusspanik funfzehn acht vier

      In Hebrew, where there is the complication that letters are written from right to left, there are two black hole numbers: ארבע and שלש . Below, the rightmost word is the word whose letters are first counted, and the subsequent counting is from right to left.

פירת ארבע
אורתודוקסית אחדעשר שש שתים ארבע
קומוניסטית עשר שלש
ומועמדויות עשר שלש
עיתונות שבע שלש
ארוך ארבע
שלה שלש
כך שתים ארבע
לראות חמש שלש
להסתכל שש שתים ארבע

      In Spanish there is a black hole at cinco. However, unlike the previous languages that had a black hole where you are stuck and can't get out, Spanish also has some words where you oscillate back and forth between two numbers but never really fall into a hole. These two Spanish numbers are seis and cuatro.

montaña ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
Iglesia siete cinco
computadora once cuatro seis cuatro seis cuatro seis cuatro seis cuatro
oficina siete cinco
preguntar nueve cinco
entender ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
hermosa siete cinco
asombroso nueve cinco
perezoso ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
somnoliento doce cuatro seis cuatro seis cuatro seis cuatro seis cuatro
saludable nueve cinco

      This is reminiscent of some numerical algorithms that oscillate and never converge. For example, if f(x) = x³ -2*x + 2 and x₀ = 1, which has a single root at approximately -1.769, Newton-Raphson approximations will oscillate between x = 0 and x = 1, and f(x) = 1 and f(x) = 2 and never find the root. You can see from the first graph that the oscillation occurs at the wrong section of the curve.

      If you think about it, the trick to why these black holes exist is not too difficult, and the same trick works in these four languages. I'm sure there are other languages that have no such black hole.

      Here is the R code I used:

####################################################
# Try hippopotomonstrosesquippedaliophobia (fear of long words) which has 36 letters.

library(english)
x <- "hippopotomonstrosesquippedaliophobia"
y <- -99     # Initialize y
while(y != "four"){
y <- nchar(x)
y <- as.character(english(y))     # Spell out an integer as a word
if (grepl('-', y, fixed = TRUE)) y <- gsub('-', '', y)     # delete hyphen
print(c(x,y))
x <- y
}

####################################################
# Try ten random English words

library(english)
library(wordcloud)
set.seed(123)
words <- read.table("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt")
original <- sample(words$V1, 10, replace = FALSE)
# original <- c(
"miscognizable","harvestry","geopolitist","jessed","pardonee","whitfield","ghazal","morphophonemically",
"calonectria","conceptiveness")
wordcloud(word=original, random.order = TRUE, colors=c("red","blue","darkgreen","brown","black","red",
"blue","darkgreen","navy","black"), ordered.colors=TRUE,, scale=c(3,7))
rm(words)     # free up memory
for (i in 1:10){
x <- original
y <- vector()
y[1] <- "dummy"     # Initialize y
for (j in 1:100){
c <- nchar(x[i])
c <- as.character(english(c))     # Spell out an integer as a word
if (grepl('-', c, fixed = TRUE)) y[j] <- gsub('-', '', c) else y[j] <- c     # delete hyphen
x[i] <- y[j]
if (y[j] == "four") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

####################################################
# Try 10 Hebrew words

original <- c("פירת", "אורתודוקסית", "קומוניסטית", "ומועמדויות", "עיתונות", "ארוך", "שלה", "כך", "לראות", "להסתכל" )

numbs <-
c("אחת", "שתים", "שלש", "ארבע", "חמש", "שש", "שבע", "שמונה", "תשע", "עשר","אחד עשר","שתיים עשרה","שלוש עשרה","ארבעה עשר","חמש עשרה","שש עשרה","שבע עשרה","שמונה עשרה","תשע עשרה","עשרים")
for (i in 1:10){
x <- original
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
if (grepl(' ', y[j], fixed = TRUE)) y[j] <- gsub(' ', '', y[j])     # delete space
x[i] <- y[j]
if (y[j] == "ארבע" | y[j] == "שלש") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

####################################################
# Try 11 Spanish words; however, infinite oscillation without convergence at cuatro and seis

x <- c("montaña","Iglesia","computadora","oficina","preguntar","entender","hermosa","asombroso","perezoso"," somnoliento","saludable")
numbs <- c(
"uno", "dos", "tres", "cuatro", "cinco", "seis", "siete", "ocho",
"nueve", "diez", "once", "doce", "trece", "catorce", "quince",
"dieciséis", "diecisiete", "dieciocho", "diecinueve", "veinte")
original <- x
for (i in 1:11){
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
x[i] <- y[j]
if (y[j] == "cinco") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

####################################################
# Try 11 German words

x <- c("handschuh","flugzeug","staubsauger","waschmaschine","haustürschlüssel","lächeln","geutscher", "danke", "morgen","tee","torschlusspanik")
numbs <- c(
"eins","zwei","drei","vier","funf","sechs","sieben","acht","neun","zehn",
"elf","zwolf","dreizehn","vierzehn","funfzehn","sechszehn","siebzehn",
"achtzehn","neunzehn"," zwanzig")
original <- x
for (i in 1:11){
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
x[i] <- y[j]
if (y[j] == "vier") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

####################################################
# Newton-Raphson: x_(n+1) = x_n - f(x_n) / f ' (x_n)

# f(x) = x^3 -2*x + 2
# f ' (x) = 3*(x^2) - 2

par(mfrow = c(1, 2))

# quick plot to choose initial value
x<- seq(from=-5, to=5, .001)
y <- x^3 - 2*x + 2
plot(x,y, main="f(x) = x^3 -2*x + 2", xlab="x", ylab="y", col="red", ylim=c(-2,4), cex.main = 3)
axis(side = 1, font = 2, cex.axis = 2)
axis(side = 2, font = 2, cex.axis = 2)
abline(h=0)

# Newton-Raphson
x <- vector()
f <- vector()
x_new <- 1     # initial guess
for (n in 1:10){
x[n] <- x_new
f[n] <- (x[n])^3 - 2*x[n] + 2
fprime <- 3 * (x[n])^2 -2     # manual derivative calculation
x_new <- x[n] - f[n]/fprime
if ( (abs(x[n] - x_new)/x[n]) < .00005 ){break}
}

df <- data.frame(cbind(x,f))
df <- head(df, 10)
df

plot(df$x, df$f, pch = 16, cex = 2, main="Sequence of N-R points", xlab="x", ylab="y", cex.main = 3)
for (i in 1:nrow(df)){
arrows(x0 = x[i], y0 = f[i], x1 = x[i+1], y1 = f[i+1], col="blue")
}
axis(side = 1, font = 2, cex.axis = 2)
axis(side = 2, font = 2, cex.axis = 2)
abline(h=0)

dev.off()     # reset par

Wednesday, July 26, 2023

Barbie and math

      With the Barbie movie coming out, I think it's time to review Barbie and math.

      If the original Barbie doll were an actual woman, she would be 5'9" tall, have a 39-inch bust, 18-inch waist, 33-inch hips, a size 3 shoe, a weight of 110 pounds, a BMI of 16.24, and perhaps would be anorexic. (Click anorexic for the source of those measurements.) Obviously these measurements are unrealistic and send a harmful message to children. But there's a lot more to Barbie and math.

      In 1992 the infamous talking Barbie included the phrase "Math class is tough!" which was bad enough, but it was ironically misreported by the press as "Math is hard." Neither is the message we want to give to children. This immediately drew protests from the National Council of Teachers of Mathematics, the American Association of University Women, and others. Mattel removed the phrase from future dolls, and the original is now a collector's item.

      Perhaps embarrassed by this experience, subsequently Barbie put her life on the line in a high school project Barbie Bungie to help students learn algebra, physics, and statistics. This is a hands-on experiment attaching Barbie to a thick rubber band, dropping her from a height, measuring the distance of a jump and the time to descend, and then estimating the line of best fit. NCTM has a suggested lesson on this.

      But I want to spend the remainder of this article talking about Barbie and the mathematics of her facial beauty.

      The ancient Greeks discovered a particular number called the Golden Ratio, denoted by Greek letter Φ (phi), that has many interesting mathematical properties, apppears in some patterns of nature, and is considered by many to be asthetically pleasing. The Golden Ratio results from finding the point on a line segment that splits the segement into two smaller segments with lengths a and b, such that (a + b)/a = a/b.

      That ratio a/b is the Golden Ratio, Φ. With a little algebra, Φ = (1 + √5)/2 , which is an irrational number so it has an infinite non-repeating decimal, and rounded to three decimal places is 1.618.

      Renaissance artists, plastic surgeons, and makeup artists are among those who use Golden Ratios in various ways with faces to create ideally proportioned faces. Gary Meisner has wriiren extensively on the Golden Ratio, and he believes there are over 20 different ways that the Golden Ratio shows up in human faces and that “the Golden Ratio is also found very commonly in beautiful models of today across all ethnic groups. Biostatistican professor Dr. Kendra Schmid and her colleagues performed various measures of many faces. They began with 17 potential Golden Ratios, and they decided only six of these ratios were predictors of facial attractiveness. See Schmid.

      This takes us to Barbie. I attempted to measure these six ratios on a picture of Barbie (the doll, not the actress). There are many pictures of Barbie, she does enjoy experimenting with different hairstyles, and I had to find one with a hairstyle that gave me the best chance of measuring her from her hairline and also between her ears. The measurement is not exact for many reasons, and because we are using a two-dimensional photo of a three-dimensional object there is certainly some loss of accuracy. Nevertheless, here are the results:

Face length / Face Width	1.07560
Mouth width / Interocular distance	1.93750
Mouth width / Nose width	1.97872
Lips to chin / Interocular	1.54167
Lips to chin / Nose width	1.574447
Ear length / Nose width	1.57447
Average	1.61374
% Deviation from Φ	- 0.27%

      I have repeated this measurement process with celebrity faces that I think most people would consider attractive. (This is the sort of thing I would do.) Many celebrities have close phi-ratios such as Scarlett Johansson, Ryan Gosling, Brad Pitt, and Lupita Nyong'o, but none are as close to Φ as Barbie. Some celebrity faces that I think most people would consider attractive did not score well, but possibly this is due to the measurement difficulties I discussed above.

      However, I think the conclusion is clear: Barbie is an ideal beauty, using the Golden Ratio as a standard. But as they say, beauty is in the Phi of the beholder.

R Programming Notes:
      I attempted doing the facial measurements in R. A tip of the hat to @technocrat who helped me with some of the code. With his help, I was able to read the Barbie graphic image into R and add the image onto a ggplot with coordinate axes. I then attempted to find the coordinates of the line segments corresponding to the 6 ratios and to calculate the ratios of the appropriate line segments. See the image below with the line segments. However, drawing these segments and finding the coordinates turned out to be too crude, and the results were unreliable. I include the code below as a reference for superimposing a graphic onto a ggplot. However, I redid the measurements with more precise software using Gary Meisner's software PhiMatrix , and the results in the table above are based on that software. Nevertheless, here is the R code I used:

library(ggplot2)
library(magick)
library(grid)

# Read the barbie image
barbie <- image_read("barbie.jpg")
barbie <- image_scale(barbie, "300")

# Create a data frame for the line segments
line_data <- data.frame(
x1 = c(100, 150, 140, 138, 210, 138, 150),
y1 = c(290, 355, 302, 260, 280, 260, 275),
x2 = c(210, 150, 167, 170, 210, 138, 155),
y2 = c(290, 235, 302, 260, 310, 235, 275)
)
rownames(line_data) <- c("Face_width","Face_length","Interocular", "Mouth_width", "Lips_2_chin", "Ear_length", "Nose_width" )

# Create a ggplot
p <- ggplot() +
geom_blank() +
theme_minimal() +
theme(
plot.background = element_blank(),
panel.grid = element_blank()
) +
coord_fixed(xlim = c(0, 300), ylim = c(0, 606)) +
xlab("") +
ylab("") +
scale_x_continuous(breaks = seq(0, 300, by = 50)) +
scale_y_continuous(breaks = seq(0, 606, by = 50)) +
geom_hline(yintercept = seq(0, 606, by = 50), linetype = "dotted", color = "gray") +
geom_vline(xintercept = seq(0, 300, by = 50), linetype = "dotted", color = "gray")

# Convert the barbie image to a raster object
barbie_raster <- as.raster(barbie)

# Add the barbie image to the ggplot2 plot
p <- p +
annotation_custom(
grob = rasterGrob(barbie_raster),
xmin = 0, xmax = 300,
ymin = 0, ymax = 606
)

# Add the lines to the plot
p <- p +
geom_segment(
data = line_data,
aes(x = x1, y = y1, xend = x2, yend = y2),
color = c("red", "blue", "black", "red", "navy", "black","navy"),
size = 1.5
)
# Display the plot
print(p)

rownames(line_data) <- c("Face_width","Face_length","Interocular", "Mouth_width", "Lips_2_chin", "Ear_length", "Nose_width" )

d <- vector()
d <- sqrt((line_data$x1 - line_data$x2)^2 + (line_data$y1 - line_data$y2)^2)

Face_width <- d[1]
Face_length <- d[2]
Interocular <- d[3]
Mouth_width <- d[4]
Lips_2_chin <- d[5]
Ear_length <- d[6]
Nose_width <- d[7]
r <- vector()
r[1] <- Face_length / Face_width
r[2] <- Mouth_width / Interocular
r[3] <- Mouth_width / Nose_width
r[4] <- Lips_2_chin / Interocular
r[5] <- Lips_2_chin / Nose_width
r[6] <- Ear_length / Nose_width
m <- mean(r)
phi <- (1 + sqrt(5))/2
percent_deviation <- (m - phi)/phi

Saturday, July 22, 2023

Happy Pi Approximation Day

Many people know March 14 is celebrated as Pi Day because 3, 1, and 4 are the first three significant digits of π (using the month, day date format). I just learned that July 22 is celebrated as Pi Approximation Day (using the day/month date format) because 22/7 is a common approximation of π .

π Is defined as the ratio of a circle’s circumference to its diameter. π Is an irrational number (it cannot be expressed as the ratio of two integers), and it has an infinite number of non-repeating digits. Approximations of π date back to ancient civilizations and continue today as people compete to calculate π to billions of decimal places on supercomputers.

The 22/7 approximation only matches π to the second digit after the decimal place, 3.14, and 22/7 is greater than π, a fact known by Archimedes. The error in the approximation is only about 0.04%, which is close enough for most of us.

People also compete in the number of decimal places they can recite by memory such as using mnemonic techniques with words, where the length of each word represents a digit of π . There are many creative π mnemonics , but I am content to remember the 15 word "How I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."

A quite impressive feat is Apu from the Simpsons who claimed to be able to recite 40,000 digits of π, and proved it by correctly stating that the 40,000th digit is 1. Apu

The R computer language carries 15 or 16 digits. That seems like enough. A NASA engineer says he can’t think of a practical application that would require more than 15 digits of π . NASA

π appears in many areas of math besides geometry and trigonometry. It is hidden away in statistics where the probability density function formula of the normal curve has a √(2π) term in the denominator to get the integral equal to 1, and elsewhere in other branches of math.

Happy Pi Approximation Day.

Here is some R code:

library(dplyr)
library(tidytext)

# compare 15 digits of 22/7 to pi
print(22/7, digits=15)
print(pi, digits=15)

# count word length & number of words in this mnemonic
textfile <- c("How I need a drink",
"alcoholic of course",
"after the heavy lectures involving quantum mechanics.")

df<-data.frame(line=1:length(textfile), text=textfile)
df_words <- df %>% unnest_tokens(word, text) %>% mutate(word_length = nchar(word))
df_words
n <- nrow(df_words)
cat("Number of words: ", n)

Friday, April 7, 2023

A more interesting pictorial numerical puzzle

I am getting tired of these little pictorial numerical puzzles with the four equations, like one where three chickens equals 60, one chicken plus two plates of two eggs per plate equals 26, and so on, until the final equation is to evaluate some mathematical expression involving chickens, eggs, and bananas.

The solution generally requires that you remember the PEMDAS (in the US, or BODMAS elsewhere) rules for order of operations especially that multiplication takes precedence over addition, and also that you carefully count the number of eggs and number of bananas. I get 36.

OK, let me try to create a more interesting pictorial puzzle.

Mathematicians agree on the PEMDAS rules, although there are many situations that PEMDAS doesn't handle. Perhaps the most common is the unary minus operator as in -3². It is unary because unlike subtraction that has two operands, the unary operator only has one. I think mathematicians would like to see the unary operator as changing the sign of the argument, so that -3² equals -9, although some software, most notably Excel, merrily calculate this as +9.

I don't believe there is a single authority for all the order of operations cases. For example, Excel, Google Search, and Wolfram Alpha do not always agree. I bet there are some pretty smart people in those companies.

Nowadays I am doing my fun calculations in the R computer language, so for the remainder of this post I will require R as the authority.

So here is my attempt at a more interesting problem, but remember, you have to use the order of operation precedence rules of R: (Let me add the link to the first item: https://www.facebook.com/watch/?v=10158293605695705 )

Do you want to try it before I reveal the R code?

The R code is:

apple <- 1
banana <- 2
kiwi <- 3
lemon <- 4
peach <- banana + lemon
pear <- banana^banana^kiwi # 2^(2^3) = 256
pineapple <- (pear - banana) %% kiwi^2 * lemon # (254 %% 9) * 4 = 2 * 4 = 8
strawberry <- pineapple / peach * peach # 8; no obelus in R
kiwi <- c(lemon, pineapple, strawberry)
watermelon <- kiwi[kiwi == lemon | kiwi == pineapple & kiwi == strawberry]
# watermelon <- (lemon V pineapple ∧ strawberry) # 4 V (8 & 8) = 4, 8

The ordering rules of R include:

Modular arithmetic is at the same level as multiplication: a mod b * c is (a mod b) *c
The obelus does not appear in R but is just a division symbol: a ÷ b * c = (a / b) * c
Repeated exponentiation goes right to left: a ^ b ^ c = a^(b^c); lots of disagreement outside R on this one
Logical AND preceds logical OR

Monday, March 27, 2023

There's a black hole in the number line

The government doesn't want you to know about this, but I have discovered it and I will share this with a few close friends: There is a black hole in the number line, and it's at number 4. Every word in the English language will eventually fall into it and can't get out of it.

As an example, take the word mathematical.

Mathematical has twelve letters.
Twelve has six letters.
Six has three letters.
Three has five letters.
Five has four letters.
Four has four letters.
Four has four letters, and now we entered this black hole at 4, and we can't get out of it!

Try a few more words. Try words as long as you like. Try morphophonemically, which has 18 letters. I have done exhaustive research on this with R, and you will find every English word eventually falls into the black hole at 4 and can't get out. At this rate, there will be no words left!

Of course, this is an April Fool's Day prank.

Did you figure it out?

Here is some R code to test the word mathematical.

# Try word mathematical
library(broman)
x <- "mathematical"
y <- -99 # Initialize y
while(y != "four"){
y <- nchar(x)
y <- broman::spell_out(y, max_value = 20) # Spell out an integer as a word
print(c(x,y))
x <- y
}

Here is some R code to test ten random words.

# Try ten random words
set.seed(123)
words <- read.table("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt")
original <- sample(words$V1, 10, replace = FALSE)
x <- original
rm(words) # free up memory
for (i in 1:10){
y <- vector()
y[1] <- "dummy"
for (j in 1:100){
c <- nchar(x[i])
y[j] <- spell_out(c, max_value = 20)
x[i] <- y[j]
if (y[j] == "four") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

Here is the code for the number line:

library(ggplot2)
df <- data.frame(x = c(1,2,3,4,5,6,7,8,9,10),
y = rep(0,10),
group = c("A","A","A","B","A","A","A","A","A","A"))
ggplot(df, aes(x = x, y = y)) +
geom_point(aes(color = group, size = ifelse(x == 4, 15, 15))) +
geom_hline(yintercept = 0, linetype = 1, color = "lightblue", size = 1) +
scale_x_continuous(limits = c(0, 11), expand = c(0, 0),
breaks = NULL, minor_breaks = NULL) +
scale_y_continuous(limits = c(-0.2, 0.2), expand = c(0, 0),
breaks = NULL, minor_breaks = NULL) +
scale_color_manual(values = c("red", "black")) +
ggtitle("THE BLACK HOLE AT NUMBER 4") +
theme_void() +
theme(legend.position = "none",
axis.line = element_blank(),
axis.text = element_blank(),
plot.title = element_text(color="black", size=14, face="bold")) +
geom_text(aes(x = x, y = -0.1, label = x), size = 5)

Saturday, February 25, 2023

These drinking glasses are too short!

Some of my reinsurance and math teacher friends may remember that when I am out of town and having an adult beverage with friends, I have been known to stare at the drinking glass and say something like, "I don't mean to be rude, but the glasses are certainly short here. They are much shorter than what we have back home. In fact, they are so short, that I think that the circumference of the top of the glass is larger than the height."

Then there is generally a long pause as the group considers this. The reinsurance group may require some reminder of what circumference means.

Either group (unless they have heard this before, or unless they can guess that this is a setup) will likely disagree with me. I will reply that I am pretty sure about this, and I am willing to bet a dollar.

How do you measure this in a bar or restaurant? I use a paper or cloth napkin to measure the circumference from one end of the napkin to somewhere in the middle of the napkin, and then I use that length to compare to the height.

I have done this enough times so that I am nearly always right. Try it with your own drinking glasses. The only time it consistently fails is with champagne glasses.

Recently it occurred to me that there must be a website with a wide variety of glasses and their measurements, and I found Dimensions.com, https://www.dimensions.com . Dimensions.com is a database of drawings with standard measurements. Measurements are based on industry standards and averages and may differ among manufacturers and regions. Here is a sample of glasses with their images and measurements which are used here with permission, plus my calculations in the last three columns of the table. Volumes are in ounces, heights and diameters are in centimeters. The source is https://www.dimensions.com/collection/drinking-glasses and https://www.dimensions.com/collection/wine-glasses.

Here is some R code to do the calculations and to draw the above graph. Note that pi (lower case) is an inbuilt R constant whose value is approximately 3.141593. (Yes, I am well aware that π is an infinite, non-repeating decimal, and I believe R carries 16 decimal digits, but that is beyond the scope of this article.)

df <- data.frame(glass = c("Kalina10", "Pokal22", "Chardonnay", "XL Oversized","Cordial", "Shooter", "Champagne"),
volume = c(10, 22, 12.3, 25.36, 1.5, 2, 9),
height = c(11, 17.75, 19.8, 22.9, 15.9, 10.5, 23.5),
diameter = c(8, 9.5, 7.9, 10.8, 5.1, 4.13, 6.35))
df$circumference <- round(pi * df$diameter, 1)
df$larger <- ifelse(df$circumference > df$height, "Circumference", "Height")
df$c_to_h <- round(df$circumference / df$height, 1)
df

library(ggplot2)
ggplot(df, aes(x=factor(glass, level = c("Kalina10", "Pokal22", "Chardonnay", "XL Oversize","Cordial", "Shooter", "Champagne")), y=c_to_h, fill = glass, color="black")) +
geom_col(width = 1, position = position_dodge(1)) +
geom_hline(yintercept=1) +
ggtitle("Ratio of Circumference to Height by Glass") + xlab("Glass") + ylab("Ratio") +
theme(plot.title = element_text(face="bold", size=12)) +
theme(axis.text.x = element_text(face="bold", size=12)) +
theme(axis.text.y = element_text(size=12, face="bold")) +
theme(legend.position="none") +
scale_fill_manual("glass", values=c("red", "yellow", "blue", "green", "grey", "brown", "violet"))

I think the reason this is a good bet is that the mind can not easily compare a circular length to a linear length (I don't know if that is scientifically accurate), plus perhaps we look at the diameter but we forget we are comparing the height not to the diameter, but rather to π times the diameter.

Feel free to make this bet with your friends or your students. How about sharing 10% of your winnings with me as a commission?

Incidentally, a beverage can is approximately a right circular cylinder. (But not exactly; look at the top and bottom to see why.) Calculus students can derive that the cylinder with the largest volume for a given surface area (the surface area can be thought of as the rectangular area of the paper label around the entire can), has height equal to diameter. A typical 12 ounce soda can does not have height equal to diameter, but its circumference is greater than its height. A fun supermarket experiment is to examine different shaped cans (a soup can, a tuna fish can, etc.) to determine which meet the largest volume criterion.

To my reinsurance friends: I learned the circumference greater than height trick from Paul Hawksworth of M&G.

Sunday, February 12, 2023

Some different graph types in R

I don't know about you, but I get tired of seeing column charts and pie charts. It's not difficult to create a few more interesting chart types once in a while. Whether these are relevant for a particular audience and truly display your message is a different question.

I wanted a really small dataset to experiment in R, so I used numbers of days in office for US presidents who were assassinated. Students of American history may want to pause reading this post and think about whether you can name the Presidents (and estimate the number of days), before continuing reading. Kennedy and Lincoln are most well-known, but there were others.

I decided I wanted a column chart with images on the x-axis, a wordcloud with the font size proportional to the number of days, a lolliplot chart which is a variation of a column chart but with a line instead of a bar and a dot at the end, and a donut chart which is a variation of a pie chart but where your eye focuses on the length of the arc rather than on the area of the sector.

The first chart requires images. I grabbed the images I needed (hopefully these are either old or Federal and therefore not subject to copyright prohibitions) and saved them as png files so they could be read with a readPNG from the png package.

Of course there are many more chart types that are beyond the scope of this blog post. One reference is Top 50 ggplot2 Visualizations - The Master List (With Full R Code). A very cool chart type is the radar chart which you can see at How to Create Radar Charts in R (With Examples).

Here is my output (click to enlarge) and my R code:

setwd("C:/Users/ ... ")
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
library(png)
library(ggtext)
df <- data.frame(President = c("Lincoln", "Garfield", "McKinley", "Kennedy"), Days_in_office = c(1503,199,1654,1036) )
# column chart with images on x-axis
p <- df %>%

ggplot() +
geom_col(mapping=aes(President, y=Days_in_office, fill=President)) +
scale_fill_manual(values=c("blue", "yellow", "red", "black")) +
labs(title="Axis Labels as Images") +
theme(plot.title = element_text(hjust = .5))

garfield <- readPNG("garfield.png")
kennedy <- readPNG("kennedy.png")
lincoln <- readPNG("lincoln.png")
mckinley <- readPNG("mckinley.png")
# in the following labels statement, please replace q with <, and replace z with >

labels <- c("qimg src='garfield.png', width='35' /z","qimg src='kennedy.png', width='35' /z","qimg src='lincoln.png', width='35' /z","qimg src='mckinley.png', width='40', height='42' /z")

p <- p +
scale_x_discrete(labels = labels) +
theme(axis.text.x = ggtext::element_markdown())
p # takes a moment to draw

# ==============================================================
suppressMessages(library(wordcloud))
df %>% with(wordcloud(words=President, freq=Days_in_office, random.order=FALSE, random.color=FALSE, rot.per = 0,
colors = c("blue","black", "red")))

# =====================================================
# lolliplot plot
df %>%
ggplot() +
geom_segment(mapping=aes(x=President, xend=President, y=0, yend=Days_in_office), color=c("blue", "yellow", "red", "black") ) +
geom_point(aes(x=President, y=Days_in_office), size=4, color=c("blue", "yellow", "red", "black") ) +
ylab("Days_in_office") +
theme(axis.text = element_text(face="bold")) +
theme(text = element_text(size =16)) +
labs(title="Lolliplot Plot") +
theme(plot.title = element_text(hjust = .5))
df

# ===========================================================
# donut plot
donut <- br="" df=""> donut$fraction = donut$Days_in_office / sum(donut$Days_in_office)

# Compute the cumulative percentages (top of each rectangle)
donut$ymax = cumsum(donut$fraction)

# Compute the bottom of each rectangle
donut$ymin = c(0, head(donut$ymax, n=-1))

# Compute label position

donut$labelPosition <- 2="" br="" donut="" ymax="" ymin="">

# Create label
donut$label <- ays_in_office="" br="" donut="" n="" paste0="" resident="" value:="">

# Make the plot
ggplot(donut, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3, fill=President)) +
geom_rect() +
geom_label( x=3.5, aes(y=labelPosition, label=label), size=6) +
scale_fill_brewer(palette=4) +
coord_polar(theta="y") +
xlim(c(2, 4)) +
theme_void() +
theme(legend.position = "none") +
theme(text = element_text(size =16)) +
labs(title="Donut Plot") +
theme(plot.title = element_text(hjust = .5))

Saturday, October 29, 2022

Find the next number in the sequence

Between ages two and four, most children can count up to at least ten.

If you ask your child, "What number comes next after 1, 2, 3, 4, 5?" they will probably say "6."

But to math nerds, any number can be the next number in a finite sequence. I like -14.

Given a sequence of n real numbers f(x₁), f(x₂), f(x₃), ... , f(x_n), there is always a mathematical procedure to find the next number f(x _n+1) of the sequence. The resulting solution may not appear to be satisfying to students, but it is mathematically logical.

I can draw a smooth curve through the points (1,1), (2,2), (3,3), (4,4), (5,5), (6, -14). If I can find an equation for that smooth curve, then I know my answer of -14 has some logic to it. Actually many equations will work.

In my example one equation is of the form y = (x-1)*(x-2)*(x-3)*(x-4)*(x-5)*(A/120) + x, where A is chosen so that when x is 6, the first term reduces to A, and A + 6 equals the -14 I want. So A is -20. This is called a collocation polynomial.

There is a theorem that for n+1 distinct values of x_i and their corresponding y_i values, there is a unique polynomial P of degree n with P(x_i) = y_i. One method to find P is to use polynomial regression. Another way is to use Newton's Forward Difference Formula (probably no longer taught in Numerical Analysis courses).

Higher degree polynomials than degree n is one reason why additional equations will work.

The equation does not have to be a polynomial, which then adds rational functions among others.

Of course the next number after -14 can be any number. It could be 7 :)

There are many famous sequences, and of course someone catalogued them.

Here is some R code.

xpoints <- c(1,2,3,4,5,6)
ypoints <- c(1,2,3,4,5,-14)
y <- vector()
x <- seq(from=1, to=6, by=.01)
y <- (x-1)*(x-2)*(x-3)*(x-4)*(x-5)*(-20/120) + x
plot(xpoints, ypoints, pch=18, type="p", cex=2, col="blue", xlim=c(1,6), ylim=c(-14,6), xlab="x", ylab="y")
lines(x,y, pch = 19, cex=1.3, col = "red")
fit <- lm(ypoints ~ xpoints + I(xpoints^2) + I(xpoints^3) +I(xpoints^4) +I(xpoints^5) )
s <- summary(fit)
bo <- s$coefficient[1]
b1 <- s$coefficient[2]
b2 <- s$coefficient[3]
b3 <- s$coefficient[4]
b4 <- s$coefficient[5]
b5 <- s$coefficient[6]
x <- seq(from=1, to=6, by=.01)
z <- bo+b1*x+b2*x^2+b3*x^3+b4*x^4+b5*x^5
plot(xpoints, ypoints, pch=18, type="p", cex=2, col="blue", xlim=c(1,6), ylim=c(-14,6), xlab="x", ylab="y")
lines(x,z, pch = 19, cex=1.3, col = "red")

More great R blogs at r-bloggers.com

Saturday, September 17, 2022

How do I count thee? Let me count the ways?

by Jerry Tuttle

In Major League Baseball, a player who hits 50 home runs in a single season has hit a lot of home runs. Suppose I want to count the number of 50 homer seasons by team, and also the number of 50 homer seasons by New York Yankees. (I will count Maris and Mantle in 1961 as two.) Here is the data including Aaron Judge's 62 in 2022 :

You would think base R would have a count function such as count(df$Team) and count(df$Team == "NYY") but this gives the error "could not find function 'count'". Base R does not have a count function.

Base R has at last four ways to perform a count:

1. The table function will count items in a vector. table(df$Team) presents results horizontally, and data.frame(table(df$Team)) presents results vertically. table(df$Team == "NYY") displays results 37 false and 10 true, while table(df$Team == "NYY")[2] just displays the result 10 true.

2. The sum function can be used to count the number of rows meeting a condition. sum(df$Team == "NYY") displays the result 10. Here df$Team == "NYY" is creating a logical vector, and sum is summing the number of true = 1.

3. Similar to sum, nrow(df[df$Team == "NYY", ]) counts the number of rows meeting the NYY condition.

4. The length function counts the number of elements in an R object. length(which(df$Team == "NYY")) , length(df$Team[df$Team == "NYY"]) , and length(grep("NYY", df[ , "Team"])) are all ways that will count the 10 Yankees.

The more direct solution to counting uses the count function in the dplyr library. Note that dplyr's count function applies to a data frame or tibble, but not to a vector. After loading library(dplyr) ,

1. df %>% count(Team) lists the count for each team.

2. df %>% filter(Team = "NYY") lists each Yankee, and you can see there are 10.

3. df %>% count(Team == "NYY") displays 37 false and 10 true, while df %>% filter(Team == "NYY") %>% count() just displays the 10 true.

The following is a bar chart of the results by team for teams with at least 1 50 homer season:

Finally, "How do I count thee? Let me count the ways?" is of course adapted from Elizabeth Barrett Browning's poem "How do I love thee? Let me count the ways?" But in her poem, just how would we count the number of times "love" is mentioned? The tidytext library makes counting words fairly easy, and the answer is ten, the same number of 50 homer Yankee seasons. Coincidence?

The following is all the R code. Happy counting!

library(dplyr)
library(ggplot2)
library(tidytext)

df <- data.frame(
   Player=c('Ruth','Ruth','Ruth','Ruth','Wilson','Foxx','Greenberg','Foxx','Kiner','Mize','Kiner','Mays','Mantle','Maris', 'Mantle','Mays','Foster','Fielder','Belle','McGwire','Anderson','McGwire','Griffey','McGwire','Sosa','Griffey', 'Vaughn','McGwire','Sosa','Sosa','Bonds','Sosa','Gonzalez','Rodriguez','Rodriguez','Thome','Jones','Howard','Ortiz', 'Rodriguez','Fielder','Bautista','Davis','Stanton','Judge','Alonso','Judge'),
   Year=c(1920,1921,1927,1928,1930,1932,1938,1938,1947,1947,1949,1955,1956,1961,1961,1965,1977,1990,1995,1996,1996,1997,1997, 1998,1998,1998,1998,1999,1999,2000,2001,2001,2001,2001,2002,2002,2005,2006,2006,2007,2007,2010,2013,2017,2017,2019,2022),
   Homers=c(54,59,60,54,56,58,58,50,51,51,54,51,52,61,54,52,52,51,50,52,50,58,56,70,66,56,50,65,63,50,73,64,57,52,57,52,51, 58,54,54,50,54,53,59,52,53,62),
   Team=c('NYY','NYY','NYY','NYY','CHC','PHA','DET','BOS','PIT','NYG','PIT','NYG','NYY','NYY','NYY','SF','CIN','DET','CLE', 'OAK','BAL','OAK/SLC','SEA','SLC','CHC','SEA','SD','SLC','CHC','CHC','SF','CHC','ARI','TEX','TEX','CLE','ATL','PHP', 'BOR','NYY','MIL','TOR','BAL','MIA','NYY','NYM','NYY'))

head(df)

# base R ways to count:

table(df$Team)    # shows results horizontally
data.frame(table(df$Team))    #shows results vertically
table(df$Team == "NYY")    # displays 37 false and 10 true
table(df$Team == "NYY")[2]

sum(df$Team == "NYY") # displays the result 10.

nrow(df[df$Team == "NYY", ]) # counts the number of rows meeting the NYY condition.

length(which(df$Team == "NYY")) # which returns a vector of indices which are true
length(df$Team[df$Team == "NYY"])
length(grep("NYY", df[ , "Team"])) # grep returns a vector of indices that match the pattern

# dplyr R ways to count; remember to load library(dplyr):

df %>% count(Team) # lists the count for each team.

df %>% filter(Team == "NYY") # lists each Yankee, and you can see there are 10.

df %>% count(Team == "NYY") # displays 37 false and 10 true, while
df %>% filter(Team == "NYY") %>% count() # just displays the 10 true.

# barplot of all teams with at least 1 50 homer season; remember to load library(ggplot2)

df %>%
    group_by(Team) %>%
    summarise(count = n()) %>%
    ggplot(aes(x=reorder(Team, count), y=count, fill=Team)) +
    geom_bar(stat = 'identity') +
    ggtitle("Count of 50 Homer Seasons") +
    xlab("Team") +
    scale_y_continuous(breaks=c(1,2,3,4,5,6,7,8,9,10)) +
    coord_flip() +
    theme(plot.title = element_text(face="bold", size=18)) +
    theme(axis.title.y = element_text(face="bold")) +
    theme(axis.title.x = element_blank()) +
    theme(axis.text.x = element_text(size=12, face="bold"),
    axis.text.y = element_text(size=12, face="bold")) +
    theme(legend.position="none")

# count number of times "love" is mentioned in Browning's poem; remember to load library(tidytext)

textfile <- c("How do I love thee? Let me count the ways.",
"I love thee to the depth and breadth and height",
"My soul can reach, when feeling out of sight",
"For the ends of being and ideal grace.",
"I love thee to the level of every day's",
"Most quiet need, by sun and candle-light.",
"I love thee freely, as men strive for right.",
"I love thee purely, as they turn from praise.",
"I love thee with the passion put to use",
"In my old griefs, and with my childhood's faith.",
"I love thee with a love I seemed to lose",
"With my lost saints. I love thee with the breath,",
"Smiles, tears, of all my life; and, if God choose,",
"I shall but love thee better after death.")

df<-data.frame(line=1:length(textfile), text=textfile)
df_words <- df %>% unnest_tokens(word, text)
cleaned_words <- df_words %>% anti_join(get_stopwords())
cleaned_words %>% count(word, sort = TRUE) %>% head(6)
cleaned_words %>% filter(word == "love") %>% count()

More great R blogs at r-bloggers.com