How do I count thee? Let me count the ways?

How easily can you be identified on the Internet?

How easily can you be identified on the Internet? Suppose you finish your meal at a restaurant, you are about to pay the check, and t...

Showing posts with label words. Show all posts
Showing posts with label words. Show all posts

Tuesday, August 8, 2023

Black hole word numbers in multiple languages

      A few months ago I used R to investigate black hole word numbers in the English language. A friend suggested there are probably black hole word numbers in other languages. There are only three other languages that I have a nodding acquaintance of (spoken languages, not computer languages), and all three do have such black holes. Here is the result of my research. My R code for all four languages is at the end.

      First, a review with English words. Every English word gets you to the same black hole number as you count the number of letters in the word and then successively count the number of letters in the resulting word number. That black hole is at four. Once you get to four, you are stuck and can't get out. Here is an example.

The word hippopotomonstrosesquippedaliophobia (fear of long words) has 36 letters.
The word thirtysix has nine letters.
The word nine has four letters.
The word four has four letters.

      Here are some more English words, with their word number length counting sequence. I found a long list of English words, so this list is truly a random sample. (For the other languages, I could not find a nice long list, so the words are not random but rather a convenience sample.)

miscognizable thirteen eight five four
harvestry nine four
geopolitist eleven six three five four
jessed six three five four
pardonee eight five four
whitfield nine four
ghazal six three five four
morphophonemically eighteen eight five four
calonectria eleven six three five four
conceptiveness fourteen eight five four

      Every German word also gets you to the same black hole number: vier.

handschuh neun vier
flugzeug acht vier
staubsauger elf drei vier
waschmaschine dreizehn acht vier
haustürschlüssel sechszehn neun vier
lächeln sieben sechs funf vier
geutscher neun vier
danke funf vier
morgen sechs funf vier
tee drei vier
torschlusspanik funfzehn acht vier

      In Hebrew, where there is the complication that letters are written from right to left, there are two black hole numbers: ארבע and שלש . Below, the rightmost word is the word whose letters are first counted, and the subsequent counting is from right to left.

פירת ארבע
אורתודוקסית אחדעשר שש שתים ארבע
קומוניסטית עשר שלש
ומועמדויות עשר שלש
עיתונות שבע שלש
ארוך ארבע
שלה שלש
כך שתים ארבע
לראות חמש שלש
להסתכל שש שתים ארבע

      In Spanish there is a black hole at cinco. However, unlike the previous languages that had a black hole where you are stuck and can't get out, Spanish also has some words where you oscillate back and forth between two numbers but never really fall into a hole. These two Spanish numbers are seis and cuatro.

montaña ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
Iglesia siete cinco
computadora once cuatro seis cuatro seis cuatro seis cuatro seis cuatro
oficina siete cinco
preguntar nueve cinco
entender ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
hermosa siete cinco
asombroso nueve cinco
perezoso ocho cuatro seis cuatro seis cuatro seis cuatro seis cuatro
somnoliento doce cuatro seis cuatro seis cuatro seis cuatro seis cuatro
saludable nueve cinco

      This is reminiscent of some numerical algorithms that oscillate and never converge. For example, if f(x) = x3 -2*x + 2 and x0 = 1, which has a single root at approximately -1.769, Newton-Raphson approximations will oscillate between x = 0 and x = 1, and f(x) = 1 and f(x) = 2 and never find the root. You can see from the first graph that the oscillation occurs at the wrong section of the curve.

      If you think about it, the trick to why these black holes exist is not too difficult, and the same trick works in these four languages. I'm sure there are other languages that have no such black hole.

      Here is the R code I used:

####################################################
# Try hippopotomonstrosesquippedaliophobia (fear of long words) which has 36 letters.

library(english)
x <- "hippopotomonstrosesquippedaliophobia"
y <- -99     # Initialize y
while(y != "four"){
y <- nchar(x)
y <- as.character(english(y))     # Spell out an integer as a word
if (grepl('-', y, fixed = TRUE)) y <- gsub('-', '', y)     # delete hyphen
print(c(x,y))
x <- y
}


####################################################
# Try ten random English words

library(english)
library(wordcloud)
set.seed(123)
words <- read.table("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt")
original <- sample(words$V1, 10, replace = FALSE)
# original <- c(
"miscognizable","harvestry","geopolitist","jessed","pardonee","whitfield","ghazal","morphophonemically",
"calonectria","conceptiveness")
wordcloud(word=original, random.order = TRUE, colors=c("red","blue","darkgreen","brown","black","red",
"blue","darkgreen","navy","black"), ordered.colors=TRUE,, scale=c(3,7))
rm(words)     # free up memory
for (i in 1:10){
x <- original
y <- vector()
y[1] <- "dummy"     # Initialize y
for (j in 1:100){
c <- nchar(x[i])
c <- as.character(english(c))     # Spell out an integer as a word
if (grepl('-', c, fixed = TRUE)) y[j] <- gsub('-', '', c) else y[j] <- c     # delete hyphen
x[i] <- y[j]
if (y[j] == "four") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

####################################################
# Try 10 Hebrew words

original <- c("פירת", "אורתודוקסית", "קומוניסטית", "ומועמדויות", "עיתונות", "ארוך", "שלה", "כך", "לראות", "להסתכל" )

numbs <-
c("אחת", "שתים", "שלש", "ארבע", "חמש", "שש", "שבע", "שמונה", "תשע", "עשר","אחד עשר","שתיים עשרה","שלוש עשרה","ארבעה עשר","חמש עשרה","שש עשרה","שבע עשרה","שמונה עשרה","תשע עשרה","עשרים")
for (i in 1:10){
x <- original
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
if (grepl(' ', y[j], fixed = TRUE)) y[j] <- gsub(' ', '', y[j])     # delete space
x[i] <- y[j]
if (y[j] == "ארבע" | y[j] == "שלש") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

####################################################
# Try 11 Spanish words; however, infinite oscillation without convergence at cuatro and seis

x <- c("montaña","Iglesia","computadora","oficina","preguntar","entender","hermosa","asombroso","perezoso"," somnoliento","saludable")
numbs <- c(
"uno", "dos", "tres", "cuatro", "cinco", "seis", "siete", "ocho",
"nueve", "diez", "once", "doce", "trece", "catorce", "quince",
"dieciséis", "diecisiete", "dieciocho", "diecinueve", "veinte")
original <- x
for (i in 1:11){
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
x[i] <- y[j]
if (y[j] == "cinco") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

####################################################
# Try 11 German words

x <- c("handschuh","flugzeug","staubsauger","waschmaschine","haustürschlüssel","lächeln","geutscher", "danke", "morgen","tee","torschlusspanik")
numbs <- c(
"eins","zwei","drei","vier","funf","sechs","sieben","acht","neun","zehn",
"elf","zwolf","dreizehn","vierzehn","funfzehn","sechszehn","siebzehn",
"achtzehn","neunzehn"," zwanzig")
original <- x
for (i in 1:11){
y <- vector()
for (j in 1:10){
c <- nchar(x[i])
y[j] <- numbs[c]
x[i] <- y[j]
if (y[j] == "vier") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

####################################################
# Newton-Raphson: x(n+1) = xn - f(xn) / f ' (xn)

# f(x) = x^3 -2*x + 2
# f ' (x) = 3*(x^2) - 2

par(mfrow = c(1, 2))

# quick plot to choose initial value
x<- seq(from=-5, to=5, .001)
y <- x^3 - 2*x + 2
plot(x,y, main="f(x) = x^3 -2*x + 2", xlab="x", ylab="y", col="red", ylim=c(-2,4), cex.main = 3)
axis(side = 1, font = 2, cex.axis = 2)
axis(side = 2, font = 2, cex.axis = 2)
abline(h=0)

# Newton-Raphson
x <- vector()
f <- vector()
x_new <- 1     # initial guess
for (n in 1:10){
x[n] <- x_new
f[n] <- (x[n])^3 - 2*x[n] + 2
fprime <- 3 * (x[n])^2 -2     # manual derivative calculation
x_new <- x[n] - f[n]/fprime
if ( (abs(x[n] - x_new)/x[n]) < .00005 ){break}
}

df <- data.frame(cbind(x,f))
df <- head(df, 10)
df

plot(df$x, df$f, pch = 16, cex = 2, main="Sequence of N-R points", xlab="x", ylab="y", cex.main = 3)
for (i in 1:nrow(df)){
arrows(x0 = x[i], y0 = f[i], x1 = x[i+1], y1 = f[i+1], col="blue")
}
axis(side = 1, font = 2, cex.axis = 2)
axis(side = 2, font = 2, cex.axis = 2)
abline(h=0)

dev.off()     # reset par



Monday, March 27, 2023

There's a black hole in the number line

The government doesn't want you to know about this, but I have discovered it and I will share this with a few close friends: There is a black hole in the number line, and it's at number 4. Every word in the English language will eventually fall into it and can't get out of it.

As an example, take the word mathematical.

Mathematical has twelve letters.
Twelve has six letters.
Six has three letters.
Three has five letters.
Five has four letters.
Four has four letters.
Four has four letters, and now we entered this black hole at 4, and we can't get out of it!

Try a few more words. Try words as long as you like. Try morphophonemically, which has 18 letters. I have done exhaustive research on this with R, and you will find every English word eventually falls into the black hole at 4 and can't get out. At this rate, there will be no words left!

Of course, this is an April Fool's Day prank.

Did you figure it out?

Here is some R code to test the word mathematical.

# Try word mathematical
library(broman)
x <- "mathematical"
y <- -99 # Initialize y
while(y != "four"){
y <- nchar(x)
y <- broman::spell_out(y, max_value = 20) # Spell out an integer as a word
print(c(x,y))
x <- y
}

Here is some R code to test ten random words.

# Try ten random words
set.seed(123)
words <- read.table("https://raw.githubusercontent.com/dwyl/english-words/master/words_alpha.txt")
original <- sample(words$V1, 10, replace = FALSE)
x <- original
rm(words) # free up memory
for (i in 1:10){
y <- vector()
y[1] <- "dummy"
for (j in 1:100){
c <- nchar(x[i])
y[j] <- spell_out(c, max_value = 20)
x[i] <- y[j]
if (y[j] == "four") {
break
}
}
cat(c(original[i], "\t", y), "\n")
}

Here is the code for the number line:

library(ggplot2)
df <- data.frame(x = c(1,2,3,4,5,6,7,8,9,10),
y = rep(0,10),
group = c("A","A","A","B","A","A","A","A","A","A"))
ggplot(df, aes(x = x, y = y)) +
geom_point(aes(color = group, size = ifelse(x == 4, 15, 15))) +
geom_hline(yintercept = 0, linetype = 1, color = "lightblue", size = 1) +
scale_x_continuous(limits = c(0, 11), expand = c(0, 0),
breaks = NULL, minor_breaks = NULL) +
scale_y_continuous(limits = c(-0.2, 0.2), expand = c(0, 0),
breaks = NULL, minor_breaks = NULL) +
scale_color_manual(values = c("red", "black")) +
ggtitle("THE BLACK HOLE AT NUMBER 4") +
theme_void() +
theme(legend.position = "none",
axis.line = element_blank(),
axis.text = element_blank(),
plot.title = element_text(color="black", size=14, face="bold")) +
geom_text(aes(x = x, y = -0.1, label = x), size = 5)