I’m currently binge re-re-watching the show Breaking Bad, when I starting thinking about the names used in the credits. All the names have highlighted in green a symbol of one of the chemical elements. Here’s an example:
Vince Gilligan’s name highlights V for the chemical element Vanadium. For as long as I was paying attention, all the names in the credits were able to include an element. I was wondering, out of the 118 identified elements, how common is it that we can find at least one element to insert into a random name?
I found a list of 5163 unique first names and 88799 unique last names on this site (Note: there are some duplicates that need to be removed first). From these lists, and a list of the 118 chemical symbols, I wrote a little python script to count the number of symbols that could be inserted into each first name and last name separately.
I realized that it should be rare to be unable to insert a symbol into a name, but I wasn’t quite aware how rare these names were. For the first names in this list, only 0.81% of the names had no possible symbol insertions. Here is a list of those names:
* ada | * delma | * ja | * mae |
* adele | * ed | * jada | * max |
* adell | * eda | * jade | * meda |
* adelle | * edda | * jae | * mee |
* aja | * elda | * jed | * meg |
* dede | * elma | * le | * mel |
* dee | * elza | * lea | * melda |
* deedee | * ema | * leda | * zada |
* deja | * emelda | * lee | * zelda |
* del | * emma | * ma | * zelma |
* dell | * emmett |
Let’s take a look at the graph of percentage of names versus number of possible symbol insertions:
The most common number of possible replacements is 4, and the largest number possible was 11, using this list of names. One of the names with 11 possibilities was Catherine, even though the name only has 9 characters:
- Catherine
- Catherine
- CAtherine
- CaTherine
- CatHerine
- CatHerine
- CathErine
- CatherIne
- CatherIne
- CatheriNe
- CatheriNe
For the last names, just 0.18% had no possible symbol insertions! I won’t list all of them here, because there are about 160 of them.
The last names follow a similar distribution, but have a longer tail and larger average, due to having longer names.
We can combine these datasets to find symbol replacements for full names. If we assume that any first name can be combined with any last name in the dataset, then we can write the probability of having n replacements by:
Let’s break down the above equation. The probability of getting 0 symbol matches is just the probability of getting 0 matches on the first name and 0 matches on the last name, or . To get 1 match on the full name, we must consider when the match comes from the first name and when it comes from the last name. In this case, we have: The general equation follows from extending this logic.
Here are the results:
The chance that a full name has no symbol replacements possible is 0.0015%! The full names contain, of course, more letters than the first name or last name separately. This leads us again to see the distribution shift upwards, having a larger mean and a tail that extends out to 25 symbol replacements! An example of such a dense name is Catherine Bernasconi, with only 19 characters.
Note: I did not consider elements that connect the first name and last name. The element Eb would not be matched to Catherine Bernasconi, if it existed. Allowing chemical symbols to connect the names would have slightly bumped the distribution up, due to allowing one more 2 character symbol to potentially match the name. It would also be interesting to see how these distributions compare to random distributions of letters of equal length, but I’ll leave that to a future post.
- ANdy Bohn
- ANdy Bohn
- AnDy Bohn
- AndY Bohn
- Andy Bohn
- Andy BOhn
- Andy BoHn
- Andy BohN