We'll choose two positive integers at random. If they have any divisor in common (other than ) I'll pay you a dollar, else you'll pay me a dollar. Are you in?

Apart from the question what kind of establishments you frequent, you should be wondering: is this a good bet for you?

When two integers have no divisors in common except the trivial divisor we say they are *coprime* or *relatively prime*. and have the common divisor , so they are *not* coprime, whilst and only have the trivial common divisor , so they are coprime.

This makes you start thinking: "As numbers grow bigger, aren't there a lot of divisors out there? After all, half the numbers are even, so if we hit two even numbers, they'll have the factor in common and I'll win. And then there's , , , ... Seems like a good deal!"

Is it, though? Let's do some experiments! This simply Python script should give a decent approximation:

from math import gcd from random import randint from sys import maxsize repetitions = 10**8 count = sum(1 if gcd(randint(1, maxsize), randint(1, maxsize)) == 1 else 0 for _ in range(repetitions)) print('{count:d} out of {repetitions:d} pairs were coprime ({pct:.3f}%)' .format(count=count, repetitions=repetitions, pct=100 * count / repetitions))

This will pick two (largish) integers at random^{1}, check if they are coprime – which is the same as saying that they're greatest common divisor (*gcd*) is – and repeat the procedure 100 million times. At the end, it will print something like this:

60797595 out of 100000000 pairs were coprime (60.798%)

If you run this, the exact numbers will obviously be different, but the bottom line is that in over 60% of the cases you'll end up with two coprime numbers. Don't take the bet! (This is generally a good rule of thumb when a stranger makes you any kind of offer.)

So, where does that number – ish – come from? There is a nice and easy representation of the exact number, and if you've never heard of this problem, you might be surprised that the answer involves – the ubiquitous constant of circle fame. What's it got to do in number theory? Quite frankly, I'm not sure, but we can prove it, so bear with me.

First, let's look at the above probability more closely. For two numbers to have a non-trivial common divisor means in particular that there is a prime that divides both of these numbers. If they are coprime, then there's no prime dividing both of them. What are the odds of that? Well, half the numbers will be divisible by – the even numbers. The probability that both numbers are even is , so the probability that they are not divisible by is the complement . Likewise, a third of all numbers are divisible by , so the probability of having as a common factor is , with a complement of .

We can consider the probabilities of being divisible by any two given prime numbers as independent, so we can just multiply all these terms together for the probability that two integers are coprime:

where the product runs over all primes . This should look awfully familiar to you. If it's been to long and I need to jog your brain, remember the Euler product that is at the very core of our investigations:

In other words, our probability is nothing but

The value of this particular sum became rather famous in its own right as the **Basel problem**, first posed in 1644 and finally solved in 1734 by none less than the great Leonhard Euler. By now, dozens of proofs have been published, most of which calculate some form of integral, but none is clearer and more beautiful than the master's original one, so let's dive into that.

We'll obtain two series representations of and compare them with one another. First, let's take a look at the **Taylor series** of the sine function. In general, this series is defined for a smooth (read: well-behaved) function as

where is the factorial function and is the -th derivative of evaluated at . The derivates of are about as easy as they get:

from where it all begins over and over in a neat -cycle. In particular we have , , and , which yields the following classic series representation for the sine-function:

We'll take one out of the sum and use the slightly different form

Half way there. Now we'll develop a product representation for , more precisely an expression over its zeros. We used a very similar expression when we developed the product representation for . The general form is

where the product runs over all the zeros of . Euler was very much ahead of his time in using such a product and just manipulated formal expressions, much like in the famous sum . It wasn't until the late 19th century, when Karl Weierstraß formally proved products like this in an attempt to verify some of the claims in Riemann's famous paper.

Now, remembering that has zeros for all with , and as , we can write

This is a product of sums in , so if we manage to multiply out everything we'll have another series representation of . In general, multiplying out such an expression means for every choosing either the left () or the right () summand and then summing everything up over all possible combinations (like *left, left, left, ...* or *left, right, left, ...*). This is actually much easier than it sounds since we don't need to deal with all the infinite coefficients which in turn will be infinite sums. Instead, all we care about are the terms that contain . Because all factors of the product above are of the form this means we choose for one the right summand and the left ones (which are all ) for everything else. Hence the above product will expand into a series like so:

The rest will be higher order terms of degree or more which we don't care about. What we've obtained are two series representations for , both of the form . It's pretty easy to show that if you have two such series for the same function, all of their coefficients must be identical. This means in particular that the coefficients in front of must match. From the Taylor series we got , while the product yielded , so we just obtained the equality

We can take the s out of the series since they don't depend on and rearrange a little to finally arrive at Euler's solution to the Basel problem:

Using this in our calculations above we obtain the exact result for the probability that two randomly chosen positive integers are coprime:

As I said before, I'm still puzzled every time I work on this how the trigonometric constant manages to creep up all over maths in seemingly unrelated fields.

Euler's solution I presented here lacks some form of rigour since he couldn't prove that the product representation actually converges, but his manipulations as formal identities not only solved the famous Basel problem correctly, it can actually do lot more. When multiplying out the product above I ignored any terms higher than . However, if you do carry out these calculations you'll get expressions in , so you'll obtain similar results for all . The exact calculations are well beyond our scope here, but it's worth noticing that Euler solved the values of for all even integers, and in particular we know therefore that all are irrational and even transcendental numbers since they are expressions in . You might ask: what about odd arguments ?

Somewhat surprisingly the answer is: very little. It wasn't until 1978 that Roger Apéry proved that , a number that's now referred to as *Apéry's constant*, is irrational. No other value of is known to be irrational^{2} (or rational for that matter), let alone any nice closed form like we had with .

I'll leave you with the somewhat comforting note that Apéry was of the almost biblical age (for a mathematician) of over 60 years when he made his breakthrough. Mathematics is often called "a young men's discipline" – you either make a significant contribution before you reach tenure or you never will. Roger Apéry is one counter-example, Andrew Wiles and Yitang Zhang are others. It's never too late to start working on that one big theorem!

I was (deliberately) vague when saying

*choose two random numbers*. There is no way of choosing two*arbitrarily large*integers uniformly at random, so what you need to do is pick an upper bound*N*and choose numbers below*N*uniformly at random. In the calculations of the probabilities you'd take the limit*N*→ ∞. But we'll be hand-waving to begin with, so no reason to be exact here. ↩Some progress has been made, but no single value has been cracked so far. ↩

The basic idea is the following:

*one*has 3 letters,*two*has 3 letters,*three*has 5 letters,*four*has 4 letters,*five*has 4 letters,*six*has 3 letters,*seven*has 5 letters,*eight*has 5 letters,*nine*has 4 letters,*ten*has 3 letters,

and so on... This can be seen as a function

(Note that it says "number of letters" and not "length of word" – English and most other languages put spaces, commas, hyphens, and other decorations in between words for numbers, which we'll ignore in everything that follows.)

As the title of the video points out, we have . This is called a fixed point, i.e., a point that just refuses to move when you apply the function to it. "Four" is the only word/number with this property in English. What's more, you can repeatedly apply the word-length-function on the result of itself, which yields a sort chain:

**one** → **three** → **five** → **four** → **four** → **four** → ...

OK, we're kinda stuck at this point. In fact, every number will eventually get stuck in this very loop when is applied to it over and over. These chains make pretty pictures:

This graph uses every number up to 10000, but leaves out leaves (pun intended), i.e., those numbers that have no other number pointing to them. In other words: for every number you see in the graph above, there is a number (below 10001) in the English language with that length. In the original video, Matt presents a chain of length 6 (a three letter word), we can now extend that one to 7 (five letters) elements:

**one hundred and twenty-four** → **twenty-three** → **eleven** → **six** → **three** → **five** → **four**,

which brings us back to the infinite loop of the fixed point. This is probably a good point to discuss the inherit ambiguity in the function's definition: natural languages are a mess. You might well have chosen to represent 124 above as "one hundred twenty-four" instead, which yields a disappointing chain of length 6. The best you can do is choose one convention and stick to it. I didn't have much of a choice – I used existing software libraries to generate the spellings for me, so their convention is mine.

This pretty much all there is to say about English. Numerologically, it's a pretty dull language. The graph is not particularly inspiring, no really long chains, it has only one fixed point, no other cycle, and every initial condition leads to the same number. What about other languages?

OK, call this nepotism, because the only reason I included German is that this is my native language. The same single fixed point as English, and no chain longer than 6. German deserves the title of *dullest of all languages*. Sigh...

No particularly long chains, but no less than 3 fixed points. Pretty self-centred! Unlike German and English, you can categorise numbers depending on what infinite loop they will be stuck in eventually, any of the three fixed points 2, 3, or 4 (popular choices for fixed points, by the way). In graph theory, we call this the *connected components* of a graph. Here's some homework for you: calculate the proportion of numbers in each connected component!

Indonesian is our first example of a language with a non-trivial cycle: 4 (*empat*) maps to 5 (*lima*) which maps back to 4. It also forms some pretty long chains, the longest having 8 elements:

**delapan puluh delapan** (88) → **sembilan belas** (19) → **tiga belas** (13) → **sembilan** (9) → **delapan** (8) → **tujuh** (7) → **lima** (5) → **empat** (4)

Again, there's a bit of convention necessary how to count the chains: I opted for counting the longest chain without having duplicate members. This does favour languages with long cycles: every chain is at least as long as that cycle, plus however long it takes to reach that cycle. Well, so be it.

Another language that made the list mostly because of personal taste. This constructed language is as regular as it could be – generating all the number words in Esperanto takes only a dozen or so lines of code. It has the same three fixed points we've seen in Danish – 2, 3, and 4 – but the graph overall is really compact: the longest chain has a mere 4 members. If you think this is only possible in an artificial language, wait for our next candidate:

A little "bushier" than Esperanto, but the same short routes – no other natural language has shorter chains. (And again, it's the same egoistic numbers 2, 3, and 4.)

So far we've either seen fixed points, or one longer cycle. Lithuanian has both: the fixed point 7 (*septyni*) and the cycle 4 (*keturi*) → 6 (*šeši*) → 4.

French boasts the longest cycle in our little survey with 4 members:

**quatre** (4) → **six** (6) → **trois** (3) → **cinq** (5)

Every number will settle in this loop, and still French has no chains longer than 7, so every number pretty quickly hits one of the four members of the omnipotent French cycle.

As well as being the only non-Latin script in contest, it also seems to have a pretty lonely number 3 (*три*). It's not all alone though: 2 (*два*) and 100 (*сто*) point to it. Still, this is the smallest connected component we've seen so far. Russian also has one other fixed point (11 – *одиннадцать*) and a 3-cycle:

**четыре** (4)→ **шесть** (6)→ **пять** (5)

A lot of structure for one language!

The thing you notice immediately in Polish is just how spread out it is – in fact it takes some numbers 7 steps before they reach the language's all-consuming 3-cycle, which gives a record chain of 10 elements:

**dwieście sześćdziesiąt dziewięć** (269) → **dwadzieścia dziewięć** (29) → **dziewiętnaście** (19) → **czternaście** (14) → **jedenaście** (11) → **dziesięć** (10) → **osiem** (8) → **pięć** (5) → **cztery** (4) → **sześć** (6)

Just for the fun of it, I give you three more languages, not because they have properties we haven't seen, but just because they're pretty.

But why stop at natural languages?

Matt mentions in his video binary, i.e., spelling out 10101 as *one zero one zero one*. The most interesting fact about this number system is the large value of its fixed point 18 – or *one zero zero one zero* to those who like to talk to their computers.

Not strictly speaking a spelling system, yet a way of representing numbers with letters. Not surprisingly, this graph is pretty sleek – and it deserves an honourable mention since it's the one time the number one (1) actually makes it into one of our graphs.

In the video, Matt conjectures that in every language there is some threshold for which every value greater than is mapped to a value smaller than under that language's function . Since most (maybe all?) languages reflect our base 10 positional number system, the information each part of the word contains when placed in front of a previous number word grows exponentially, which means that is essentially logarithmic, so large numbers will be rapidly "passed down" to the small numbers where all the action takes place.

This plot shows the values for in Esperanto which I chose mostly for performance reasons. It is clear just how slow the function grows.

If you made it this far in the article, you might appreciate a few words on how I produced all the graphics in this article. There are a few steps to it.

Despite the many exceptions, latest from numbers above 100 things get very regular and are pretty much just concatenations of what happened before, so this task screams for computer programs. Many exist, but most are dirty one-time scripts. One pretty decent Python library is *num2words* which is where most of the number words in this article come from. Unfortunately, the code is little messy as well, and I just wasn't able to extend it to add Esperanto support, so I started my own project *literumi* ("spell" in Esperanto) which right now is mostly a thin wrapper around *num2words*, but that might change over time.

Since I had to rely on these software tools for the spelling of the numbers in most of the above languages I cannot guarantee their correctness – if you spot any mistakes, please do let me know!

There are plenty of sites on the Internet that list the first couple of numbers, maybe up to 100, in a bunch of languages. (This site really should get an honourable mention for listing 5000 languages, albeit only numbers 1 to 10.) But that's all dictionary style and meant to be educational, not to be processed easily with a machine. This article might be the only application for it, but it bothered me that there are no readily available files where nothing but the first couple of thousand number words are listed. So – of course – I started a project for it: *nombroj* ("numbers" in Esperanto).

This pretty much only contains lists I could generate with *literumi*, but I also tried the approach of sending the English number words to Google Translate and thus receiving a list in any of their dozens of languages. But the approach has two flaws: (a) Google Translate is extremely expensive – if you use their API they charge you about USD1 for translating 1000 numbers. (b) It's terrible – the spellings are inconsistent if not incorrect and are often interspersed with the expression in digits. And you charge for this, Google?

All the graphs (in the mathematical sense, other people would maybe call them networks) you see above are produced with Neo4j, an excellent graph database. I wrote a small script (another project on GitHub, of course) to load the data into Neo4j, and then used their default visualisation in the browser to get the graphics – I simply made a screenshot. This is the query I used for English:

MATCH (n1:Number) WHERE (n1)<-[:LINK {lang: "en"}]-(:Number) OPTIONAL MATCH (n1)<-[e1:LINK {lang: "en"}]-(n2:Number) WHERE (n2)<-[:LINK {lang: "en"}]-(:Number) RETURN n1, e1, n2

There might be smarter ways of doing this, but I'm very much a Neo4j novice and it certainly does the trick for me.

The last plot of the function was created with *matplotlib*, through something like this:

import matplotlib.pyplot as plt from literumi import spell mx = 10**7 + 1 x = range(mx) y = [len(spell(i, lang='eo')) for i in range(mx)] plt.plot(x, y, linestyle='', marker='.') plt.show()]]>

There are numbers that behave very much like the integers but have a different structure. One rather simple example are the Gaussian integers (usually denoted ) which look just like complex numbers , except that and are restricted to integer values. They live in the complex plane, but exclusively on a discrete grid amongst their continuous cousins.

Every ordinary integer is also a Gaussian integer (think ); other examples include , , , , etc. Just as with the integers we can start decomposing numbers as products and develop a notion of primality. Our first important observation is that is no longer prime! It decompose to . On the other hand, is still prime as a Gaussian integer. But as it turns out, other than losing a few primes and gaining a few new ones, the Gaussian numbers are still very well behaved and have most of the properties the integers have, including unique prime factorisation. The primes in make pretty pictures:

Things change when we consider a different set of numbers which we'll denote . They are of the form and have a surprise in store: Looking at the decomposition of , we can of course still do , but we also have . Maybe this does not surprise you – after all we've already seen that some of our beloved primes decompose in some other systems such as does in the Gaussian integers. The difference here is that neither nor decompose in – we have found a truly different decomposition! This proves that our new set of numbers does not have a unique prime factorisation.

What I've given you is a small taster of *algebraic number theory*. It's the regular follow up course to the introduction or elementary course and will teach you that much of the regular behaviour of the integers can be restored if you move from considering numbers to ideals. But that's way beyond the scope of this article.

Now, you may find all these “new” numbers esoteric if not abstract nonsense. But they do have broad applications – within mathematics, of course. In fact, these rings or domains we call first emerged in the study of Fermat’s Last Theorem at the end of the 19th century. After a prize has been announced for its solution by the French Academy of Science the highly reputed mathematic Gabriel Lamé announced a proof within a year. His approach used the kind of numbers I introduced here, but was destroyed by Ernst Kummer when he pointed out that the proof implicitly relied on the unique factorisation property they don't necessarily have. Lamé’s assaults were cancelled^{2} and it would take another century before Andrew Wiles would take algebraic number theory to a completely new level. But that is way beyond the scope of this post.^{3}

Yes, you can write 12 as 3*4 or 2*6, but you can continue either way and eventually reach the unambiguous 2*2*3. ↩

Which is not to say the work was in vain: Kummer developed said ideal theory and used it to prove a large (in fact, infinite) number of cases of Fermat’s Last Theorem. The study of our new number friends developed in the meantime their own ever more abstract life which has little to do with number theory and is in my humble opinion nothing but tedious and boring. ↩

If you are interested in the whole story, there's no better (popular) source than Simon Singh’s book. ↩

However, there *is* one more dimension we can exploit: time! Used in the right way, this can produce wonderful videos like this one:

What are we looking at? These are the values of as goes up the critical line . We start at^{1} at the beginning of the video and go all the way up to . , so this is where the values start. From there, we make an anti-clockwise semicircle until we hit the real axis again. After that, the -function "turns right" and settles into a clockwise spiral with most of the action happing in the right half-plane. This goes on and on forever. Notably, after about four seconds, the graph passes the origin for the first time. This is the first of infinitely many -zeros on the critical line, at about . From now, it winds around in seemingly unpredictable circles, sometimes small and hasty, sometimes wide and elegant, but it never forgets to visit the origin every so often.

I produced this video with a relatively simple SageMath script quite some time ago, but I didn't write this post up until now since I would really like to turn this into an interactive sheet where you can play with the values for yourself, step away from the critical line, see how the spiral will miss the origin, and so on. But it's unlikely I will get around to doing that any time soon, and I've realised the video got quite some audience on YouTube, so I thought it's high time I shared it here as well!

As far as I know, it's total coincidence we conventionally use t for both the imaginary part of a complex argument and a time variable, but it makes talking about this animation surprisingly natural. ↩

For instance, one way two parties (who we, by convention, call Alice and Bob) could hide their secrete communication is if Alice writes a letter, puts it in a box, and locks it with a padlock for which both she and Bob have a key, but no one else. Then Alice can send this box safely through any public means as anyone who intercepts it will not be able to open the box, rendering it useless to them. This is the principle behind symmetric cryptography. The obvious problem is that Alice and Bob will only be able to communicate if they managed to obtain identical keys beforehand.

Alternatively, Bob could distribute padlocks for which only he possesses the key to anyone who's interested. Now Alice can obtain such a padlock, use it to close the box with her letter and send it off to Bob. She won't be able to open the box, but neither will anybody else -- except Bob. It sounds tedious to ship padlocks for every single communication, but the advantage is that there's no need whatsoever for Bob to restrict distribution to trusted parties, making it possible for anyone to send him messages securely. This is the basis of public key cryptography (a field which is extremely number theory heavy -- a course about it will feature extensively theorems on primes, groups, elliptic curves, lattices, and many more really abstract concepts).

But there is yet another way parties could communicate securely. Let's imagine Alice uses a lock for which only she has a key. She ships this of to Bob who cannot unlock it himself -- instead, he'll add another padlock himself and sends everything back to Alice. She couldn't open it herself anymore, but she can remove her padlock and still send it off to Bob once more knowing no one except Bob will be able to open her secret. Finally, Bob opens his lock and can read Alice's message. No need to meet up and share keys, no shipping padlocks.

The problem when trying to translate this to some mathematical model is that "adding Bob's padlock" in most encryption schemes actually means taking Alice's locked box, *putting it in another box*, and then adding the padlock. When Alice receives this package, all of a sudden she's unable to remove her padlock anymore. In proper mathematical terms: encryption is not commutative. (Sorry, it's going to get a little mathematical. Feel free to skip a few paragraphs until you feel again comfortable with the formula-to-text ratio.) What we are trying to achieve looks like this:

where and stand for encryption and decryption, and and for Alice's and Bob's key, respectively. This simply does not check out in general, since all we can rely on is

What we need is a scheme where

with meaning the composition or chaining of functions, and standing for the identity, i.e., the lazy operation that just leaves every argument as it is, exactly what we want for an encrypted then decrypted message. (End of the mathsy symbols section. Please switch brains back on *now*!)

It turns out there is one encryption scheme with this property, and what's more, it's in fact the best possible encryption: the **one time pad** (OTP).^{1} It's a deep (though not hard to prove) result in information theory that the OTP provides perfect secrecy, i.e., an attacker who intercepts the encrypted message has no chance of learning any information about the original message whatsoever. This is an incredible strong statement: it does not depend on the smartness, luck, or resources of the attacker, it's a mathematical fact: not a single bit of the message will be revealed. Unfortunately, it requires a secrete (shared) key that is as long as the plaintext, and if the two parties have a way to communicate this key securely -- why not use this channel to send the message in the first place?

Here we go back to the scheme I described earlier -- Alice and Bob add their padlocks individually, no need to exchange keys beforehand. Concretely, this means Alice and Bob each choose their own keys which have to be as long as the original message, but there's no need to ever communicate these with anyone.

So, how does the OTP work? It's about as simple as it could be:

Here, the "funny plus" means XOR if you're a computer scientist or (bitwise) addition modulo 2 if you're a mathematician. If you're neither, just think of it as a special kind of addition and don't worry about the details. The important fact is that it has all the properties of regular addition, plus the following: , i.e., an element XORed with itself vanishes. This is all it takes to show that the OTP has the required consistency property:

just as we needed. The secrecy follows from the fact that if the looks like "random noise", so will the resulting cipher text. In other words, if I intercept a cipher , *any* message (of the same length) is equally likely to be the original message. This is what it means that the eavesdropper learns nothing about the plaintext.

How would our little keyless protocol work? First, Alice and Bob both choose keys and uniformly at random. (As I stated above, the whole security of the OTP hinges on this being truly random, which is non-trivial, or maybe impossible,^{2} with software, but this way beyond our scope here.) Then the dance begins:

- Alice sends Bob .
- Bob sends Alice .
- Alice sends Bob .
- Bob retrieves the message as .

You don't believe me that this checks out? Well, let's see... Plugging things in from bottom up yields:

All we need is the fact that XOR, like any good addition, is commutative and associative, i.e., order does not matter, and neither do parentheses.

*Of course, this protocol is absolutely and entirely useless.*

Why? Well, I claimed throughout that it was Bob who added his lock. But how do we know that it was in fact Bob? What do we know about this "Bob guy" anyways? It could in fact be anyone at all! There's no proof of his identity. So, when we receive the double-locked package back with our little secrete safely inside, and remove our lock as dutifully as gullibly, we may have unlocked it in fact for Eve, Alice's evil cousin, who intercepted our message en route, and sent it back with her own lock instead of Bob's. It's a good example of why secrecy against eavesdropping is never enough -- we must equally protect against any tempering.^{3}

But it comes worse. Note that the problem above holds for my real world padlock analogy just the same: when we unlock our part, we have no clue who added the second padlock. In fact, if there was no previous contact, how could we have any information at all about that other party? At least, we are guaranteed that no one else could read our message, even if we don't know who that other dude is we're communicating with.

Alas, our use of the OTP does not provide even that guarantee. Anyone who intercepted our exchange can recover Bob's key easily and hence eventually the plaintext. Just look what happens when I XOR the first to cipher texts:

Out falls , so nothing stops us from using this to unlock and hence steal the secrete message . This is just more evidence for the first rule of cryptography:

Don't invent your own crypto -- particularly if you're a clueless number theorist writing some weird blog...

]]>

where and

Here, stands for the greatest common divisor,^{1} i.e., the largest integer that divides both and . This may not seem terribly interesting at first sight, but if you look at the first few values for you'll notice something curious:

1, 1, 1, 5, 3, 1, 1, 1, 1, 11, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 23, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 47, 3, 1, 5, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 101...

There are for sure a lot of ones in there, but other than that, all the numbers are primes. This is not a bias in the first few example -- Eric Rowland proved that all values of are either or a prime in a beautiful little paper back in 2008.

I can't help but getting a little sentimental here -- this was the paper my supervisor assigned to me for my undergrad thesis, and the first research paper I really got my teeth into. The beginning of a wonderful journey!

The proof is far from difficult and a good overview of follow-up articles is given on bit-player.org. But it wasn't Rowland's sequence the question on MSE was about, but instead a similar one that uses the lowest common multiple () instead of the gcd:

where and

When we look at the values of this sequence, you'll spot a similar pattern to the one above:

2, 1, 2, 5, 1, 7, 1, 1, 5, 11, 1, 13, 1, 5, 1, 17, 1, 19, 1, 1, 11, 23, 1, 5, 13, 1, 1, 29, 1, 31, 1, 11, 17, 1, 1, 37, 1, 13, 1, 41, 1, 43, 1, 1, 23, 47, 1, 1, 1, 17, 13, 53, 1, 1, 1, 1, 29, 59, 1, 61, 1, 1, 1, 13, 1, 67, 1, 23, 1, 71, 1, 73, 1, 1, 1, 1, 13, 79, 1, 1, 41, 83, 1, 1, 43, 29, 1, 89, 1, 13, 23, 1, 47, 1, 1, 97, 1, 1, 1, 101...

The sequence seems to be much richer in non-one values, but this comes at a price: We don't know if there are any composite values in the sequence! Benoit Cloitre announced a proof in Rowland's original paper, but hasn't delivered on his promise as of 2015. One thing that is very easy to prove is that every prime (except for ) is a member of the sequence -- a nice fact, given that we have very little knowledge of the values that appear in Rowland's sequence.^{2}

My answer to the MSE question made me think about the problem again, and instead of the pages of awkward arguments it took me in my thesis I came up with a very simple and short proof. However, the question is about a slightly different sequence, and I wasn't able to use the same short cut to the definition above. Please do let me know if you find a shorter argument!

First, we need the trivial connection between gcd and lcm:

This is best understood if you compare the prime factorisation on the left and the right hand side: For every prime , we have the highest power that divides and that divides . The smaller of the two will be part of the gcd, the larger part of the lcm. United in products, both sides will yields the same result.

If we apply this formula to , we obtain

Equipped with this, it's now easy to conclude that we must have either or for every prime since the gcd in the denominator can only be or itself. In order to prove (for primes bigger than ) we need to prove that . For this we need to show that only has "small" prime factors, and in particular that the largest prime factor of is strictly less than . Once we got this fact, we immediately see that for all primes , and so all primes other than are included in the sequence . Since we further have , this also implies that every prime can appear earliest at position , and hence when we regard only increasing primes in the sequence we will end up with a full list of primes (except for ).

So, it remains to prove that has only small prime factors. This in turn follows from the fact that is odd for all . If this is true, then we can write

for some integer , and we can argue inductively that all factors in must be small (i.e., less than ), as well as and , so no prime factor in can exceed .

Now, for odd it is obvious that is odd too. So let be even. Then we can write , where is odd and must be less than . On the other hand, we can argue again inductively that must have at least factors -- at each step, at least one factor will be added to the product. Since , we know that divides both and , and hence also their gcd. We conclude that must divide and thus also be odd.

Puhhh... That was some piece of work! I bet I lost everyone by now. (I certainly got lost multiple times when writing this thing up.) If this easy fact already takes so much space to prove, how much longer would the full proof that contains no composite have to be? I'd still be very interested to see one, it'd finally give me closure to move on from my undergraduate project...

PS: This faulty proof has been on my bedroom closet for months now. Can you spot the mistake?

PPS: Here is a Sage script to reproduce some of these numbers.

It's common to abbreviate gcd(a,b)=(a,b) in number theory, and I shall do so in the remainder of the article. Similarly, it's convention to write lcm(a,b)=[a,b]. ↩

A more recent paper by Fernando Chamizo, Dulcinea Raboso, and Serafín Ruiz-Cabello sheds some more light on this question, but still only conditionally. ↩

It is notoriously difficult to find exact formulae for general combinatorial constructs. Typically, we want to know how many objects, e.g., trees, permutations, sequences, with certain properties there are of a given size. Famously, the number of binary trees (and about a million other constructions) is governed by the Catalan numbers

How do we get to this result?^{1} We could start to construct small examples. How many trees are there of size 1? How many of size 2? 3, 4, 5? This will get tedious very soon, may not help you with finding a formula if the behaviour does not match any of the known sequences, or may even mislead if the beginning of the sequence is very different from the asymptotic behaviour.

You may be tempted to think of constructing a general formula like this: here we have nodes, how many trees can I build from these? In some cases, this approach actually works, e.g., for permutations, but it soon reaches its limits. Working example by example, number by number, won't get us very far. But working on all examples *simultaneously* will!

What I mean by this is that we find a general recipe to construct our object of interest, usually from smaller parts we already understand, i.e., atoms and smaller sub-structures. This is particularly natural for recursively defined structures like trees. Now, you have *all* the objects at your disposal. The trick is to encode them all together in one function -- the *generating* function.

In the case of combinatorial objects, we define a polynomial series of the form

where the are the quantities we're interested in, i.e., is the number of object of size . Once we understand the resulting function^{2} , we can simply extract the coefficient of to recover . This may sound pretty circular since we need the to define the function in the first place, but the recursive constructions I mentioned before lead to functional equations that allow us to easily retrieve the function (i.e., write down a nice, closed form of the function). Once we have that, we can recover the coefficients by applying techniques like comparing it to other well-understood functions, calculating the Taylor coefficients, or finding asymptotic expressions for the coefficients.

You may now be enchanted by the magic powers of analytic combinatorics, but may ask: what does this again have to do with primes? In fact, *analytic* number theory owes its name to the exact same reason as *analytic* combinatorics: we take stubborn and hard to handle discrete objects (e.g., primes or trees), throw them all together into one nice function (e.g., the zeta function or a generating function), and -- voilà! -- finally, we have a handle on those objects by applying all these wonderful analytical^{3} tools we developed over the centuries. It's as though we smooth out the erratic behaviour of these discrete^{4} structures by zooming out of the nitty-gritty details and taking a look at the big picture.

As Marcus du Sautoy^{5} puts it more poetically:

The zeta function provided Riemann with a looking-glass in which the primes appeared transformed. As in

Alice in Wonderland, Riemann's paper sucked mathematicians from their familiar world of numbers through a rabbit hole into a new and often counterintuitive mathematical land.

John Derbyshire^{6} calls it "the great fusion": the discrete and the continuous worlds, that for centuries have been thought as being completely independent, "counting and measuring", "numbers *staccato* and numbers *legato*", come together in an unexpected harmony. For me, this is what makes the beauty of mathematics: even the most harmless problems require the full power of our mind and imagination to come up with new and creative concepts. Only the brightest mathematicians will pioneer new techniques like Bernhard Riemann did.^{7} But by following their footsteps, we may hope to breathe some of the inspiration they exhale.

The exact result depends on what exactly the question is, e.g., if you count internal nodes, external nodes, only full trees, etc. Either way, the Catalan numbers are a recurring theme. Here, we don't care about the details, but just the general idea. ↩

Of course, if we want to apply analytical tools, we need this series to converge, which isn't guaranteed at all. If it doesn't, we can usually make it converge by applying weights, e.g., inverses of factorials like in the definition of the exponential function. ↩

At this point, I fully got confused between

*analytic*and*analytical*. A quick internet search indicates that there is simply no difference. However, I didn't find a single mention of analytic*al*number theory. Still, I somehow feel that*analytical*tools sound better than*analytic*tools. English is such a confusing language! ↩Luckily, the distinction between

*discrete*and*discreet*is an easy one! ↩In

*The Music of the Primes*, chapter 3. ↩In

*Prime Obsession*, chapter 6. ↩Derbyshire rightfully attributes "the great fusion" to Peter Gustav Lejeune Dirichlet who first used analytical methods to prove his famous theorem on arithmetic progressions, but it was Riemann who brought the subject to its full powers by leaving the known waters of the real numbers and set sail for the complex numbers. ↩

where is the prime power counting function introduced even earlier. It's high time we applied this!

First, let's take a look at when calculating it exactly:

You see how this jumps by one unit at prime values (, , , , , , , ), by half a unit at squares of primes (, ), by a third at cubes (), and by a quarter at fourth powers (), but is constant otherwise.

Now, the point about Riemann's formula is that if you worked out diligently every single term on the right hand side exactly (and there's an infinity of smooth functions there) -- you'd end up with the same step function I just plotted!

This thought always makes my brain hurt a little, so let's take it step by step. Of course, it's impossible to work *every* term since this would require to know the value of every single zeta zero. Instead, we will look at increasingly better approximations as we add more and more zeros into the equation. Let start from what it looks like when we only use the terms without the zeta zeros, i.e., , , and the integral:

This already looks like a pretty good approximation, but we already do much better by just adding the first three^{1} zeros:

Note how elegantly the previously monotonous function dances in waves around the prime steps. But this is only the start -- with ten zeros the waves are getting more dense:

By the time we added one hundred zeros it's getting pretty hard to distinguish the two functions:

I think you can't get much closer to seeing the music of the primes!

If you're worried that we just got lucky with the small values, here's a plot of the values up to with twenty zeros used in the approximation:

Note what a close match the approximation is for small values of , but even further up the approximation will never let go too far astray.

I could carry on with more plots with higher values of or more zeros added, but this would only really look like two identical smooth lines, so the way this works is actually much clearer in smaller values.

I'll conclude with two thoughts: First, the equation and the pictures are in terms of this somewhat exotic , but we were really interested in , the number or primes less than . This is really not a big deal, since we can just apply Möbius inversion to the definition of and get back the values of :

Just plug in our approximations for and we will get out an approximation for .

Second, the approximations are smooth functions, but our actual objects of study are discrete step functions. In particular, can only ever take on integer values, which actually helps a lot when doing computations. We don't need to add more and more zeros to get the precise value of . Instead, as soon as our approximation is zooming in on an integer, this must be the correct value, and we can move on.

That said, calculating the approximation to any degree of accuracy for high values of requires precise knowledge of a lot of zeta zeros and even more computing power. In the end, it is much faster just to tabulate the prime numbers and count them directly.

But of course, we are not doing this for a particular practical reason, but to understand prime numbers -- and Riemann's formula gave the researchers of his time finally the tools to prove the Prime Number Theorem.

In what follow I always refer to the zeros ordered in increasing height up the critical line

*plus*their respective mirror images with negative imaginary part. ↩

(Here is the link to the video directly on the Numberphile webpage.) As mentioned before, it's not easy to explain the details and the beauty of the Riemann Hypothesis in few words, but I think the video definitely succeeds in compressing the essentials into 17 minutes.

One thing that caught my attention is the discussion of the CMI's Millennium Prize. Of course, you couldn't possibly fail to mention the $1,000,000 bounty on the problem in a popular account on it, but I was surprised to hear that you could earn the money not only by *proving* the RH, but also by *disproving* it. I was convinced that I read somewhere that a counterexample would not earn you the prize, so I took a look at the rules for the Millennium Prize -- and indeed, a counterexample would be considered a solution the same way as a proof. (So where the heck did I read the opposite? I couldn't find it anymore...)

Mathematicians' believes if the RH is true or false is an interesting discussion in itself. But since we are on the topic of the money it's also an interesting thought if it's worth putting resources in calculating zeta zeros in the hope to find a counterexample, i.e., a zero off the critical line -- after all, it would earn you a million bucks.

But there are two reasons why this would be ill invested money. First, much computing power has gone into calculating zeta zeros, worth well in excess of the prize money.

In fact, there goes the anecdote that the RH is responsible for the "most expensive bottles of wine ever drunk": the two eminent mathematicians Don Zagier and Enrico Bombieri made a bet (worth two bottles of good Bordeaux wine) that there would be no counterexamples to the RH amongst the first 300,000,000 zeros. As it so happened, a research team around Herman te Riele and Richard Brent some time later confirmed the RH for the first 200,000,000 zeros. When they learnt about the bet, they went the extra mile and calculated another 100,000,000 zeros just to settle the bet -- of course in favour of Bombieri who is a firm believer in the RH. But the point to this story is that at that time (we talk about the late 70s) the calculating power to find these extra 100,000,000 zeros was worth $700,000 alone -- all this to settle a bet for two bottles of wine!

So if you think it's a good investment to calculate zeta zeros in the hope to find a counterexample just to win the Millennium Prize -- it's a safer choice to buy a lottery ticket.

The second reason is that it seems extremely unlikely to find a (hypothetical) counterexample within the reach of today's computing capabilities. There are two famous conjectures closely related to the Riemann Hypothesis which seem perfectly sound to any number you could possibly calculate, but have been proven wrong by theoretical or indirect means. The first is Gauss's conjecture that the logarithmic integral always overestimates the number of primes up to . This has been vaporised by John Edensor Littlewood in 1914^{1}. In fact, Littlewood's student Stanley Skewes later found an explicit upper bound for the first violation of Gauss's conjecture -- this mind-boggling number is now known as Skewes' Number.^{2}

The second is the Mertens Conjecture which I mentioned in the previous article:

This one has been disproved by Andrew Odlyzko and Herman te Riele by calculating zeta zeros to high precision, but would still hold true if you just calculated ever higher of . Prime numbers are governed by weird things like iterated logarithms which make even the longest and most intensive calculations look like pre-school counting exercised.

As it's said in the video: if you want a really, *really* arduous way of earning a million dollars, do number theory...

**Edit**: Just found this video where Numberphile regular James Grime claims (towards the end) that a counterexample to the Riemann Hypothesis will not earn you the million dollars. It's certainly not where I first heard this, but clearly those statements do exist!

Based on Riemann's work, by the way, which is kind of ironic since Riemann himself mentions the conjecture towards the end of his paper and sees his results as evidence for its truth. ↩

Of course, this number has later been dwarfed by Graham's Number, which I couldn't fail to mention... ↩

Let's first take a look at a regular (fair) coin, that is, the two outcomes "heads" or "tails" are equally likely. We now do an experiment: we start counting at , and toss our coin repeated. Whenever we throw "heads", we add to our count, when it's a "tails", we subtract . What will be the count after coin tosses?

Mathematically speaking, we define a sequence of independent random variables where

Now we are interested in what the sum is, i.e., we define

Because of the 's symmetry, they each have , and hence we also have for all (due to the linearity of the expectation value). In other word, if we were to bet on what our total count would be at any given point, we should put our money on .^{1}

However, reality looks different.

This is an example of how might develop. (You see why this process is known as a random walk.) When we stopped at , we were off by about . After all, that shouldn't be too surprising. In theory, could take any value between and . However, these boundary values would be extremely unlikely: the probability that is indeed , a ridiculously small number.

Still, despite its tendency to oscillate around the -axis, we would observe that would reach arbitrary distances from it as grows larger and larger. The way to quantify this behaviour is to measure the *standard deviation* of . I won't go into the details, but if you're familiar with these calculations you won't have a problem verifying that

This means that while we expect to be on average, we would expect the values to spread up to :

More precisely, we would expect values between and in of the cases if we repeated the experiment over and over again. We can visualise this if we take a look at a few more random walks:

Out of these 100 random walks, indeed 5 ended up outside the square root region.

But for random walks, we can make an even more precise statement. The law of the iterated logarithm asserts that the rate of growth of a random walk is governed by , so we might say^{2}

which of course has to be interpreted with some good will^{3} since we are talking here about a random sequence, not a deterministic one.

That's all very well, but what does it have to do with primes or the Riemann Hypothesis? To see this, we need to remember the Möbius function that I half-heartedly introduced some time ago. The definition of this function looks very odd and artificial at first, but is actually a very natural and useful function in the theory of numbers, not least because of the Möbius inversion mentioned in the article above. We define as follows: if has prime factors, all of which are distinct (i.e., is squarefree), then , otherwise, if has at least one repeated prime factor, . The sequence looks like this:

Notice how for primes since they have exactly one prime factor (themselves). But there are also composite values with , e.g., . In general, it seems pretty hard to predict what value will come next. The sequence looks *random*. Of course, it isn't actually random since the values are predefined by the arguments' prime factors. It is exactly this unpredictable, yet not erratic nature of the primes that the zeta-zeros encode.

To see what I mean, we'll sum up the values of just like we did with out coin tosses. The result

is called the *Mertens function*, and looks like this:

From this (small) sample, looks pretty much like one of our random walks. One crucial difference however is that about of the values are . (More precisely, the proportion of squarefree numbers amongst all integers converges to which may look familiar to you as the value of , but this deserves an article of its own.) To get a better comparison we'll just skip these values in our "random prime walk": If we take the values of , can we distinguish the resulting sequence of and from a sequence of random coin tosses? A small comparison actually makes it look like is rather better behaved than a coin toss (the prime walk is overlaid in red):

Since this is not actually a random variable, we cannot talk of expectation values or standard deviations, but we can control its rate of growth. If ^{4} really behaved like a random walk, we would expect it to be delimited by the square root function, and indeed did Franz Mertens conjecture the stronger statement that for all . This turned out to be wrong^{5}, but not all is lost -- after all, we don't have such a tight bound on our coin tosses either. Instead, we would be happy with something like

*This indeed is equivalent to the Riemann Hypothesis.* To see this, we first observe that

(There are many clever ways to see this, but you will have to trust me on this -- or, much better, verify it yourself.) Through a clever sequence of transformations, we express as an integral, and then get it out of it via Mellin transform to obtain

The crucial question is for what values of this is valid. We will change the value of the integral if we push the line of integration to the left past a singularity in the integrand. This will happen if there is a zero in the denominator. Voilà, the zeta-zeros entered the stage!

If Riemann was right, there will be no zeros for , and hence we could choose to be as little as (where is an arbitrarily small number). But then it is not difficult to convince yourself^{6} that this implies , just as we were hoping for.

In other words: if the Riemann Hypothesis is true, then the sequence is *very* close (though not quite) to looking like a random walk. What's more is that the converse is true as well, i.e., you could deduce the Riemann Hypothesis from the above bound for : if Riemann is false, then there are zeros off the critical line and we will know that the Mertens function has a wider spread than a random walk. That's why we say that these statements are equivalent.

So when someone asks you next time how it is to work on primes, just say that it's like flipping a coin -- you never know what you'll get.

PS: This is the Sage script I used to generate the graphs in this article.

Of course,

*Y_n*can*actually*only ever be 0 for even*n*. ↩The law of the iterative logarithm even gives us the value of the implicit constant that the big-O notation surpresses: it's the square root of 2. ↩

That is, the number of exceptions will have probability 0. ↩

We entered the realm of big O where constants don't matter. The zero values of the Möbius function only change the constant, and hence it doesn't matter if we consider the Mertens function

*with*the zero values, or the prime walk*without*them. ↩Though only the

*existence*of a violation is known, but an actual value is beyond the scope of the current calculating power -- yet another example that you can't trust empirical evidence when it comes to primes. ↩Just observe that x^(s+it)=O(x^s). Then all you need is a good bound for the denominator. ↩