The Sieve of Eratosthenes

In this article...
I will begin talking about the fascinating topic of prime sieves. Here we are looking at the most basic one, the Sieve of Eratosthenes.
The number 2.
2 is the first prime. As far as I know, it has the distinction of being the only prime that has all of its products referred to by a categorical name: "even". What would we call all numbers divisible by 3? Threeven? We know that 2 is the only even number to be a prime number. Why? Because all other even numbers are products of 2. If we were seeking only primes out of a sequential pool of numbers, we could immediately eliminate all other even numbers we are looking at - boom, half of our numbers are gone. Thanos would be proud. 2 is the first of the even series so we get to keep it in our list of primes and discard the rest of the evens.
The rest of the primes.
So now let's move on to the next prime: 3. First of the threevens. Much like all evens after 2, we know that all products of 3 (after 3) can't be prime numbers either. This eliminates many more numbers in our set of numbers that could potentially be a prime (like 9, 15, and 21). If we look at the next number after 3, not eliminated as either being a product of 2 or 3, we come to 5. The first prime of its kind and all other numbers ending in 5 or 0 can also be eliminated. 6 is eliminated as being a product of 2 and of 3. The next non eliminated number will be 7, and of course all subsequent numbers divisible by 7 can't be a prime. Sorry 49.
The pattern.
By removing the products of primes from a number pool in an orderly fashion, we can find the next prime. Finding the next prime allows us to remove more numbers from the pool and this will lead us to the next prime. This pattern continues until we have found all primes and eliminated all other numbers as products of a prime in our pool of numbers.

Here are the basic broad steps of the pattern:
For a pool of sequential numbers starting with 2 and going up to n.
- for each number x in the pool
- if x is a prime not eliminated as a product of a previous prime,
- eliminate each remaining number in the pool that is greater than x and is a product of x
- if x is a prime not eliminated as a product of a previous prime,
- numbers not eliminated after going through the process are primes.
In step 1, we will repeat the sub-steps for each number in the pool. It is very broadly stated to try to start expressing the steps in an algorithm. We will be more detailed soon. This pattern is known as the Sieve of Eratosthenes.
Some observations from the above example:
- The smaller primes are good at eliminating many other numbers from the pool. The bigger primes, not so much. This is expected as we will see.
- For a pool of limit n and a prime p, there could be up to (n/p)-1 products of p (not including p itself). So in our pool of 50, there are 24 products of the prime 2 that can be eliminated. Whereas 7 could only potentially eliminate 6 products in that same pool. The actual count of numbers removed from our pool of 50 based on the prime 7 is only 1 (the number 49).
- The overlap of products of previous primes further reduces how many numbers can be eliminated as a product of a given prime. 5 can't eliminate 10, 15, or 20 because they are already eliminated as products of 2 or 3.
- In general, the series of a prime can't eliminate any number remaining in the pool less than p2. Why not? Let's say we consider the prime multiplicand 7 and its multipliers between 1 and 7:
- 7*2 (a product of 2)
- 7*3 (a product of 3)
- 7*4 (a product of 2)
- 7*5 (a product of 5)
- 7*6 (a product of 2,3)
- 7*7 is the first product of 7 that is not the product of another prime. We may think that the next number 7 would be able to eliminate would be 7*7*7, and no doubt this number would be eliminated if our pool were that big, however there are several other numbers that 7 could eliminate between 7*7 and 7*7*7 (such as 7*11, 7*13, 7*17, and 7 multiplied by any other prime number between 7 and 49.
- Because of this, it's not too surprising that after the prime 7 and its product of 49 is removed from the number pool in the above example, the remaining numbers are all prime. The next prime number (11) wouldn't be able to eliminate a number in our pool unless that pool extended to 121 or more. So after 7, there's nothing any of the remaining primes can eliminate (and by extension, there is nothing left that can eliminate them).
- Another way we could think about the last point is that for a pool of numbers of limit n, once we've sieved up to the square root of n, any non-eliminated number remaining in the pool is going to be a prime.
Not only are these points fun to notice (right?), they help us to optimize our algorithm some so that now it can be said:
For a set of sequential numbers starting with 2 and going up to n.
- for each number x in the pool under the square root of n
- if x is a prime not eliminated as a product of a previous prime,
- eliminate each remaining number in the pool that is x2 or greater and is a product of x
- if x is a prime not eliminated as a product of a previous prime,
- numbers not eliminated after going through the process are primes.
Notice the underlined additions. By adding this we are able to restrict the number of comparisons we need to perform in the outer loop and the inner loop.
Implementing the Algorithm
As mentioned, our algorithms so far have been generally stated in order to help think about the steps and how to express them. But if we are going to write a program to do this we must consider several things:
- what is the best way to store our pool of numbers?
- what is the best way to indicate if a number is eliminated or not?
- what is the best way to retrieve our list of primes afterwards?
We would probably want an array to work with our pool of numbers and iterate through. What would this array contain? Our number pool? The problem with this is that we have to use an index number to refer to which element of the array we want to look at. So let's say we look at arrayitem[45] that so happened to have the number 45 stored in there, it just seems kind of wasteful. (It would probably actually be arrayitem[45]=47 since we are starting at 2, but still you get the point). But suppose we did this, we would iterate through the array, open each element and see what number is in there and change that value to 0 or something to indicate that it is eliminated. This is one way we could do it. But another way would be to use the index as the number and make it an array of boolean values all set to true and we simply set it to false if it's been eliminated.
For a boolean array [A] indexed 2...n, with values initially set to true:
for i=2 up to the square root of n
if A[i]=true //it is a prime
for j = i2, i2 + i, i2 + 2i,... up to n
A[j] = false
After iterating, all array indexes that refer to an array element that is still true is a prime..
Here's a sample C# program:
using System.Globalization;
//Setting the stage
Console.WriteLine("Enter a limiting integer:");
string input = Console.ReadLine();
int n;
while(!int.TryParse(input, out n))
{
Console.WriteLine("Not a good number, is it? Enter a limiting INTEGER:");
input = Console.ReadLine();
}
var watch = new System.Diagnostics.Stopwatch(); //used for timing performance.
watch.Start();
int indexOffset = 2;//indexes are 0 based and we need to start at 2.
int sqrt = (int)Math.Sqrt(n);
bool[] pool = Enumerable.Repeat(true, n - 1).ToArray(); //size is decreased by 1 because we start at 2
//finding the primes
for(int i=2; i<=sqrt; i++)
{
if (pool[i - indexOffset])
{
for(int j=i*i; j<=n; j += i)
{
pool[j - indexOffset] = false;
}
}
}
watch.Stop();
//output
Console.WriteLine("Primes found:");
int primeCount = 0;
for (int i=0; i<n-1; i++)
{
if (pool[i])
{
Console.Write($"{i + indexOffset}\t");
primeCount++;
if (primeCount % 5 == 0) Console.WriteLine();
}
}
Console.WriteLine($"\nFound {primeCount} primes in {watch.ElapsedMilliseconds} ms.");
Running the code gives the following benchmarks on an overworked virtual machine I'm testing from:

We will investigate some other sieves to improve performance in future articles. The expected performance of this function is O(n log log n) according to the before linked article, but I/m not convinced I practically got there. Just to point out a couple of limitations here, time quickly crept up. Also using a signed int to store the max number in really limits how big we can run this sieve on.