Share your repls and programming experiences

← Back to all posts
Optimized Sieve of Eratosthenes.
h
eric_wang (10)

Optimized Sieve of Eratosthenes

Given two naturals, start and end, print all primes between start and end inclusive.

Preconditions

1 <= start <= end <= 2,100,000,000
end - start <= 10,000,000

Input specification

start and end are entered on the same line, separated by a single space.
For example, 4 20 is a valid input.

Strengths

This sieve is memory efficient when start is a large natural number.
The sieve considers the naturals from [0, sqrt(end)], and the odd naturals from [start, end]

Limitations

Despite improvements, The time complexity of the sieve is still O(n log log n), where n is the end

Improvements

To improve the sieve, a larger possible value to filter by, given any prime number, must be found.

Example

If 2 were to be considered, this implies that

4, 6, 8, 10, 12, 14, 16, 18, 20, 22 ...

are not primes.


If 3 were to be considered, this implies that

6, 9, 12, 15, 18, 21 ...

are not primes.


However, 6, 12, 18 were considered more than necessary, as all these values occurred
when sieving both in multiples of 2 and in multiples of 3. This makes the sieving process
inefficient. When considering multiples of 3, a more efficient means of sieving would have been

9, 15, 21 ...

with a distance of 6 between each value.

Although this is a trivial example, this becomes increasingly complex when more
primes are taken into consideration.

Comments

The maximum step value used in this sieve is 2 x i, where i is the ith prime found.

The maximum step one can sieve by when considering
multiples of a prime, continues to elude me to this day.

Commentshotnewtop
Highwayman (1340)

Maybe it’s not that 6, 12, and 18 are be considered too much, but that the iterations 1, 2, and 3 are being considered too much? Too an extent of course. I’m no programmer lol.

Edit: sry not 1,2,3 ; I mean 2,4,6.

Edit: oh here we go.
Generate the set S of integers between start and sqrt(start), take out the multiples of the first element, creating set S1. Then take the next number in S1 and do the same, making S2, then again with S2 and so on and so forth....?

eric_wang (10)

@Highwayman That does seem like a lovely idea, I might try that out sometime.

eric_wang (10)

@Highwayman

I tried to code your idea in python.
After some brief testing, the code
has trouble dealing with larger
start and end values.

In concept, that is still an amazing idea,
even though its not exactly
reflected through my execution.

However, I think that sifting through all the numbers
in the set would still be over linear time.

In an ideal situation, if the set has n elements:

The first sift would have to sift through
all n elements eliminating half,

The second sift would iterate through
the remaining half, eliminating one third of
the remaining elements, and so forth.

The amount of sifts performed in total should be roughly

n
+ n / 2
+ 2 / 3 x n / 2
+ 4 / 5 x 2 / 3 x n / 2
+ ... + 
+ (p1 - 1) / p1 x (p2 - 1) / p2 x ... x 4 / 5 x 2 / 3 x n / 2

where p1, p2, ... are all the primes before the sqrt(end), in descending order.

import math
from timeit import timeit


def sieve(start, ending):

    # Initialising the sieve
    lop = [2] if start < 3 else []

    start = max(3, start + (start + 1) % 2)
    ending -= (ending + 1) % 2

    small = set([i for i in range(3, int(math.sqrt(ending)) + 1, 2)])
    large = set([j for j in range(start, ending + 1, 2)])

    # Process of the sieving that is timed.
    def sieving(small, large):
        index = 3
        while index ** 2 <= ending:
            if index in small:
                new = set()
                new.add(index)
                for s in small:
                    if s % index:
                        new.add(s)
                small = new

                new = set()
                if index in large:
                    new.add(index)
                for p in large:
                    if p % index:
                        new.add(p)
                large = new
            index += 2
        return small, large

    small, large = sieving(small, large)

    print(timeit(lambda: sieving(small, large), number=5) / 5)

    # Sorting the list of primes does not count,
    # Only the sieving process is important.
    # An ordered set can be implemented.
    lop.extend(list(large))
    return sorted(lop)


if __name__ == '__main__':

    # print(sieve(1, 1))
    # print(sieve(1, 2))
    # print(sieve(2, 2))
    # print(sieve(2, 3))
    # print(sieve(4, 20))

    # print(len(sieve(1, 100)))
    # print(len(sieve(1, 1000)))
    # print(len(sieve(1, 10000)))
    # print(len(sieve(1, 100000)))
    # print(len(sieve(1, 1000000)))
    # print(len(sieve(1, 10000000)))
    # print(len(sieve(995000000, 1000000000)))

    stuff = input().split()
    for p in sieve(int(stuff[0]), int(stuff[1])):
        print(p)
Highwayman (1340)

@eric_wang oof. :/ oh well it was worth a try lol. Thank you!
I spent like 10 min trying to parse the code even though it was my algo idea 😂😂😂

eric_wang (10)

@Highwayman Thank you so much for the suggestion, if you have more ideas, please share :D

Highwayman (1340)

@eric_wang your welcome, absolutely! :)

xxpertHacker (382)

Why does the code look more like C, but use C++ headers?

Might I ask, what is the difference between <cstring>, <cstdio>, and <cmath> when compared to <string.h>, <stdio.h>, and <math.h>?

I wonder if you could make a simple, but ridiculously inefficient recursive to determine if something is a prime or not.

Like, I'm not even joking, can you?

If you care about speed, since you wrote "Optimized" in the title, I might be able to optimize it.

eric_wang (10)

@StudentFires Alright, sounds like a lovely idea

Why does the code look more like C, but use C++ headers?

I originally coded this in C++ but found out later that using certain features
in C (scanf and printf), made the program slightly faster. As for the headers,
the difference between <cmath> vs <math.h>, <cstring> vs <string.h>, and <cstdio> and <stdio.h>, should not make a difference in the performance of the code.

Might I ask, what is the difference between <cstring>, <cstdio>, and <cmath> when compared to <string.h>, <stdio.h>, and <math.h>?

As for the differences between libraries, after a quick google search, math.h is the deprecated C header.
<cmath> is the C++ header. The difference is that **<cmath> puts all the names in the std namespace.
I think the same concept is applied for <cstring> vs <string.h> and <cstdio> and <stdio.h>

I wonder if you could make a simple, but ridiculously inefficient recursive to determine if something is a prime or not.

I found this original piece of code here: https://www.geeksforgeeks.org/recursive-program-prime-number/

After some slight modifications, I simplified the base case, and increased the step value.
However, I think the time complexity of the program is at least O((end - start) sqrt(end)), since this
program basically performs trial division on every prime in the interval from start to end.

If the finding of primes were to be done iteratively, with a vector that stores the results of previous
primes found to test for divisibility of future primes, the program would still be slower than the sieve.

The main reason the sieve works is because its much easier to determine if a number is not prime,
compared to determining if a number is prime.

def is_prime(n, i=3):
    # Base case
    if i * i > n:
        return True

    if n % i == 0:
        return False

    return is_prime(n, i + 2)


start, end = input().split()
start, end = int(start), int(end)

# lop = []
if start < 3:
    # lop.append(2)
    print(2)

start += not(start % 2)
start = max(3, start)
end -= not (end % 2)

for i in range(start, end + 1, 2):
    if is_prime(i):
        print(i)
        # lop.append(i)

# print(len(lop))

If you care about speed, since you wrote "Optimized" in the title, I might be able to optimize it.

How would you optimize this? I am open to suggestions.