Optimise sqlite3 1.4 million rows access

I am running a small python3 service that checks some TLS certificates with a systemd timer. Over the months it accumulated over 1.4 million rows and the memory consumption rose to over 750MB per check run.

The initial code was like this:

db = sqlite3.connect('database.sqlite3')
cur = db.cursor()

rows = cur.execute('SELECT timestamp, hostname, common_name, alt_names FROM samples ORDER BY id DESC')

rows = cur.fetchall()
for row in rows:
    # do the things

db.close()

The biggest problem was this line: rows = cur.fetchall() It did copy the entire database into memory. Somehow a 253MB sqlite3 file occupies over 750MB in memory.

A small speedup was changing the isolation_level, since I know that the database will not get accessed by any other program.

Using the with closable as db: automatically flushes and closes the database connection.

# EXCLUSIVE prevents other database connections from reading the database while the transaction is underway. 
with sqlite3.connect('database.sqlite3', isolation_level='Exclusive') as db:
    cur = db.cursor()

    rows = cur.execute('SELECT timestamp, hostname, common_name, alt_names FROM samples ORDER BY id DESC')

    for row in rows:
        # do the things

With these tweaks I was able to reduce the memory footprint from 750MB to 37MB and the runtime from 3.7s to 2.9s 🎉

links

social