Go back

Go, Sqlite and 7+ million records borks it all

I've been writing a small software in my spare time to index all files on my different machines on my network. There's some old disks running and I have no clue what files are stored and where.

Since I'm a go (golang) -fan the choice of language was easy enough. Being a hobby-project I tried to keep it as close to a POC or MVP as possible seing as I just wanted the result and not a robust general software. My intent was never for anyone else to use it really.

After a few moments - between putting kids to bed and going to bed myself - I got something running. It's a two piece software. One server-part which is meant to recieve the file-list, store it in a db and provide a simple web UI where one can view, search and filter.

The second part is a client which you run on the machine where you've got some files to index. The client walks through the dir you want and builds a struct for each file - containing mine-type, size, name, and maybe ten more pieces of data. When it's built the entire index it stores it in a sqlite db and sends that db over to the server. When the server recieves a sqlite db from a client it uses it to populate it's own sqlite and then throws away the client one.

This all works like a charm - until I stopped trying it on my dev machine and actually wanted to index one of my other machines.

Well, to be honest - it still worked like a charm. The problem was just that it worked a bit too well. The client-part ended up indexing around 7.5 million files and sent over the sqlite-db to the server. The server inserted them just fine in it's own db. The problems then arose trying to actually handle the files.

Everything became monumentally slow. Adding new records to the db is real slow and reading the db takes so long everything pretty much times out.

My conclusion is two-fold:

I have some studying to do on the subject of handling many millions of files (these 7+ millon records is just one /mnt -dir on one machine) SQLite - as great as it is - might not be a great tool for the job. I hardly think MariaDB is the best either - perhaps a nosql db or maybe elastic?