mulander is a user on bsd.network. You can follow them or interact with them if you have an account anywhere in the fediverse. If you don't, you can sign up here.
mulander @mulander

In our app today we had a need to bulk insert largish (230M) batches of data frequently into the DB without the process taking too long (2-3 seconds max).

Sequential inserts in ActiveRecord - took forever (40s).
Bulk insert using a postgresql feature - took forever (21s), because ruby ate all the time by building the query.

Ended up dumping the data to disk (SSD) and importing it via postgresql COPY command - nearly instant.

· Web · 0 · 8
@mulander what do you mean "ruby ate all the time building the query"? The DB driver or actually serializing the query or?

@kurisu serializing the query, that was ~10k records building up to 230MB of data. The bulk insert gems try to be smart about it but they still loose out with so many string interpolation/concatenations (plus things like checking column default values). Even building it carefully by hand would take longer than dumping it to disk raw.

@mulander interesting, I wonder why postgres doesn't have a binary wire fomrat for bulk insert. The postgres protocol is already binary and efficient for interpolating values into normal queries using $1 etc, but for some reason they dont extend that to bulk insert.

@kurisu I guess you could use COPY using it's STDIN input method. I was considering that briefly but didn't get around to trying that/seeing if it's possible as what I tested satisfied the criteria already.

@mulander yeah that looks like the ideal answer. I wonder why none of the ruby libraries do that.

@kurisu I did spot this gem (github.com/diogob/postgres-cop) that wraps the COPY command which is closest to what I did - I didn't use it as I don't need a generalized solution to import/export records.