Bundled mysql inserts

Posted by Chris on August 12th, 2009 filed in programming

I often “bundle” my mysql inserts – instead of doing one at a time, I’ll do 5,000 for example. (Or more usually, do a user-specified number at a time). The only trick is remembering to do any leftovers at the end. Here’s the code:

my  $count=0;
my $max_inserts = 15000;
my @inserts=();
my $printed_example=0;  #will be 1 when we print an example of the insert;
my $insert_sql = "insert into some_table (col1,col2,col,etc) values ";
while(my @things=$sth->fetchrow_array())
{
    $count++;
    my $bigline="('".join("','",@things)."')";
    push(@inserts,$bigline);
    #flush the @inserts after 1k (or whatever) rows get pushed onto it
    if(scalar(@inserts)>$max_inserts)
    {
         $dbh->do($insert_sql.join(",",@inserts))
            or die "Couldn't insert: $!\n";
        @inserts=();       #empty back out the array.
    }
}
#don't forget to insert any leftovers :)
$dbh->do($insert_sql.join(",",@inserts)) if scalar(@inserts);
@inserts=();

What it’s doing: Getting a bunch of rows from a database (or whatever). While it’s looping through the giant list it gets, it’s pushing part of an insert statement onto an array. Basically, the back half of an insert statement. Once that array gets big enough, it “flushes” it into the database and empties it out again. Finally, after it’s done looping through the giant list, it “flushes” any left over. I find this is faster than preparing and executing one at a time by a factor of 10 to 50. Must do some number crunching soon…


2 Responses to “Bundled mysql inserts”

  1. James Says:

    No equivalent of the COPY command in MySQL?

  2. Matt S Trout Says:

    You know, if you’ve got a $sth, I’m guessing those are the results of the SELECT – so you could equally do:

    INSERT INTO foo (…) VALUES (SELECT …);

    Of course, that can get slow if mysql tries to slurp the entire thing up front, so a COUNT(*) on the SELECT and then doing the INSERT several times with a LIMIT clause of some sort may be required.

    The other thing to remember is that wrapping a transaction around a bunch of inserts may give you most of the same speed.

    Oh, and of course LOAD DATA INFILE may be a better strategy if your data is external rather than a SELECT.

    Finally, a neat trick with splice –

    while (my @one_thousand = splice(@array, 0, 1000)) {

    should, I -think-, DWIM – since the boolification of an array is based on its scalarification which is the number of elements – so you can use the truth value for that. or you could do –

    do {
    my @one_thousand = splice(@array, 0, 1000);

    } while (@array);

    this same trick is how you can have an iterator that may return undef – your next method returns () for exhaustion and then

    while (my ($value) = $it->next) {

    will do the right thing.

    — mst

Leave a Comment