Array Size, Record Count, Commit Frequency, Bulk Insert

jreddy · Post by **jreddy** » Wed May 09, 2018 11:54 am

Looking for some guidance on how we can set/use the transaction handling options in ODBC Connector (or other SQL Server stages) to get the best performance when inserting/updating data to SQL Server Database.

For a job that inserts 50K rows for example, I have set the Array Size and Record Count as 2000 in the SQL Server Enterprise stage for the INSERT, but have seen that on the DB side, I see 50k insert statements issued

Tried changing array size and record count to 20K each, but still see the same on the DB side.

What option(s) could be set so we can reduce the number of insert statements. I was going to try the SQL Server Bulk stage and ODBC connector stage and try the same, but wanted to see if anyone was able to gain performance improvements by playing with these options and also commit frequencies, for example..

Any guidance is appreciated, thanks

chulett · Post by **chulett** » Wed May 09, 2018 12:01 pm

None of those settings will affect the number of inserts issued. If you've got 50k rows to process, 50k insert statements will be issued. Array Size controls how many are "bundled together" to be sent to the database at any given time. And Record Count, doesn't that control how often a commit is performed?

jreddy · Post by **jreddy** » Wed May 09, 2018 12:07 pm

Thank you Craig.

Our DBA has pointed me to these 50k inserts getting issued 1 by 1 and has asked me to see if there is a way to group them for better efficiencies, so it doesn't hit the database that many times.. is that something we can do with a bulk load?

chulett · Post by **chulett** » Wed May 09, 2018 1:03 pm

Not sure why there would be an issue with that number of inserts. Sure they get processed 'one by one' but that's not a volume where I would think it warranted to switch to bulk loads. Are they taking an inordinate amount of time and that's why the DBA is involved?

Bulk inserts are a different beast. Not familiar with SQL Server and all of its nuances but have dealt with this in Oracle. They use a different API, bypass a lot of the things a "conventional" insert needs to deal with and there's no longer a concept of "commits". For Oracle, you cannot have any indexes on the table. And then it all processes successfully or it fails and when there's a bulk load failure it can leave your table in a bit of a compromised state. That's why we save them for Loads of Unusual Size - say for example when our normal daily load might be 1 to 1.5 million (which we still load conventionally) expands up to 100 to 150 million like it does once a year.

And from the Oracle side I could also write a PL/SQL procedure to generate and bulk collect the inserts and let the database handle them in large chunks that way. Been a heck of a long time since I've had a situation where that looked like that would be a viable solution for. Never mind it's not an ETL solution, more of a PITA solution.

Not sure how much of that helps. I'm just wondering if you are trying to solve something that really isn't a problem, per se.

qt_ky · Post by **qt_ky** » Thu May 10, 2018 5:10 am

Sounds quite normal.

jreddy · Post by **jreddy** » Fri May 11, 2018 11:57 am

Thank you Craig
Yes, we are having performance issues with the load jobs and hence we are looking into every possible way to tune it. The whole process up until the last stage where insert happens breezes through and then insert takes anywhere between 2 mins to 11 mins randomly ..

chulett · Post by **chulett** » Fri May 11, 2018 4:25 pm

I would hazard a guess that the load on the database, whatever other processes are running on it or just the server itself, are what are driving your variances. That and probably volumes since I don't image each run is always 50K records. Another factor could simply be the number of indexes on the table. Or network congestion. Lots of factors could be at play here.

Just from personal experience I would caution against going down the bulk load path for such a small number of rows considering the (mostly single digit) load times you are seeing. I don't believe the level of complexity it would add would be worth whatever performance gains you may or may not see. All IMHO, of course.

UCDI · Post by **UCDI** » Tue May 15, 2018 10:05 am

have you done stats and index on the target? Inserts can go long if the stats are a mess, or if the table isnt indexed well for the job.

bulk works well too, if you go that route. Hardware and systems vary so its hard to say where to do a cutoff of inserts vs bulk mode. What seems few to a big system can be overwhelming to a tiny system. Try both, see what you get.

Post by **rschirm** » Sun May 20, 2018 3:26 pm

I would try the following.

Calculate what the width is for each row. (so add up all the lengths of the columns) Take that value and divide it into 64,000. Round the value down to the next whole number. Use this number for the Array Size. Make the Record Count to be a multiple of the Array Size so that number is up around 5000.

Please post back with the stats of this change.