Array Size, Record Count, Commit Frequency, Bulk Insert

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Array Size, Record Count, Commit Frequency, Bulk Insert

Post by jreddy »

Looking for some guidance on how we can set/use the transaction handling options in ODBC Connector (or other SQL Server stages) to get the best performance when inserting/updating data to SQL Server Database.

For a job that inserts 50K rows for example, I have set the Array Size and Record Count as 2000 in the SQL Server Enterprise stage for the INSERT, but have seen that on the DB side, I see 50k insert statements issued

Tried changing array size and record count to 20K each, but still see the same on the DB side.

What option(s) could be set so we can reduce the number of insert statements. I was going to try the SQL Server Bulk stage and ODBC connector stage and try the same, but wanted to see if anyone was able to gain performance improvements by playing with these options and also commit frequencies, for example..

Any guidance is appreciated, thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

None of those settings will affect the number of inserts issued. If you've got 50k rows to process, 50k insert statements will be issued. Array Size controls how many are "bundled together" to be sent to the database at any given time. And Record Count, doesn't that control how often a commit is performed?
-craig

"You can never have too many knives" -- Logan Nine Fingers
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Thank you Craig.

Our DBA has pointed me to these 50k inserts getting issued 1 by 1 and has asked me to see if there is a way to group them for better efficiencies, so it doesn't hit the database that many times.. is that something we can do with a bulk load?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not sure why there would be an issue with that number of inserts. Sure they get processed 'one by one' but that's not a volume where I would think it warranted to switch to bulk loads. Are they taking an inordinate amount of time and that's why the DBA is involved?

Bulk inserts are a different beast. Not familiar with SQL Server and all of its nuances but have dealt with this in Oracle. They use a different API, bypass a lot of the things a "conventional" insert needs to deal with and there's no longer a concept of "commits". For Oracle, you cannot have any indexes on the table. And then it all processes successfully or it fails and when there's a bulk load failure it can leave your table in a bit of a compromised state. That's why we save them for Loads of Unusual Size - say for example when our normal daily load might be 1 to 1.5 million (which we still load conventionally) expands up to 100 to 150 million like it does once a year.

And from the Oracle side I could also write a PL/SQL procedure to generate and bulk collect the inserts and let the database handle them in large chunks that way. Been a heck of a long time since I've had a situation where that looked like that would be a viable solution for. Never mind it's not an ETL solution, more of a PITA solution.

Not sure how much of that helps. I'm just wondering if you are trying to solve something that really isn't a problem, per se. :(
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Sounds quite normal.
Choose a job you love, and you will never have to work a day in your life. - Confucius
jreddy
Premium Member
Premium Member
Posts: 202
Joined: Tue Feb 03, 2004 5:09 pm

Post by jreddy »

Thank you Craig
Yes, we are having performance issues with the load jobs and hence we are looking into every possible way to tune it. The whole process up until the last stage where insert happens breezes through and then insert takes anywhere between 2 mins to 11 mins randomly ..
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I would hazard a guess that the load on the database, whatever other processes are running on it or just the server itself, are what are driving your variances. That and probably volumes since I don't image each run is always 50K records. Another factor could simply be the number of indexes on the table. Or network congestion. Lots of factors could be at play here.

Just from personal experience I would caution against going down the bulk load path for such a small number of rows considering the (mostly single digit) load times you are seeing. I don't believe the level of complexity it would add would be worth whatever performance gains you may or may not see. All IMHO, of course.
Last edited by chulett on Sun May 20, 2018 4:40 pm, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

have you done stats and index on the target? Inserts can go long if the stats are a mess, or if the table isnt indexed well for the job.

bulk works well too, if you go that route. Hardware and systems vary so its hard to say where to do a cutoff of inserts vs bulk mode. What seems few to a big system can be overwhelming to a tiny system. Try both, see what you get.
rschirm
Premium Member
Premium Member
Posts: 27
Joined: Fri Dec 13, 2002 2:53 pm

Post by rschirm »

I would try the following.

Calculate what the width is for each row. (so add up all the lengths of the columns) Take that value and divide it into 64,000. Round the value down to the next whole number. Use this number for the Array Size. Make the Record Count to be a multiple of the Array Size so that number is up around 5000.



Please post back with the stats of this change.
Post Reply