Comments

Using SqlBulkCopy for fast inserts

Problem

You need to insert large number of records into one or more tables in a SQL Server database. By large, I mean several hundred thousands or even millions. The source of the data can be another database, an Excel spreadsheet, CSV files, XML and literally anything. Writing one record at a time using a SqlCommand along with a INSERT INTO statement is very costly and slow. You need an efficient solution to insert large number of records quickly.

Solution

In .NET, we have a class called SqlBulkCopy to achieve this. You give it a DataTable, or an array of DataRows, or simply an instance of a class that implements IDataReader interface. Next, you tell it the name of the destination table in the database, and let it do its magic.

I used this in one of my projects were I had to migrate data from one of our legacy models into a new model. First, I used a plain SqlCommand to insert one record at a time (totally about 400,000 records). The process took 12 minutes to run. With SqlBulkCopy, I reduced the data migration time to 6 seconds!

How to use it?

Here is the most basic way to use SqlBulkCopy. Read the code first and then I’ll explain it line by line.

var dt = new DataTable();
dt.Columns.Add("EmployeeID");
dt.Columns.Add("Name"); 

for (var i = 1; i < 1000000; i++)    
    dt.Rows.Add(i + 1, "Name " + i + 1);

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

In this example I’m assuming we have a table in the database called Employees with two columns: EmployeeID and Name. I’m also assuming that the EmployeeID column is marked as IDENTITY.

In the first part, we simply create a DataTable that resembles the structure of the target table. That’s the reason this DataTable has two columns. So, to keep things simple, I’m assuming that the source DataTable and the target table in the database have identical schema. Later in this post I’ll show you how to use mappings if your source DataTable has a different schema.

The second part is purely for demonstration. We populate this DataTable with a million record. In your project, you get data from somewhere and put it into a DataTable. Or you might use an IDataReader for more efficient reads.

And finally, we create a SqlBulkCopy and use it to write the content of our DataTable to the Employees table in the database.

Pretty simple, right? Let’s take this to the next level.

Using the identity values from the source

In the above example, I assumed that the EmployeeID column is marked as IDENTITY, hence the values are generated by the Employees table. What if you need to use the identity values in the source? It’s pretty simple. You need to use the KeepIdentity option when instantiating your SqlBulkCopy.

using (var sqlBulk = new SqlBulkCopy(_connectionString, SqlBulkCopyOptions.KeepIdentity))

With this option, the EmployeeID in our DataTable will be used.

Transactions

You can wrap all your inserts in a transaction, so either all will succeed or all will fail. This way you won’t leave your database in an inconsistent state. To use a transaction, you need to use a different constructor of SqlBulkCopy that takes a SqlConnection, options (as above) and a SqlTransaction object.

using (var connection = new SqlConnection(_connectionString))
{
    connection.Open();

    var transaction = connection.BeginTransaction();

    using (var sqlBulk = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepIdentity, transaction))
    {
        sqlBulk.DestinationTableName = "Employees";
        sqlBulk.WriteToServer(dt);
    }
}

Note that here we have to explicitly create a SqlConnection object in order to create a SqlTransaction. That’s why this example is slightly more complicated than the previous ones where we simply passed a string (connection string) to SqlBulkCopy. So here we need to manually create the connection, open it, create a transaction, and then pass the connection and the transaction objects to our SqlBulkCopy.

Batch size

By default, all the records in the source will be written to the target table in one batch. This means, as the number of records in the source increases, the memory consumed by SqlBulkCopy will increase. If you have memory limitations, you can reduce the number of records written in each batch. This way, SqlBulkCopy will write smaller batches to the database, hence it will consume less memory. Since there are multiple conversations with the database, this will have a negative impact on the performance. So, based on your circumstances, you may need to try a few different batch sizes and find a number that works for you.

To set the batch size:

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.BatchSize = 5000;
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

This example writes 5000 records in each batch.

Notifications

You might need to write a message in the console or a log as records are written to the database. SqlBulkCopy supports notifications. So it has an event to which you can subscribe. You set the number of records to be processed before a notification event is generated.

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.NotifyAfter = 1000;
    sqlBulk.SqlRowsCopied += (sender, eventArgs) => Console.WriteLine("Wrote " + eventArgs.RowsCopied + " records.");
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

In this example, once every 1000 records are processed, we get notified and display a message on the Console. Note that SqlRowsCopied is the event name and here we use a lambda expression to create an anonymous method as the event handler. If you’re not familiar with lambda expressions, delegates and event handlers, check out my C# Advanced course.

Column mappings

In all the examples so far, I assumed our DataTable has the exact same schema as the target table. What if the “Name” column in the source is actually called “FullName”. Here is how we can create a mapping between the columns in the source and target table.

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.ColumnMappings.Add("FullName", "Name");
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

How about multiple tables?

So far we’ve only inserted records into one table: the Employees table. What if we wanted to populate Employees and their Timesheet? With our DataTables as the data source, we can have two data tables, one for Employees, one for Timesheets. Then, we use our SqlBulkCopy to populate one table at a time:

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dtEmployees);

    sqlBulk.DestinationTableName = "Timesheets";
    sqlBulk.WriteToServer(dtTimesheets);
}

 

I hope you enjoyed this post and learned something new. If you enjoy my teaching style and like to learn more from me, subscribe to my blog. Also, check out my courses for more substantial learning.

 

Hi! My name is Mosh Hamedani. I’m a software engineer with two decades of experience and I’ve taught over three million people how to code or how to become professional software engineers through my YouTube channel and online courses. It’s my mission to make software engineering accessible to everyone.
Tags: , , ,

29 responses to “Using SqlBulkCopy for fast inserts”

  1. Jon says:

    Great article Mosh!

    One thing to mention though are the downsides of using Sql Bulk commands, the major one being that you can leave your database in an inconsistent state if you’re not careful.

    Sql Bulk Copy by default disables foreign keys and triggers:
    http://stackoverflow.com/questions/22092091/how-does-sqlbulkcopy-circumnavigate-foreign-key-constraints and marks the potentially violated constraint on the table as is_not_trusted in the sys.check_constraints metadata table.

    To make it check constraints and fire triggers you can use the SqlBulkCopyOptions enum with the following options: SqlBulkCopyOptions.FireTriggers | SqlBulkCopyOptions.CheckConstraints.

    Be aware this has a large impact on performance but it will maintain the consistency of the database.

    • admin says:

      Excellent tip! Thanks Jon 🙂

    • Markus says:

      Jon, how and why exactly does it have a large impact on performance? If the trigger gets fired after the inserts once and does it’s work, sure, some things may have to be done, but is it correct to state large impact on performance in general?

      • Jonathan van Doore says:

        Markus,

        Triggers actually execute on every row, not just once for the entire operation. This is because there may be logic within the trigger that modifies the row being inserted or updates a value within the current row from elsewhere, especially in the case of INSERT INSTEAD OF triggers (https://technet.microsoft.com/en-us/library/ms187640(v=sql.105).aspx). This obviously has a performance impact as for every row it’ll be performing more work.

        Checking foreign keys also impacts performance as for each row inserted sql has to check the referential integrity of the insert, which will increase the time in proportion to the number of FKs you have on the table in question (https://www.brentozar.com/archive/2015/05/do-foreign-keys-matter-for-insert-speed).

        Both of these concerns are minor when you’re dealing with a few thousand rows, it’ll still be fast. But it is something to be mindful of when considering the real use case of BulkInsert which will cater for scenarios with millions of rows.

  2. Jaspal says:

    Very well explained – Well done Mosh and thanks!

  3. luis says:

    Hello Mosh,

    I really appreciate the post ‘Using SqlBulkCopy for fast inserts’. It’s very usefull. Thank you 😉

    I have two doubts regarding them, if you can help.

    1) It’s possible, if i update one table with identity column receive the ID? (for example to use in child tables)

    2) using SqlBulkCopy, it’s possible that the system do an update, instead of insert in applicable cases?

    Thank you and best regards

    • Anastasios Markos says:

      Unfortunately i haven’t have found a direct way of doing this either.

      One way, is to use a GUID column as an identifier of your bulkCopy and go back and select the inserted ids where your guid column.

  4. Michael says:

    Hello Mosh,

    I have your Angular 2 for Beginners @ Udemy. I appreciate your teaching style.

    Here, I am trying to combine transactions with updating multiple tables.. So far its failing silently. If I comment out the connection.BeginTransaction() and SqlBulkCopy parameter, the data shows up. My code is the following and any input is appreciated.

    using (var connection = new SqlConnection(new _DataDbContext().Database.Connection.ConnectionString))
    {
    connection.Open();

    var transaction = connection.BeginTransaction();

    foreach (var table in DataSetToUpload)
    {

    using (var sqlBulk = new SqlBulkCopy(connection, SqlBulkCopyOptions.Default, transaction))
    {
    sqlBulk.ColumnMappings.Clear();
    foreach (DataColumn clmn in table.Value.Columns)
    {
    sqlBulk.ColumnMappings.Add(clmn.ColumnName, clmn.ColumnName);
    }

    sqlBulk.BatchSize = 10000;
    sqlBulk.DestinationTableName = table.Key;
    try
    {
    await sqlBulk.WriteToServerAsync(table.Value);
    }
    catch (InvalidOperationException ex)
    { var x = ex; }
    catch (SqlException ex)
    { var x = ex; }
    }
    }
    }

  5. Ali Göl says:

    the bad thing about this class is: if you have a data like DateTime.Min and you have it on your datatable in gives an error. but it doesnt really give you where the problem is and what the data is. I’ve been debugging the SqlBulkCopy class to find out where the problem is. And suddenly programmer comment on some line made me laugh:
    ———


    // this is important
    foo.blabla().DoSomething();

    • Don Carroll says:

      hello Ali Gol,
      Yes I know that I’m responding over 2 years later but I just came across this posting by Mosh today on 1/7/2020. Anyway, there might be an answer to your problem in the SQLBulkCopy Microsoft docs in the Remarks section at this Url: https://docs.microsoft.com/en-us/dotnet/api/system.data.sqlclient.sqlbulkcopy?view=netframework-4.8

      It says this: SqlBulkCopy will fail when bulk loading a DataTable column of type SqlDateTime into a SQL Server column whose type is one of the date/time types added in SQL Server 2008.

      I think this is why you got an error for your data like DateTime.Min.

      Sorry for the late response. I just found out about SQLBulkCopy today for the very first time ever. Previously I was passing large amounts by using a datatable as stored proc parameter to User-Defined Table Type parameter and then doing a Select Into From style syntax to bulk insert data from another source using C#.Net.

  6. mike2023910 says:

    Why not using this for every sql insert then?

  7. EZT says:

    Finally, a quick and easy explanation of SQL Bulk insert that covers the basics. Thank-you

  8. Julian says:

    Hi Mosh

    How would i do this for all the tables in the database, I have over a thousand tables in the database , I need to move the data out to another clean database as the current one has some IO consistency errors.

  9. Patrick Johnson says:

    using System.Data.SqlClient;

  10. Narendra Kumar says:

    How abt my columns will vary, need to put service method which accepts any datatable in return it should insert everytime with dynamic columns. may present may not?

  11. Shamshad Ali says:

    Is there a similar method to get data out from SQL to generate .csv files, and get the status like:

    sqlBulk.SqlRowsCopied += (sender, eventArgs) => Console.WriteLine(“Wrote ” + eventArgs.RowsCopied + ” records.”);
    sqlBulk.DestinationTableName = “Employees”;
    sqlBulk.WriteToServer(dt);
    }

    We are using difference db environment [Oracle, SQL Server, DB2, MySQL] on multiple Servers, I want to Export data from all these environment on some defined schedule on a target destination as .csv files. and get the status as like the above code explains.

    How do we do that ?

  12. Richard LLL says:

    Thanks for this tutorial. It’s clear, concise and very helpful

  13. Dave Ocean says:

    Thanks Mosh,

    Came looking for a way of handling the identity columns – just needed the KeepIdentity option.

    Apart from using tables as the source of the bulk copy, try using datareader. I use it all the time for memory resident data structures. When tables in memory are too slow (even indexed) and you need faster access / processing of data , you end up with custom structures. Finally put these out via bulk copy and a data reader to read through the data structure. Tens of millions of records can be pushed out very quickly with this method.

  14. Fish says:

    Another question, where there are two or more tables within one SqlBulkCopy block, how to set BatchSize when these tables have different table schema.

  15. Meir Krudo says:

    Thank you Mosh!!!
    as always great & clear explanation!
    you just saved my client 20 minutes…. 🙂

  16. Ali says:

    Thank you MOSH, your Bulk Insert method saved my life 🙂

  17. Aquim Elifá says:

    I do it with vb.net, That’s the best, I populated 5,000 rows (10 columns) very fast.

  18. Pawan Nayak says:

    I am vb.net developer (new). While searching for VB.Net SQL Bulk copy options this article showed up. Not understand what does this means “SqlBulkCopy(_connectionString))”. Is it a connection string parameter to the database or a language implementation. What’s the equivalent to this in VB.Net? Hope you might have written something similar for VB.Net developers as well.

  19. Ashwini says:

    Thanks Mosh for this post

  20. Ahsan Fraz says:

    Great Mosh! really you are the mentor not only me but million of programmer that who earn because you play a massive role for it, well I also written a program for importing excel file I use datareader and Insert with while loop, I didnt se datatable because once I try to fill then memory exception thrown so I couldnt be able to fetch half a million data and 80 columns on datatable second and most important issue while using bulkCopy as a Oracle perspective it belong to the non managed data access client so when I use managed data access then itis the issue second thing is that how I validate those record which I want toenter that they already persist or not or I don’t want to use auto generated Id by back end Iv want to generate sequence so in that scenario How could we use bulkCopy

  21. Kevin Knaus says:

    bulk copy is rather tedious to define all the column mappings for an entire set of columns in a large table among many large tables. I expected the Add option to allow a list of source cols and then list of target cols that would then be positionally mapped. but it appears you have to add each column to the mapping one at a time, the example here implied that only one mapping to pick up the column that was named differently FullName to Name, is required and that the other columns of the Employee table would come along nicely. But wouldn’t your example create just one column being transferred source to target?
    to make this sort of script useful would be to pass a list of tables, and have it derive which columns can be in the selection and insert (like not identities or temporal columns that you don’t want to try and insert values for .. as that insert will fail if triggers are enabled.

    Is there a syntax that allows the whole column lists for source and target to be specified in one statement using 2 variables for the column lists?
    Or can I put more than one column pair in the Add( ) ?
    thanks

  22. pururaba says:

    Hello Mosh ! It’s a well explained article. But, I have to ask you about the exception handling method of this bulk copy. As far as I know you can’t log the row wise exception with this. If there is any idea, I am eagerly waiting to read. Thanks

Leave a Reply

Connect with Me
  • Categories
  • Popular Posts