Comments

Using SqlBulkCopy for fast inserts

Problem

You need to insert large number of records into one or more tables in a SQL Server database. By large, I mean several hundred thousands or even millions. The source of the data can be another database, an Excel spreadsheet, CSV files, XML and literally anything. Writing one record at a time using a SqlCommand along with a INSERT INTO statement is very costly and slow. You need an efficient solution to insert large number of records quickly.

Solution

In .NET, we have a class called SqlBulkCopy to achieve this. You give it a DataTable, or an array of DataRows, or simply an instance of a class that implements IDataReader interface. Next, you tell it the name of the destination table in the database, and let it do its magic.

I used this in one of my projects were I had to migrate data from one of our legacy models into a new model. First, I used a plain SqlCommand to insert one record at a time (totally about 400,000 records). The process took 12 minutes to run. With SqlBulkCopy, I reduced the data migration time to 6 seconds!

How to use it?

Here is the most basic way to use SqlBulkCopy. Read the code first and then I’ll explain it line by line.

var dt = new DataTable();
dt.Columns.Add("EmployeeID");
dt.Columns.Add("Name"); 

for (var i = 1; i < 1000000; i++)    
    dt.Rows.Add(i + 1, "Name " + i + 1);

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

In this example I’m assuming we have a table in the database called Employees with two columns: EmployeeID and Name. I’m also assuming that the EmployeeID column is marked as IDENTITY.

In the first part, we simply create a DataTable that resembles the structure of the target table. That’s the reason this DataTable has two columns. So, to keep things simple, I’m assuming that the source DataTable and the target table in the database have identical schema. Later in this post I’ll show you how to use mappings if your source DataTable has a different schema.

The second part is purely for demonstration. We populate this DataTable with a million record. In your project, you get data from somewhere and put it into a DataTable. Or you might use an IDataReader for more efficient reads.

And finally, we create a SqlBulkCopy and use it to write the content of our DataTable to the Employees table in the database.

Pretty simple, right? Let’s take this to the next level.

Using the identity values from the source

In the above example, I assumed that the EmployeeID column is marked as IDENTITY, hence the values are generated by the Employees table. What if you need to use the identity values in the source? It’s pretty simple. You need to use the KeepIdentity option when instantiating your SqlBulkCopy.

using (var sqlBulk = new SqlBulkCopy(_connectionString, SqlBulkCopyOptions.KeepIdentity))

With this option, the EmployeeID in our DataTable will be used.

Transactions

You can wrap all your inserts in a transaction, so either all will succeed or all will fail. This way you won’t leave your database in an inconsistent state. To use a transaction, you need to use a different constructor of SqlBulkCopy that takes a SqlConnection, options (as above) and a SqlTransaction object.

using (var connection = new SqlConnection(_connectionString))
{
    connection.Open();

    var transaction = connection.BeginTransaction();

    using (var sqlBulk = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepIdentity, transaction))
    {
        sqlBulk.DestinationTableName = "Employees";
        sqlBulk.WriteToServer(dt);
    }
}

Note that here we have to explicitly create a SqlConnection object in order to create a SqlTransaction. That’s why this example is slightly more complicated than the previous ones where we simply passed a string (connection string) to SqlBulkCopy. So here we need to manually create the connection, open it, create a transaction, and then pass the connection and the transaction objects to our SqlBulkCopy.

Batch size

By default, all the records in the source will be written to the target table in one batch. This means, as the number of records in the source increases, the memory consumed by SqlBulkCopy will increase. If you have memory limitations, you can reduce the number of records written in each batch. This way, SqlBulkCopy will write smaller batches to the database, hence it will consume less memory. Since there are multiple conversations with the database, this will have a negative impact on the performance. So, based on your circumstances, you may need to try a few different batch sizes and find a number that works for you.

To set the batch size:

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.BatchSize = 5000;
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

This example writes 5000 records in each batch.

Notifications

You might need to write a message in the console or a log as records are written to the database. SqlBulkCopy supports notifications. So it has an event to which you can subscribe. You set the number of records to be processed before a notification event is generated.

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.NotifyAfter = 1000;
    sqlBulk.SqlRowsCopied += (sender, eventArgs) => Console.WriteLine("Wrote " + eventArgs.RowsCopied + " records.");
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

In this example, once every 1000 records are processed, we get notified and display a message on the Console. Note that SqlRowsCopied is the event name and here we use a lambda expression to create an anonymous method as the event handler. If you’re not familiar with lambda expressions, delegates and event handlers, check out my C# Advanced course.

Column mappings

In all the examples so far, I assumed our DataTable has the exact same schema as the target table. What if the “Name” column in the source is actually called “FullName”. Here is how we can create a mapping between the columns in the source and target table.

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.ColumnMappings.Add("FullName", "Name");
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

How about multiple tables?

So far we’ve only inserted records into one table: the Employees table. What if we wanted to populate Employees and their Timesheet? With our DataTables as the data source, we can have two data tables, one for Employees, one for Timesheets. Then, we use our SqlBulkCopy to populate one table at a time:

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dtEmployees);

    sqlBulk.DestinationTableName = "Timesheets";
    sqlBulk.WriteToServer(dtTimesheets);
}

 

I hope you enjoyed this post and learned something new. If you enjoy my teaching style and like to learn more from me, subscribe to my blog. Also, check out my courses for more substantial learning.

 

Tags: , , ,
Comments

Do Microsoft certificates make you a better developer?

I was having a chat with one of my students on Udemy. He mentioned that he took one of Microsoft tests but he didn’t pass. He heard from a friend that many of the questions were tricky and were not indicative of a how good a programmer one was.

I can’t agree more! Ten years ago, it was a big deal to have Microsoft exams. It was kind of a new concept and as a 20-something year old, I thought I was so excited to get all these certificates! I passed many exams and became a Microsoft Certified Technology Specialist (MCTS), Application Developer (MCAD) and Professional (MCP). But do Microsoft certificates really make one a good programmer?

First, we need to define what a good programmer is. Is there an official way to define a “good programmer”? Scott Hanselman has a list of what makes good .NET developers. But with all respect for Scott, in my opinion, that is purely his personal opinion. This is very subjective and if you ask ten developers the same question, each would probably come up with a different list of criteria! Many of the criteria in his list really depend on the kind of applications a programmer has been exposed to.

But what is a good programmer? In my humble opinion:

  • A good programmer spends enough time to really understand the problem he is solving. Einstein said: “If I’m given 60 minutes to solve a problem, I’ll spend 55 minutes understanding it and 5 minutes solving it.” I can’t agree more! In my 14 years of programming career, I’ve seen several examples of programmers solving the wrong problem. Sometimes the problem they themselves liked to solve, not the problem the business wanted them solved! I’ve seen developers debating for months whether they should use SOA, BPMS, REST, RCP, etc and no one really knew what the actual business problem was!
  • A good programmer solves a problem with a simple elegant solution without over-engineering. The old saying: keep it simple, stupid! Leonardo Da Vinci said: “Simplicity is the ultimate sophistication.”
  • A good programmer does enough design without modelling the entire universe. It’s easy to spend several months drawing 100+ pages of UML diagrams but not producing working software!
  • A good programmer writes clean, maintainable code that tells a story. That means small (10 lines or less), perfectly named methods, each responsible for one thing. Small classes with high cohesion and lose coupling. And no hacks!
  • A good programmer cares about the long-term maintainability of software rather than a quick fix: Sometimes we just want a quick fix. But all these quick fixes or patches in long term lead to rotten software. That kind of software needs to be re-written from scratch. Isn’t it better to spend a bit more time coming up with a clean elegant solution that having to re-write the entire software?
  • A good programmer writes tests before writing production code: this one is subjective and you may disagree with me. Some hate TDD, some love it. When I first learned about TDD, I loved the philosophy of it. Then I tried to apply it and didn’t have much success. So I hated it. A while later, I tried again and I got the hang of it. Then I fell in love with it. These days, I’m neither a TDD evangelist nor an anti-TDD. I take the middle ground. TDD is great for two reasons: you write enough code to make sure the actual problem is solved. Nothing more, nothing less. Also, you end up with tests that don’t lie. If you write tests after writing the production code, it’s possible that sometimes your tests pass while the production code is not working. These tests lie! They’re not trustworthy. And it’s better to delete them than write them in the first place. TDD helps you write tests that tell the truth at all times! But is TDD good at all times? I personally prefer to use TDD as much as I can. But if I find situations where it is affecting my productivity and wasting my time, I’m happy to break the rules.
  • A good programmer thinks of edge cases: The code he writes doesn’t blow up in rare circumstances.

So, that is my personal definition of what makes one a good programmer. Being a good programmer shouldn’t be tied to .NET or any specific platforms. Good programmers can write good code on any platform. There are a million little details in .NET that you may need to know (if ever) once in a blue moon! Same goes for all other platforms out there. And interestingly, we’re getting more programming languages, frameworks and libraries every single day. Who can know all these languages in full detail these days anyway?

Comments

Hierarchical Views with Backbone

One of the common questions amongst developers starting with Backbonejs is: how should I implement hierarchical (master-detail / parent-child) views with Backbone? So, in this post I’m going to present a clean solution that gets you started.

If you want to jump to the final solution, here is the JSBin link:

http://jsbin.com/sozaxejoji/1/edit?js

If you prefer a step-by-step approach, it’s best to have a look at the link above to have an idea what we’ll be building.

Ok now, let’s get started.

Step 1: Add the containers in the HTML

We need two separate containers in our HTML: one for the master view, the other for the child view.

 <div id="container">
    <div id="masters" class="region"></div>
    <div id="detail" class="region"></div>
  </div>

We use the “masters” container to render a list of master items. When the user clicks on a master, the “detail” container will be refreshed showing the details for that master.

Step 2: Add a bit of CSS

To make our containers more visible, let’s add the following CSS:

.region {
  width: 50%;
  float: left;
  border: 1px solid #ccc;
  box-sizing: border-box;
  padding: 30px;
}

Step 3: Render the list of masters

To render the list of master items, we need a model, a view for rendering each model, a collection and a view to render the collection.

First, let’s start by creating a namespace for our app.

var App = App || {};

From this point, any model, collection or view we define, will hang off this App object, rather than the global window object.

Now, let’s define a model and a view for our masters:

App.Master = Backbone.Model.extend();

App.MasterView = Backbone.View.extend({
  render: function(){
    this.$el.html("<a href='#'>" + this.model.get("name") + "</a>");
    
    return this;
  }
});

Next, to render a list of masters, we need a collection and a collection view:

App.Masters = Backbone.Collection.extend({
  Model: App.Master
});

App.MastersView = Backbone.View.extend({
  render: function(){
    this.collection.each(function(p){
      var masterView = new App.MasterView({ model: p });
      this.$el.append(masterView.render().$el);
    }, this);
    
    return this; 
  }
});

Nothing fancy yet. In the MastersView, we simply iterate the collection, encapsulate each model inside a MasterView, rendering the view and appending it to the collection’s DOM element.

Finally, to wire all these together:

$(document).ready(function(){
  var masters = new App.Masters([
    new App.Master({ name: "Master 1" }),
    new App.Master({ name: "Master 2" })
  ]);
  
  var mastersView = new App.MastersView({ collection: masters });
  $("#masters").html(mastersView.render().$el); 
});

So at this point we should be able to see a list of master items in the container on the left.

Step 4: Handle the click event of masters

Now, we’re going to take our code one step further and handle the click event of master items. At this point we don’t care about showing the details in the detail container. All we want to do is to handle the click event and make sure the plumbing code is working. Remember: baby steps, always!

So, we change our MasterView to the following:

App.MasterView = Backbone.View.extend({
  events: {
    "click": "onClick"
  },
  
  onClick: function(){
     alert(this.model.get("name"));
  },

  render: function(){
    this.$el.html("<a href='#'>" + this.model.get("name") + "</a>");
    
    return this;
  }
});

Note that all I did here was adding the events hash and the onClick handler. Run the application and make sure that when you click a master item, you see the alert.

All good? Ok, let’s go to the next step.

Step 5: Display the details when a master is clicked

We’re almost there! All we need to do now is to display the details in the detail container, instead of showing an alert. So how can we do this? We use events for that. Here is the idea: The master view publishes an event to which the detail view is listening. This is how you can collaborate between various views without coupling them to each other.

To get this to work we need an event aggregator or event bus. It’s best to define it right after declaring the App namespace.

App.eventBus = _.extend({}, Backbone.Events);

The eventBus object here derives from Backbone.Events, which provides the base functionality for publishing and subscribing to events.

Now, we need to change our MasterView to publish an event instead of displaying an alert:

  onClick: function(){
    App.eventBus.trigger("master:select", this.model);
  }

The first argument to the trigger method is the name of the event. You can use any name you would like, but it’s a good idea to use namespaces (note the colon). In this case, “master:select” simply means master is selected.

The second argument to this method is the data we would like to publish. The subscriber will then receive this data with the event.

Now, who is the subscriber? The detail view, which we haven’t implemented yet. So here is our DetailView:

App.DetailView = Backbone.View.extend({
  initialize: function(){
    App.eventBus.on("master:select", this.onMasterSelected, this);
  },
  
  onMasterSelected: function(master){
    this.model = master;
    this.render();
  },
  
  render: function(){
    if (!this.model) {
      this.$el.html("Please select an item from the master list.");
    } else {
      this.$el.html(this.model.get("name"));
    }
     
    return this;
  }
});

Let’s see what is happening here. In the initialise method, we use the on() method to subscribe to the “master:select” event. The second argument to this method is the name of our event handler (onMasterSelected in this case). The third argument is the context. When the handler (onMasterSelected) is called, by default, “this” has a different meaning. We want “this” in that method to refer to the view itself. That’s why we pass “this” (a pointer to the view itself) as the third argument to this method.

The onMasterSelected method has an argument, which is the data that we published with the event. In this case, it’s going to be a master model. So we can keep a reference to this object in our view’s model.

Finally, in the render() method we have a simple conditional statement to display a default message if no master is selected, or the name of the currently selected master. Rather than just displaying the name, you may want to display a whole heap of data. Or you may want to fetch another model or collection based on that and populate the detail view. It’s up to you and the project you’re working on.

One last step: we defined the DetailView but we never rendered it! So, in our (document).ready(), after reading masters, we need to render the DetailView as well:

  var detailView = new App.DetailView();
  $("#detail").html(detailView.render().$el);
});

And that brings us to the end of this post! You can see the completed solution at:

http://jsbin.com/sozaxejoji/1/edit?js

Tags:
Comments

Tell, Don’t Ask, the Pragmatic Way

Tell, Don’t Ask is a design guideline that helps us adhere to the encapsulation principle of object-oriented programming: data and functions that operate on the data belong to the same class. Dismissing this guideline often leads to an anaemic domain model like the following:

public class Account
{
    public int Id { get; set; }
    public float Balance { get; set; }
}

And the domain logic appearing in a service class like here:

public class AccountService
{
    public void Widthraw(int accountId, float amount)
    {
        var account = _repository.GetById(accountId);

        if (account.Balance < amount)
            throw new InvalidOperationException();

        account.Balance -= amount;

        _repository.SaveChanges();
    }
}

While this is not a serious problem in this example, often, in the real-world, such service classes quickly become fat, unmaintainable and hard to unit test. You’ll also often find the same logic appearing in various methods of a service class or even across different service classes.

In this example, checking the balance of an account against the given amount and updating the balance is the responsibility of the Account class itself, not AccountService.

Tell, Don’t ask suggests: Tell your objects what to do, don’t ask them questions. With Tell, Don’t Ask, this example can be modified to something like the following:

public class Account
{
    public int Id { get; set; }
    private float _balance;

    public void Withdraw(float amount)
    {
        if (_balance < amount)
            throw new InvalidOperationException();

        _balance -= amount;
    }
}

public class AccountService
{
    public void Widthraw(int accountId, float amount)
    {
        var account = _repository.GetById(accountId);
        account.Withdraw(amount);
        _repository.SaveChanges();
    }
}

Note that I’ve made the balance a private field of the Account class (as opposed to a public property with a getter/setter). So, as you see, with Tell, Don’t Ask, our objects become more about “behaviors”, as opposed to property bags. Our service layer also becomes more slim. It’s purely responsible for orchestration, which is the actual responsibility of a service. It fetches a domain object from the repository, invokes some operation on it, and then persists it.

While this principle looks great on the surface, there are situations where it becomes impractical. For example, if you’re using Entity Framework as your O/RM, you should probably know that Entity Framework cannot map a private field to a column in the database. Even if you’re not using Entity Framework and you manually persist or hydrate your domain objects from the database, you’ll still have difficulty working with a private field in this case.

So, here is one pragmatic way to put Tell, Don’t Ask into practice:

Having a public property with a getter is not necessarily a bad thing. Sometimes (at least for display reasons) we certainly need to get the value of a field and display it. It is the public setter that is the evil. When we expose a public setter, the client of that class can set any value, and this causes two issues:

1- The client (e.g. the service class) becomes responsible for controlling the business logic.

2- If the client forgets to check the business logic, the object goes into an invalid state and will eventually gets persisted. This is how we create bugs in our programs!

So, with a public property with a public getter and a private setter, we can prevent these two issues from happening, yet have the flexibility of using an O/RM. We will still adhere to Tell, Don’t Ask (partially) because we’ll tell our object what to do.

Our code will eventually look like this:

public class Account
{
    public int Id { get; set; }
    public float Balance { get; private set; }

    public void Withdraw(float amount)
    {
        if (Balance < amount)
            throw new InvalidOperationException();

        Balance -= amount;
    }
}

public class AccountService
{
    public void Widthraw(int accountId, float amount)
    {
        var account = _repository.GetById(accountId);
        account.Withdraw(amount);
        _repository.SaveChanges();
    }
}
Comments

Hello world!

Yes, a true Hello World! As much as this title sounds cliche, it’s so true about me. For years, I’ve wanted to have a blog, but always something got in the way and prevented me from doing this. Finally, I’ve made the commitment to keep this space updated on a regular basis.

I’m gonna keep my posts, short and pragmatic with useful content.

Hope you enjoy this blog! :)