Comments

Critical stuff that every junior C# developer must know

A common question I often get from my students at Udemy is:

Mosh, I just got my first junior level C# job. What advice do you have for me? What are some critical stuff I need to learn?

So, whether you’re looking for your first junior C# job, or you just got one, this post will give you an overview of the kind of skills that you need to be familiar with as a junior C# developer. I’ve tried to put it in a “learning path” that would give you direction, whether you want to build web or desktop applications.

Before getting into details, I need to clarify something: as a junior, you’re not expected to know everything! No one does, even many senior developers! The world of programming is so big and it’s constantly getting bigger. So, every developer has strengths in some specific areas based on the projects they have worked on.

For each skill, I’ve added one or more links to good resources I have found. If you know better resources, please let me know and I’ll update the post.

Core Skills

Whether you want to focus on building desktop or windows apps, here are a few key things that you must know.

path2.001

Data Structures and Algorithms

If you don’t have a computer science degree, I strongly recommend you to spend only one month and study data structures and algorithms. These are the alphabets of programming. Sure you can skip this and jump straight into web development stuff, but trust me, there is a difference between a programmer who has been exposed to data structures and algorithms and one who hasn’t. This stuff help you think like a programmer.

You may be surprised that most big companies like Microsoft, Apple and Amazon dedicate a significant part (if not all) of their technical interviews to data structures and algorithms, not ASP.NET 5 or WPF! Because they just want to see if you can think like a programmer or not.

This is a good book to get you started:
Data Structures and Algorithms Made Easy

If you get stuck on some parts, don’t get discouraged. Move on. Just make sure you understand the basics of lists, stacks, queues, trees and hashtables. Try to implement each of these data structures using plain C# (arrays, loops, conditionals) without using LINQ or .NET collections. Implement a couple of search and sort algorithms.

Databases

SQL Server is the most commonly used Relational Database Management System (DBMS) amongst .NET developers. Make sure you’re familiar with the basics of relational databases and how to create tables, views and stored procedures in SQL Server.

T-SQL is the query language we use to query or modify data in a SQL Server database. Make sure you know your SELECT, INSERT, UPDATE, DELETE, JOIN and GROUP BY.

Zero to Hero with Microsoft SQL Server 2014
T-SQL Step by Step

O/RMs

When using a relational database, we often use an Object/Relational Mapper (O/RM) to save or load objects in a database. There are many O/RMs out there including Entity Framework, nHibernate, Dapper, PetaPoco, etc, but Entity Framework is the most commonly used amongst many teams.

Getting Started with Entity Framework 7

I also have a comprehensive 6-hour Entity Framework course on Udemy.

For Web Development

Building web applications is fundamentally different from building desktop applications. A web application at a minimum includes two parts: one that runs in the user’s browser (front-end), and one that runs on the server (back-end). As you view web pages in your browser, click on buttons and links, a request is sent from your browser to the server. The request is processed on the server, some data fetched from or written to the database and results are returned to your browser.

Web developers are often classified in three groups:

  • Front-end developers
  • Back-end developers
  • Full-stack developers: those who do both the front-end and the back-end

You should choose one of these paths depending on your interests. Full-stack developers often have more job opportunities because they can do both the front-end and the back-end.

As a front-end developer, you need to be familiar with basics of HTML, CSS and Javascript at a minimum.

HTML is the markup language we use to build structure for a web page. Unlike programming languages like C#, it doesn’t have logic.With the structure in place, we use CSS to make the page beautiful. CSS is all about styles (colors, padding, borders, fonts, etc). And finally we use Javascript to add behaviour to a webpage: what happens when you click a button or drag-and-drop an element.

HTML & CSS for Beginners
Learn to Code HTML & CSS
HTML5 & CSS Fundamentals on Channel9
Javascript on Code Academy

path3.001

ASP.NET MVC is the dominant framework (amongst C# developers) for building web applications on the server. As an ASP.NET MVC developer, you should still have some basic familiarity with HTML, CSS and Javascript. So, I’d suggest you to start with front-end development and then move to back-end development, which would make you a full-stack developer.

Here is a comprehensive tutorial I’ve published on Udemy blog to get you started. I’ll show you how to build a simple application with CRUD operations using ASP.NET MVC5 and Entity Framework6:

A Step-by-Step ASP.NET MVC Tutorial for Beginners

For Desktop Development

If you want to build desktop applications for Windows, you need a different set of skills than HTML, CSS and Javascript. Even though some are working on using HTML, CSS and Javascript to build desktop applications, it’s still new and 99% of the jobs out there require you to know XAML, WPF or Windows Forms.

 

path3.002

WPF: A Beginner’s Guide

 

If you’re a junior C# developer and have a question, drop a comment below and I’ll do my best to guide you in the right direction. If you’re an experienced C# developer and think I missed something to include in this post, or you know better resources for any of these topics, please let me know. I’ll update the post.

 

Tags: , ,
Comments

5 C# Collections that Every C# Developer Must Know

Finding the right collection in .NET is like finding the right camera in a camera shop! There are so many options to choose from, and each is strong in certain scenarios and weak in others. If looking for a collection in .NET has left you confused, you’re not alone.

In this post, which is the first in the series on .NET collections, I’m going to cover 5 essential collection types that every C# developer must know. These are the collections that you’ll use 80 – 90% of the time, if not more. In the future posts in this series, I’ll be covering other collection types that are used in special cases, where performance and concurrency are critical.

So, in this post, I’m going to explore the following collection types. For each type, I’ll explain what it is, when to use and how to use it.

  • List
  • Dictionary
  • HashSet
  • Stack
  • Queue

List<T>

Represents a list of objects that can be accessed by an index. <T> here means this is a generic list. If you’re not familiar with generics, check out my YouTube video.

Unlike arrays that are fixed in size, lists can grow in size dynamically. That’s why they’re also called dynamic arrays or vectors. Internally, a list uses an array for storage. If it becomes full, it’ll create a new larger array, and will copy items from the existing array into the new one.

These days, it’s common to use lists instead of arrays, even if you’re working with a fixed set of items.

To create a list:

var list = new List<int>();

If you plan to store large number of objects in a list, you can reduce the cost of reallocations of the internal array by setting an initial size:

// Creating a list with an initial size
var list = new List<int>(10000);

Here are some useful operations with lists:

// Add an item at the end of the list
list.Add(4);

// Add an item at index 0
list.Insert(4, 0);

// Remove an item from list
list.Remove(1);

// Remove the item at index 0
list.RemoveAt(0);

// Return the item at index 0
var first = list[0];

// Return the index of an item
var index = list.IndexOf(4);

// Check to see if the list contains an item
var contains = list.Contains(4);

// Return the number of items in the list 
var count = list.Count;

// Iterate over all objects in a list
foreach (var item in list)
    Console.WriteLine(item);

Now, let’s see where a list performs well and where it doesn’t.

Adding/Removing Items at the Beginning or Middle

If you add/remove an item at the beginning or middle of a list, it needs to shift one or more items in its internal array. In the worst case scenario, if you add/remove an item at the very beginning of a list, it needs to shift all existing items. The larger the list, the more costly this operation is going to be. We specify the cost of this operation using Big O notation: O(n), which simply means the cost increases linearly in direct proportion to the size of the input. So, as n grows, the execution time of the algorithm increases in direct proportion to n.

Adding/Removing Items at the End

Adding/removing an item at the end of a list is a relatively fast operation and does not depend on the size of the list. The existing items do not have to be shifted. This is why the cost of this operation is relatively constant and is not dependent on the number of items in the list. We represent the execution cost of this operation with Big O notation: O(1). So, 1 here means constant.

Searching for an Item

When using methods that involve searching for an item(e.g. IndexOf, Contains and Find), List performs a linear search. This means, it iterates over all items in its internal array and if it finds a match, it returns it. In the worst case scenario, if this item is at the end of the list, all items in the list need to be scanned before finding the match. Again, this is another example of O(n), where the cost of finding a match is linear and in direct proportion with the number of elements in the list.

Accessing an Item by an Index

This is what lists are good at. You can use an index to get an item in a list and no matter how big the list is, the cost of accessing an item by index remains relatively constant, hence O(1).

List in a Nutshell

So, adding/removing items at the end of a list and accessing items by index are fast and efficient operations with O(1). Searching for an item in a list involves a linear search and in the worst case scenario is O(n). If you need to search for items based on some criteria, and not an index (e.g. customer with ID 1234), you may better use a Dictionary.

 

Dictionary<TKey, TValue>

Dictionary is a collection type that is useful when you need fast lookups by keys. For example, imagine you have a list of customers and as part of a task, you need to quickly look up a customer by their ID (or some other unique identifier, which we call key). With a list, looking up a customer involves a linear search and the cost of this operation, as you learned earlier, is O(n) in the worst case scenario. With a dictionary, however, look ups are very fast with O(1), which means no matter how large the dictionary is, the look up time remans relatively constant.

When storing or retrieving an object in a dictionary, you need to supply a key. The key is a value that uniquely identifies an object and cannot be null. For example, to store a Customer in a Dictionary, you can use CustomerID as the key.

To create a dictionary, first you need to specify the type of keys and values:

var dictionary = new Dictionary<int, Customer>();

Here, our dictionary uses int keys and Customer values. So, you can store a Customer object in this dictionary as follows:

dictionary.Add(customer.Id, customer);

You can also add objects to a dictionary during initialization:

var dictionary = new Dictionary<int, Customer>
{
     { customer1.Id, customer1 },
     { customer2.Id, customer2 }
}

Later, you can look up customers by their IDs very quickly:

// Return the customer with ID 1234 
var customer = dictionary[1234];

You can remove an object by its key or remove all objects using the Clear method:

// Removing an object by its key
dictionary.Remove(1);

// Removing all objects
dictionary.Clear();

And here are some other useful methods available in the Dictionary class:

var count = dictionary.Count; 

var containsKey = dictionary.ContainsKey(1);

var containsValue = dictionary.ContainsValue(customer1);

// Iterate over keys 
foreach (var key in dictionary.Keys)
     Console.WriteLine(dictionary[key]);

// Iterate over values
foreach (var value in dictionary.Values)
     Console.WriteLine(value);

// Iterate over dictionary
foreach (var keyValuePair in dictionary)
{
     Console.WriteLine(keyValuePair.Key);
     Console.WriteLine(keyValuePair.Value);
}

So, why are dictionary look ups so fast? A dictionary internally stores objects in an array, but unlike a list, where objects are added at the end of the array (or at the provided index), the index is calculated using a hash function. So, when we store an object in a dictionary, it’ll call the GetHashCode method on the key of the object to calculate the hash. The hash is then adjusted to the size of the array to calculate the index into the array to store the object. Later, when we lookup an object by its key, GetHashCode method is used again to calculate the hash and the index. As you learned earlier, looking up an object by index in an array is a fast operation with O(1). So, unlike lists, looking up an object in a dictionary does not require  scanning every object and no matter how large the dictionary is, it’ll remain extremely fast.

So, in the following figure, when we store this object in a dictionary, the GetHashCode method on the key is called. Let’s assume it returns 1234. This hash value is then adjusted based on the size of the internal array. In this figure, length of the internal array is 6. So, the remainder of the division of 1234 by 6 is used to calculate the index (in this case 4). Later, when we need to look up this object, its key used again to calculate the index.

Hashtable in C#

Now, this was a simplified explanation of how hashing works. There is more involved in calculation of hashes, but you don’t really need to know the exact details at this stage (unless for personal interests). All you need to know as a C# developer is that dictionaries are hash-based collections and for that reason lookups are very fast.

 

HashSet<T>

A HashSet represents a set of unique items, just like a mathematical set (e.g. { 1, 2, 3 }). A set cannot contain duplicates and the order of items is not relevant. So, both { 1, 2, 3 } and { 3, 2, 1 } are equal.

Use a HashSet when you need super fast lookups against a unique list of items. For example, you might be processing a list of orders, and for each order, you need to quickly check the supplier code from a list of valid supplier codes.

A HashSet, similar to a Dictionary, is a hash-based collection, so look ups are very fast with O(1). But unlike a dictionary, it doesn’t store key/value pairs; it only stores values. So, every objects should be unique and this is determined by the value returned from the GetHashCode method. So, if you’re going to store custom types in a set, you need to override GetHashCode and Equals methods in your type.

To create a HashSet:

var hashSet = new HashSet<int>();

You can add/remove objects to a HashSet similar to a List:

// Initialize the set using object initialization syntax 
var hashSet = new HashSet<int>() { 1, 2, 3 };

// Add an object to the set
hashSet.Add(4);

// Remove an object 
hashSet.Remove(3);

// Remove all objects 
hashSet.Clear();

// Check to see if the set contains an object 
var contains = hashSet.Contains(1);

// Return the number of objects in the set 
var count = hashSet.Count;

HashSet provides many mathematical set operations:

// Modify the set to include only the objects present in the set and the other set
hashSet.IntersectWith(another);

// Remove all objects in "another" set from "hashSet" 
hashSet.ExceptWith(another);

// Modify the set to include all objects included in itself, in "another" set, or both
hashSet.UnionWith(another);

var isSupersetOf = hashSet.IsSupersetOf(another);
var isSubsetOf = hashSet.IsSubsetOf(another);
var equals = hashSet.SetEquals(another);

 

Stack<T>

Stack is a collection type with Last-In-First-Out (LIFO) behaviour. We often use stacks in scenarios where we need to provide the user with a way to go back. Think of your browser. As you navigate to different web sites, these addresses that you visit are pushed on a stack. Then, when you click the back button, the item on the stack (which represents the current address in the browser) is popped and now we can get the last address you visited from the item on the stack. The undo feature in applications is implemented using a stack as well.

Here is how you can use a Stack in C#:

var stack = new Stack<string>();
            
// Push items in a stack
stack.Push("http://www.google.com");

// Check to see if the stack contains a given item 
var contains = stack.Contains("http://www.google.com");

// Remove and return the item on the top of the stack
var top = stack.Pop();

// Return the item on the top of the stack without removing it 
var top = stack.Peek();

// Get the number of items in stack 
var count = stack.Count;

// Remove all items from stack 
stack.Clear();

Internally, a stack is implemented using an array. Since arrays in C# have a fixed size, as you push items into a stack, it may need to increase its capacity by re-allocating a larger array and copying existing items into the new array. If re-allocation doesn’t need to happen, push is O(1) operation; otherwise, if re-allocation is required, assuming the stack has n elements, all these elements need to be copied to the new array. This leads to runtime complexity of O(n).

Pop is an O(1) operation.

Contains is a linear search operation with O(n).

 

Queue<T>

Queue represents a collection with First-In-First-Out (FIFO) behaviour. We use queues in situations where we need to process items as they arrive.

Three main operations on queue include:

  • Enqueue: adding an element to the end of a queue
  • Dequeue: removing the element at the front of the queue
  • Peek: inspecting the element at the front without removing it.

Here is how you can use a queue:

var queue = new Queue<string>();

// Add an item to the queue
queue.Enqueue("transaction1");

// Check to see if the queue contains a given item 
var contains = queue.Contains("transaction1");

// Remove and return the item on the front of the queue
var front = queue.Dequeue();

// Return the item on the front without removing it 
var top = queue.Peek();
            
// Remove all items from queue 
queue.Clear();

// Get the number of items in the queue
var count = queue.Count;

 

Summary

Lists are fast when you need to access an element by index, but searching for an item in a list is slow since it requires a linear search.

Dictionaries provide fast lookups by key. Keys should be unique and cannot be null.

HashSets are useful when you need fast lookups to see if an element exists in a set or not.

Stacks provide LIFO (Last-In-First-Out) behaviour and are useful when you need to provide the user with a way to go back.

Queues provide FIFO (First-In-First-Out) behaviour and are useful to process items in the order arrived.

 

Love your feedback!

If you enjoyed this post, please share it and leave a comment. If you got any questions, feel free to post them here. I’ll answer every question.

Tags: , ,
Comments

Using SqlBulkCopy for fast inserts

Problem

You need to insert large number of records into one or more tables in a SQL Server database. By large, I mean several hundred thousands or even millions. The source of the data can be another database, an Excel spreadsheet, CSV files, XML and literally anything. Writing one record at a time using a SqlCommand along with a INSERT INTO statement is very costly and slow. You need an efficient solution to insert large number of records quickly.

Solution

In .NET, we have a class called SqlBulkCopy to achieve this. You give it a DataTable, or an array of DataRows, or simply an instance of a class that implements IDataReader interface. Next, you tell it the name of the destination table in the database, and let it do its magic.

I used this in one of my projects were I had to migrate data from one of our legacy models into a new model. First, I used a plain SqlCommand to insert one record at a time (totally about 400,000 records). The process took 12 minutes to run. With SqlBulkCopy, I reduced the data migration time to 6 seconds!

How to use it?

Here is the most basic way to use SqlBulkCopy. Read the code first and then I’ll explain it line by line.

var dt = new DataTable();
dt.Columns.Add("EmployeeID");
dt.Columns.Add("Name"); 

for (var i = 1; i < 1000000; i++)    
    dt.Rows.Add(i + 1, "Name " + i + 1);

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

In this example I’m assuming we have a table in the database called Employees with two columns: EmployeeID and Name. I’m also assuming that the EmployeeID column is marked as IDENTITY.

In the first part, we simply create a DataTable that resembles the structure of the target table. That’s the reason this DataTable has two columns. So, to keep things simple, I’m assuming that the source DataTable and the target table in the database have identical schema. Later in this post I’ll show you how to use mappings if your source DataTable has a different schema.

The second part is purely for demonstration. We populate this DataTable with a million record. In your project, you get data from somewhere and put it into a DataTable. Or you might use an IDataReader for more efficient reads.

And finally, we create a SqlBulkCopy and use it to write the content of our DataTable to the Employees table in the database.

Pretty simple, right? Let’s take this to the next level.

Using the identity values from the source

In the above example, I assumed that the EmployeeID column is marked as IDENTITY, hence the values are generated by the Employees table. What if you need to use the identity values in the source? It’s pretty simple. You need to use the KeepIdentity option when instantiating your SqlBulkCopy.

using (var sqlBulk = new SqlBulkCopy(_connectionString, SqlBulkCopyOptions.KeepIdentity))

With this option, the EmployeeID in our DataTable will be used.

Transactions

You can wrap all your inserts in a transaction, so either all will succeed or all will fail. This way you won’t leave your database in an inconsistent state. To use a transaction, you need to use a different constructor of SqlBulkCopy that takes a SqlConnection, options (as above) and a SqlTransaction object.

using (var connection = new SqlConnection(_connectionString))
{
    connection.Open();

    var transaction = connection.BeginTransaction();

    using (var sqlBulk = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepIdentity, transaction))
    {
        sqlBulk.DestinationTableName = "Employees";
        sqlBulk.WriteToServer(dt);
    }
}

Note that here we have to explicitly create a SqlConnection object in order to create a SqlTransaction. That’s why this example is slightly more complicated than the previous ones where we simply passed a string (connection string) to SqlBulkCopy. So here we need to manually create the connection, open it, create a transaction, and then pass the connection and the transaction objects to our SqlBulkCopy.

Batch size

By default, all the records in the source will be written to the target table in one batch. This means, as the number of records in the source increases, the memory consumed by SqlBulkCopy will increase. If you have memory limitations, you can reduce the number of records written in each batch. This way, SqlBulkCopy will write smaller batches to the database, hence it will consume less memory. Since there are multiple conversations with the database, this will have a negative impact on the performance. So, based on your circumstances, you may need to try a few different batch sizes and find a number that works for you.

To set the batch size:

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.BatchSize = 5000;
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

This example writes 5000 records in each batch.

Notifications

You might need to write a message in the console or a log as records are written to the database. SqlBulkCopy supports notifications. So it has an event to which you can subscribe. You set the number of records to be processed before a notification event is generated.

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.NotifyAfter = 1000;
    sqlBulk.SqlRowsCopied += (sender, eventArgs) => Console.WriteLine("Wrote " + eventArgs.RowsCopied + " records.");
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

In this example, once every 1000 records are processed, we get notified and display a message on the Console. Note that SqlRowsCopied is the event name and here we use a lambda expression to create an anonymous method as the event handler. If you’re not familiar with lambda expressions, delegates and event handlers, check out my C# Advanced course.

Column mappings

In all the examples so far, I assumed our DataTable has the exact same schema as the target table. What if the “Name” column in the source is actually called “FullName”. Here is how we can create a mapping between the columns in the source and target table.

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.ColumnMappings.Add("FullName", "Name");
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dt);
}

How about multiple tables?

So far we’ve only inserted records into one table: the Employees table. What if we wanted to populate Employees and their Timesheet? With our DataTables as the data source, we can have two data tables, one for Employees, one for Timesheets. Then, we use our SqlBulkCopy to populate one table at a time:

using (var sqlBulk = new SqlBulkCopy(_connectionString))
{
    sqlBulk.DestinationTableName = "Employees";
    sqlBulk.WriteToServer(dtEmployees);

    sqlBulk.DestinationTableName = "Timesheets";
    sqlBulk.WriteToServer(dtTimesheets);
}

 

I hope you enjoyed this post and learned something new. If you enjoy my teaching style and like to learn more from me, subscribe to my blog. Also, check out my courses for more substantial learning.

 

Tags: , , ,