Comments

IEnumerable and IEnumerator in C#

Many junior C# developers find the two IEnumerable and IEnumerator interfaces confusing. In fact, I was one of them when I first started learning C#! So, in this post, I’m going to explore these two interfaces in detail.

I’ll start by giving you a quick answer if you’re too busy to read the rest of the post, and then I’ll get into the details.

IEnumerable and IEnumerator in a Nutshell

IEnumerable and IEnumerator are implementation of the iterator pattern in .NET. I’ll explain the iterator pattern and the problem it aims to solve in detail shortly. But if you’re looking for a quick, pragmatic tip, remember that when a class implements IEnumerable, it can be enumerated. This means you can use a foreach block to iterate over that type.

In C#, all collections (eg lists, dictionaries, stacks, queues, etc) are enumerable because they implement the IEnumerable interface. So are strings. You can iterate over a string using a foreach block to get every character in the string.

Iterator Pattern

Consider the following implementation of a List class. (This is an over-simplified example and not a proper/full implementation of the List class).

public class List
{
    public object[] Objects;

    public List()
    {
        Objects = new object[100];
    }

    public void Add(object obj)
    {
        Objects[Objects.Count] = obj;
    }
}

The problem with this implementation is that the List class is exposing its internal structure (object[]) for storing data. This violates the information hiding principle of object-oriented programming. It gives the outside world intimate knowledge of the design of this class. If tomorrow we decide to replace the array with a binary search tree, all the code that directly reference the Objects array need to modified.

So, objects should not expose their internal structure. This means we need to modify our List class and make the Objects array private:

public class List
{
    private object[] _objects;

    public List()
    {
        _objects = new object[100];
    }

    public void Add(object obj)
    {
        _objects[_objects.Count] = obj;
    }
}

Note that I renamed Objects  to _objects because by convention private fields in C# should be named using camel notation prefixed with an underline.

So, with this change, we’re hiding the internal structure of this class from the outside. But this leads to a new different problem: how are we going to iterate over this list? We no longer have access to the Objects array, and we cannot use it in a loop.

That’s when the iterator pattern comes into the picture. It provides a mechanism to traverse an object irrespective of how it is internally represented.

IEnumerable and IEnumerator interfaces in .NET are implementations of the iterator pattern. So, let’s see how these interfaces work, and how to implement them in our List class here.

IEnumerable interface represents an object that can be enumerated, like the List class here. It has one method:

public interface IEnumerable
{
    IEnumerator GetEnumerator();
}

The GetEnumerator method here returns an IEnumerator object, which can be used to iterate (or enumerate) the given object. Here is the declaration of the IEnumerator interface:

public interface IEnumerator
{
    bool MoveNext();
    object Current { get; }
    void Reset();
}

With this, the client code can use the MoveNext() method to iterate the given object and use the Current property to access one element at a time. Here is an example:

var enumerator = list.GetEnumerator();
while (enumerator.MoveNext())
{
      Console.WriteLine(enumerator.Current);
}

Note that with this interface, the client of our class no longer knows about its internal structure. It doesn’t know if we have an array or a binary search tree or some other data structure in the List class. It simply calls GetEnumerator, receives an enumerator and uses that to enumerate the List. If we change the internal structure, this client code will not be affected whatsoever.

So, the iterator pattern provides a mechanism to iterate a class without being coupled to its internal structure.

Implementing IEnumerable and IEnumerator

So, now let’s see how we can implement the IEnumerable interface on our List class. First, we need to change our List class as follows:

public class List : IEnumerable
{
    private object[] _objects;

    public List()
    {
        _objects = new object[100];
    }

    public void Add(object obj)
    {
        _objects[_objects.Count] = obj;
    }

    public IEnumerator GetEnumerator()
    {
    } 
}

So I added the IEnumerable interface at the declaration of the class and also created the GetEnumerator method. This method should return an instance of a class that implements IEnumerator. So, we’re going to create a new class called ListEnumerator.

public class List : IEnumerable
{
    private object[] _objects;

    public List()
    {
        _objects = new object[100];
    }

    public void Add(object obj)
    {
        _objects[_objects.Count] = obj;
    }

    public IEnumerator GetEnumerator()
    {
        return new ListEnumerator();
    } 

    private class ListEnumerator : IEnumerator 
    {
    }
}

So, I modified the GetEnumerator method to return a new ListEnumerator. I also declared the ListEnumerator class, but I haven’t implemented the members of the IEnumerator interface yet. That will come shortly.

You might ask: “Mosh, why are you declaring ListEnumerator as a nested private class? Aren’t nested classes ugly?” The ListEnumerator class is part of the implementation of our List class. As you’ll see shortly, It’ll have intimate knowledge of the internal structure of the List class. If tomorrow I replace the array with a binary search tree, I need to modify ListEnumerator to support this. I don’t want anywhere else in the code to have a reference to the ListEnumerator; otherwise, the internals of the List class will be leaked to the outside again.

Alright, so let’s quickly recap up to this point. I implemented IEnumerable on our List class and defined the GetEnumerator method. This method returns a new ListEnumerator that the clients will use to iterate the List. I declared ListEnumerator as a private nested class inside List.

Now, it’s time to complete the implementation of ListEnumerator. It’s pretty easy:

public class ListEnumerator : IEnumerator
{
    private int _currentIndex = -1; 

    public bool MoveNext()
    {
        _currentIndex++;

        return (_currentIndex < _objects.Count); 
    }

    public object Current
    { 
        get 
        {
            try
            {
                return _objects[_currentIndex];
            }
            catch (IndexOutOfRangeException)
            {
                throw new InvalidOperationException();
            }
    }

    public void Reset()
    {
        _currentIndex = -1;
    }
}

Let’s examine this class bit by bit.

The _currentIndex field is used to maintain the position of the current element in the list. Initially, it is set to -1, which is before the first element in the list. As we call the MoveNext method, it is incremented by one.

The MoveNext method returns a boolean value to indicate if we’ve reached the end of the list or not. Note that here in the MoveNext method, we have a reference to _objects. This is why I told our ListEnumerator has intimate knowledge of the internal structure of the List. It knows we’re using an object[] there. If we replace the array with a binary search tree, we need to modify the MoveNext method. There are different traversal algorithms for trees.

The Current property returns the current element in the list. I’ve used a try/catch block here, incase the client of the List class tries to access the Current property before calling the MoveNext method. In this case, _currentIndex will be -1 and accessing _objects[-1] will throw IndexOutOfRangeException. I’ve caught this exception and re-thrown a more meaningful exception (InvalidOperationException). The reason for that is because I don’t want the clients of the list to know anything about the fact that we’re using an array with an index. So, IndexOutOfRange is too detailed for the clients of the List class to know and should be replaced with InvalidOperationException.

And finally, in the Reset method, we set _currentIndex back to -1, so we can re-iterate the List from the beginning, if we want.

So, let’s review. I modified our List class to hide its internal structure by making the object[] private. With this, I had to implement the IEnumerable interface so that the clients of the List could enumerate it without knowing about its internal structure. IEnumerable interface has only a single method: GetEnumerator, which is used by the clients to enumerate the List. I created another class called ListEnumerator that knows how to iterate the List. It implements a standard interface (IEnumerator) and hides the details of how the List is enumerated.

The beauty of IEnumerable and IEnumerator is that we’ll end up with a simple and consistent mechanism to iterate any objects, irrespective of their internal structure. All we need to is:

var enumerator = list.GetEnumerator();
while (enumerator.MoveNext())
{
      Console.WriteLine(enumerator.Current);
}

Any changes in the internals of our enumerable classes will be protected from leaking outside. So the client code will not be affected, and this means: more loosely-coupled software.

Generic IEnumerable<T> and IEnumerator<T>

In the examples in this post, I showed you the non-generic versions of these interfaces. These interfaces were originally added to .NET v1, but later Microsoft introduced the generic version of these interfaces to prevent the additional cost of boxing/unboxing. If you’re not familiar with generics, check out my video on YouTube.

Misconception about IEnumerable and Foreach

A common misconception about IEnumerable is that it is used so we can iterate over the underlying class using a foreach block. While this is true on the surface, the foreach block is simply a syntax sugar to make your code neater. IEnumerable, as I explained earlier, is the implementation of the iterator pattern and is used to give the ability to iterate a class without knowing its internal structure.

In the examples earlier in this post, we used IEnumerable/IEnumerator as follows:

var enumerator = list.GetEnumerator();
while (enumerator.MoveNext())
{
      Console.WriteLine(enumerator.Current);
}

So, as you see, we can still iterate the list using a while loop. But with a foreach block, our code looks cleaner:

foreach (var item in list)
{
     Console.WriteLine(item);
}

When you compile your code, the compiler translates your foreach block to a while loop like the earlier example. So, under the hood, it’ll use the IEnumerator object returned from GetEnumerator method.

So, while you can use the foreach block on any types that implements IEnumerable, IEnumerable is not designed for the foreach block!

Wrapping it Up

In this post, you learned that IEnumerable and IEnumerator are used to enumerate (or iterate) a class that has a collection nature. These interfaces are the implementation of the iterator pattern. They aim to provide a mechanism to iterate an object without knowing its internal structure.

If you enjoyed this post, please share it and leave your comment below. If you have any questions, feel free to post them here. I’ll answer every question.

Hi, my name is Mosh Hamedani and I am the author of several best-selling courses on Udemy and Pluralsight with more than 130,000 students in 196 countries. You can see the list of all my web and mobile development courses on this website.
Tags: , ,
Comments

5 C# Collections that Every C# Developer Must Know

Finding the right collection in .NET is like finding the right camera in a camera shop! There are so many options to choose from, and each is strong in certain scenarios and weak in others. If looking for a collection in .NET has left you confused, you’re not alone.

In this post, which is the first in the series on .NET collections, I’m going to cover 5 essential collection types that every C# developer must know. These are the collections that you’ll use 80 – 90% of the time, if not more. In the future posts in this series, I’ll be covering other collection types that are used in special cases, where performance and concurrency are critical.

So, in this post, I’m going to explore the following collection types. For each type, I’ll explain what it is, when to use and how to use it.

  • List
  • Dictionary
  • HashSet
  • Stack
  • Queue

List<T>

Represents a list of objects that can be accessed by an index. <T> here means this is a generic list. If you’re not familiar with generics, check out my YouTube video.

Unlike arrays that are fixed in size, lists can grow in size dynamically. That’s why they’re also called dynamic arrays or vectors. Internally, a list uses an array for storage. If it becomes full, it’ll create a new larger array, and will copy items from the existing array into the new one.

These days, it’s common to use lists instead of arrays, even if you’re working with a fixed set of items.

To create a list:

var list = new List<int>();

If you plan to store large number of objects in a list, you can reduce the cost of reallocations of the internal array by setting an initial size:

// Creating a list with an initial size
var list = new List<int>(10000);

Here are some useful operations with lists:

// Add an item at the end of the list
list.Add(4);

// Add an item at index 0
list.Insert(4, 0);

// Remove an item from list
list.Remove(1);

// Remove the item at index 0
list.RemoveAt(0);

// Return the item at index 0
var first = list[0];

// Return the index of an item
var index = list.IndexOf(4);

// Check to see if the list contains an item
var contains = list.Contains(4);

// Return the number of items in the list 
var count = list.Count;

// Iterate over all objects in a list
foreach (var item in list)
    Console.WriteLine(item);

Now, let’s see where a list performs well and where it doesn’t.

Adding/Removing Items at the Beginning or Middle

If you add/remove an item at the beginning or middle of a list, it needs to shift one or more items in its internal array. In the worst case scenario, if you add/remove an item at the very beginning of a list, it needs to shift all existing items. The larger the list, the more costly this operation is going to be. We specify the cost of this operation using Big O notation: O(n), which simply means the cost increases linearly in direct proportion to the size of the input. So, as n grows, the execution time of the algorithm increases in direct proportion to n.

Adding/Removing Items at the End

Adding/removing an item at the end of a list is a relatively fast operation and does not depend on the size of the list. The existing items do not have to be shifted. This is why the cost of this operation is relatively constant and is not dependent on the number of items in the list. We represent the execution cost of this operation with Big O notation: O(1). So, 1 here means constant.

Searching for an Item

When using methods that involve searching for an item(e.g. IndexOf, Contains and Find), List performs a linear search. This means, it iterates over all items in its internal array and if it finds a match, it returns it. In the worst case scenario, if this item is at the end of the list, all items in the list need to be scanned before finding the match. Again, this is another example of O(n), where the cost of finding a match is linear and in direct proportion with the number of elements in the list.

Accessing an Item by an Index

This is what lists are good at. You can use an index to get an item in a list and no matter how big the list is, the cost of accessing an item by index remains relatively constant, hence O(1).

List in a Nutshell

So, adding/removing items at the end of a list and accessing items by index are fast and efficient operations with O(1). Searching for an item in a list involves a linear search and in the worst case scenario is O(n). If you need to search for items based on some criteria, and not an index (e.g. customer with ID 1234), you may better use a Dictionary.

 

Dictionary<TKey, TValue>

Dictionary is a collection type that is useful when you need fast lookups by keys. For example, imagine you have a list of customers and as part of a task, you need to quickly look up a customer by their ID (or some other unique identifier, which we call key). With a list, looking up a customer involves a linear search and the cost of this operation, as you learned earlier, is O(n) in the worst case scenario. With a dictionary, however, look ups are very fast with O(1), which means no matter how large the dictionary is, the look up time remans relatively constant.

When storing or retrieving an object in a dictionary, you need to supply a key. The key is a value that uniquely identifies an object and cannot be null. For example, to store a Customer in a Dictionary, you can use CustomerID as the key.

To create a dictionary, first you need to specify the type of keys and values:

var dictionary = new Dictionary<int, Customer>();

Here, our dictionary uses int keys and Customer values. So, you can store a Customer object in this dictionary as follows:

dictionary.Add(customer.Id, customer);

You can also add objects to a dictionary during initialization:

var dictionary = new Dictionary<int, Customer>
{
     { customer1.Id, customer1 },
     { customer2.Id, customer2 }
}

Later, you can look up customers by their IDs very quickly:

// Return the customer with ID 1234 
var customer = dictionary[1234];

You can remove an object by its key or remove all objects using the Clear method:

// Removing an object by its key
dictionary.Remove(1);

// Removing all objects
dictionary.Clear();

And here are some other useful methods available in the Dictionary class:

var count = dictionary.Count; 

var containsKey = dictionary.ContainsKey(1);

var containsValue = dictionary.ContainsValue(customer1);

// Iterate over keys 
foreach (var key in dictionary.Keys)
     Console.WriteLine(dictionary[key]);

// Iterate over values
foreach (var value in dictionary.Values)
     Console.WriteLine(value);

// Iterate over dictionary
foreach (var keyValuePair in dictionary)
{
     Console.WriteLine(keyValuePair.Key);
     Console.WriteLine(keyValuePair.Value);
}

So, why are dictionary look ups so fast? A dictionary internally stores objects in an array, but unlike a list, where objects are added at the end of the array (or at the provided index), the index is calculated using a hash function. So, when we store an object in a dictionary, it’ll call the GetHashCode method on the key of the object to calculate the hash. The hash is then adjusted to the size of the array to calculate the index into the array to store the object. Later, when we lookup an object by its key, GetHashCode method is used again to calculate the hash and the index. As you learned earlier, looking up an object by index in an array is a fast operation with O(1). So, unlike lists, looking up an object in a dictionary does not require  scanning every object and no matter how large the dictionary is, it’ll remain extremely fast.

So, in the following figure, when we store this object in a dictionary, the GetHashCode method on the key is called. Let’s assume it returns 1234. This hash value is then adjusted based on the size of the internal array. In this figure, length of the internal array is 6. So, the remainder of the division of 1234 by 6 is used to calculate the index (in this case 4). Later, when we need to look up this object, its key used again to calculate the index.

Hashtable in C#

Now, this was a simplified explanation of how hashing works. There is more involved in calculation of hashes, but you don’t really need to know the exact details at this stage (unless for personal interests). All you need to know as a C# developer is that dictionaries are hash-based collections and for that reason lookups are very fast.

 

HashSet<T>

A HashSet represents a set of unique items, just like a mathematical set (e.g. { 1, 2, 3 }). A set cannot contain duplicates and the order of items is not relevant. So, both { 1, 2, 3 } and { 3, 2, 1 } are equal.

Use a HashSet when you need super fast lookups against a unique list of items. For example, you might be processing a list of orders, and for each order, you need to quickly check the supplier code from a list of valid supplier codes.

A HashSet, similar to a Dictionary, is a hash-based collection, so look ups are very fast with O(1). But unlike a dictionary, it doesn’t store key/value pairs; it only stores values. So, every objects should be unique and this is determined by the value returned from the GetHashCode method. So, if you’re going to store custom types in a set, you need to override GetHashCode and Equals methods in your type.

To create a HashSet:

var hashSet = new HashSet<int>();

You can add/remove objects to a HashSet similar to a List:

// Initialize the set using object initialization syntax 
var hashSet = new HashSet<int>() { 1, 2, 3 };

// Add an object to the set
hashSet.Add(4);

// Remove an object 
hashSet.Remove(3);

// Remove all objects 
hashSet.Clear();

// Check to see if the set contains an object 
var contains = hashSet.Contains(1);

// Return the number of objects in the set 
var count = hashSet.Count;

HashSet provides many mathematical set operations:

// Modify the set to include only the objects present in the set and the other set
hashSet.IntersectWith(another);

// Remove all objects in "another" set from "hashSet" 
hashSet.ExceptWith(another);

// Modify the set to include all objects included in itself, in "another" set, or both
hashSet.UnionWith(another);

var isSupersetOf = hashSet.IsSupersetOf(another);
var isSubsetOf = hashSet.IsSubsetOf(another);
var equals = hashSet.SetEquals(another);

 

Stack<T>

Stack is a collection type with Last-In-First-Out (LIFO) behaviour. We often use stacks in scenarios where we need to provide the user with a way to go back. Think of your browser. As you navigate to different web sites, these addresses that you visit are pushed on a stack. Then, when you click the back button, the item on the stack (which represents the current address in the browser) is popped and now we can get the last address you visited from the item on the stack. The undo feature in applications is implemented using a stack as well.

Here is how you can use a Stack in C#:

var stack = new Stack<string>();
            
// Push items in a stack
stack.Push("http://www.google.com");

// Check to see if the stack contains a given item 
var contains = stack.Contains("http://www.google.com");

// Remove and return the item on the top of the stack
var top = stack.Pop();

// Return the item on the top of the stack without removing it 
var top = stack.Peek();

// Get the number of items in stack 
var count = stack.Count;

// Remove all items from stack 
stack.Clear();

Internally, a stack is implemented using an array. Since arrays in C# have a fixed size, as you push items into a stack, it may need to increase its capacity by re-allocating a larger array and copying existing items into the new array. If re-allocation doesn’t need to happen, push is O(1) operation; otherwise, if re-allocation is required, assuming the stack has n elements, all these elements need to be copied to the new array. This leads to runtime complexity of O(n).

Pop is an O(1) operation.

Contains is a linear search operation with O(n).

 

Queue<T>

Queue represents a collection with First-In-First-Out (FIFO) behaviour. We use queues in situations where we need to process items as they arrive.

Three main operations on queue include:

  • Enqueue: adding an element to the end of a queue
  • Dequeue: removing the element at the front of the queue
  • Peek: inspecting the element at the front without removing it.

Here is how you can use a queue:

var queue = new Queue<string>();

// Add an item to the queue
queue.Enqueue("transaction1");

// Check to see if the queue contains a given item 
var contains = queue.Contains("transaction1");

// Remove and return the item on the front of the queue
var front = queue.Dequeue();

// Return the item on the front without removing it 
var top = queue.Peek();
            
// Remove all items from queue 
queue.Clear();

// Get the number of items in the queue
var count = queue.Count;

 

Summary

Lists are fast when you need to access an element by index, but searching for an item in a list is slow since it requires a linear search.

Dictionaries provide fast lookups by key. Keys should be unique and cannot be null.

HashSets are useful when you need fast lookups to see if an element exists in a set or not.

Stacks provide LIFO (Last-In-First-Out) behaviour and are useful when you need to provide the user with a way to go back.

Queues provide FIFO (First-In-First-Out) behaviour and are useful to process items in the order arrived.

 

Love your feedback!

If you enjoyed this post, please share it and leave a comment. If you got any questions, feel free to post them here. I’ll answer every question.

Hi, my name is Mosh Hamedani and I am the author of several best-selling courses on Udemy and Pluralsight with more than 130,000 students in 196 countries. You can see the list of all my web and mobile development courses on this website.
Tags: , ,
%d bloggers like this: