LINQ Join Queries vs. Nested Sub-Queries

Developing web applications, there are scenarios and patterns which repeat over and over, day after day, when working with the underlying data. One of these scenarios which I see very often is when there are two or more collections with one-to-many or many-to-many relationships which require filtering or joining on some type of key to ultimately perform a computation that requires the parent and the child in a hierarchy. Let’s explore two ways of solving this problem, first using nested loops with sub-queries, and then using LINQ join queries, and figure out which of the two solutions has the best performance and performs the least amount of iterations.

First, let’s define our Parent class:

public class Parent
{
    public Guid Id { get; set; }
 
    public string Name { get; set; }
}

Next, let’s define our Child class, which will contain a ParentId property to reference its Parent:

public class Child
{
    public Guid Id { get; set; }
 
    public string Name { get; set; }
 
    public Guid ParentId { get; set; }
}

Next, in order to gather profiling data, we’re going to create a custom implementation of the IEnumerable and IEnumerator interfaces, named LoggingEnumerable and LoggingEnumerator, which will log each call to the IEnumerator members.

The LoggingEnumerable class implements GetEnumerator by returning a new instance of the LoggingEnumerator class, which will do the actual logging.

public class LoggingEnumerable<T> : IEnumerable<T>
{
    private readonly IEnumerable<T> _enumerable;
    private readonly Action<string> _log;
 
    public LoggingEnumerable(IEnumerable<T> enumerable, Action<string> log)
    {
        _enumerable = enumerable;
        _log = log;
    }
 
    public IEnumerator<T> GetEnumerator()
    {
        return new LoggingEnumerator<T>(_enumerable.GetEnumerator(), _log);
    }
 
    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

The LoggingEnumerator implements the IEnumerator interface by proxying calls to the inner IEnumerator instance, but logs access to the members using the provided logging function.

public class LoggingEnumerator<T> : IEnumerator<T>
{
    private readonly IEnumerator<T> _enumerator;
    private readonly Action<string> _log;
 
    public LoggingEnumerator(IEnumerator<T> enumerator, Action<string> log)
    {
        _enumerator = enumerator;
        _log = log;
    }
 
    public void Dispose()
    {
        _log("Dispose");
        _enumerator.Dispose();
    }
 
    public bool MoveNext()
    {
        _log("MoveNext");
        return _enumerator.MoveNext();
    }
 
    public void Reset()
    {
        _log("Reset");
        _enumerator.Reset();
    }
 
    public T Current
    {
        get
        {
            _log("get_Current");
            return _enumerator.Current;
        }
    }
 
    object IEnumerator.Current
    {
        get { return Current; }
    }
}

Having defined our profiling classes, now let’s define our logging functions; we’ll want to log the Parent iterations and the Child iterations separately, so let’s setup two functions that we can pass to the LoggingEnumerable constructor:

 



private readonly Dictionary<string, int> _childrenLog = new Dictionary<string, int>();
private void ChildrenLog(string key)
{
    if (!_childrenLog.ContainsKey(key))
        _childrenLog[key] = 0;
    _childrenLog[key]++;
}
private readonly Dictionary<string, int> _parentLog = new Dictionary<string, int>();
private void ParentsLog(string key)
{
    if (!_parentLog.ContainsKey(key))
        _parentLog[key] = 0;
    _parentLog[key]++;
}

Next, to simulate some work we’ll write a function that we can execute during our iterations; this simple function will compute the hash code of parent and child.

 



private int ComputeHash(object parent, object child)
{
    return parent.GetHashCode() & child.GetHashCode();
}

Now we’re ready to write our profiling code, using a data source _parents and _children collection with 100 parents and 10000 children.

Using nested loops with LINQ subqueries the resulting profiling code would be:

 



public void Profile_Enumerations_Using_Nested_Loops()
{
    var parents = new LoggingEnumerable<Parent>(_parents, ParentsLog);
    var children = new LoggingEnumerable<Child>(_children, ChildrenLog);
 
    var codes = new List<int>();
    foreach (Parent parent in parents)
    {
        Guid parentId = parent.Id;
        IEnumerable<Child> ptc = children.Where(x => x.ParentId == parentId);
        foreach (Child child in ptc)
        {
            int hash = ComputeHash(child, parent);
            codes.Add(hash);
        }
    }
 
    Console.WriteLine("{0} computations completed.", codes.Count);
    foreach (var kvp in _childrenLog)
    {
        Console.WriteLine("Children_{0}:{1}", kvp.Key, kvp.Value);
    }
    foreach (var kvp in _parentLog)
    {
        Console.WriteLine("Parent_{0}:{1}", kvp.Key, kvp.Value);
    }
}

Using LINQ join queries the resulting profiling code would be:

 



public void Profile_Enumerations_Using_Linq_Join()
{
    var parents = new LoggingEnumerable<Parent>(_parents, ParentsLog);
    var children = new LoggingEnumerable<Child>(_children, ChildrenLog);
 
    IEnumerable<int> ptc = from p in parents
        join c in children
            on p.Id equals c.ParentId
        select ComputeHash(p, c);
 
    List<int> codes = ptc.ToList();
 
    Console.WriteLine("{0} computations completed.", codes.Count);
    foreach (var kvp in _childrenLog)
    {
        Console.WriteLine("Children_{0}:{1}", kvp.Key, kvp.Value);
    }
    foreach (var kvp in _parentLog)
    {
        Console.WriteLine("Parent_{0}:{1}", kvp.Key, kvp.Value);
    }
}

Finally, running the profiling methods as tests yields the following data:

 



Output from LinqCycles.LinqCyclesFixture.Profile_Enumerations_Using_Nested_Loops:
  10000 computations completed.
  Children_MoveNext:1000100
  Children_get_Current:1000000
  Children_Dispose:100
  Parent_MoveNext:101
  Parent_get_Current:100
  Parent_Dispose:1
 
Output from LinqCycles.LinqCyclesFixture.Profile_Enumerations_Using_Linq_Join:
  10000 computations completed.
  Children_MoveNext:10001
  Children_get_Current:10000
  Children_Dispose:1
  Parent_MoveNext:101
  Parent_get_Current:100
  Parent_Dispose:1

From which can validate that using LINQ join queries is far more efficient, and that the program iterates over the entire children collection only once, while using nested loops with sub-queries the sub-queries iterate over the entire children collection on for each parent! See the included sample solution and play around with the tests; things get much more interesting when adding additional layers to the hierarchy (e.g. adding grandparents).

Clone the sample solution here

Want brilliance sent straight to your inbox?