So here we are at part 2 in the series of posts looking at Functional Data Structures from the book of the same name by Chris Okasaki. Last time we looked at what is perhaps the simplest of the functional data structures, the List (also useful as a LIFO stack).  Up next we’ll continue in the order that Chris Okasaki used in his book, and take a look at implementing a Set using a Binary Tree.

Diving right in, here is implementation for a Set using a binary tree in F#:

module Set

    type Tree<'a when 'a:comparison> =
        | Empty
        | Tree of Tree<'a>*'a*Tree<'a> 

    let rec isMember value tree =
        match (value,tree) with
        | (_,Empty) -> false
        | (x,Tree(a,y,b)) ->
            if value < y then
                isMember x a
            elif value > y then
                isMember x b
            else
                true

    let rec insert value tree = 
        match (value,tree) with
        | (_,Empty) -> Tree(Empty,value,Empty)
        | (v,Tree(a,y,b)) as s -> 
            if v < y then
                Tree(insert v a,y,b)
            elif v > y then
                Tree(a,y,insert v b)
            else
                snd s

This is pretty simple, like the List we’re working with a Discriminated Union, this time with an Empty, and then a Tree that is implemented using a 3-tuple (threeple?) with a Tree, an element, and a Tree. There is a constraint on the elements that ensures they are comparable, since this is going to be an ordered tree.

We only have two functions here, one isMember, which says whether or not the element exists in the set, and the other insert, which adds a new element. If you look at the isMember function, its not too difficult, a recursive search of the tree attempting to find the element. Since this is a sorted tree, each iteration will compare the element being searched for with the element in the current node of the tree. If its less than the current node we follow the right-hand side of the tree, otherwise we follow the left-hand side of the tree. If we find an empty tree, the element doesn’t exist. Update is a little more difficult…it’s recursive like isMember, but it is also copying some of the paths. The bits that are copied are the bits that are not being traversed, so in reality the majority of the tree returned from the update function is actually shared with the source tree, its root is just new. Take a hard look at that for a moment, and see if the pain begins to subside…then we’ll look at the C# version.

public static class Tree
{
    public static EmptyTree<T> Empty<T>() where T: IComparable
    {
        return new EmptyTree<T>();
    }
}

public class EmptyTree<T> : Tree<T> where T: IComparable
{
    public override bool IsEmpty { get { return true; }}
}

public class Tree<T> where T: IComparable
{
    public Tree<T> LeftSubtree { get; internal set; }
    public Tree<T> RightSubtree { get; internal set; }
    public T Element { get; internal set; }
    public virtual bool IsEmpty
    {
        get { return false; }
    }
}

public static class Set
{
    public static bool IsMember<T>(T element, Tree<T> tree) where T: IComparable
    {            
        if (tree.IsEmpty)
            return false;
        var currentElement = tree.Element;
        var currentTree = tree;
        while(!currentTree.IsEmpty)
        {
            if (element.CompareTo(currentElement) == 0)
                return true;
            if (element.CompareTo(currentElement) == 1)
            {
                currentTree = currentTree.RightSubtree;
            }
            else
            {
                currentTree = currentTree.LeftSubtree;
            }
            currentElement = currentTree.Element;
        }
        return false;
    }

    public static Tree<T> Insert<T>(T element, Tree<T> tree) where T: IComparable
    {
        if (tree.IsEmpty)
            return new Tree<T> { LeftSubtree = Tree.Empty<T>(), Element = element, 
                                 RightSubtree = Tree.Empty<T>() };
        switch(element.CompareTo(tree.Element))
        {
            case 0:
                return tree;
            case 1:
                return new Tree<T> { RightSubtree = tree.RightSubtree, Element = tree.Element, 
                                     LeftSubtree = Set.Insert<T>(element,tree.LeftSubtree) };
            default:
                return new Tree<T> { LeftSubtree = tree.LeftSubtree, Element = tree.Element, 
                                     RightSubtree = Set.Insert<T>(element, tree.RightSubtree) }; 
        }
    }
}

This is a reasonable chunk of code, so lets work it from the top down. We start off by defining the Tree data structure. We use inheritance in this case to make an Empty tree, since we don’t have Discriminated Unions in C# (If I were a good person I would update that right now to return a singleton of the EmptyTree class, but alas, I’m lazy). The Static Tree class provides the convenience method for creating the empty tree, and the Tree type is our parameterized tree.

The methods in the Set class do the work of checking for an existing member in the set, and inserting a new member in the set.  I took the opportunity to convert the recursive isMember function to a looping construct in C# (which is what the F# compiler will do for you).  This is not really possible with the Insert method because it is not tail recursive.  The logic is the same in both versions, but the C# version is a bit more verbose (though having LeftSubtree and RightSubtree makes things a little clearer in my opinion).  Again, the biggest difference between the two is the amount of code (since we don’t have Discriminated Unions and Pattern Matching in C# land)

Summing Up Persistent Structures

Interestingly this is where the first section of Okasaki’s book ends (Its actually chapter 2, but chapter 1 is more of a foundational thing…no code).  These two implementations show the basic ideas behind what are described as “Persistent” data structures…meaning bits of the structures are re-used when creating new structures are part of an operation that would mutate the original structure in a non-functional (mutable) data structure.  In the case of a List/Stack we are referencing the old list as the “Tail” of the new list, so each time we add a new item we are simply allocating space for the new item.  In the case of the Tree/Set we create a new root tree on Add, and then reference all paths except for the new node that gets added (or, if the item already exists, we just have the new root…this is actually something Okasaki suggests the reader should solve as an additional exercise).  These concepts are fundamental to the more complex data structures that fallow, and present the basic ideas that are employed to make the structures efficient within the context of functional programming.

Up next in the book is a look at how more traditional data structures, such as heaps and queues, can be converted to a more functional setting.  Expect more goodness in the area, but I would also like to revisit some of the basics here.  The more observant readers may have noticed that the majority of the functions used on these simple types were not Tail Recursive, which means the compiler and JIT cannot optimize them, which ultimately means they are going to cause your stack to blow up if you’re dealing with large structures.  It might be worth exploring how to go about converting these to make them Tail Recursive.

I thought it might be fun to explore a little bit of CS as it applies to functional programming, by looking at the idea of Functional Data Structures.  This is actually an area that is still getting a lot of active research, and is pretty interesting stuff overall.  The general idea is to try and figure out ways to provide immutable data structures which can be efficiently implemented in a functional setting.  So you look at some standard data structures, like a linked list, and find a way to implement that as an immutable linked list.  One of the really cool features of Functional Data Structures is that because your dealing with them in an immutable setting, you can actually get a lot of re-use out of them….specifically for something like a list, you can add an item to the list, and return a “new” list that consists of the old list and the new item, and literally provide a structure that points to the old list instead of copying items.  Even if you have other parts of the code referencing older versions of the list without the new item, you don’t have to worry since none of them can mutate the list.

The biggest body of research on this topic was published by Chris Okasaki in 1998, and is still the definitive reference on the subject today.  Just for fun I’m going to look at some of the structures discussed in the original book and see what the implementations would look like in F# and C#.  The original text provided samples in Standard ML, with an eppendix containing Haskell versions.  I won’t go into too much depth on the theory behind the structures, but I will try to point out the interesting bits.

Without further ado, lets get rolling with our first data structure, which is also Okasaki’s first: Lists

Specifically, we’re going to implement a singly-linked list, which can be used rather effectively as a LIFO stack.  To start off lets look at the F# version of the list, which is closest to what Okasaki listed in his book.  The basic list type looks like this:

type List<'a> =
| Empty
| Cons of 'a * List<'a>

This is a simple Discriminated Union, with two options, Empty, and something I’ve called Cons in honor of the Lisp folks. The Cons option is basically a tuple containing an element of type type ‘a, and a List of ‘a.  This by itself is reasonably uninteresting, so lets actually do something with this.

let isEmpty = function
    | Empty -> true
    | _ -> false

let cons head tail= Cons(head,tail)

let head = function
    | Empty -> failwith "Source list is empty"
    | Cons(head,tail) -> head

let tail = function
    | Empty -> failwith "Source list is empty"
    | Cons(head,tail) -> tail

let rec (++) leftList rightList = 
    match leftList with
    | Empty -> rightList
    | Cons(head,tail) -> Cons(head,tail ++ rightList)

let rec update list index value =
    match (list,index,value) with
    | (Empty,_,_) -> failwith "Source list of empty"
    | (Cons(_,tail),0,v) -> Cons(v,tail)
    | (Cons(_,tail),i,v) -> update tail (i - 1) v

Here we have some basic functions, an isEmpty check, a cons method (which creates a list), the head and tail functions, along with a ++ function, which appends two lists, plus an update method which changes the value of a particular element in the list.  Notice the update and ++ functions are both recursive, and in the case of the ++ function, it is not tail recursive. This is probably ok in this case since the performance of the ++ function is O(n) where n = length of the left list.  Both of these functions are also interesting because the F# compiler is unable to optimize them by converting them into a loop.

If we look at the C# version of these same structures things look pretty much the same:

public static class List
{
    public static List<T> Empty<T>()
    {
        return new EmptyList<T>();
    }

    public static List<T> Cons<T>(T head, List<T> tail)
    {
        return new List<T> { Head = head, Tail = tail };
    }
}
public class EmptyList<T> : List<T>
{
    public bool IsEmpty { get { return true; } }
}

public class List<T> : List
{
    public T Head {get; set; }
    public List<T> Tail {get; set; }

    public bool IsEmpty 
    {
        get { return false; }
    }

    public List<T> Update(int index, T value)
    {
        if(this.IsEmpty)
            throw new InvalidOperationException("You can't update an empty list");
        if(index == 0)
            return List.Cons<T>(value,this.Tail);
        return this.Tail.Update(index - 1, value);

    }

    public static List<T> operator +(List<T> leftList, List<T> rightList)
    {
        if(leftList.IsEmpty)
            return rightList;

        return List.Cons<T>(leftList.Head, leftList.Tail + rightList);
    }
}

Other than being almost twice as long, there are not many differences between the C# version of this structure and the F# version In this version I’ve opted to make the empty list a subclass of the List that has the IsEmpty property return true all the time.  There is also a static Empty<T>() method which returns an empty list.  A reasonable improvement could be to make this a singleton, so that empty lists would also share reference equality. Since the ++ operator in C# is not overloadable (and is a unary operator to boot) I’ve used an overload of the + operator for concatenating two lists.  The implementations are the same as the F# versions, though honestly recursion is a little strange in C#.  We still have the same performance characteristics, where appending an element is an O(n) operation, We also have the same issues with recursion, namely a stackoverflow if we have a large enough list.  Though, honestly with the performance of the update operation overall, you should probably find a new structure before you get to the point where your going to overflow your stack.

One very nice use for this particular structure is the LIFO stack.  Rather than the typical “push” and “pop” operations, we have the “cons” and “head”/”tail” operations (in the case of pop, you have “head” which gives you the elements, and “tail” which gives you the rest of the list).  This works well because pushing and popping are O(1).  This structure is not all that different than the built-in List type in F#, without the benefit of the additional functions (filter, map, tryFind, etc).  Thought it would be reasonably trivial to implement these in a recursive fashion.

 

That’s it for this segment…up next we’re going to look at using an immutable binary tree to implement a Set….good stuff for sure.

Tentatively subtitled: “How scale can make fools of us all”

This is going to be a real life war story…cause I haven’t done one of those in a while, and this particular case really ticked me off.  Here’s the scoop:  I’ve got a “service” which is called by other parts of the system.  And by “service” I don’t mean something running in its own process and waiting for SOAP/REST requests or messages, I simply mean something that has a defined entry point (a static method in this case), where you pass in some data, and get something back.

Like many others, I’m sure, I’m using an IoC container to wire up bits so that I can have a big ball of interfaces “to make testing easier” (one of these days I’ll break that rather nasty habit and figure out a better way to do thing, but I’m getting off topic).  Specifically, I’m using Windsor for my dependency injection because it seems to have become the Container de jure among the devs that actually are using containers at work (StructureMap was in there for a while too, but it seems to have faded).  As many of you may know, Windsor is one of those containers that tracks instances for you so that it can use a Lifecycle rule to decide whether to give you an already existing instance of an object, or create a new one for you. It will also automatically call Dispose() on IDisposable objects that it may be tracking, thus helping ensure proper cleanup of resources.

In my case I had everything set up using the Transient lifestyle, because each request was essentially stateless, and there really wasn’t a lot of expense involved in creating a new instance of the objects.  Because I’ve done my homework, I know that if you’re using Transient objects in Windsor, you should explicitly call Release on the container to release the object when you’re done with it, otherwise you’re likely to get a memory leak, since the container would be holding on to an instance of the object, not letting the GC do its thing.  So, I made sure I did that, and my code looked something like this:

var myService = _container.GetService<IMyService>();
try
{
    myService.DoWork();
}
finally
{
    _container.Release(myService);
}

The one thing to point out here, is that my reference to _container was a singleton, so I would get it set up the first time and then use the pre-configured container after that. So, where is the problem? Anyone? Well, I didn’t see anything wrong with it. And neither did the person doing the code review.  But, as you might guess from the fact that I’m writing about this, there was a problem, and here’s how it manifested itself:

Approximately 6 days after this went to production, one particular set of servers in one of our data centers (lets say for the sake of this post we have 2) started kicking out OutOfMemoryExceptions during calls to the service.  My first thought was, “strange, but I’m doing the right thing here and releasing, so its probably just something else eating up memory and my code is suffering”.  To help demonstrate this I even set up a test running 1000 calls to the service in a while loop and watching the memory…nothing unusual, hovered around 33MB.  So I fired up the most excellent dotTrace memory profiler, and it confirmed.

4 more days go by and our operations folks come and beat the crap out of me because they have had to reboot production servers every couple of hours.  Ok, they didn’t beat the crap out of me, but they wanted to, and they did send along a dump, which one of the other devs who is a wiz with windbg was able to translate into something meaningful for me.  The dump showed thread contention in ReaderWriterLockSlim.WaitOnEvent(), and about 200MB worth of an object called Castle.Microkernel.Burden.  And here are some other interesting details:  The service is called by all kinds of different servers; Web servers, SOAP servers, REST servers, but none of these were showing problems.  The only one that was having issues was a server that was set up to process asynchronous SOAP requests (don’t ask).  And each server could process up to 20 at a time.

Armed with this information I did some googling, and discovered that the Burden object is the thing you leak when you don’t call Release() on the container in Windsor….But I was calling release!  I found a blog post by Davy Brion that talked about getting leaks when using your own Windsor container with NServiceBus, and how to deal with it….seemed interesting, but it also seemed like something that didn’t apply, since the problem there was that NServiceBus didn’t know about calling Release() since it was written with a container that didn’t keep references.  It did lead me to the source code for the release policy, which showed me something very interesting.

The Windsor object tracking is basically doing some reference counting.  The ReaderWriterLockSlim is being used to manage the count of instance references, so when you create a new instance it is incremented, and when you release an instance it is decremented.  In either case you’re doing a write, so you’re calling a ForWriting() method on a lock wrapper, which is effectively trying to do a write lock (at some point down the call stack)….very interesting.  At this point I decided to see if I could reproduce the problem, and so I took my earlier test running 1000 calls in a loop, and kicked it up a few notches on the concurrency scale, and set it up to run calls in a while loop until the thread was canceled. I fired up 25 threads to do this, launched the little console app and waited.  Sure enough I was able to see in process monitor that memory was rising….there were some spots where a large collection was taking place, but it wouldn’t release everything, and so soon my little app which started at around 40 MB was using 50 MB, then 60 MB.  It was the concurrency!  The multiple requests were stacking up new instances of object, and new instances of the Burden object faster than they could be collected because the whole thing was bottle-necked by the ReaderWriterLockSlim!

So I plugged in a version of Davy’s code to fix the NServiceBus issue, only I decided since I was managing this container local to my service, and I was also dealing with any Disposables myself, that I would not let it track anything (there is actually a built-in policy object for not tracking anything…just realized that).  Plugged it in, fired up the test, and I had a little console app that ran for about an hour and hovered at about 40MB of memory in use.

We actually did an emergency deployment to push this to the effected set of servers in production, and I’m happy to say that so far I’ve not seen an issue….of course our logs stopped showing the OutOfMemory exceptions about 24 hours before we pushed the fix, so we have that to help out our feeling of doubt that the issue is resolved.  And even though I could create something suspicious locally, we were never able to recreate the production issue in QA.  One of the interesting things about our environment is that we have a lot of customers who do things that we don’t exactly expect, or want, them to do.  It looks like in this case we had some customers who were doing a lot of asynchronous calls and they just managed to stack up in a way where things got ugly.

On November 27th, a beta release of the 9.3 version of the Developer Express components, including CodeRush and Refactor Pro! was made available to subscribers.  This release is pretty significant to me because it contains a major feature that I have been waiting for for a long time: A Unit Test Runner.  There were some teasers released by Mark Miller a while back, which only made me want to get my hands on the tool that much more.  My initial impressions are that it is very nice.  It is similar to TestDriven.Net in that it provides context menu options to run tests at various levels of granularity (single test, file, project, and solution level) and includes a debug option.  At this point it does not contain some of the additional coolness that TestDriven gives you like NCover/Team Coverage and TypeMock integration, but it does have the advantage of being extensible.  I know it was extensible because Mr. Miller told me it was extensible (the title “The Extensible Unit Test Runner You’ve Been Waiting For” was a clue).  I did not realize how extensible, however, until after I submitted a bug report to DevExpress.  The bug I was reporting (the NUnit TestCase attributes were not recognized), it turns out, was already brought to the attention of the DX team by way of a forum post, and they had already planned on correcting it with the next 9.3 release, but I could have saved myself (and Vito on DevExpress team) some time by taking a peek at the source samples bundled with the 9.3 release.  Yep, you guessed it, there with a shared source license were all of the test framework implementation projects.  So this meant I could whip together my own temporary fix while I was waiting for the next release.  It seemed like something that other folks might want to know about, so I thought I would share it here.

The biggest piece of the puzzle is a new TestExecuteTask class for handling the TestCaseAttribute.  Due to my complete lack of creativity, I called mine TestCaseExecuteTask, and it looks like this:

using System;
using System.Collections.Generic;
using System.Text;
using DevExpress.CodeRush.Core.Testing;
using System.Reflection;
using DevExpress.CodeRush.Core;

namespace CR_NUnitTesting
{
    public class TestCaseExecuteTask : TestExecuteTask
    {
        public override TaskExecuteResult CollectTestParameters()
        {
            TaskExecuteResult result = TaskExecuteResult.SkippedTaskResult;
            Attribute testCase = GetMethodAttribute("NUnit.Framework.TestCaseAttribute");
            if (testCase == null)
                return result;
            
            foreach(Attribute testCaseItem in TestMethod.GetCustomAttributes(true))
            {
                if(testCaseItem == null)
                    continue;
                var testCaseType = testCaseItem.GetType();
                if(testCaseType == null || testCaseType.FullName != "NUnit.Framework.TestCaseAttribute")
                    continue;
                PropertyInfo prop = testCaseType.GetProperty("Arguments");
                if(prop == null)
                    continue;
                foreach(MethodInfo getter in prop.GetAccessors())
                {
                    object[] parameters = getter.Invoke(testCaseItem, Type.EmptyTypes) as object[];
                    result.AddParameters(parameters);
                }
            }
        }
    }
}

This could be cleaned up some, and some of the magic strings extracted to constants, but overall it is pretty simple. Basically what is going on here is that we are looking for the TestCase attribute, and extracting the arguments for any attributes we find.  It just so happens that the TestExecuteTask base class has a CollectTestParameters() method we can override which allows for this sort of Row testing.  The parameters we extract get stashed in the execution result, which causes the test runner to execute the test once for each group of parameters (the result has a list of parameters, which gets populated with an array of objects for each TestCase attribute), and will correctly display which cases failed if there is a failure.

There are a couple other small changes that need to happen to get this to work.  There is an NUnitExtension.cs  class, which is the Plug-In class for the NUnit support, and it handles wiring everything up for us.  First off we need to initialize our new TestExecuteTask, and add it to the list of tasks that run for NUnit tests.  We do that in the InitializePlugin method of the NUnitExtension class:

public override void InitializePlugin()
{
    base.InitializePlugin();
    nUnitProvider.AvailableTasks.Add(new NUnitIgnoreTask());
    nUnitProvider.AvailableTasks.Add(new NUnitSetupTearDownTask());
    nUnitProvider.AvailableTasks.Add(new NUnitExpectedExceptionTask());
    nUnitProvider.AvailableTasks.Add(new NUnitValuesTask());
    nUnitProvider.AvailableTasks.Add(new NUnitRowTestTask());
    nUnitProvider.AvailableTasks.Add(new NUnitTimeoutTask());
    nUnitProvider.AvailableTasks.Add(new NUnitExplicitTask());
    nUnitProvider.AvailableTasks.Add(new NUnitTestCaseTask());
}

Ours gets added to the end of the list, so it will be executed. The next step is to get the plug-in to realize that a method with a TestCase attribute is an executable test method. That trick happens in the handler for the CheckTestMethod event on the UnitTestProvider. All we’re going to do is add another condition to an if statement like so:

void nUnitProvider_CheckTestMethod(object sender, CheckTestMethodEventArgs ea)
{
    IMethodElement method = ea.Method;
    if(//method.Name != null && method.Name.StartsWith("Test")
       ea.GetAttribute("NUnit.Framework", "Test", method) != null
    || ea.GetAttribute("NUnit.Framework.Extensions", "RowTest", method) != null
    || ea.GetAttribute("NUnit.Framework", "TestCase", method) != null)
    {
        ea.IsTestMethod = true;
        ea.Description = ea.GetAttributeText("NUnit.Framework", "Description", method);
        ea.Category = ea.GetAttributeText("NUnit.Framework", "Category", method);
    }
}

The only change to the original code was the additional GetAttribute call at the end of the if statement (the comments were there when I got there, I swear).  Now the only thing left to do is to compile it and drop it in the plug-ins directory.  Now when you are looking at a test class, you should be able to run TestCase decorated test methods without problem.  Well, almost.  There is one thing I was not able to find a clean way to implement, and that is the Result property of the TestCase attribute.  This allows you to streamline tests which are doing equals assertions by having the test method return the actual result, and you specify the expected result by using the result property.  Unfortunately I could not find a way to hook into the actual execution of the test in such a way that I could have access to the specific test properties being used, and the result of the test method execution.  But considering the DevExpress folks will be fixing this issue, I’m sure when they release it there will be support for this feature.  After all, this is simply a stop-gap solution until the next CodeRush release is available, so I’m willing to live with this slight inconvenience.

Happy Testing!