Tuesday, November 13, 2007

Wield the Yield

Recently I've noticed an IO pattern emerging combining yield and linq; a powerful and efficient mechanism to read data.

Check the following code: We safely open a file (with the using statement), and yield the results with a loop.
IEnumerable<string> ReadFile(string filename)
{
using (StreamReader reader = new StreamReader(File.OpenRead(filename)))
while (reader.EndOfStream == false)
yield return reader.ReadLine();
}

This simple (and beautiful) code allows us to treat a file like an array of strings - which becomes very powerful when coupled with linq.
// Contrived example - How many file lines refer to "fred"?
var result = ReadFile("file.txt").Where(x => x.Contains("fred")).Count();

I'm sure this same pattern could also be applied to other data retrieval methods, eg algorithm calculation (prime numbers, Fibonacci, etc), network communication (message queue) and even as an abstraction over asynchronous data methods.

Yield provides a lazy enumeration (doesn't execute unless needed) so the solution is generic and efficient - I like it.

7 comments:

Nick said...

Beautiful ..... but deadly!!!!

If the user of the code doesn't enumerate through the entire sequence, Dispose is not called. E.g.:

foreach (string s in ReadFile(...))
if (s == "Bam!")
break;

If the file contains "Bam!" then the control flow never returns to the yielding method.

Nasty! I don't know if there is a fix for this in VS2008, doubt it though.

(You might want to run the example to make sure I'm not mad ;))

Nick said...

Oops humble pie I must eat! Guess what, I'm wrong :)

The foreach loop takes care of this scenario. The problem I hit occurs if you don't use a foreach loop: if you assign the IEnumerator to a variable and iterate using a while loop and MoveNext() you hit this.

Guess that makes foreach a closer sibling of 'using' than of 'while'. The enumerator objects implement IDisposable and foreach magically calls Dispose regardless of how the loop exits...

Yuck :)

Glad your pattern works though! Seen any documentation on this?

Luke Marshall said...

Excellent! I was wondering if anyone would pick up that gotcha.

The 'yield' syntax produces an anon class at compile time - which inherits from IDisposable.

This Dispose implementation calls our FileStream.Dispose, which (as you noted) is called at the end of a 'foreach'.

You are correct about the 'while' statement - if anyone is handling the MoveNext() code manually, they need to call Dispose() to have correct 'foreach' behaviour.

I haven't seen any documentation about this - so if you see some let me know!

Nick said...

I wonder what other atrocities can be committed by taking advantage of foreach's Dispose()-calling behaviour :)

Luke Marshall said...

haha... Here's a dodgy one - an automatic progress bar.

You pass in a UI element into the yield function; where each iteration updates the progress.

You implement a dispose that sets the progress to 100%!


Here's a cleaner trick to cover my dodgy sins:

Yielding asynchronous data (with caching) - the dispose could clean up all the async calls.

Nick said...

Brilliant... and scary...

Luke Marshall said...

I should also mention that AFAIK all the enumerating linq methods call Dispose as well.

i.e. Sum, Aggregate, ToList, etc...

but not the Where, Take, Skip, etc. as they simply wrap the iterator instead of enumerating the collection.