Monday, October 31, 2011

FileSystemWatcher With the BlockingCollection

While working with the FileSystemWatcher I found that if too many files were created the built in buffer will overflowed and files will be skipped.  After much research I found out about the Producer-Consumer Problem.  Then I found that .Net 4 has the BlockingCollection which helps solve the issue.  But how to use it with the FileSystemWatcher?

On StackOverflow I found Making PLINQ and BlockingCollection work together.  I'm not so interested in the PLINQ issue but this is a great example of using The BlockingCollection with FileSystemWatcher.

[csharp]
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading;

namespace ConsoleApplication4
{
    public class Program
    {
        private const string Folder = "C:\\Temp\\InputData";

        static void Main(string[] args) {

            var cts = new CancellationTokenSource();
            foreach (var obj in Input(cts.Token))
                Console.WriteLine(obj);
        }

        public static IEnumerable<object> Input(CancellationToken cancellationToken) {
            var fileList = new BlockingCollection<string>();

            var watcher = new FileSystemWatcher(Folder);
            watcher.Created += (source, e) => {
                if (cancellationToken.IsCancellationRequested)
                    watcher.EnableRaisingEvents = false;
                else if (Path.GetFileName(e.FullPath) == "STOP") {
                    watcher.EnableRaisingEvents = false;
                    fileList.CompleteAdding();
                    File.Delete(e.FullPath);
                } else
                    fileList.Add(e.FullPath);
            };
            watcher.EnableRaisingEvents = true;

            return from file in
                       fileList
                            .GetConsumingEnumerable(cancellationToken)
                            .AsParallel()
                            .WithMergeOptions(ParallelMergeOptions.NotBuffered)
                            .WithCancellation(cancellationToken)
                            .WithDegreeOfParallelism(5)
                   let obj = CreateMyObject(file)
                   select obj;
        }

        private static object CreateMyObject(string file) {
            return file;
        }
    }
}
[/csharp]

Wednesday, September 28, 2011

Running Totals in Excel

Excel does not have a running total feature or function built in.  All the examples I found on the web to do running totals included VBA code. Not that I have anything against VBA but I thought there should be a way to do running totals with built in worksheet functions.

Enter our one of our favorite functions: OFFSET().  But first, what is a running total?

Running Total Example

A running total is when you have a list of Values and you want to total of the current Value with the Previous values.  Wikipedia states that a running total is "summation of a sequence of numbers which is updated each time a new number is added to the sequence, simply by adding the value of the new number to the running total."

The key to getting the SUM() correct is getting the Range correct.  For a given Range of Values, start with the First number and SUM() until you get to the current row.  You can do this by using the OFFSET() function and taking advantage of Excel's table features to get the column range.

[vb] OFFSET ( cell reference, rows, columns, [ height ], [ width ] ) [/vb]

In the above case the Running Total column's formula becomes:

[vb] =SUM( OFFSET( [Values], 0, 0, ROW() - ROW([Values]) + 1, 1 ) ) [/vb]

[Values] is the Column we want the running total for.

rows = 0 and columns = 0 because we want to start at the very first cell of [Values]

[width] = 1 because we want only the [Values] column

[height] = ROW() - ROW([Values]) + 1, this is the magic line.

To get the height we have to figure out our current Row number, subtract off the starting Row of [Values] then add 1.  ROW([Values]) gives us the starting row of the column and ROW() gives us the current row.  For example, if the Table starts on row 3 (headers are on row 3) then the column [Values] starts on row 4.  The height of the very first cell in [Values] is:

[vb] ROW() - ROW([Values]) + 1 = 4 - 4 + 1 = 1 [/vb]

The height of the 3rd cell in the [Values] column is:

[vb]ROW() - ROW([Values]) + 1 = 6 - 4 + 1 = 3 [/vb]

Offset Function in Excel

Running Total

Tuesday, January 18, 2011

Scratching Parallel with StopWatch

Threw together a quick parallel stopwatch test. Not sure if the times prove anything.

[csharp highlight="28,33,39"]
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;

namespace Scratch.ParallelProcessing
{
class Program
{
static void Main(string[] args)
{
const int count = 10000000;
var source1 = Enumerable.Range(0, count).ToArray();
var source2 = Enumerable.Range(0, count).ToArray();
var source3 = Enumerable.Range(0, count).ToArray();

Stopwatch stopwatch = new Stopwatch();

var parallelElapsedTimes = new List<TimeSpan>();
var linearElapsedTimes = new List<TimeSpan>();
var linqSelectElapsedTimes = new List<TimeSpan>();

for (int i = 0; i < 10; i++)
{
stopwatch.Reset();
stopwatch.Start();
var parallelResults = Parallel.ForEach(source1, s => s %= 2);
parallelElapsedTimes.Add(stopwatch.Elapsed);
stopwatch.Reset();

stopwatch.Start();
LinearAction(source2, s => s %= 2);
linearElapsedTimes.Add(stopwatch.Elapsed);
stopwatch.Reset();

stopwatch.Reset();
stopwatch.Start();
Array.ForEach(source3, s=>s = s%2);
linqSelectElapsedTimes.Add(stopwatch.Elapsed);
stopwatch.Reset();

}

Console.WriteLine("Elapsed Time\t\tMin\t\tMax\t\t\tAvg");
Console.WriteLine("============\t\t===\t\t===\t\t\t===");
Console.WriteLine("{0}\t\t{1}\t\t{2}\t\t\t{3}", "Parallel", parallelElapsedTimes.Min(t => t.Milliseconds), parallelElapsedTimes.Max(t => t.Milliseconds), parallelElapsedTimes.Average(t => t.Milliseconds));
Console.WriteLine("{0}\t\t\t{1}\t\t{2}\t\t\t{3}", "Linear", linearElapsedTimes.Min(t => t.Milliseconds), linearElapsedTimes.Max(t => t.Milliseconds), linearElapsedTimes.Average(t => t.Milliseconds));
Console.WriteLine("{0}\t\t\t{1}\t\t{2}\t\t\t{3}", "Linq", linqSelectElapsedTimes.Min(t => t.Milliseconds), linqSelectElapsedTimes.Max(t => t.Milliseconds), linqSelectElapsedTimes.Average(t => t.Milliseconds));

}

public static void LinearAction<T>(IEnumerable<T> source, Action<T> action)
{
foreach (var s in source) action(s);
}
}
}


[/csharp]

Results of the timer:

Elapsed Time Min Max Avg
============ === === ===
Parallel 63 191 79.5
Linear 138 143 140.3
Linq 54 56 54.5
Press any key to continue . . .


I'm running 64 bit Vista on a Intel Core2 Duo with 4GB RAM. The Parallel seems to be inconsistent, and depends a lot on whether or not it grabs that second CPU.

Monday, January 10, 2011

Simple MapReduce

Open file, read in lines, return individual words, get length of each word, Order by the length of the words, count each word of specific length.

[csharp]

static void Main()
{

var counts = OpenFileReturnWords(@"LoremIpsumDolor.txt")
.AsParallel().Select(w=>w.Length)
.AsParallel().ToLookup(k => k)
.Select(c => new { Number = c.Key, CountOfNumber = c.Count() })
.OrderBy(c=>c.Number);

foreach (var count in counts)
Console.WriteLine("Count of {0:0000}: {1}", count.Number, count.CountOfNumber);

Console.WriteLine("Total Count: {0}", counts.Sum(c=>c.CountOfNumber));
}

public static IEnumerable<string> OpenFileReturnWords(string fileName)
{
using (var reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
var wordsInLine = line.Split(new[] {' ', '.'})
.Where(word => !string.IsNullOrEmpty(word));

foreach (var word in wordsInLine)
yield return word;
}
}
yield break;
}

[/csharp]

Tuesday, January 04, 2011

String Joins

I've seen a lot of code to generate SQL statements. Invariable the programmer has an array of strings that they loop through (for example to put into an IN clause) and they always have a check to see of the current item is the first or last in the list. The typical usage is to have a StringBuilder and an if statement which determines if an extra comma (or plus sign or whatever) is added or left out.

I say: Stop Doing That!

Use the string.Join().

[csharp]

var strings = new[] { "Darren", "Dawn", "Thomas", "Zoey" };

var results = string.Format("Replace \"{0}\" with {1} Question Marks: ({2})",
string.Join(",", strings), strings.Length,
string.Join(",", Enumerable.Repeat("?", strings.Length))
);

Console.WriteLine(results);

[/csharp]

The resulting output is:

Replace "Darren,Dawn,Thomas,Zoey" with 4 Question Marks: (?,?,?,?)