Skip to main content

Parsing Delicious Export File

With all the brew ha ha going on about how Delicious is going to be dumped I made a backup of my Delicious bookmarks. While I was at it I created a quick utility to parse the export file so I could play with the links and tags.

Key to parsing the file was the HTML Agility Pack. The Bookmark class:

[csharp]
public class Bookmark
{
public string Title { get; set; }
public string Href { get; set; }
public DateTime AddDate { get; set; }
public string AddDateEpoch { get; set; }
public List<string> Tags { get; set; }
public bool IsPrivate { get; set; }
public Bookmark()
{
Tags = new List<string>();
}

public static Bookmark New(HtmlNode node)
{
if (node == null) throw new ArgumentNullException("node");

var bookmark = new Bookmark
{
Title = node.InnerText ?? string.Empty,
Href = node.Attributes["href"].Value ?? string.Empty,
AddDate = FromUnixTime(Convert.ToDouble(node.Attributes["ADD_DATE"].Value ?? "0")),
IsPrivate = (node.Attributes["ADD_DATE"].Value ?? "0").Equals("1")
};

bookmark.Tags.AddRange(GetTags(node.Attributes["tags"].Value ?? string.Empty));

return bookmark;
}

protected static DateTime FromUnixTime(double unixTime)
{
DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
return epoch.AddSeconds(unixTime);
}

protected static string[] GetTags(string tagList)
{
if(string.IsNullOrEmpty(tagList)) return new string[]{};

return tagList.Trim().Split(',');
}

}
[/csharp]

The tricky part was really just handling the Epoch time for the AddDate field.

Using the class:
[csharp]
var doc = new HtmlDocument();
doc.Load(@"Data\delicious.htm");

var bookmarks = doc.DocumentNode.SelectNodes("//a[@href]").Select(Bookmark.New);

var tags = bookmarks.SelectMany(b => b.Tags).Distinct().OrderBy(t => t);

var output = new StringBuilder();

foreach (var tag in tags)
{
output.AppendLine(tag);
string localTag = tag;
var taggedBookmarks = bookmarks.Where(b => b.Tags.Contains(localTag)).OrderBy(b => b.AddDate);
foreach (var taggedBookmark in taggedBookmarks)
{
output.AppendFormat("\t{0}", taggedBookmark.Title).AppendLine();
}
}

File.WriteAllText("TaggedBookmarks.txt", output.ToString());
Console.WriteLine(output.ToString());

[/csharp]

Comments

Popular posts from this blog

C# Spirograph Point Generators

Spirograph's  are cool.  See here and here . I put together three ways to generate points for a Spirograph, first using a Brute Force straight generate the points, second using a Parallel.For and third using LINQ.

FileSystemWatcher With the BlockingCollection

While working with the FileSystemWatcher I found that if too many files were created the built in buffer will overflowed and files will be skipped.  After much research I found out about the Producer-Consumer Problem .  Then I found that .Net 4 has the BlockingCollection which helps solve the issue.  But how to use it with the FileSystemWatcher? On StackOverflow I found  Making PLINQ and BlockingCollection work together .  I'm not so interested in the PLINQ issue but this is a great example of using The BlockingCollection with FileSystemWatcher. [csharp] using System; using System.Collections.Concurrent; using System.Collections.Generic; using System.IO; using System.Linq; using System.Threading; namespace ConsoleApplication4 {     public class Program     {         private const string Folder = "C:\\Temp\\InputData";         static void Main(string[] args) {             var cts = new CancellationTokenSource();             foreach (var obj in Input(cts.Token))            

Remote Controlled RoboTank

This is my version of the ever popular to build RoboTank. It uses an Arduino Mega 2560 with the AdaFruit motor shield and an XBee S1 to communicate to the DFRobot Gamepad. The sketch for the RoboTank makes use of the AFMotor.h to drive the motors and includes a serial parser to read and process the commands coming from the Gamepad. Robotank-Sketch.zip DFRobot Wireless Joystick