Skip to main content

Parsing Delicious Export File

With all the brew ha ha going on about how Delicious is going to be dumped I made a backup of my Delicious bookmarks. While I was at it I created a quick utility to parse the export file so I could play with the links and tags.

Key to parsing the file was the HTML Agility Pack. The Bookmark class:

[csharp]
public class Bookmark
{
public string Title { get; set; }
public string Href { get; set; }
public DateTime AddDate { get; set; }
public string AddDateEpoch { get; set; }
public List<string> Tags { get; set; }
public bool IsPrivate { get; set; }
public Bookmark()
{
Tags = new List<string>();
}

public static Bookmark New(HtmlNode node)
{
if (node == null) throw new ArgumentNullException("node");

var bookmark = new Bookmark
{
Title = node.InnerText ?? string.Empty,
Href = node.Attributes["href"].Value ?? string.Empty,
AddDate = FromUnixTime(Convert.ToDouble(node.Attributes["ADD_DATE"].Value ?? "0")),
IsPrivate = (node.Attributes["ADD_DATE"].Value ?? "0").Equals("1")
};

bookmark.Tags.AddRange(GetTags(node.Attributes["tags"].Value ?? string.Empty));

return bookmark;
}

protected static DateTime FromUnixTime(double unixTime)
{
DateTime epoch = new DateTime(1970, 1, 1, 0, 0, 0, DateTimeKind.Utc);
return epoch.AddSeconds(unixTime);
}

protected static string[] GetTags(string tagList)
{
if(string.IsNullOrEmpty(tagList)) return new string[]{};

return tagList.Trim().Split(',');
}

}
[/csharp]

The tricky part was really just handling the Epoch time for the AddDate field.

Using the class:
[csharp]
var doc = new HtmlDocument();
doc.Load(@"Data\delicious.htm");

var bookmarks = doc.DocumentNode.SelectNodes("//a[@href]").Select(Bookmark.New);

var tags = bookmarks.SelectMany(b => b.Tags).Distinct().OrderBy(t => t);

var output = new StringBuilder();

foreach (var tag in tags)
{
output.AppendLine(tag);
string localTag = tag;
var taggedBookmarks = bookmarks.Where(b => b.Tags.Contains(localTag)).OrderBy(b => b.AddDate);
foreach (var taggedBookmark in taggedBookmarks)
{
output.AppendFormat("\t{0}", taggedBookmark.Title).AppendLine();
}
}

File.WriteAllText("TaggedBookmarks.txt", output.ToString());
Console.WriteLine(output.ToString());

[/csharp]

Comments

Popular posts from this blog

Json for jqGrid from ASP.Net MVC

jqGrid takes a specific format for its json (taken from jqGrid documentation): [js]{ total: "xxx",page: "yyy", records: "zzz", rows : [ {id:"1", cell:["cell11", "cell12", "cell13"]}, {id:"2", cell:["cell21", "cell22", "cell23"]}, ... ]}[/js] The tags mean the following: total - Total number of Pages. page - Current page Index. records - Total number of records in the rows group. rows - An array with the data plus an identifier. id - The unique row identifier, needs to be an int from what I have found. cell - An array of the data for the grid. The ASP.Net MVC framework has the JsonResult response type which we can use to populate the jqGrid. As an example I created a Person model and a method to return some data: [csharp] public class Person { public int ID { get; set; } public string Name { get; set; } public DateTime Birthday { get; set; } } public I

Changing Typed DataSet Connection String

I was working on a WinForm app that connected to a MS Access database. Yeah, Access sucks but I didn't have a choice in the matter. The app itself is used to import a bunch of CSV files into the Access database. It is more of a utility program and it has going through several variations, from being a simple hand driven command line tool to being GUI driven. The command line version was all hand controlled. I had to go in and update configuration files to point to the CSV files and the MDB database. That got old fast so I decided to make a GUI version that would allow me to pick the MDB file and each of the CSV files to import. Picking and using the CSV files was easy, it was changing the connection string for the MDB that proved to be the hardest. I am using strongly typed datasets in VS2005. If you have ever worked with them you find out soon that the connection string gets saved with the project in the app.config file, even if it is a seperate DAL dll project. My guess is

Remote Controlled RoboTank

This is my version of the ever popular to build RoboTank. It uses an Arduino Mega 2560 with the AdaFruit motor shield and an XBee S1 to communicate to the DFRobot Gamepad. The sketch for the RoboTank makes use of the AFMotor.h to drive the motors and includes a serial parser to read and process the commands coming from the Gamepad. Robotank-Sketch.zip DFRobot Wireless Joystick