How to extract data from Microsoft Excel XML Files

Today I want to show how to extract data from Microsoft Excel Files. I needed it myself the time I looked for a solution to create some testdata for my application. Instead of putting the testdata into a database – which would be very hard to handle – I decided to put them into an Excel file.

As you can see the Excel file contains multiple worksheets for Users, Projects, Tasks and more. And every worksheet has a single header row.

excelread.jpg

The key solution to extract data from it is XML. Therefore the Excel file has to be stored in XML. You can also see this in the screenshot, when you look at the header of Microsoft Excel.

Knowing this presumtions, I wrote a method to extract a single worksheet from the excel file. The result is a list of rows that contains the data as a string array.

private List<string[]> ExtractWorksheet(string excelXmlFile, string worksheet, out string[] header)
{
    const string SpreadSheetSchema = "urn:schemas-microsoft-com:office:spreadsheet";
    
    XmlNamespaceManager nsMgr = new XmlNamespaceManager(new NameTable());
    nsMgr.AddNamespace("ss", SpreadSheetSchema);
    
    /*
     * Load XPath Document
     */
    XPathDocument testData = new XPathDocument(excelXmlFile);
    XPathNavigator navigator = testData.CreateNavigator();
    
    /*
     * Select amount of columns
     */
    XPathExpression expression = navigator.Compile(string.Format("/ss:Workbook/ss:Worksheet[@ss:Name='{0}']/*", worksheet));
    expression.SetContext(nsMgr);
    XPathNodeIterator expIterator = navigator.Select(expression);
    int columns = 0;
    while (expIterator.MoveNext())
    if (expIterator.Current.Name == "Table")
    {
        string attrib = expIterator.Current.GetAttribute("ExpandedColumnCount", SpreadSheetSchema);
        columns = int.Parse(attrib);
        break;
    }
    
    /*
     * Select rows
     */
    expression = navigator.Compile(string.Format("/ss:Workbook/ss:Worksheet[@ss:Name='{0}']/ss:Table/*", worksheet));
    expression.SetContext(nsMgr);
    expIterator = navigator.Select(expression);
    List<string[]> rows = new List<string[]>();
    header = null;
    while (expIterator.MoveNext())
    {
        if (expIterator.Current.HasChildren == false)
        continue;
        
        string[] row = new string[columns];
        int currentColumn = 0;
        
        /*
         * Select columns
         */
        XPathNodeIterator colIterator = expIterator.Current.SelectChildren(XPathNodeType.Element);
        while (colIterator.MoveNext())
        row[currentColumn++] = colIterator.Current.Value;
        
        if (header == null)
        header = row;
        else
        rows.Add(row);
    }
    
    return rows;
}

As you can imagine, it’s know easy to read the testdata from the Excel file.

/*
* Create Users
*/
string[] userColumns;
List<string[]> usersSheet = ExtractWorksheet("TestData.xml", "Users", out userColumns);
foreach (string[] userRow in usersSheet)
{
    User user = Factory.UserService.CreateUser(connection, userRow[0]);
    user.UserName = userRow[1];
    user.Rights = (UserRights) Enum.Parse(typeof (UserRights), userRow[2]);
    
    Factory.UserService.SaveUser(connection, user);
}

Hope you enjoyed my explations.
Cheers

- Gerhard

kick it on DotNetKicks.com