Today I want to show how to extract data from Microsoft Excel Files. I needed it myself the time I looked for a solution to create some testdata for my application. Instead of putting the testdata into a database – which would be very hard to handle – I decided to put them into an Excel file.
As you can see the Excel file contains multiple worksheets for Users, Projects, Tasks and more. And every worksheet has a single header row.

The key solution to extract data from it is XML. Therefore the Excel file has to be stored in XML. You can also see this in the screenshot, when you look at the header of Microsoft Excel.
Knowing this presumtions, I wrote a method to extract a single worksheet from the excel file. The result is a list of rows that contains the data as a string array.
private List<string[]> ExtractWorksheet(string excelXmlFile, string worksheet, out string[] header)
{
const string SpreadSheetSchema = "urn:schemas-microsoft-com:office:spreadsheet";
XmlNamespaceManager nsMgr = new XmlNamespaceManager(new NameTable());
nsMgr.AddNamespace("ss", SpreadSheetSchema);
/*
* Load XPath Document
*/
XPathDocument testData = new XPathDocument(excelXmlFile);
XPathNavigator navigator = testData.CreateNavigator();
/*
* Select amount of columns
*/
XPathExpression expression = navigator.Compile(string.Format("/ss:Workbook/ss:Worksheet[@ss:Name='{0}']/*", worksheet));
expression.SetContext(nsMgr);
XPathNodeIterator expIterator = navigator.Select(expression);
int columns = 0;
while (expIterator.MoveNext())
if (expIterator.Current.Name == "Table")
{
string attrib = expIterator.Current.GetAttribute("ExpandedColumnCount", SpreadSheetSchema);
columns = int.Parse(attrib);
break;
}
/*
* Select rows
*/
expression = navigator.Compile(string.Format("/ss:Workbook/ss:Worksheet[@ss:Name='{0}']/ss:Table/*", worksheet));
expression.SetContext(nsMgr);
expIterator = navigator.Select(expression);
List<string[]> rows = new List<string[]>();
header = null;
while (expIterator.MoveNext())
{
if (expIterator.Current.HasChildren == false)
continue;
string[] row = new string[columns];
int currentColumn = 0;
/*
* Select columns
*/
XPathNodeIterator colIterator = expIterator.Current.SelectChildren(XPathNodeType.Element);
while (colIterator.MoveNext())
row[currentColumn++] = colIterator.Current.Value;
if (header == null)
header = row;
else
rows.Add(row);
}
return rows;
}
As you can imagine, it’s know easy to read the testdata from the Excel file.
/*
* Create Users
*/
string[] userColumns;
List<string[]> usersSheet = ExtractWorksheet("TestData.xml", "Users", out userColumns);
foreach (string[] userRow in usersSheet)
{
User user = Factory.UserService.CreateUser(connection, userRow[0]);
user.UserName = userRow[1];
user.Rights = (UserRights) Enum.Parse(typeof (UserRights), userRow[2]);
Factory.UserService.SaveUser(connection, user);
}
Hope you enjoyed my explations.
Cheers
- Gerhard
October 15, 2010 at 7:43 pm
Hey this is a great example and a very powerful tool thanks.
However, the Xml/excel file I was given did not have a ExpandedColumnCount attritbute. As a result the columns count is a 0, and then very bad things happen. What do I do if ExpandedColumnCount is not in the xml file?
October 24, 2010 at 7:52 pm
Hi Jonathan,
sry. that I respond that late, but I checked some excel files and exported it to XML. But in all that files, the ExpandedColumnCount attribute did exist.
Could it be, that you’re using another version of Excel that I do? I used Excel 2007 in order to do the XML export.
January 21, 2011 at 6:01 pm
alas I didn’t generate the xml file, it was handed to me.
however I have a nice solution. compare my version of extracworksheet vs yours:
/*********************************************/
private List ExtractWorksheet(string excelXmlFile, string worksheet, out string[] header)
{
const string SpreadSheetSchema = “urn:schemas-microsoft-com:office:spreadsheet”;
XmlNamespaceManager nsMgr = new XmlNamespaceManager(new NameTable());
nsMgr.AddNamespace(“ss”, SpreadSheetSchema);
/*
* Load XPath Document
*/
XPathDocument testData = new XPathDocument(excelXmlFile);
XPathNavigator navigator = testData.CreateNavigator();
/*
* Select amount of columns
*/
XPathExpression expression = navigator.Compile(string.Format(
“/ss:Workbook/ss:Worksheet[@ss:Name='{0}']/*”, worksheet));
expression.SetContext(nsMgr);
XPathNodeIterator expIterator = navigator.Select(expression);
//int columns = 0;
while (expIterator.MoveNext())
if (expIterator.Current.Name == “Table”) {
string attrib = expIterator.Current.GetAttribute(“ExpandedColumnCount”, SpreadSheetSchema);
//had to add this attribute to the data. not good!!!!
//columns = int.Parse(attrib);
break;
}//then
/*
* Select rows
*/
expression = navigator.Compile(string.Format(
“/ss:Workbook/ss:Worksheet[@ss:Name='{0}']/ss:Table/*”, worksheet));
expression.SetContext(nsMgr);
expIterator = navigator.Select(expression);
List rows = new List();
header = null;
while (expIterator.MoveNext()) {
if (expIterator.Current.HasChildren == false)
continue;
string[] row;
List row_list_form =new List();
//int currentColumn = 0;
/*
* Select columns
*/
XPathNodeIterator colIterator = expIterator.Current.SelectChildren(XPathNodeType.Element);
while (colIterator.MoveNext()) {
//row[currentColumn++] = colIterator.Current.Value;
row_list_form.Add(colIterator.Current.Value);
}//while
row = row_list_form.ToArray();
if (header == null)
header = row;
else
rows.Add(row);
}//while
return rows;
}//extractworksheet
/*****************************************/
/*****
Note the use of the List variable “row_list_form” it allows the data to be loaded dynamically so I don’t have to pre-declare the size. Afther is all done, I convert it back to a normal array to work with the rest of the program.
******/
Now as a follow-up, I now have an issue where I don’t know the name of the worksheet. How can I get a list of all the worksheets of an xml document???