Excel Doesn’t Always Excel

While working on one of the change requests for one of our web products, I came across a very peculiar issue with importing Excel data into the system. The usual way of uploading data from Excel spreadsheets is that you browse to the folder where the Excel file is located, select it then click some sort of “Upload” button and more often than not you would want to include all the header columns as well so you can map the right data with the right type. This process works just about fine in most cases. In this particular case however, a different approach to upload Excel data was adopted, that of copy and paste.

The way it works is, all the data is selected in the Excel spreadsheet and copied to clipboard (through usual Ctrl+C), then a button “Paste Data” in the system is clicked which
actually uses javascript behind the scenes to run Windows’ “paste” command to actually paste it into a hidden text box. From there its parsed out into individual rows and columns mapping with the right data type using C#’s Split() function, splitting by a delimiter in this case a tab “\t”.

The problem was: If, say, the excel spreadsheet had 20 columns and 10 rows and the last ‘n’ number of columns did not contain any data in all those 10 rows, those columns did
not get copied or more correctly, pasted. It turned out the way the copy-paste was internally working was to seek ahead and omit any last ‘n’ empty columns if all the rows
starting from the first data row, had those columns empty. This resulted in IndexOutOfRange exception because the indices that were being accessed in the parsing code did not actually exist because they were not copied/pasted. The code does a hard-coded parsing so it expected a particular column to be at a particular place, for e.g. column 5 is “Customer Name” then the array would access the index 4 to get the data corresponding to that column. So if that column was empty and omitted, array[4] would result in IOR exception. The peculiarity was, the first data row was read all the way till the end correctly even if the last few columns were empty, but any blank columns towards the end corresponding to the subsequent rows were omitted. The figure below shows the empty columns boxed in red which would be omitted during copy and paste operations:


The problem was made tricky because it wasn’t a code fault neither a data fault, so something between copying the data and pasting it went wrong. Fixing the Windows’ “Paste” command was a bit out of jurisdiction, so I started looking at the solution from outside-in. I observed, that no matter what, the headers were being read correctly and hence the string array containing the header values would be a correct indicator of how many columns are there. After that, when the data row was parsed out into another array, that count would show how many columns were actually copied, taking the difference of these two numbers would give the deficit and if I can append blank tab-delimited strings for the number equal to the deficit columns to the data array, I would have made both these arrays the same size and hence solved the problem.

The code below shows how I achieved this:

private string AbraKaDabra(string a_data, int a_headercount)
{
    string[] l_splits = a_data.Split(Helpers.Constants.CHR_TAB_DELIMITER);
    int l_howmanycols;
    //append only if there is a mismatch in the number of columns/items
    if (l_splits.Length < a_headercount)
    {
        l_howmanycols = a_headercount - l_splits.Length;
        for (int i = 0; i < l_howmanycols; i++)
        {
            a_data += Helpers.Constants.CHR_TAB_DELIMITER;
        }
    }

    return a_data;
}

Calling this function prior to saving the parsed data to the DB, fixed any omitted columns problem for good. Seems like a simple problem but it baffled me and my other team mate for a good couple of days. Most of the times, hacking pays off. 🙂

EDIT: The original code never failed because the data excel files uploaded before always had data in all columns or so I assume or may be the original developer had a data work around for the issue. Obviously, that wasn’t the best practice as has been made clear in this case.

One Reply to “Excel Doesn’t Always Excel”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.