Tuesday, February 28, 2006

Excel formulas and VBA user defined functions

Excel formulas and User Defined Functions

The conventional wisdom is that Excel formulas are superior to VBA-based User Defined Functions (UDFs) because the former are faster. Once that claim is debunked, the fallback argument of proponents of formulas is that use of UDFs lead to warnings about the file containing macros. This contrarian view not only argues that UDFs are better from a ‘total cost’ perspective but also shows how UDFs, properly written, can be very, very fast.

Of course, we are not discussing simple formulas but rather complex ones such as those used in the examples below.

In this day and age, one should find the claim of performance of a function to be of dubious merit. Granted it was important in the '70s and the '80s to conserve precious hardware resources – and, yes, I have done my share of scrounging – but that is no longer the case. No one in their right mind will believe that saving a CPU cycle or two is important to a computer running at 1.4 Giga – that’s right Giga – hertz! Or that saving a bit of memory here or a byte there has an serious performance benefit on a computer that supports if not 1GB of memory then at least 512MB.

Benefits of an UDF

Relative to an Excel formula, a UDF has three key positive factors, each of which should be paramount to a good software designer: transparency of intent, ease of understandability and maintenance flexibility.

Transparency of intent

What does the following formula do?

=SUMPRODUCT(LEN(C1)-LEN(SUBSTITUTE(C1,{"M";"D";"C";"L";"X";"V";"I"},"")),{1000;500;100;50;10;5;1})+SUMPRODUCT(LEN(C1)-LEN(SUBSTITUTE(C1,{"CM";"CD";"XC";"XL";"IX";"IV";0},"")),{-100;-100;-10;-10;-1;-1;0})

By contrast, the corresponding UDF in a worksheet would look like

=RomanToArabic(C1)

Need I write any more on the transparency of intent?

Ease of understandability

The formula above must be entered as an array formula. There is no way to document that requirement. In fact, there is no way to add any comment about how a formula works, what it is supposed to do, or why it is written the way it is. By contrast a UDF, written in VBA, can be documented ad nauseum.

Ease of maintainability

Suppose one were to discover a bug in a formula or wanted to enhance its capability. This requires finding *every* cell in every worksheet in every workbook that uses this formula and updating it! Miss one – just one – and the integrity of your work is hopelessly compromised. By contrast, with a properly deployed UDF, one must fix only *one* instance of the code. Now, every time a workbook is opened, it will automatically access the updated function!

The UDF and Excel’s macro warning

The next argument of the formula proponents is that if one creates a UDF and leaves it in a workbook that is subsequently opened by someone, Excel will alert the user to the potential risk of malicious code. However, distributing code through workbooks is far from the smart way. One should put all code into an add-in, which is loaded once by the user. Subsequently, workbooks that require a particular function will access the single add-in.

The real problems with formulas and UDFs

Having addressed the ‘conventional wisdom’ of the problems with UDFs let me discuss a few real problems and possible workarounds. The problems are not likely to be fixed without Microsoft making necessary enhancements to Excel. While in theory I know some about what is and is not in the pipeline for Excel’s future let us pretend otherwise, which, in any case, is not far from the truth.

At least two of the arguments against formulas that I mentioned above would disappear if Excel supported arguments in named formulas – what might be called codeless functions. Imagine being able to create a named formula (Insert | Name > Define…) such as

RomanToArabic(x)

=SUMPRODUCT(LEN(x)-LEN(SUBSTITUTE(x,{"M";"D";"C";"L";"X";"V";"I"},"")),{1000;500;100;50;10;5;1})+SUMPRODUCT(LEN(x)-LEN(SUBSTITUTE(x,{"CM";"CD";"XC";"XL";"IX";"IV";0},"")),{-100;-100;-10;-10;-1;-1;0})

Now, one could simply use

=RomanToArabic(C1)

There is a way to create named formulas that use pseudo-arguments but it is far from robust. See the POWER FORMULA TECHNIQUE section in the Excel Experts E-letter Issue 9 (http://j-walk.com/ss/excel/eee/eee009.txt)

On the other hand, the real limitation with a UDF is tied to how Excel maintains the link between the function in a cell and the VBA function in the add-in file. While it might show the function as =RomanToArabic(C1) it actually retains the full path to the file. To see this, unload the add-in (with Tools | Add-Ins…). The formula in the cell containing the function will look like

='C:\temp\Book1.xla'! RomanToArabic (C1)

Now, if the add-in file is moved to another folder and then loaded back in Excel, the program will not recognize this new add-in as the same one that was originally in the C:\Temp folder. The workbook using the function will show all sorts of errors until the user tells Excel that the two files are the same (with Edit | Links…)

The same problem exists if the workbook is moved to another machine where the add-in is located in a folder other than C:\Temp. Once again, the user will have to reconnect the function in the workbook with the function in the add-in through Edit | Links…

Blazing fast UDFs

As discussed above, the typical UDF is not sufficiently slower than an Excel formula to warrant the use of a hard-to-understand-hard-to-maintain formula over a UDF. But, can we write a VBA UDF that is really, really fast? The answer is yes. In some cases, a UDF is simply perceived to be slow. After all, that is the conventional wisdom, isn’t it? If it is written in VBA it has to be slower, doesn’t it? Not necessarily. For an example of a fast UDF see http://groups-beta.google.com/group/microsoft.public.excel.worksheet.functions/browse_thread/thread/26fba3fbc08c5db1/bec0d915203b0f96#bec0d915203b0f96

Further, a UDF designed properly for use as an array function can be blazing fast. See http://www.tushar-mehta.com/excel/newsgroups/rand_selection/vba.html#from_worksheet .

An Introduction

This blog will contain posts that run contrary to the "conventional wisdom" of software development. These ideas may be high-level as in pie in the sky project ideas or micro as in human vs. machine efficiency but irrespective of where they fit in the scheme of programming or development or management, they won't fit the norm.

Over time I hope to share ideas that are meant to provoke. I invite discussion and disagreement and hope we keep it at a professional level.

Do keep in mind that my guiding philosophy for software development is this. The system -- at the macro level and at the code at micro level -- must be reliable, robust, easy to understand, transparent in intent, and facilitate maintenance. The reader, especially the programmer, will most likely be tempted to add "and efficient." So, in a sense this introduction serves as the first contrarian post.

A system must be efficient only to the extent that it meets the demands placed on it. Keep in mind that in most software suites 80-90% of execution time is spent in 10-15% of the code. So, why focus efficiency concerns elsewhere? One also finds that micro-improvements are almost never great performance enhancers. Finding a better way to tackle the problem will lead to improvements a magnitude better.