Tuesday, February 28, 2006

Excel formulas and VBA user defined functions

Excel formulas and User Defined Functions

The conventional wisdom is that Excel formulas are superior to VBA-based User Defined Functions (UDFs) because the former are faster. Once that claim is debunked, the fallback argument of proponents of formulas is that use of UDFs lead to warnings about the file containing macros. This contrarian view not only argues that UDFs are better from a ‘total cost’ perspective but also shows how UDFs, properly written, can be very, very fast.

Of course, we are not discussing simple formulas but rather complex ones such as those used in the examples below.

In this day and age, one should find the claim of performance of a function to be of dubious merit. Granted it was important in the '70s and the '80s to conserve precious hardware resources – and, yes, I have done my share of scrounging – but that is no longer the case. No one in their right mind will believe that saving a CPU cycle or two is important to a computer running at 1.4 Giga – that’s right Giga – hertz! Or that saving a bit of memory here or a byte there has an serious performance benefit on a computer that supports if not 1GB of memory then at least 512MB.

Benefits of an UDF

Relative to an Excel formula, a UDF has three key positive factors, each of which should be paramount to a good software designer: transparency of intent, ease of understandability and maintenance flexibility.

Transparency of intent

What does the following formula do?

=SUMPRODUCT(LEN(C1)-LEN(SUBSTITUTE(C1,{"M";"D";"C";"L";"X";"V";"I"},"")),{1000;500;100;50;10;5;1})+SUMPRODUCT(LEN(C1)-LEN(SUBSTITUTE(C1,{"CM";"CD";"XC";"XL";"IX";"IV";0},"")),{-100;-100;-10;-10;-1;-1;0})

By contrast, the corresponding UDF in a worksheet would look like

=RomanToArabic(C1)

Need I write any more on the transparency of intent?

Ease of understandability

The formula above must be entered as an array formula. There is no way to document that requirement. In fact, there is no way to add any comment about how a formula works, what it is supposed to do, or why it is written the way it is. By contrast a UDF, written in VBA, can be documented ad nauseum.

Ease of maintainability

Suppose one were to discover a bug in a formula or wanted to enhance its capability. This requires finding *every* cell in every worksheet in every workbook that uses this formula and updating it! Miss one – just one – and the integrity of your work is hopelessly compromised. By contrast, with a properly deployed UDF, one must fix only *one* instance of the code. Now, every time a workbook is opened, it will automatically access the updated function!

The UDF and Excel’s macro warning

The next argument of the formula proponents is that if one creates a UDF and leaves it in a workbook that is subsequently opened by someone, Excel will alert the user to the potential risk of malicious code. However, distributing code through workbooks is far from the smart way. One should put all code into an add-in, which is loaded once by the user. Subsequently, workbooks that require a particular function will access the single add-in.

The real problems with formulas and UDFs

Having addressed the ‘conventional wisdom’ of the problems with UDFs let me discuss a few real problems and possible workarounds. The problems are not likely to be fixed without Microsoft making necessary enhancements to Excel. While in theory I know some about what is and is not in the pipeline for Excel’s future let us pretend otherwise, which, in any case, is not far from the truth.

At least two of the arguments against formulas that I mentioned above would disappear if Excel supported arguments in named formulas – what might be called codeless functions. Imagine being able to create a named formula (Insert | Name > Define…) such as

RomanToArabic(x)

=SUMPRODUCT(LEN(x)-LEN(SUBSTITUTE(x,{"M";"D";"C";"L";"X";"V";"I"},"")),{1000;500;100;50;10;5;1})+SUMPRODUCT(LEN(x)-LEN(SUBSTITUTE(x,{"CM";"CD";"XC";"XL";"IX";"IV";0},"")),{-100;-100;-10;-10;-1;-1;0})

Now, one could simply use

=RomanToArabic(C1)

There is a way to create named formulas that use pseudo-arguments but it is far from robust. See the POWER FORMULA TECHNIQUE section in the Excel Experts E-letter Issue 9 (http://j-walk.com/ss/excel/eee/eee009.txt)

On the other hand, the real limitation with a UDF is tied to how Excel maintains the link between the function in a cell and the VBA function in the add-in file. While it might show the function as =RomanToArabic(C1) it actually retains the full path to the file. To see this, unload the add-in (with Tools | Add-Ins…). The formula in the cell containing the function will look like

='C:\temp\Book1.xla'! RomanToArabic (C1)

Now, if the add-in file is moved to another folder and then loaded back in Excel, the program will not recognize this new add-in as the same one that was originally in the C:\Temp folder. The workbook using the function will show all sorts of errors until the user tells Excel that the two files are the same (with Edit | Links…)

The same problem exists if the workbook is moved to another machine where the add-in is located in a folder other than C:\Temp. Once again, the user will have to reconnect the function in the workbook with the function in the add-in through Edit | Links…

Blazing fast UDFs

As discussed above, the typical UDF is not sufficiently slower than an Excel formula to warrant the use of a hard-to-understand-hard-to-maintain formula over a UDF. But, can we write a VBA UDF that is really, really fast? The answer is yes. In some cases, a UDF is simply perceived to be slow. After all, that is the conventional wisdom, isn’t it? If it is written in VBA it has to be slower, doesn’t it? Not necessarily. For an example of a fast UDF see http://groups-beta.google.com/group/microsoft.public.excel.worksheet.functions/browse_thread/thread/26fba3fbc08c5db1/bec0d915203b0f96#bec0d915203b0f96

Further, a UDF designed properly for use as an array function can be blazing fast. See http://www.tushar-mehta.com/excel/newsgroups/rand_selection/vba.html#from_worksheet .

1 Comments:

Anonymous Anonymous said...

Hi One nasty gotcha relating to the location of addins you didnt mention got me real bad recently. I have a worksheet which is used by many people - I have generally been working on this on a network share (under source control). But the remote drive was slow as hell, so I decided to start working on my C drive instead. All seemed well, ans saves were taking a fraction of the time. However when I then copied the sheet to the network drive, it failed horribly on all users machines.

What I think happens is - if when you save the sheet, the addin is on the same drive as the spreadsheet, the reference to the addin is saved without reference to the drive, just the path. If you then copy the sheet to another drive and open it, it tries to open the addin from the same path on the new drive - and fails. If however you save the file to a different drive to the addin, then the full path & drive are preserved and you dont get the same issue.

1:48 PM  

Post a Comment

<< Home