The Little Things
Analyzing Business Data with Excel is a book by one of my favorite publishers that tries to show you how Excel can make your life easier. Throughout the book, you are presented with strategies to help you tackle business problems through the advanced use of spreadsheets. One of the problems discussed in the book is that of "Measuring Quality" (Chapter 5). In that chapter, the Author discusses "Statistical Process Control" techniques to evaluate the quality of a check processing operation. As part of the analysis, he notes on page 108 that:
In Statistical Process Control, it is considered significant if the metric being measured has been on the same side of the average for three consecutive days. This could mean that a trend has started.
Essentially, the Author compares today's metric to the historical average. He
uses a nested IF
that
returns 1 if the metric is above average, -1 if it is below, or 0 if it is equal to the average.
The formula looks like this:
$$code(lang=python) =IF(D19>G19,1,(IF(D19<G19,-1,0))) $$/code
This got me thinking - the use of the IF
function in spreadsheets is
pervasive. Looking at the formula above, though, makes you wonder how the next
person will decipher what the formula achieves. Or even how you will
decipher it 6 months down the road. You might argue (perhaps correctly) that
the above example is not difficult to interpret. But what if you had another
condition that you wanted to evaluate (or even more)? For example, what if you
wanted to evaluate the magnitute with which the metric differed from the
average. In that case, you might want:
Val | Test / Condition |
---|---|
2 | The metric is greater than twice the average |
1 | Greater than the average, but less than or equal to 2x the average |
0 | Equal to the average |
-1 | Less than the average, but greater than 0.5x the average |
-2 | Less than 0.5x the average |
The corresponding spreadsheet function would then be:
$$code(lang=python) =IF(D19>2G19,2,IF(D19>G19,1,IF(D19<0.5G19,-2,IF(D19<G19,-1,0)))) $$/code
That is virtually impenetrable. What if, instead, that same formula could be written as:
$$code(lang=lua) =if (D19 > G19 * 2) then 2 elseif (D19 > G19) then 1 elseif (D19 < G19 * 0.5) then -2 elseif (D19 < G19) then -1 else 0 end $$/code
I'm not going to argue that you won't have to think when reading the above
formula, but I would argue that it is at least an order of magnitude easier to
understand. This demonstrates one of the types of problems we're trying to
solve - making models not only easy to create, but also easy to understand (and
therefore scale). If you prefer the "old" way of doing things, you have the
ability to use a "spreadsheet like" IF
function in our software, except it's
called IFTHEN
(e.g., =IFTHEN(1=1,"yes","no")
). If you prefer the "more
natural" syntax noted above, use it (e.g., =if 1=1 then "yes" else "no" end
).
You can decide on a case by case basis which method you prefer. I would
recommend that if you want to look at more than 1 condition you use the "new
style" IF
(as I expect it will always be easier to read as compared to nested
IF
s) and choose for yourself otherwise.
One quick final note along a similar line is on the use of AND
and OR
functions. In most spreadsheets, you get something like:
$$code(lang=python) =IF(AND(A1=10,A2=30), 1, 2) $$/code
While not terribly difficult to read or understand, it does not appear to be optimal. In our software you have the ability to write:
$$code(lang=lua) =IFTHEN(A1=10 and A2=30, 1, 2) $$/code
or even:
$$code(lang=lua) =if (A1=10 and A2=30) then 1 else 2 end $$/code
which I find to be the most readable. OR
can be used the same way. Anyway,
I hope you enjoyed the post. If you haven't signed up for our beta you can do
so from our home page. If you want to contact us, we'd
love to hear from you at feedback@gridmule.com.