July 08 2009

Spreadsheet Skills: Working With Multiple Criteria

Tagged Under :

This month we look at another common modelling problem: summarising data that meets multiple criteria. By Liam Bastick, associate director of BPM Analytical Empowerment.

Query

I want to summarise the sales made by my business that meet various criteria (for example, sales for a particular region for a particular product). SUMIF only works with one criterion. Due to end user requirements, I don’t wish to use pivot tables. Is there any way I can obtain the analysis by using a formula?

Advice

This is an example of conditional summing – where you want to add numerical values provided that they meet certain criteria.

For example – imagine you were reviewing this data summary:

Example ‘dummy’ database

The function SUMIF(range,criterion,sum_range) is ideal for summing data based on one criterion:

  • range is the array that you wanted evaluated by the criterion (cells F12:F21 in this instance)
  • criterion in the form of a number, expression, or text defines which cell(s) will be added, for example, “X”, 1, G26 or “<>”&G27 (this last one means ‘not equal to the value in cell G27’)
  • sum_range are the actual cells to be added if their corresponding cells in range match the criterion.

So, to find the sales for Business Unit 1 in the above example, we would type =SUMIF(F12:F21,1,H12:H21) (which is $1,000). Or to find the total sales of Product X we might type =SUMIF(G12:G21,”X”,H12:H21) which is $1,200 (note that the text must be in inverted commas).

This month’s problem: how would you find the total sales of Product Z in Business Unit 1 using a formula?

That’s two criteria and SUMIF will not work with multiple conditions. Fortunately, there are several alternatives.

For once, I am going to stick with Excel 2003 and earlier, because there is a SUMIFS function in Excel 2007 that resolves our query in seconds:
=SUMIFS(sum_range,criterion_range1,criterion1, criterion_range2,criterion2,…)

But this function is new to Excel 2007, and is not backwards compatible with earlier versions of Excel.

I will consider other options instead.

Concatenation

It is often possible to cheat with SUMIF by making a ‘mega-criterion’ out of multiple criteria. This works on joining criteria together usually by using the ampersand (‘&’) operator.

Let’s consider our example, slightly revised, from above.

Revised, dummy database

A new column has been inserted (column H), with a formula combining the contents of columns F and G (for example, the formula in cell H12 is =F12&G12). Provided that all possible combinations are unique (that is, no duplicates can be generated), a simple SUMIF can then be applied – for example:
=SUMIF(H12:H21,”1Z”,I12:I21).

This is by far and away the simplest solution – if it works.

But it can fall down. In another example, the concatenation ”111” might refer to Product 1 in Business Unit 11 or Product 11 in Business Unit 1.

Conditional Sum Wizard

This wizard is not part of the default Excel package. It is an add-in – a supplementary programme that adds custom commands or features to Excel. Unless previously included, this add-in needs to be loaded. ALT+T+I (Tools -> Add-Ins) is the keystroke that works in all versions of Excel.

Add-ins dialog box

Once loaded, shortcut keys will not work (ALT+T+C gives rise to a conflict with Tools -> Customize and possibly Tools -> COM Add-Ins). In Excel 2003 and earlier versions, it can be located on the Tools dropdown menu, whereas in Excel 2007 it is in the final grouping, ‘Solutions’, of the ‘Formulas’ tab.

When activated, the user is prompted by a straightforward multistep wizard, as follows.

Step 1: Identify the database (including the column labels / ‘headers’)

Step 2: Set the conditions

Step 3: Insert the formula

It is recommended that the second option is selected in Step 3, otherwise a formula is constructed containing hard code rather than cell references. This gives rise to a ‘static’ rather than ‘dynamic’ formula. Note that choosing this second option changes the number of steps from four to six.

Final step(s): (assumptions and) output location

The final three steps simply require where the inputs/outputs should go.

In the example above, you can expect to get a formula similar to this:
{=SUM(IF($F$12:$F$21=G26,IF($G$12:$G$21=G27,$H$12:$H$21,0),0))}

The curly brackets at the beginning and the end of the formula cannot be typed in. If the formula had been manually generated, you would not press ENTER once the formula was written, but CTRL+SHIFT+ENTER.

This type of formula is known as an array formula. Array formulas perform multiple calculations on one or more sets of values, and then return either a single result (as in this instance) or multiple results across a range of cells.

Array formulae can be memory intensive and slow down a workbook’s calculation time significantly. The formulae are not always easy to follow – this is often exacerbated by the wizard using ‘nested’ functions (one inside another). In fact, with a little practice, it is often possible to write simpler, more efficient array formulae without the wizard.

But it’s a great place to start!

Dbase Functions

Perhaps one of the lesser known families of Excel functions is the Dbase family – for example, DSUM, DMAX, DMIN. Working similarly to their ‘non-D’ counterparts, these functions interrogate a database in a reasonably transparent way.

The key function for our query is DSUM(range,field,criteria):

  • range is the entire array that you wanted evaluated by the criteria (in our example, cells F12:H21 – that is, the whole table where the first row must be the headings/column labels)
  • field is the column heading cell reference, the column heading typed in inverted commas, or a number (without the quotation marks) which represents the position of the column within the array (1 for the first column, 2 for the second column, etc). In this instance field could be H11, “Sales” or 3
  • criteria is the range of cells that contains the conditions to be specified. This can be highly flexible (the same heading can be used more than once for instance) as long as it includes at least one column label in the top row and at least one cell below each column label in order to specify the condition.

From our example above, the following could be a criteria table:

Example criteria table

Our formula in this case would be DSUM(F11:H21,3,F26:G27) (=$700).

More than one row can be used. Criteria put in the same row are ‘AND’ criteria, so in our example above, to be summed, the Business Unit must be 1 AND the Product must be Z.

If more rows were added, each row is read as ‘OR’ criteria. For example:

Another example criteria table

Here, all sales are sold which relate to Business Unit 1 OR Product Z (=$2,600).

Dbase calculations are like halfway houses between formulas and pivot tables (that do not refresh automatically if data or assumptions are changed). Once mastered, they are fairly easy to construct and are quite transparent for users and developers alike. Unlike SUMIF and the Conditional Sum Wizard, OR criteria can easily be selected too.

Their main drawback here is if multiple results are required. The criteria tables needed may add significantly to the overall file size and confuse users as to which criteria table relates to which formula.

Sumproduct

This is my favourite function in Excel. For those who do not know it, SUMPRODUCT(array1*array2*…) appears quite humble upon first glance. Consider the following sales report:

Example sales report

The sales in column H are simply the product of columns F and G, for example, the formula in cell H12 is simply =F12*G12. Then, to calculate the entire amount cell H19 sums column H. This could all be performed much quicker using the following formula:
=SUMPRODUCT(F12:F17*G12:G17)

SUMPRODUCT does exactly what it says on the tin. It sums the individual products.

How does this help the cause? It is all to do with the properties of TRUE and FALSE in Excel, namely:

  • TRUE*number = number (for example, TRUE*7 = 7)
  • FALSE*number = 0 (for example, FALSE*7=0).

In our earlier example (repeated below)

Example ‘dummy’ database

we can test columns F and G to check whether they equal our required values. SUMPRODUCT could be used as follows:
=SUMPRODUCT((F12:F21=1)*(G12:G21=”Z”)*H12:H21).

For the purposes of this calculation, (F12:F21=1) replaces the contents of cells F12:F21 with either TRUE or FALSE depending on whether the value contained in each cell equals 1 or not.

Similarly, (G12:G21=”Z”) replaces the contents of cells G12:G21 with either TRUE or FALSE depending on whether the value “Z” is contained in each cell.

The only time cells H12:H21 will be summed is when the corresponding cell in the arrays F12:F21 and G12:G21 are both TRUE. Then you will get TRUE*TRUE*number, which equals the said number.

Notice that SUMPRODUCT is not an array formula (you do not use CTRL+SHIFT+ENTER), but it is an array function, so again it can use a lot of memory making the calculation speed of the file slow down.

Used sparingly, it can be a highly versatile addition to the modeller’s repertoire. It is a sophisticated function, but once you understand how it works, you can start to use SUMPRODUCT for a whole array of problems (pun intended!).

This Excel file shows examples of all of the above solutions and may help you to choose which solution to use in which scenario.

Excel Example File

March 06 2009

Spreadsheet Skills: Exorcising Phantom Links

Tagged Under :

We look at how to locate and remove unintended links. By Liam Bastick, associate director with BPM Analytical Empowerment.

Query

Whenever I open certain Excel workbooks, I get a message asking if I want to update the links. I’m unsure how to search my workbook to find out what these links are. Please help!

Update Links Prompt (Excel 2003)

Advice

In this instance, the user wishes to find the links and decide individually whether they should be retained or removed. (If the intention is merely to remove them, you may wish to consider the free Microsoft add-in, Delete Links Wizard.)

The first step is to ascertain what type of links you have. One way of doing this is to select Edit–>Links in Excel 2003 or earlier, or use the Connections section of the Data tab (see graphic below) in Excel 2007. Also Alt + E + K, the keyboard shortcut, works in all versions of Excel.

Location of Edit Links in Excel 2007

However this command will not be available in all instances. If it is, you will probably have Formula Links. If Edit–>Links is available, a dialog box will appear.

Edit Links dialog box (illustration)

There may be more than one file linked. Upon inspection, you may notice that one or more file may simply be an older version of the active workbook. If so, the active workbook can be substituted for each file in turn by clicking on the Change Source button (Alt + N) and following the directions. This will remove these referencing errors.

You may not have access to some files and this may cause errors if the file is inadvertently updated. By selecting the Break Link button, these links can be replaced by their current values. This action cannot be undone so you may wish to save the file beforehand in order to rectify errors.

Formula links

If you do have formula links, it is relatively straightforward to search for them:

  • Close all workbooks except the active workbook with the links in.
  • In Excel 2003 / earlier, on the Edit menu, click Find. In Excel 2007, click on Find & Select on the Editing section of the Home tab – or use Ctrl + F in all versions.

Location of Find & Select in Excel 2007

  • Click Options
  • In the Find what box, enter [
  • In the Within box, click Workbook
  • In the Look In box, click Formulas
  • Click Find All
  • In the box at the bottom, look in the Formula column for formulas that contain [
  • To select the cell with a link, select the row in the box at the bottom.

Find Dialog box (illustration)

Other ‘phantom’ links

There are other types of links – often referred to as ‘phantom’ links as they are harder to locate than formula links. But once you know, it’s easy!

Name links

This is probably the most common cause of phantom links: names that reference ranges in other workbooks.

Using Define Name in Excel 2003 or earlier or Name Manager in Excel 2007 (or Ctrl + F3), we can get a list of all the names in the workbook:

Define Name (Excel 2003 or earlier)

Name Manager (Excel 2007)

By scrolling through the list of names and examining the ‘Refers to’ section (a little cumbersome prior to Excel 2007, admittedly), names referring to other workbooks or containing erroneous references such as #REF! can be changed or deleted.

Chart links

If you have charts in your workbook, there are various places where hidden links could be lurking. Click on each text box or title and examine the formula bar, , for references to other workbooks.

Click on each data series in the chart and examine the SERIES formula for external references. These links can be removed by copying (as values!) the data located into the active workbook.

Object links

External references can also be attached to objects. The simplest way of reviewing objects in a workbook is to use the highly underrated Go To–>Special function (use the F5 function key and then click Special). In the next dialog box, select Objects, then click OK.

Go To Dialog Box  Go To Special: Selecting Objects

By pressing the Tab key and examining the formula bar, each object can be reviewed in turn for external references.

And finally…

Once you have completed the above process, unless your workbook includes web queries containing parameters (this could be an article in itself), all links should now have been reviewed. If the intention was to remove all such links, simply save and reopen once all deletions have been made.

Source: CIMA Global Website

March 05 2009

Spreadsheet skills: INDEX and MATCH make powerful combo

Tagged Under : ,

We look at one of the most powerful combinations of Excel functions for looking up data: INDEX and MATCH. By Liam Bastick, associate director with BPM Analytical Empowerment.

Query

Preparing financial statements in Excel, I want to construct a formula that will tell me if my balance sheet is balancing, and, if not, which is the first period that the misbalance occurs in.

Balance Sheet Summary Illustration

Advice

This is a common modelling query. The usual suspects, LOOKUP and HLOOKUP / VLOOKUP do not work here:

  • LOOKUP(lookup_value, lookup_vector,[result_vector]) gives the wrong date as the balance checks are not in strict ascending order (that is, ascending alphanumerically with no duplicates)
  • HLOOKUP(lookup_value,table_array,row_index_num,[range_lookup]) gives #VALUE! since the first row must contain the data to be ‘looked up’, but the Balance Check is in row 28 in our example above, whereas the dates we need to return are in row 6 – hence we get a syntax error.

There is a solution, however: INDEX MATCH. They form a highly versatile tag team, but are worth introducing individually.

Index

Essentially, INDEX(array, row_num,[column_num]) returns a value or the reference to a value from within a table or range (list). For example, INDEX({7,8,9,10,11,12},3) returns the third item in the list {7,8,9,10,11,12}, that is 9. This could have been a range: INDEX(A1:A10,5) gives the value in cell A5, etc.

INDEX can work in two dimensions as well (hence the column_num reference). Consider the following example:

Two Dimensional INDEX Example

INDEX(F11:L21,4,5) returns the value in the fourth row, fifth column of the table array F11:L21 (clearly 26 in the above illustration).

Match

MATCH(lookup_value,lookup_array,[match_type]) returns the relative position of an item in an array that (approximately) matches a specified value. It is not case sensitive.

The third argument, match_type, does not have to be entered, but for many situations, I strongly recommend that it is specified. It allows one of three values:

  • match_type 1 [default if omitted]: finds the largest value less than or equal to the lookup_value – but the lookup_array must be in strict ascending order, limiting flexibility
  • match_type 0: probably the most useful setting, MATCH will find the position of the first value that matches lookup_value exactly. The lookup_array can have data in any order and even allows duplicates
  • match type -1: finds the smallest value greater than or equal to the lookup_value – but the lookup_array must be in strict descending order, again limiting flexibility.

When using MATCH, if there is no (approximate) match, #N/A is returned (this may also occur if data is not correctly sorted depending upon match_type).

MATCH is fairly straightforward to use:

MATCH Example

In the figure above, MATCH(“d”,F12:F22,0) gives a value of 6, being the relative position of the first ‘d’ in the range. Note that having match_type 0 here is important. The data contains duplicates and is not sorted alphanumerically. Consequently, match_types 1 and -1 would give the wrong answer: 7 and #N/A respectively.

Index Match

While useful functions in their own right, combined they form a highly versatile partnership. Consider our original problem:

Balance Sheet Summary Illustration (Revisited)

MATCH(1,J28:O28,0) equals 5, that is, the first period the balance sheet does not balance in is Period 5. But we can do better than that.

INDEX(J6:O6,5) equals May-09, so combining the two functions:
INDEX(J6:O6,MATCH(1,J28:O28,0)) equals May-09 in one step, and answers our reader’s query.

Note how flexible this combination really is. We do not need to specify an order for the lookup range, we can have duplicates and the value to be returned does not have to be in a row / column below / to the right of the lookup range (indeed, it can be in another workbook never mind another worksheet!).

With a little practice, the above technique can be extended to match items on a case sensitive basis, use multiple criteria and even ‘grade’. The attached workbook provides several examples as illustrations.

Source: CIMA Global Website