16 Interval Membership
There are many situations in which categories are defined by intervals and, as a result, it is necessary to find the interval that contains a particular value. This chapter considers one specific way to organize the data and perform such a search.
Motivation
The U.S. tax code defines a group of intervals called tax brackets, each of which has an associated marginal tax rate. In 2017, these tax brackets (for single taxpayers) were defined as in Table 16.1. For example, a person earning $23,000 would be in the 15% marginal bracket (i.e., would pay 10.0% on the first $9,325, and 15.0% on everything over $9,325).
Our objective in this chapter is to organize the information in Table 16.1 in such a way that it is easy to find the marginal tax rate for any income.
From | To | Rate |
---|---|---|
0 | 9,325 | 10.0 |
9,326 | 37,950 | 15.0 |
37,951 | 91,900 | 25.0 |
91,901 | 191,650 | 28.0 |
191,651 | 416,700 | 33.0 |
416,701 | 418,400 | 35.0 |
418,401 | Above | 39.6 |
Review
Though Chapter 15 on lookup arrays used different terminology, some of the examples were closely related to the problem considered here. For example, consider the problem of finding the letter grade that corresponds to a particular numerical grade. Essentially, one needs to find the interval that contains the numerical grade. However, since all of the intervals are the same size, there is no reason to explicitly list the intervals. Hence, the code in Chapter 15 converts an interval like [latex][90, 99][/latex] to the index [latex]9[/latex]. In this chapter, the situations involve intervals which are irregular, and a different solution is required.
Thinking About The Problem
The tax table has two convenient properties (that are common to a wide variety of similar situations). First, the union of all of the intervals is the relevant subset of the real numbers (in this case, the non-negative reals). In other words, the tax table contains the marginal tax rate for every possible income. Second, the intervals are disjoint. That is, the intersection of any two intervals is the empty set.
Hence, each income has a unique marginal tax rate that can be determined using only a sequence of boundary values, [latex]b_{0}, b_{1}, \ldots, b_{n-1}[/latex]. Specifically, assuming you want to “cover” the entire set of real numbers, interval [latex]0[/latex] can be defined as [latex][-\infty, b_{0})[/latex], interval [latex]1[/latex] can be defined as [latex][b_{0}, b_{1})[/latex], interval [latex]2[/latex] can be defined as [latex][b_{1}, b_{2})[/latex], interval [latex]n-1[/latex] can be defined as [latex][b_{n-2}, b_{n-1})[/latex], and interval [latex]n[/latex] can be defined as [latex][b_{n-1}, \infty)[/latex]. For the tax example, the boundaries for single taxpayers are [latex]0[/latex], [latex]9326[/latex], [latex]37951[/latex], [latex]91901[/latex], [latex]191651[/latex], [latex]416701[/latex], [latex]418401[/latex], making the intervals [latex][-\infty, 0)[/latex], [latex][0, 9326)[/latex], [latex][9326, 37951)[/latex], [latex][37951, 91901)[/latex], [latex][91901, 191651)[/latex], [latex][191651, 416701)[/latex], [latex][416701, 418401)[/latex], [latex][418401, \infty)[/latex].
Since the boundaries are homogeneous (i.e., they are values of the same type) they can be stored in a single array. For example, the boundaries for single taxpayers can be stored in a int[]
named single
as follows:
int[] single = {0, 9326, 37951, 91901, 191651, 416701, 418401};
Given this representation of the intervals, you could search through them as follows:
public static int indexOf(int value, int[] boundary) { int i, n; n = boundary.length; for (i = 0; i < n-1; ++i) { if ((value >= boundary[i]) && (value < boundary[i + 1])) { return i + 1; } } return n; }
The Pattern
While the implementation above is fine, it has a couple of shortcomings. First, it uses a for
loop, which might lead someone reading the code to think that the loop is determinate (or definite) when, in fact, it isn’t. That is, someone reading the code might assume that there are always exactly n-1
iterations when there can be fewer. Second, the containment condition checks to see if the target value is greater than or equal to the left boundary and less then the right boundary at every iteration. However, both checks aren’t really necessary since the right boundary of interval [latex]n-1[/latex] is the same as the left boundary of interval [latex]n[/latex].
The first shortcoming can be corrected by using a while
loop. The second shortcoming can be corrected by continuing to loop as long as the target value is greater than or equal to the right boundary (meaning that the correct interval has not yet been found). Combining the two ideas leads to the following pattern:
public static int indexOf(int value, int[] boundary) { int i, n; n = boundary.length; i = 0; while ((i < n) && (value >= boundary[i])) ++i; return i; }
This algorithm increases the index as long as there are more intervals to check and the target value is greater than or equal to the right boundary.
Examples
Continuing with the tax example, you can now find the tax bracket for a particular income level as follows:
int[] single = {0, 9326, 37951, 91901, 191651, 416701, 418401}; int bracket; bracket = indexOf(125350, single);
You can then use a lookup-array (see Chapter 15) to find the marginal tax rate that corresponds to that tax bracket as follows:
double[] rate = {-1.0, 10.0, 15.0, 25.0, 28.0, 33.0, 35.0, 39.6}; double marginal; marginal = rate[bracket];
Some Warnings
The obvious thing to be careful about when using this pattern is which side of the interval is open and which side is closed. Making mistakes here can cause off-by-one defects that are difficult to find.
The other thing to be careful about is much less obvious. One might be tempted to combine the interval membership functionality and the look-up array functionality in a single method. For example, in the tax rate example, one might be tempted to do the following:
public static double taxRate(int income) { int[] single = {0, 9326, 37951, 91901, 191651, 416701, 418401}; double[] rate = {-1.0, 10.0, 15.0, 25.0, 28.0, 33.0, 35.0, 39.6}; return rate[indexOf(income, single)]; }
The drawback of this seemingly elegant idea is that one often wants to perform more than one lookup with the same index.
Perhaps the easiest way to see this is with the grade example from Chapter 15 on lookup arrays. One commonly wants to convert a numeric grade on a 0–100 scale to both a letter grade on an F–A scale and a numeric grade on a 0–4 scale. Hence, one wants to do one interval membership search and two array look-ups. This can be accomplished as follows:
int[] intervals = {0, 60, 63, 67, 70, 73, 77, 80, 83, 87, 90, 93}; double[] gp = { -1.0, 0.0, 0.7, 1.0, 1.3, 1.7, 2.0, 2.3, 2.7, 3.0, 3.3, 3.7, 4.0}; String[] letter = { "NA","F","D-","D","D+","C-","C","C+","B-","B","B+","A-","A"}; int i; String out; i = indexOf(88, intervals); o ut = String.format("Grade: %s (%3.1f)", letter[i], gp[i]);
There’s no reason to do the interval membership search separately for each of the two look-ups. Hence, it is better to have a separate indexOf()
method.
Looking Ahead
In some situations, the intervals don’t cover all the real numbers (i.e., there are gaps). In such situations, the value might not be in any interval. One way to handle this (and other situations) is to use conformal arrays as discussed in Chapter 17 — use one array to hold the left boundaries and another to hold the right boundaries.