Subarrays

David Bernstein

19 Subarrays

One of Java’s most convenient features is the length attribute associated with each array. Because of this feature, it isn’t necessary to pass both the array and its length to a method (as it is in some other languages). However, it leads to people creating methods with inflexible signatures. The subarray pattern is a way to remedy this shortcoming.

Motivation

Most of the examples of arrays that you have seen probably involve iterating over all of the elements. However, there are many situations in which you only need to iterate over some of the elements in the array. Unfortunately, because you have only/mostly been exposed to examples in which this isn’t the case, you may have started to use a pattern that makes this difficult (and, hence, is sometimes called an anti-pattern).

Review

Suppose you were asked to write a method that is passed an array of int values and returns the total. You would probably use an accumulator (see Chapter 13) as in the following method:

    public static int total(int[] data) {
        int result;
        result = 0;

        for (int i = 0; i < data.length; ++i) {
            result += data[i];
        }
        return result;
    }

This method takes advantage of the fact that the number of iterations is determinate (or definite) and uses a for loop. This method also takes advantage of the fact that the array has a length attribute and, so, does not include it in its signature.^[1]

The problem with this implementation is that it does not allow you to find the sum of a subset of the elements. For example, if the indexes represent months and the elements represent sales data, then you might want to find the total sales for only the second quarter (i.e., April, May, and June; 0-based months 3, 4, 5).

Thinking About The Problem

Obviously, what you need to do is add formal parameters to the method that describe the subset of interest. You could, for example, pass another array that contained the indexes to consider. Or, you could pass a conformal boolean[] array that contains true for the elements to use and false for the elements to ignore. In practice, however, both of these solutions are more complicated than is necessary because the most common need is to iterate over a contiguous subset of the elements (i.e., loosely speaking, a subset that contains no “gaps”, or a subset in which the difference between two sequential is exactly 1).

The Pattern

The easiest way to solve this problem of iterating over a contiguous subset of the elements is to add two formal parameters, the index to start with and the size of the subset.^[2] Traditionally, the index to start with is thought of as an offset from 0 and, hence, is named offset. The size of the subset is traditionally named length.

The example of monthly sales data can be illustrated as in Figure 19.1. As is apparent from this illustration, the bounds on the loop control variable are now going to be offset and offset + length. As is also apparent from the illustration, you have to be careful to use a strong inequality when comparing the loop control variable to the upper bound (i.e., < rather that <=) to avoid an off-by-one defect. So, the loop control variable will be initialized to offset and the iterations will continue as long as the loop control variable is strictly less than offset + length.

Figure 19.1. The Parameters for the Second Quarter of a Year of Monthly Data

The one drawback of adding these parameters is that they must be included in every invocation of the method. To avoid this, you can add an overloaded version of the method that is only passed the array and invokes the three-parameter version, passing it 0 for offset and the array’s length attribute for length.

Examples

It’s trivially easy to create and find examples of this pattern.

The Motivating Example

Returning to the example above, the three-parameter version of the method is as follows:

    public static double total(double[] data, int offset, int length) {
        double result;

        result = 0;        
        for (int i = offset; i < offset + length; ++i) {
            result += data[i];
        }
        return result;
    }

The one-parameter version then invokes the three-parameter version as follows:

    public static double total(double[] data) {
        return total(data, 0, data.length);
    }

Examples in the Java Library

Examples of this pattern abound in the Java library. For example, this pattern is used by the fill() methods and the copyOfRange() method in the Arrays class, and the arraycopy() method in the System class.

A Less Obvious Example

The same kind of thing can be done with rectangular arrays of arrays (sometimes called multidimensional arrays). In this case, the flexible method is as follows:

    public static double total(double[][] data, 
                               int roffset, int coffset, 
                               int rlength, int clength) {
        double result;
        
        result = 0;        
        for (int r = roffset; r < roffset + rlength; ++r) {
            for (int c = coffset; c < coffset + clength; ++c) {
                result += data[r][c];
            }
        }
        return result;
    }

The one-parameter version then invokes the five-parameter version as follows:

    public static double total(double[][] data) {
        return total(data, 0, 0, data.length, data[0].length);
    }

A Warning

It is possible for the invoker to pass an invalid offset and/or invalid length. Hence, you should validate these parameters and respond to invalid values appropriately.

There are two common responses to invalid values. One is to throw an IllegalArgumentException. The other is to use 0 for values of offset that are too small, the array’s length for values of offset that are too large, and the array’s length for values of offset + length that are too large.

If you have not yet learned about the length attribute, you can instead invoke the Array.getLength(), passing it the array. ↵
You could, instead, have the second parameter contain the index to end with. The approach discussed here is more common, in part because it eliminates any confusion about whether the end element is or isn't included. ↵

License

Icon for the Creative Commons Attribution 4.0 International License