Continuous Build-It-Yourself Transform1

Introduction

The ContinuousWeighted transform adapts values from the driver domain to a range divided into regions, where the distribution within each region is trapezoidal. The ContinuousWeighted transform was implemented not with any particular application in mind, but rather to satisfy a functional role for build-it-yourself distributions when none of the boilerplate transforms will do the job.

The range of values output by ContinuousWeighted.convert() is controlled by two parameters implemented as Java fields: minRange and maxRange. These have the restriction that minRange < maxRange.

Each ContinuousWeighted instance internally maintains a ContinuousDistribution instance whose succession of items is supplied either by the package consumer or the end user. The first item's left field is zero; each following item's left field equals its predecessor's right field, and the final item's right field is unity. The origin and goal fields for all items may not be negative. There is no requirement for any item's origin to equal its predecessor's goal, although a check for this should probably be an option.

The convert() method maps a value x in the driver domain from zero to unity into a value v in the application-range from minRange to maxRange in two steps. The first step uses ContinuousDistribution.quantile() to recast the driver value x into an intermediate value z, also between zero and unity. The second step applies the linear interpolation formula:

v = (maxRange-minRange)*z + minRange.

Profile

Figure 1 illustrates the influence which ContinuousWeighted.convert() exerts over driver sequences. This panel was created using the same driver sources used for ContinuousUniform, which earlier panel provides a basis for comparison. The distribution was specified using the text string:

2. 1. 3.; 1. 0. 0.; 1. 2. 2.; 1. 1. 1.;

This distribution was chosen to illustrate the features of the ContinuousWeighted transform which is able to do things like specify discontinuities between distribution items and to carve zones of exclusion out of the sample range. These are things the boilerplate transforms never do. I have not imagined any application which would actually call for doing these things, but that is no reason to disallow them.

The text above resolves into four items:

  1. An item 2 units wide (left is 0.0 and right is 0.4) with an origin of 1 and a goal of 3.
  2. An item 1 unit wide (left is 0.4 and right is 0.6) with an origin of 0 and a goal of 0.
  3. An item 1 units wide (left is 0.6 and right is 0.8) with an origin of 2 and a goal of 2.
  4. An item 1 units wide (left is 0.8 and right is 1.0) with an origin of 1 and a goal of 1.

Notice that range units are arbitrary and that ContinuousWeighted.setWeights() takes responsibility for calculating left and right fields so that the succession of items appropriately partitions the range from zero to unity. Also, the decimal points in the text were optional.


Figure 1: Panel of ContinuousWeighted output from three different Driver sources. Each row of graphs provides a time-series graph of samples (left) and a histogram analyzed from the same samples (right). The first row of graphs was generated using the standard random number generator. The second row was generated using the balanced-bit generator. The third row was generated using an ascending sequence of driver values, equally spaced from zero to unity.

The standard-random time-series graph (top row of Figure 1) has the same relative ups and downs as the standard-random time-series graph prepared for ContinuousUniform. However the zone from 0.4 to 0.6 is entirely excluded in favor of the region from 0.0 to 0.4, with relative weight 2×(1+3)/2 = 4 (area of a trapezoid) and the region from 0.6 to 1.0, with relative weight 1×2 + 1×1 = 3 (sum of two rectangles). The time-series graph shows this in the long vertical distances whenever samples cross over the exclusion zone. The histogram of sample values is somewhat distorted by the way the graph joins histogram-region tallies with line segments; what shows as steep transitions into and out of the exclusion zone should actually be horizontal discontinuities. Also, if you refer back to the corresponding histogram for ContinuousUniform, you'll see that the original driver sequence abruptly dipped from maximum density to minimum density near the topmost extent of the range. The driver maximum here overplays the steady weight of 2 in the region from 0.6 to 0.8, while the driver minimum underplays the steady weight of 1 in the region from 0.8 to 1.0.

The balanced-bit time-series (middle row of Figure 1) likewise has the same ups and downs as the balanced-bit time-series graph prepared for ContinuousUniform. The histogram regions from 0.6 to 0.8 and from 0.8 to 1.0 adhere much more closely to what the distribution prescribes, although the weights are still not discernable as steady.

The time-series graph generated using ascending, equally spaced driver values (bottom row of Figure 1) presents the percentile function for the custom distribution described above. The histogram of sample values presents the distribution's probability density function or PDF. The PDF does a good impression of a straight line sloping upward from f(v) = 1 when v = 0 to f(v) = 3 when v = 0.4. The exclusion zone is exactly as expected given that a weight of 0 absolutely prevents a value from being selected. The steady regions from 0.6 to 0.8 and from 0.8 to 1.0 seem actually that, except for small small kinks I don't think worth the trouble to find an explanation for. Looking back at the time-series graph, notice how the curve from vertical coordinates 0.0 to 0.4 decreases in steepness, reflecting linearly increasing weights (steeper percentile slopes spread driver values more thinly over the range). The exclusion zone from 0.4 to 0.6 manifests as a vertical discontinuity. The steady weights in the remaining two regions manifest as straight lines, the slope for the region from 0.6 to 0.8 having steady weight 2 is less steep than the slope for the region from 0.8 to 1.0 having steady weight 1.

Summary statistics such as the average sample value and the standard deviation have little relevance for Figure 1, where the average falls within a region of exclusion. I therefore disabled the feature which plots the average as a dashed green line and the standard deviation as an enclosing region.

Coding

/**
 * The {@link ContinuousWeighted} class models a continuous distribution
 * over the range from {@link #minRange} to {@link #maxRange}.
 * The distribution is represented internally as ranging from zero to unity; values are
 * rescaled to fit the desired range.
 * <p>
 * As a statistical transform, {@link ContinuousProportional} does not itself employ probability or randomness.
 * Instead it responds to an externally generated driver sequence which may or may not be random.
 * {@link ContinuousDistribution#quantile(double)} converts the driver value to an outcome.
 * </p>
 * @author Charles Ames
 */
public class ContinuousWeighted extends BoundedTransform {
   /**
    * Constructor for {@link ContinuousWeighted} instances.
    * @param container An entity which contains this transform.
    */
   public ContinuousWeighted(WriteableEntity container) {
      super(container);
   }
   /**
    * Set the weights data from a list of three-element item arrays.
    * @param items A list, each element of which is an array of three double-precision numbers:
    * ordinate range, origin weight, goal weight.
    * @throws IllegalArgumentException when the item list is null.
    * @throws IllegalArgumentException when the item list is empty.
    * @throws IllegalArgumentException when an individual item has other than three elements.
    * @throws IllegalArgumentException when the ordinate range of any item is not positive.
    * @throws IllegalArgumentException when the origin weight of any item is negative.
    * @throws IllegalArgumentException when the goal weight of any item is negative.
    */
   public void setWeights(List<Double[]> items) {
      super.addItems(items);
   }
   /**
    * Set the weights data from a text string.
    * @param text A list of decimal numbers, grouped into items by
    * the indicated delimiter.  Numbers within items are whitespace delimited.
    * @param delimiter
    */
   public void parseWeightsText(String text, String delimiter) {
      String[] itemTokens = text.split(delimiter);
      List<Double[]> items = new ArrayList<Double[]>();
      for (int itemIndex = 0; itemIndex < itemTokens.length; itemIndex++) {
         String itemToken = itemTokens[itemIndex];
         System.out.println("Item " + itemIndex + ": " + itemToken);
         try {
            Double[] item = new Double[3];
            int k = 0;
            String[] elementTokens = itemToken.split("\\s+");
            for (int elementIndex = 0; elementIndex < elementTokens.length; elementIndex++) {
               String token = elementTokens[elementIndex];
               if (StringUtils.isNotBlank(token)) {
                  item[k++] = MathMethods.doubleFromText(token);
               }
            }
            if (3 != k) throw new Exception();
            items.add(item);
         }
         catch (Exception e) {
            throw new IllegalArgumentException("Cannot parse [" + itemToken + "] into three item elements", e);
         }
      }
      super.addItems(items);
   }
}
Listing 1: The ContinuousWeighted implementation class.

The type hierarchy for ContinuousWeighted is:

BoundedTransform manages minRange and maxRange fields and the formula scaling intermediate values from zero to unity to the application range. Class BoundedTransform. ContinuousDistributionTransform embeds a ContinuousDistribution to manage the succession of trapezoids.

The all-important convert() method is implemented by BoundedTransform. ContinuousWeighted has no valid field to flag parameter changes. The item properties are in fact the parameters and if you want to change those, you simply make a call to ContinuousWeighted.setWeights(). This method is itself merely a proxy for ContinuousDistributionTransform.addItems().

The only original code provided by ContinuousWeighted happens in the parseWeightsText() method. This method makes rough-and-ready use of String.split() to break up a space-delimited string of numbers into single-value text snippets. I am not totally comfortable with the way this method allocates snippets off the heap, keeps them around just long enough to extract a number, then abandons the snippets to garbage collection. If I thought this method were to be frequently used (and I do not), I would seek a parsing algorithm which was less heap-intensive.

Comments

  1. The present text is adapted from my Leonardo Music Journal article from 1991, "A Catalog of Statistical Distributions". The heading is "Continuous Distributions", p. 61.

© Charles Ames Page created: 2022-08-29 Last updated: 2022-08-29