Continuous Beta Transform1

Introduction

The ContinuousBeta transform adapts values from the driver domain to a bounded range, where the relative concentrations are governed by a member of the beta family of distribution curves.

The range of values output by ContinuousBeta.convert() is controlled by two parameters implemented as Java fields: minRange and maxRange. These have the restriction that minRange < maxRange.

The shape of the distribution curve is controlled by two additional parameters: alpha, symbolized α and beta, symbolized β. Both shape parameters must be greater than zero.

Each ContinuousBeta instance internally maintains a ContinuousDistribution instance which divides the range from zero to unity into trapezoids of width 1/itemCount. The trapezoid height for sample value z is calculated using the formula:

Math.pow(zalpha - 1) * Math.pow(1 - zbeta - 1)

The convert() method maps a value x in the driver domain from zero to unity into a value v in the application-range from minRange to maxRange in two steps. The first step uses ContinuousDistribution.quantile() to recast the driver value x into an intermediate value z, also between zero and unity. The second step applies the linear interpolation formula:

v = (maxRange-minRange)*z + minRange.

For the beta family of distribution curves, Wikipedia gives a parametric mean (average, symbolized μ) of α / (α+β) and a parametric variance (squared deviation, symbolized σ2) of αβ / ((α+β)2(α+β+1)).

Profile

Figure 1 illustrates the influence which ContinuousBeta.convert() exerts over driver sequences when alpha is 5 and beta is 2. This panel was created using the same driver sources used for the ContinuousUniform, which earlier panel provides a basis for comparison.


Figure 1: Panel of ContinuousBeta output from three different Driver sources. Each row of graphs provides a time-series graph of samples (left) and a histogram analyzed from the same samples (right). The first row of graphs was generated using the standard random number generator. The second row was generated using the balanced-bit generator. The third row was generated using an ascending sequence of driver values, equally spaced from zero to unity.

The standard-random time-series graph (top row of Figure 1) has the same relative ups and downs as the standard-random time-series graph prepared for ContinuousUniform, but the specific values are squinched up toward the upper range bound. This difference becomes much clearer in the standard-random histogram, where the whitespace separating the vertical v axis from the smallest f(v) value progressively increases as v increases from zero to unity. Notice that while these histogram peaks and valleys are similar to those derived for ContinuousUniform, they are not the same. The fact that values squinch upwards means that range values which fell into the bottommost histogram region in the uniform histogram were spread across the bottom three regions here in the beta histogram. Likewise the range values which fell into the topmost histogram region here were spread across three regions in the uniform histogram.

The balanced-bit time-series (middle row of Figure 1) likewise has the same ups and downs as the balanced-bit time-series graph prepared for ContinuousUniform with values squinched similarly. Since balanced-bit sequences strive aggressively for uniformity, the histogram peaks and valleys are comparatively restrained.

The time-series graph generated using ascending, equally spaced driver values (bottom row of Figure 1) presents the quantile function for this instance of the continuous beta distribution. The histogram of sample values presents the distribution's probability density function or PDF. The PDF is an equal-ratios curve bending upward from f(v) = 1 when v = 0 to f(v) = 3 when v = 1. Looking back at the time-series graph, notice how the quantile function rises more steeply where the distribution is rarefied and less steeply where the distribution is concentrated.

For each graph in Figure 1 the average sample value is plotted as a dashed green line, while the interval between ± one standard deviation around the average is filled in with a lighter green background. For the ideally uniform driver values plotted in the third row of graphs, the average sample value is 0.719 and the standard deviation is 0.159. The interval from 0.719-0.159 to 0.719+0.159 is 2*0.159 = 0.318 = 32% of the full application range from zero to unity. Since the continuous uniform distribution had 58% of samples within ± one standard deviation of the mean, this suggests that with the beta distribution with alpha 5 and beta 2 is squeezing 58% of samples into 32% of the application range, giving a concentration rate of 58/32 = 1.81.

Since the bottommost row in Figure 1 illustrates the most ideal conditions under which a profile can be generated, the numerical average and deviation should closely match the parametric values supplied by Wikipedia. For α = 5 and β = 2, the parametric mean calculates out to μ = 5 / (5+2) = 5/7 = 0.714. The parametric variance (square of deviation) is σ2 = 5×2 / ((5+2)2(5+2+1)) = 10 / (7×7×8) = 10/392 = 0.0255 so the deviation is σ = √0.0255 = 0.159. Increasing the itemCount from 200 (used for Figure 1) to 500 produces a bottom-row numerical average of 0.716 without noticably changing the graph.

The Beta Family of Probability Curves


 β = 1,  β = 2,  β = 3,  β = 5,  β = 8
Figure 2 (a): Beta distribution curves for α = 1.

 β = 1,  β = 2,  β = 3,  β = 5,  β = 8
Figure 2 (b): Beta distribution curves for α = 2.

 β = 1,  β = 2,  β = 3,  β = 5,  β = 8
Figure 2 (c): Beta distribution curves for α = 3.

 β = 1,  β = 2,  β = 3,  β = 5,  β = 8
Figure 2 (d): Beta distribution curves for α = 5.

Figures 2 (a) through 2 (d) show how changes in parameter settings affect the distribution curves. Each figure provides two graphs. The upper graph shows the probability density function or PDF. The lower graph shows the cumulative distribution function or CDF.

The series of graphs start with α = 1 and β = 1; (Wikipedia shows things going a little crazy when these parameters fall below unity.) The graph for α = β = 1 ( in Figure 2 (a)) resolves to the flat continuous uniform curve with a deviation of σ = 0.289. For α = β = 2 ( in Figure 2 (b)), the curve becomes a gentle bump anchored at zero at both extremes, with a deviation of σ = 0.224. For α = β = 3 ( in Figure 2 (c)), the deviation narrows to σ = 0.188 and the curve just begins to flare outward at the extremes. For α = β = 5 ( in Figure 2 (d)), the deviation narrows further to σ = 0.151 and the regions near the extremes can definitely be described as tails.

For α ≠ β, keep in mind that the parameters are symmetric: the graph for α = B and β = A is the mirror image of the graph for α = A and β = B. The α parameter exerts a suppressive effect on the distribution near zero, while the β parameter exerts a suppressive effect on the distribution near unity. In consequence, the mean μ shifts leftward as β increases relative to α and rightward as α increases relative to β.

Coding

/**
 * The {@link ContinuousBeta} class maps driver values back to the range
 * from zero to unity.  Emphasis peaks in center when alpha equals beta; it
 * shifts leftward when alpha < beta and rightward when alpha > beta.  When
 * alpha and beta are both 1, the graph is uniform.
 * @author Charles Ames
 */
public class ContinuousBeta extends BoundedTransform {
   /**
    * Left-side attenuation.
    */
   private double alpha;
   /**
    * Right-side attenuation.
    */
   private double beta;
   /**
    * Determines the number of trapezoidal regions.
    */
   private int itemCount;
   /**
    * Determines whether a change in parameter values requires the distribution
    * to be recalculated.
    */
   private boolean valid;
   /**
    * Constructor for {@link ContinuousBeta} instances.
    * @param container An entity which contains this transform.
    */
   public ContinuousBeta(WriteableEntity container) {
      super(container);
      this.alpha = Double.NaN;
      this.beta = Double.NaN;
      this.itemCount = Integer.MIN_VALUE;
   }
   /**
    * Getter for {@link #alpha}.
    * @return The assigned {@link #alpha} value.
    * @throws UninitializedException when {@link #alpha} has not been initialized.
    */
   public double getAlpha() {
      if (Double.isNaN(alpha)) throw new UninitializedException("Alpha not initialized");
      return alpha;
   }
   /**
    * Setter for {@link #alpha}.
    * Distribution is recalculated.
    * @param alpha The intended {@link #alpha} value.
    * @throws IllegalArgumentException when the argument is not positive.
    */
   public void setAlpha(double alpha) {
      checkAlpha(alpha);
      if (this.alpha != alpha) {
         this.alpha = alpha;
         this.valid = false;
      }
   }
   /**
    * Check if the indicated value is suitable for {@link #alpha}.
    * @param alpha The indicated value.
    * @throws IllegalArgumentException when the argument is not positive.
    */
   public void checkAlpha(double alpha) {
      if (alpha < MathMethods.TINY)
         throw new IllegalArgumentException("Alpha "
               + MathMethods.df3.format(alpha) + " is not positive");
   }
   /**
    * Getter for {@link #beta}.
    * @return The assigned {@link #beta} value.
    * @throws UninitializedException when {@link #beta} is not initialized.
    */
   public double getBeta() {
      if (Double.isNaN(beta))
         throw new UninitializedException("Beta not initialized");
      return beta;
   }
   /**
    * Setter for {@link #beta}.
    * Distribution is recalculated.
    * @param beta The intended {@link #beta} value.
    * @throws IllegalArgumentException when the argument is not positive.
    */
   public void setBeta(double beta) {
      checkBeta(beta);
      if (this.beta != beta) {
         this.beta = beta;
         this.valid = false;
      }
   }
   /**
    * Check if the indicated value is suitable for {@link #beta}.;
    * @param beta The indicated value.
    * @throws IllegalArgumentException when the argument is not positive.
    */
   public void checkBeta(double beta) {
      if (beta < MathMethods.TINY)
         throw new IllegalArgumentException("Beta "
               + MathMethods.df3.format(beta) + " not positive");
   }
   /**
    * Getter for {@link #itemCount}.
    * Distribution is recalculated.
    * @return The assigned {@link #itemCount} value.
    * @throws UninitializedException when {@link itemCount} is not initialized.
    */
   public int itemCount() {
      if (Integer.MIN_VALUE == itemCount)
         throw new UninitializedException("Item count not initialized");
      return itemCount;
   }
   /**
    * Setter for {@link #itemCount}.
    * @param itemCount The intended {@link #itemCount} value.
    */
   public void setItemCount(int itemCount) {
      checkItemCount(itemCount);
      if (this.itemCount != itemCount) {
         this.itemCount = itemCount;
         this.valid = false;
      }
   }
   /**
    * Check if the indicated value is suitable for {@link #itemCount}.
    * @param itemCount The indicated value.
    */
   public void checkItemCount(int itemCount) {
      if (itemCount < 8)
         throw new IllegalArgumentException("Item count too small");
   }
   @Override
   public Double convert(double driver) {
      if (!valid) {
         getDistribution().calculateBeta(getAlpha(), getBeta(), itemCount());
         valid = true;
      }
      return super.convert(driver);
   }
}
Listing 1: The ContinuousBeta implementation class.

The type hierarchy for ContinuousBeta is:

Class ContinuousDistributionTransform embeds a ContinuousDistribution instance capable of approximating most any continuous distribution as a succession of trapezoids. Each ContinuousDistribution trapezoid item has left, right, origin, and goal fields.

Understand that the succession of trapezoids ranges from zero to unity, not minRange to maxRange. The trick with leveraging ContinuousDistribution instances is that the trapezoids need recalculating every time a parameter changes. Updating one single trapezoid item is not that big a deal, but more typically the number of will be 20 or more (my canned Normal distribution uses 200 trapezoids); also, the calculating formulas often include exponents. So it makes sense to abstract the range boundaries out of the distribution and to apply range scaling separately.

The distributing step of conversion happens in ContinuousDistributionTransform, where the convert() method does this:

return getDistribution().quantile(driver);

Range scaling happens in BoundedTransform, where the convert() method does this:

return interpolate(super.convert(driver));

And BoundedTransform.interpolate(factor) does this (ignoring pesky initialization checks):

return (maxRange-minRange)*factor + minRange;.

TransformBase maintains a valid field to flag parameter changes. This field starts out false and reverts to false with every time ContinuousBeta calls TransformBase.invalidate(). This happens with any change to alpha, beta, or itemCount. Any call to TransformBase.getDistribution() (and ContinuousDistributionTransform.convert() makes such a call) first creates the distribution if it does not already exist, then checks valid. If false, then getDistribution() calls validate(), which is abstract to TransformBase but whose implementation is made concrete by ContinuousBeta. And that particular implementation of validate() makes use of ContinuousDistribution.calculateBeta(alpha, beta, itemCount) to recalculate the succession of trapezoids.

Comments

  1. The present text is adapted from my Leonardo Music Journal article from 1991, "A Catalog of Statistical Distributions". The heading is "Beta", p. 63

© Charles Ames Page created: 2022-08-29 Last updated: 2022-08-29