Discrete Bell Curve1

Introduction

The DiscreteBinomial transform adapts values from the driver domain to a bounded integer range, where the weight associated with each range value follows a binomial distribution.

The binomial distribution comes from Jakob Bernoulli's urn model. The scenario involves N Bernoulli trials. These trials are independent; that is, the outcome of one trial has no influence upon any other trial, and they are identically distributed, which means that all trials have the same probability p of success. The scenario counts the number k of successful trials.

The integers output by DiscreteBinomial.convert() are controlled by two parameters implemented as Java fields. Field trials is an integer corresponding to the number of trials N in the urn scenario. Field weight is a double-precision number corresponding to the probability of single-trial success p in the urn scenario. This second field ranges from zero (always fails) to unity (always succeeds). The point density function f(k) is:

f(k) = 
N!
k!(N-k)!
pk(1-p)N-k

Where the exclamation point indicates factorial. This is a truly elegant bit of mathematics which I am not competent to explain.

For the binomial family of point density functions, Wikipedia gives a parametric mean (average, symbolized μ) of Np and a parametric variance (squared deviation, symbolized σ2) of Np(1−p).

The binomial distribution graphs out as a discrete bell curve. As the number of trials N grows, the binomial distribution converges to a normal distribution; that is, to a continuous bell curve. This normal distribution has mean μ = Np and deviation σ = √Np(1−p).

Scientific experiments often involve taking readings off gauges, with fine gradations indicated by calibrated marks, and with the white space in between providing one additional digit of accuracy. Thus readings taken off guages are inherently discrete. According to Judith Grabiner,2 19th century scientific experimenters noticed that readings off guages tend to cluster randomly around the 'correct' value. They also noticed that the shape of this random clustering was that of a discrete bell curve.

Profile

Figure 1 illustrates the influence which DiscreteBinomial.convert() exerts over driver sequences when trials = 10 and weight = 0.334. The vertical v axis ranges from 0 to 11; that is, the application range from 0 to 10 and the number of outcomes, 11. The horizontal k axis shows the sample values vk which have been obtained from driver values xk using convert(). Each left-side sample graph presents 200 values; the right-side histogram presents a sidewise bar for each range value.


Figure 1: Panel of DiscreteBinomial output from three different Driver sources. Each row of graphs provides a time-series graph of samples (left) and a histogram analyzed from the same samples (right). The first row of graphs was generated using the standard random number generator. The second row was generated using the balanced-bit generator. The third row was generated using an ascending sequence of driver values, equally spaced from zero to unity.

The source sequences used to create Figure 1 are the same sequences used to create the profile panel for ContinuousUniform, which passes through its driver values undecorated. You can view the actual source sequences by clicking the link. All three source sequences are nominally uniform. The first source is standard randomness from Lehmer. The second source is balanced-bit values from Balance. The third source is an ascending succession produced using DriverSequence.

For each row in Figure 1 the average sample value is plotted as a dashed green line, while the interval between ± one standard deviation around the average is filled in with a lighter green background. For the ideally uniform driver values plotted in the third row of graphs, the average sample value is 3.340 and the standard deviation is 1.491. These numerical summary statistics are indistinguishable from the ideal mean μ = Np = 10×0.334 = 3.34 and the ideal deviation σ = √Np(1−p) = √10×0.334×0.666) = 1.491.

The standard-random time-series (top row of Figure 1) bears comparison to the corresponding top-row graphic for DiscreteUniform, which employed the same random source sequence. The relative ups and downs are much alike. The calculated average of 3.360 differs from μ = 3.34 by 0.6%. The calculated deviation of 1.493 around this average differs from σ1.491 by 0.7%. However the top-row histogram differs noticably from the bottom-row histogram, which presents ideal point densities.

The balanced-bit time-series (middle row of Figure 1) likewise bears comparison to the corresponding middle-row graphic for DiscreteUniform, which employed the same balanced-bit source sequence. Again, the relative ups and downs are much alike. The calculated average of 3.315 differs from μ = 3.34 by 0.5%. The calculated deviation of 1.455 around this average differs from σ =  = 1.491 by 0.1%. The middle-row histogram is barely distinguishable from the bottom-row histogram.

The time-series graph generated using ascending, equally spaced driver values (bottom row of Figure 1) presents the quantile function for the binomial distribution with the indicated parameters. This is an irregular ascending step function, where the run of each step indicates the point density and the rise is fixed at one unit. The bottom-row histogram of sample values presents the distribution's probability density function or PDF.

The interval from 3.340−1.491 = 1.849 to 3.340+1.491 = 4.831 is 2*1.491/10 = 0.318 = 30% of the full application range from 0 to 10. Since the continuous uniform distribution had 58% of samples within ± one standard deviation of the mean, this suggests that with the binomial distribution with trials = 10 and weight = 0.334 is squeezing 58% of samples into 30% of the application range, giving a concentration rate of 58/30 = 1.93.

Coding

/**
 * The {@link DiscreteBinomial} class implements a discrete statistical transform based on the notion of a Bernoulli trial.
 * The Bernoulli trial was originally conceived by Jakob Bernoulli.
 * It is a random experiment with two outcomes:  success and failure, along with a probability p of success.
 * For example, to experience a trial with p=7/10 you can fill an urn with seven white balls and three black balls,
 * close your eyes, mix the balls around, and select one ball.
 * If the ball is white, the trial has succeeded.
 * If the ball is black, the trial has failed.
 * The {@link #weight} property of the the {@link DiscreteBinomial} class stands in for the probability of success p.
 * As a statistical transform, {@link DiscreteBinomial} does not itself employ probability or randomness.
 * Instead it responds to an externally generated driver sequence which may or may not be random.
 * {@link DiscreteDistribution#quantile(double)} converts the driver value to an outcome.
 * @author Charles Ames
 */
public class DiscreteBinomial extends DiscreteDistributionTransform {
   /**
    * Determines the success rate for a trial.
    */
   private double weight;
   /**
    * Constructor for {@link DiscreteBinomial} instances.
    * @param container An entity which contains this transform.
    */
   public DiscreteBinomial(WriteableEntity container) {
      super(container);
      this.weight = Double.NaN;
   }
   /**
    * Getter for {@link #weight} .
    * @return The assigned {@link #weight} value.
    * @throws UninitializedException when {@link #weight} has not been initialized.
    */
   public double getWeight() {
      if (Double.isNaN(weight)) throw new UninitializedException("Weight not initialized");
      return weight;
   }
   /**
    * Setter for {@link #weight}.
    * @param weight The intended {@link #weight} value.
    * @return True if {@link #weight} has changed; false otherwise.
    */
   public boolean setWeight(double weight) {
      checkWeight(weight);
      if (this.weight != weight) {
         this.weight = weight;
         invalidate();
         makeDirty();
         return true;
      }
      return false;
   }
   /**
    * Check if the indicated value is suitable for {@link #weight}.
    * @param weight The indicated value.
    */
   public void checkWeight(double weight) {
      if (0. > weight || 1. < weight)
         throw new IllegalArgumentException("Weight not in range from zero to unity");
   }
   @Override
   protected void validate(DistributionBase<Integer> distribution) {
      ((DiscreteDistribution) distribution).calculateBinomial(1, getWeight());
   }
}
Listing 1: The DiscreteBinomial implementation class.

The type hierarchy for DiscreteBinomial is:

DiscreteDistributionTransform embeds a DiscreteDistribution which manages the succession of value-weight items.

Each DiscreteBinomial instance internally maintains a DiscreteDistribution instance whose succession of items is populated by the call to DiscreteDistribution.calculateBinomial() in method DiscreteBinomial.validate(). This call to calculateBinomial() creates trials+1 items with weights according to the point density function given above.

The distributing step of conversion happens in DiscreteDistributionTransform, where the convert() method does this:

return getDistribution().quantile(driver);

TransformBase maintains a valid field to flag parameter changes. This field starts out false and reverts to false with every time DiscreteBinomial calls TransformBase.invalidate(). This happens with any change to trials or weight. Any call to TransformBase.getDistribution() (and DiscreteDistributionTransform.convert() makes such a call) first creates the distribution if it does not already exist, then checks valid. If false, then getDistribution() calls validate(), which is abstract to TransformBase but whose implementation is made concrete by DiscreteBinomial. And that particular implementation of validate() makes use of DiscreteDistribution.calculateBinomial(getTrials(), getWeight()) to recalculate the distribution items.

Comments

  1. The present text is adapted from my Leonardo Music Journal article from 1991, "A Catalog of Statistical Distributions". The heading is "Binomial", p. 59.
  2. Judith V. Grabiner, "Popular Statistics — Polling and Sampling", lecture 6 of Mathematics, Philosophy, and the Real World video series from The Great Courses.

© Charles Ames Page created: 2020-02-26 Last updated: 2020-02-26