Discrete Distributions

Introduction

This page discusses statistical distributions over discrete ranges, and how to implement them as DiscreteDistribution objects managing lists of DiscreteDistributionItem instances.

When the range of a distribution is discrete, its PDF maps integer values from the range to individual "point" densities in the probability domain. The point densities always sum to unity. The PDF presents visually as a bar graph.

The CDF for a discrete distribution maps integer values from the range to intervals of the probability domain; each interval combining the point density for the current value with the sum of point densities for all previous values:

CDF(k) = PDF(k) + CDF(k-1)

The CDF presents visually as an always-ascending bar graph or alternatively as a step function with fixed runs and variable rises. The bar or step associated with the largest range value (rightmost if the range is plotted horizontally) has a height of one.

The quantile function for a discrete distribution is the mathematical inverse of the CDF. The quantile function maps contiguous intervals of the probability domain to specific range values, with the stipulation that the intervals cover the entire probability domain. The quantile function is always ascending. It presents visually as a step function with variable runs and fixed rises.

Implementing a discrete PDF programmatically involves maintaining a collection of DiscreteDistributionItem instances, where a DiscreteDistributionItem contains a sample value and a weight. Notice "weight" rather than "probability" or "density". I don't find it practical, every time a change happens, to recalculate probabilities so they always sum to unity. Instead the DiscreteDistribution class maintains a running sum which is adjusted whenever a new item is added, whenever an item weight is altered, and whenever an existing item is deleted.

/**
 * The {@link DiscreteDistribution} class implements
 * functionality for a statistical distribution with an integer
 * range.
 * @author Charles Ames
 */
public class DiscreteDistribution
extends DistributionBase<Integer> {
   private double numericalMean;
   private double numericalVariance;
   private double sumOfWeights;
   private List<DiscreteDistributionItem> distributionItems;
   private List<DiscreteDistributionItem> unmodifiableItems;
   /**
    * Constructor for a {@link DiscreteDistribution} instance with
    * a container.
    * @param container A {@link WriteableEntity} which contains this
    * distribution.
    */
   public DiscreteDistribution(WriteableEntity container) {
      super(container);
      this.distributionItems = new ArrayList<DiscreteDistributionItem>();
      initialize();
   }
   public void initialize() {
      this.numericalMean = Double.NaN;
      this.numericalVariance = Double.NaN;
      this.sumOfWeights = 0.;
      this.distributionItems.clear();
      this.unmodifiableItems = null;
   }
   /**
    * Constructor for {@link DiscreteDistribution} instances.
    * @param container A {@link WriteableEntity} which contains this
    * distribution.
    * @param name The unit name, which must be unique under
    * the container.
    */
   public DiscreteDistribution(WriteableEntity container, String name) {
      this(container);
      setName(name);
   }
   /**
    * Constructor for a {@link DiscreteDistribution} instance with no
    * container.
    */
   public DiscreteDistribution() {
      super(null);
      this.distributionItems = new ArrayList<DiscreteDistributionItem>();
      updateUnmodifiableItems();
      this.sumOfWeights = Double.NaN;
   }
   private void updateUnmodifiableItems() {
      this.unmodifiableItems
         = Collections.unmodifiableList(distributionItems);
   }
   @Override
   public double densityAt(Integer value) {
      DiscreteDistributionItem item = findItem(value);
      return item.getWeight() / sumOfWeights();
   }
   @Override
   public double densityUpTo(Integer value) {
      DiscreteDistributionItem item = findItem(value);
      return item.getSum() / sumOfWeights();
   }
   @Override
   public double maxDensity() {
      DiscreteDistributionItem item = heaviestItem();
      return item.getWeight() / sumOfWeights();
   }
   @Override
   public int itemCount() {
      return distributionItems.size();
   }
   /**
    * Get the size of an array which would have one element
    * for each item value represented in this distribution.
    * @return The maximum item value, plus 1.
    */
   public int getLimit() {
      int size = distributionItems.size();
      if (0 == size) return 0;
      DiscreteDistributionItem item = distributionItems.get(size-1);
      return item.getValue() + 1;
   }

   @Override
   public double sumOfWeights() {
      if (!isNormalized())
         throw new UnsupportedOperationException(
               "Distribution has not been normalized");
      return sumOfWeights;
   }
   @Override
   public Integer quantile(double driver) {
      if (!isNormalized())
         throw new UnsupportedOperationException(
               "Distribution has not been normalized");
      Driver.checkDriverValue(driver);
      double chooser = driver * sumOfWeights;
      DiscreteDistributionItem item
         = chooseItem(chooser, 0, itemCount() - 1);
      int value = item.getValue();
      return value;
   }
   /**
    * Generate items for this distribution to emulate probabilities
    * of event counts per time unit in the Poisson point process.
    * The Poisson point process was never actually studied by the
    * mathematician named Poisson.
    * @param mean The average.
    * @param extent The distance from lowest range value (zero) to highest
    * range value, in standard deviations.
    * @throws IllegalArgumentException when the mean is not positive.
    * @throws IllegalArgumentException when the extent is not positive.
    */
   public void calculatePoisson(double mean, double extent) {
      if (MathMethods.TINY > mean)
         throw new IllegalArgumentException(
            "Mean must be positive");
      if (Double.isNaN(extent))
         throw new IllegalArgumentException(
            "Uninitialized extent argument");
      if (2. > extent)
         throw new IllegalArgumentException(
            "Extent must exceed two standard deviations");
      initialize();
      double deviation = Math.sqrt(mean);
      int itemCount = (int) Math.ceil(deviation*extent);
      double numerator = Math.exp(-mean);
      double denominator = 1.;
      double previousQuotient = 0.;
      for (int value = 0; value < itemCount; value++) {
         double quotient = numerator / denominator;
         if (quotient < MathMethods.TINY) {
            if (quotient < previousQuotient) break;
         }
         addItem(value, quotient);
         previousQuotient = quotient;
         numerator *= mean;
         if (value > 0) denominator *= (double) value;
      }
      normalize();
   }
   /**
    * Generate items for this distribution to emulate probabilities
    * for counts of successful outcomes from N Bernoulli trials.
    * A Bernoulli distribution happens when the trials count is 1.
    * @param trials The trial count.
    * @param successRate The success rate for a single trial.
    * @throws IllegalArgumentException when the trials count is not positive.
    * @throws IllegalArgumentException when the success rate is outside the
    * range from zero to unity.
    * @throws IllegalArgumentException when the extent is not positive.
    */
   public void calculateBinomial(int trials, double successRate) {
      if (1 > trials)
         throw new IllegalArgumentException(
            "Trial count must be positive");
      if (0. > successRate || 1. < successRate)
         throw new IllegalArgumentException(
            "Success rate must range from zero to unity");
      initialize();
      double failureRate = 1. - successRate;
      double f1 = 1.;
      double f2 = Math.pow(failureRate, trials);
      for (int value = 0; value <= trials; value++) {
         double w = MathMethods.combinations(trials, value) * f1 * f2;
         addItem(value, w);
         f1 *= successRate;
         f2 /= failureRate;
      }
      normalize();
   }
   /**
    * Generate items for this distribution to emulate probabilities
    * for counts of successful outcomes from Bernoulli trials until
    * the Nth failure.
    * A geometric distribution happens when the failures count is 1.
    * @param failures The trial count.
    * @param successRate The success rate for a single trial.
    * @param extent The distance from lowest range value (zero) to highest
    * range value, in standard deviations.
    * @throws IllegalArgumentException when the failures count is not positive.
    * @throws IllegalArgumentException when the success rate is outside the
    * range from zero to unity.
    * @throws IllegalArgumentException when the extent is not positive.
    */
   public void calculateNegativeBinomial(
         int failures, double successRate, double extent) {
      if (1 > failures)
         throw new IllegalArgumentException(
            "Failures count must be positive");
      if (0. > successRate || 1. < successRate)
         throw new IllegalArgumentException(
            "Success rate must range from zero to unity");
      if (Double.isNaN(extent))
         throw new IllegalArgumentException(
            "Uninitialized extent argument");
      if (2. > extent)
         throw new IllegalArgumentException(
            "Extent must exceed two standard deviations");
      initialize();
      double failureRate = 1. - successRate;
      double deviation = Math.sqrt(failures*successRate/failureRate);
      int itemCount = (int) Math.ceil(deviation*extent);
      double f = 1.;
      for (int value = 0; value < itemCount; value++) {
         double w = MathMethods.combinations(failures + value - 1, value) * f;
         addItem(value, w);
         f *= failureRate;
      }
      normalize();
   }
   /**
    * Generate items for this distribution to emulate probabilities
    * for counts of successful outcomes from T Bernoulli trials drawing
    * from M white balls (success) and N black balls (failure).
    * @param trials The trial count T.
    * @param successes The number of white balls M.
    * @param failures The number of black balls N.
    * @throws IllegalArgumentException when the trial count is not positive.
    * @throws IllegalArgumentException when the successes count is not positive.
    * @throws IllegalArgumentException when the failures count is not positive.
    */
   public void calculateHyperGeometric(
         int trials, int successes, int failures) {
      if (1 > trials)
         throw new IllegalArgumentException(
            "Trial count must be positive");
      if (1 > successes)
         throw new IllegalArgumentException(
            "Success count must be positive");
      if (1 > failures)
         throw new IllegalArgumentException(
            "Failure count must be positive");
      initialize();
      for (int value = 0; value <= trials; value++) {
         double numerator = (double) MathMethods.combinations(successes, value);
         numerator *= (double) MathMethods.combinations(failures, trials - value);
         double denominator
            = (double) MathMethods.combinations(successes+failures, trials);
         addItem(value, numerator / denominator);
      }
      normalize();
   }
   /**
    * Get the distribution items.
    * @return A collection of {@link DiscreteDistributionItem} instances,
    * presented in integer value order.
    */
   public Iterator<DiscreteDistributionItem> iterateItems() {
      return unmodifiableItems.iterator();
   }
   /**
    * Dereference a (value, weight) item from the collection.
    * @param index The position of the desired item.
    * @return The desired item.
    * @throws IndexOutOfBoundsException when the index is out of range.
    */
   public DiscreteDistributionItem getItem(int index) {
      return distributionItems.get(index);
   }
   protected void addItem(DiscreteDistributionItem item) {
      distributionItems.add(item);
   }
   protected void clearSumOfWeights() {
      int itemCount = itemCount();
      if (0 == itemCount) return;
      DiscreteDistributionItem item = distributionItems.get(itemCount-1);
      item.setSum(0.);
   }
   /**
    * Add an integer value with the associated weight.
    * If the value is already present, update its weight.
    * Otherwise append a new item at the end of the collection.
    * Denormalizes the distribution.
    * @param value The indicated value.
    * @param weight The associated weight.
    * @throws IllegalArgumentException when the values are not
    * presented in sequential order.
    * @throws IllegalArgumentException when the weights fall
    * outside the range from zero to unity.
    * @throws UnsupportedOperationException when the distribution
    * has already been normalized.
    */
   public void addItem(int value, double weight) {
      if (isNormalized())
         throw new UnsupportedOperationException("Distribution has already been normalized");
      DiscreteDistributionItem item = findItem(value);
      if (null == item) {
         int index = itemCount() - 1;
         if (0 < index) {
            DiscreteDistributionItem lastValue = getItem(index);
            if (value <= lastValue.getValue())
               throw new IllegalArgumentException(
                  "Values must be presented in sequential order");
         }
         item = new DiscreteDistributionItem(this, value, weight);
         distributionItems.add(item);
      }
      else {
         item.setWeight(weight);
      }
   }
   /**
    * Get the distribution item with the smallest value.
    * @return the distribution item with the smallest value.
    */
   public int getMinValue() {
      return distributionItems.get(0).getID();
   }
   /**
    * Get the distribution item with the largest value.
    * @return the distribution item with the largest value.
    */
   public int getMaxValue() {
      return distributionItems.get(itemCount()-1).getID();
   }
   /**
    * Get the distribution item with the heaviest weight.
    * @return The distribution item with the heaviest weight.
    */
   public DiscreteDistributionItem heaviestItem() {
      if (!isNormalized())
         throw new IllegalArgumentException(
            "Distribution has not been normalized");
      double heaviestWeight = 0.;
      DiscreteDistributionItem result = null;
      for (DiscreteDistributionItem item : distributionItems) {
         double weight = item.getWeight();
         if (weight > heaviestWeight) {
            result = item;
            heaviestWeight = weight;
         }
      }
      if (null == result)
         throw new RuntimeException("No heaviest-weight item found");
      return result;
   }
   /**
    * Get the distribution item with the lightest weight.
    * @return The distribution item with the lightest weight.
    */
   public DiscreteDistributionItem lightestItem() {
      if (!isNormalized())
         throw new IllegalArgumentException(
            "Distribution has not been normalized");
      double lightestWeight = Double.MAX_VALUE;
      DiscreteDistributionItem result = null;
      for (DiscreteDistributionItem item : distributionItems) {
         double weight = item.getWeight();
         if (weight > MathMethods.TINY && weight < lightestWeight) {
            result = item;
            lightestWeight = weight;
         }
      }
      if (null == result)
         throw new RuntimeException("No lightest-weight item found");
      return result;
   }
   /**
    * Find the {@link DiscreteDistributionItem} instance with the
    * indicated integer value.
    * @param value The indicated integer value.
    * @return The {@link DiscreteDistributionItem} instance with the
    * indicated integer value.
    */
   private DiscreteDistributionItem findItem(int value) {
      return findItem(value, 0, itemCount() - 1);
   }
   private DiscreteDistributionItem findItem(int value, int low, int high) {
      if (high < low)
         return null;
      int mid = (high + low) / 2;
      DiscreteDistributionItem item = getItem(mid);
      int midValue = item.getValue();
      if (value == midValue) return item;
      if (value < midValue) return findItem(value, 0, mid - 1);
      return findItem(value, mid + 1, high);
   }
   /**
    * Associate the indicated integer value with the indicated weight.
    * @param value The indicated value.
    * @param weight The weight to be associated with the indicated value.
    * @return True when the associated weight has changed; false otherwise.
    */
   public boolean setWeight(int value, double weight) {
      DiscreteDistributionItem item = findItem(value);
      if (null == item)
         throw new IllegalArgumentException("Invalid value " + value);
      return item.setWeight(weight);
   }
   @Override
   public void normalize() {
      sumOfWeights = 0.;
      numericalMean = 0.;
      for (DiscreteDistributionItem item : distributionItems) {
         double weight = item.getWeight();
         sumOfWeights += weight;
         item.setSum(sumOfWeights);
         int value = item.getID();
         numericalMean += weight * value;
      }
      if (sumOfWeights < MathMethods.TINY)
         throw new RuntimeException(
            "Distribution has no items with positive weight");
      numericalMean /= sumOfWeights;
      numericalVariance = 0.;
      for (DiscreteDistributionItem item : distributionItems) {
         int value = item.getID();
         double weight = item.getWeight();
         double d = value - numericalMean;
         numericalVariance += weight * d * d;
      }
      numericalVariance /= sumOfWeights;
      updateUnmodifiableItems();
   }
   @Override
   public boolean isNormalized() {
      return sumOfWeights > 0.;
   }
   private DiscreteDistributionItem chooseItem(
         double chooser, int low, int high) {
      if (low > high) {
         for (DiscreteDistributionItem v : distributionItems) {
            System.out.println(v
                  + " value " + v.getValue()
                  + " weight " + v.getWeight()
                  + " sum " + v.getSum());
         }
         throw new RuntimeException("Cannot locate item for driver "
         + MathMethods.formatDouble(chooser / sumOfWeights,3)
         + " chooser " + MathMethods.formatDouble(chooser,3)
         + " sum " + MathMethods.formatDouble(sumOfWeights,3));
      }
      int mid = (low + high) / 2;
      DiscreteDistributionItem item = getItem(mid);
      if (low == high) {
         return item;
      }
      double sum = item.getSum();
      double weight = item.getWeight();
      if (chooser >= sum ) {
         return chooseItem(chooser, mid+1, high);
      }
      if (chooser < sum - weight)  {
         return chooseItem(chooser, low, mid);
      }
      return item;
   }
   @Override
   public Integer minRange() {
      return getItem(0).getValue();
   }
   @Override
   public Integer maxRange() {
      return getItem(itemCount()-1).getValue();
   }
   @Override
   public Integer maxGraphValue(double tail) {
      DistributionBase.checkTail(tail);
      if (!isNormalized())
         throw new IllegalArgumentException("Distribution has not been normalized");
      double threshold = (1. - tail) * sumOfWeights;
      for (int index = itemCount(); --index >= 0;) {
         DiscreteDistributionItem item = getItem(index);
         if (item.getSum() - item.getWeight() > threshold) {
            return item.getValue();
         }
      }
      throw new RuntimeException("No maximum graph value");
   }
   @Override
   public Integer minGraphValue(double tail) {
      DistributionBase.checkTail(tail);
      if (!isNormalized())
         throw new IllegalArgumentException("Distribution has not been normalized");
      double threshold = sumOfWeights * tail;
      for (int index = 0; index < itemCount(); index++) {
         DiscreteDistributionItem item = getItem(index);
         if (item.getSum() > threshold) {
            return item.getValue();
         }
      }
      throw new RuntimeException("No maximum graph value");
   }
   /**
    * Locate the distribution item for the indicated value and
    * return the item's weight.
    * @param value The indicated value.
    * @return The weight associated with the indicated value.
    * @throws IllegalArgumentException when the value is valid in
    * the distribution.
    */
   public double getItemWeight(int value) {
      DiscreteDistributionItem item = findItem(value);
      if (null == item) throw new IllegalArgumentException(
            value + " is not valid in the distribution");
      return item.getWeight();
   }
   /**
    * Generate a text string listing the values and their associated weights.
    * @return A text string listing the values and their associated weights.
    */
   public String displayWeights() {
      StringBuilder builder = new StringBuilder();
      for (DiscreteDistributionItem item : distributionItems) {
         if (0 < builder.length()) builder.append(' ');
         builder.append(item.getValue());
         builder.append(':');
         builder.append(MathMethods.formatDouble(item.getWeight(), 3));
      }
      return builder.toString();
   }
   @Override
   public double numericalMean() {
      if (!isNormalized())
         throw new IllegalArgumentException(
            "Distribution has not been normalized");
      return numericalMean;
   }
   @Override
   public double numericalVariance() {
      if (!isNormalized())
         throw new IllegalArgumentException(
            "Distribution has not been normalized");
      return numericalVariance;
   }
}
Listing 1: The DiscreteDistribution implementation class.
/**
 * The {@link DiscreteDistributionItem} class defines an integer value of a discrete distribution and the weight
 * associated with this value.
 * @author Charles Ames
 */
public class DiscreteDistributionItem extends WriteableEntity {
   private double weight;
   private double sum;
   protected DiscreteDistributionItem(DiscreteDistribution container) {
      super(container);
      setIDQuality(AttributeQuality.FIXED);
      setWeight(0.);
   }
   protected DiscreteDistributionItem(DiscreteDistribution container, int value, double weight) {
      this(container);
      setID(value);
      setWeight(weight);
   }
   public DiscreteDistribution getContainer() {
      return (DiscreteDistribution) super.getContainer();
   }
   /**
    * Get the integer value.
    * @return The integer value.
    */
   public int getValue() {
      return getID();
   }
   /**
    * Get the weight associated with this value.
    * @return The weight associated with this value.
    */
   public double getWeight() {
      return weight;
   }
   /**
    * Set the weight associated with this value.  If the distribution has previously
    * been normalized, then calling this function undoes the normalization.
    * @param weight The weight intended to be associated with this value.
    * @throws IllegalArgumentException when the weight is negative.
    */
   public void setWeight(double weight) {
      if (weight < 0.) throw new IllegalArgumentException("Weight is negative");
      this.weight = weight;
      getContainer().clearSumOfWeights();
   }
   /**
    * Get the sum of weights associated with this value and its predecessor.
    * @return The sum of weights associated with this value and its predecessor.
    */
   public double getSum() {
      return sum;
   }
   protected void setSum(double sum) {
      this.sum = sum;
   }
}
Listing 2: The DiscreteDistributionItem implementation class.

Coding

There are two discrete distribution classes which together implement a composite design pattern where a single parent DiscreteDistribution instance (Listing 1) stands for the whole, while multiple child DiscreteDistributionItem instance (Listing 2) detail the parts. Package consumers typically interact with the parent only.

The type hierarchy for DiscreteDistribution is:

The type hierarchy for DiscreteDistributionItem is:

Construction begins with a call to initialize(), next iterates through calls to addItem(int valuedouble weight), and completes with a call to normalize(). The first argument to addItem() is a range value; values must be presented in ascending order, but there is no requirement for the kth value to be one more than the k-1st value. The first and last values presented define the extent of the distribution's range. The second argument to addItem() is a weight; weights may never be negative. There is no requirement that all the weights sum to unity.

The primary role for normalize() is to populate sum. , setting sum=0 for the item in position 0 and sum for subsequent items to weight+sum from the previous item.

In addition to supporting the PDF with value and weight fields, DiscreteDistributionItem supports the CDF and quantile function with sum fields. The sum fields are calculated by normalize(). For the 0th item, normalize() sets DiscreteDistriutionItem.sum to 0. For the k+1st item, normalize() sets DiscreteDistriutionItem.sum to sum+weight from the kth item.

Consumption begins once the call to normalize() returns. The normalize() method iterates through the DiscreteDistributionItem items, populating DiscreteDistributionItem.sum with the CDF up to — but not including — the current range value.

Method densityAt(Integer value) first identifies which DiscreteDistributionItem corresponds to value. It returns the item's weight divided by the distribution's sumOfWeights.

Method densityUpTo(Integer value) also starts by identifying which DiscreteDistributionItem corresponds to value. It returns sum+weight, divided by the distribution's sumOfWeights.

Method quantile(double driver) returns an Integer range value. A double chooser variable is calculated as driver times the distribution's sumOfWeights. Next step is a binary search to identify the DiscreteDistributionItem whose sum is largest yet still < the chooser. That item's value is returned.

© Charles Ames Page created: 2022-08-29 Last updated: 2022-08-29