Discrete Distributions
This page discusses statistical distributions over discrete ranges, and how to implement them as
DiscreteDistribution
objects managing lists of
DiscreteDistributionItem
instances.
When the range of a distribution is discrete, its PDF maps integer values
from the range to individual "point" densities in the probability domain. The point densities
always sum to unity. The PDF presents visually as a bar graph.
The CDF for a discrete distribution
maps integer values from the range to intervals of the probability domain; each
interval combining the point density for the current value with the sum of point densities for all previous values:
CDF(k) = PDF(k) + CDF(k-1)
The CDF presents visually as an always-ascending bar graph or alternatively as
a step function with fixed runs and variable rises. The bar or step associated with the largest range value (rightmost
if the range is plotted horizontally) has a height of one.
The quantile function for a discrete distribution is the mathematical
inverse of the CDF. The quantile function maps contiguous intervals of the
probability domain to specific range
values, with the stipulation that the intervals cover the entire probability domain.
The quantile function is always ascending. It presents visually as a step function with variable runs and fixed rises.
Implementing a discrete PDF programmatically involves maintaining a
collection of DiscreteDistributionItem
instances,
where a DiscreteDistributionItem
contains
a sample value and a weight. Notice "weight" rather than "probability" or "density". I don't find it practical, every time
a change happens, to recalculate probabilities so they always sum to unity. Instead the
DiscreteDistribution
class maintains a running sum
which is adjusted whenever a new item is added, whenever an item weight is altered, and whenever an existing
item is deleted.
/**
* The {@link DiscreteDistribution} class implements
* functionality for a statistical distribution with an integer
* range.
* @author Charles Ames
*/
public class DiscreteDistribution
extends DistributionBase<Integer> {
private double numericalMean;
private double numericalVariance;
private double sumOfWeights;
private List<DiscreteDistributionItem> distributionItems;
private List<DiscreteDistributionItem> unmodifiableItems;
/**
* Constructor for a {@link DiscreteDistribution} instance with
* a container.
* @param container A {@link WriteableEntity} which contains this
* distribution.
*/
public DiscreteDistribution(WriteableEntity container) {
super(container);
this.distributionItems = new ArrayList<DiscreteDistributionItem>();
initialize();
}
public void initialize() {
this.numericalMean = Double.NaN;
this.numericalVariance = Double.NaN;
this.sumOfWeights = 0.;
this.distributionItems.clear();
this.unmodifiableItems = null;
}
/**
* Constructor for {@link DiscreteDistribution} instances.
* @param container A {@link WriteableEntity} which contains this
* distribution.
* @param name The unit name, which must be unique under
* the container.
*/
public DiscreteDistribution(WriteableEntity container, String name) {
this(container);
setName(name);
}
/**
* Constructor for a {@link DiscreteDistribution} instance with no
* container.
*/
public DiscreteDistribution() {
super(null);
this.distributionItems = new ArrayList<DiscreteDistributionItem>();
updateUnmodifiableItems();
this.sumOfWeights = Double.NaN;
}
private void updateUnmodifiableItems() {
this.unmodifiableItems
= Collections.unmodifiableList(distributionItems);
}
@Override
public double densityAt(Integer value) {
DiscreteDistributionItem item = findItem(value);
return item.getWeight() / sumOfWeights();
}
@Override
public double densityUpTo(Integer value) {
DiscreteDistributionItem item = findItem(value);
return item.getSum() / sumOfWeights();
}
@Override
public double maxDensity() {
DiscreteDistributionItem item = heaviestItem();
return item.getWeight() / sumOfWeights();
}
@Override
public int itemCount() {
return distributionItems.size();
}
/**
* Get the size of an array which would have one element
* for each item value represented in this distribution.
* @return The maximum item value, plus 1.
*/
public int getLimit() {
int size = distributionItems.size();
if (0 == size) return 0;
DiscreteDistributionItem item = distributionItems.get(size-1);
return item.getValue() + 1;
}
@Override
public double sumOfWeights() {
if (!isNormalized())
throw new UnsupportedOperationException(
"Distribution has not been normalized");
return sumOfWeights;
}
@Override
public Integer quantile(double driver) {
if (!isNormalized())
throw new UnsupportedOperationException(
"Distribution has not been normalized");
Driver.checkDriverValue(driver);
double chooser = driver * sumOfWeights;
DiscreteDistributionItem item
= chooseItem(chooser, 0, itemCount() - 1);
int value = item.getValue();
return value;
}
/**
* Generate items for this distribution to emulate probabilities
* of event counts per time unit in the Poisson point process.
* The Poisson point process was never actually studied by the
* mathematician named Poisson.
* @param mean The average.
* @param extent The distance from lowest range value (zero) to highest
* range value, in standard deviations.
* @throws IllegalArgumentException when the mean is not positive.
* @throws IllegalArgumentException when the extent is not positive.
*/
public void calculatePoisson(double mean, double extent) {
if (MathMethods.TINY > mean)
throw new IllegalArgumentException(
"Mean must be positive");
if (Double.isNaN(extent))
throw new IllegalArgumentException(
"Uninitialized extent argument");
if (2. > extent)
throw new IllegalArgumentException(
"Extent must exceed two standard deviations");
initialize();
double deviation = Math.sqrt(mean);
int itemCount = (int) Math.ceil(deviation*extent);
double numerator = Math.exp(-mean);
double denominator = 1.;
double previousQuotient = 0.;
for (int value = 0; value < itemCount; value++) {
double quotient = numerator / denominator;
if (quotient < MathMethods.TINY) {
if (quotient < previousQuotient) break;
}
addItem(value, quotient);
previousQuotient = quotient;
numerator *= mean;
if (value > 0) denominator *= (double) value;
}
normalize();
}
/**
* Generate items for this distribution to emulate probabilities
* for counts of successful outcomes from N Bernoulli trials.
* A Bernoulli distribution happens when the trials count is 1.
* @param trials The trial count.
* @param successRate The success rate for a single trial.
* @throws IllegalArgumentException when the trials count is not positive.
* @throws IllegalArgumentException when the success rate is outside the
* range from zero to unity.
* @throws IllegalArgumentException when the extent is not positive.
*/
public void calculateBinomial(int trials, double successRate) {
if (1 > trials)
throw new IllegalArgumentException(
"Trial count must be positive");
if (0. > successRate || 1. < successRate)
throw new IllegalArgumentException(
"Success rate must range from zero to unity");
initialize();
double failureRate = 1. - successRate;
double f1 = 1.;
double f2 = Math.pow(failureRate, trials);
for (int value = 0; value <= trials; value++) {
double w = MathMethods.combinations(trials, value) * f1 * f2;
addItem(value, w);
f1 *= successRate;
f2 /= failureRate;
}
normalize();
}
/**
* Generate items for this distribution to emulate probabilities
* for counts of successful outcomes from Bernoulli trials until
* the Nth failure.
* A geometric distribution happens when the failures count is 1.
* @param failures The trial count.
* @param successRate The success rate for a single trial.
* @param extent The distance from lowest range value (zero) to highest
* range value, in standard deviations.
* @throws IllegalArgumentException when the failures count is not positive.
* @throws IllegalArgumentException when the success rate is outside the
* range from zero to unity.
* @throws IllegalArgumentException when the extent is not positive.
*/
public void calculateNegativeBinomial(
int failures, double successRate, double extent) {
if (1 > failures)
throw new IllegalArgumentException(
"Failures count must be positive");
if (0. > successRate || 1. < successRate)
throw new IllegalArgumentException(
"Success rate must range from zero to unity");
if (Double.isNaN(extent))
throw new IllegalArgumentException(
"Uninitialized extent argument");
if (2. > extent)
throw new IllegalArgumentException(
"Extent must exceed two standard deviations");
initialize();
double failureRate = 1. - successRate;
double deviation = Math.sqrt(failures*successRate/failureRate);
int itemCount = (int) Math.ceil(deviation*extent);
double f = 1.;
for (int value = 0; value < itemCount; value++) {
double w = MathMethods.combinations(failures + value - 1, value) * f;
addItem(value, w);
f *= failureRate;
}
normalize();
}
/**
* Generate items for this distribution to emulate probabilities
* for counts of successful outcomes from T Bernoulli trials drawing
* from M white balls (success) and N black balls (failure).
* @param trials The trial count T.
* @param successes The number of white balls M.
* @param failures The number of black balls N.
* @throws IllegalArgumentException when the trial count is not positive.
* @throws IllegalArgumentException when the successes count is not positive.
* @throws IllegalArgumentException when the failures count is not positive.
*/
public void calculateHyperGeometric(
int trials, int successes, int failures) {
if (1 > trials)
throw new IllegalArgumentException(
"Trial count must be positive");
if (1 > successes)
throw new IllegalArgumentException(
"Success count must be positive");
if (1 > failures)
throw new IllegalArgumentException(
"Failure count must be positive");
initialize();
for (int value = 0; value <= trials; value++) {
double numerator = (double) MathMethods.combinations(successes, value);
numerator *= (double) MathMethods.combinations(failures, trials - value);
double denominator
= (double) MathMethods.combinations(successes+failures, trials);
addItem(value, numerator / denominator);
}
normalize();
}
/**
* Get the distribution items.
* @return A collection of {@link DiscreteDistributionItem} instances,
* presented in integer value order.
*/
public Iterator<DiscreteDistributionItem> iterateItems() {
return unmodifiableItems.iterator();
}
/**
* Dereference a (value, weight) item from the collection.
* @param index The position of the desired item.
* @return The desired item.
* @throws IndexOutOfBoundsException when the index is out of range.
*/
public DiscreteDistributionItem getItem(int index) {
return distributionItems.get(index);
}
protected void addItem(DiscreteDistributionItem item) {
distributionItems.add(item);
}
protected void clearSumOfWeights() {
int itemCount = itemCount();
if (0 == itemCount) return;
DiscreteDistributionItem item = distributionItems.get(itemCount-1);
item.setSum(0.);
}
/**
* Add an integer value with the associated weight.
* If the value is already present, update its weight.
* Otherwise append a new item at the end of the collection.
* Denormalizes the distribution.
* @param value The indicated value.
* @param weight The associated weight.
* @throws IllegalArgumentException when the values are not
* presented in sequential order.
* @throws IllegalArgumentException when the weights fall
* outside the range from zero to unity.
* @throws UnsupportedOperationException when the distribution
* has already been normalized.
*/
public void addItem(int value, double weight) {
if (isNormalized())
throw new UnsupportedOperationException("Distribution has already been normalized");
DiscreteDistributionItem item = findItem(value);
if (null == item) {
int index = itemCount() - 1;
if (0 < index) {
DiscreteDistributionItem lastValue = getItem(index);
if (value <= lastValue.getValue())
throw new IllegalArgumentException(
"Values must be presented in sequential order");
}
item = new DiscreteDistributionItem(this, value, weight);
distributionItems.add(item);
}
else {
item.setWeight(weight);
}
}
/**
* Get the distribution item with the smallest value.
* @return the distribution item with the smallest value.
*/
public int getMinValue() {
return distributionItems.get(0).getID();
}
/**
* Get the distribution item with the largest value.
* @return the distribution item with the largest value.
*/
public int getMaxValue() {
return distributionItems.get(itemCount()-1).getID();
}
/**
* Get the distribution item with the heaviest weight.
* @return The distribution item with the heaviest weight.
*/
public DiscreteDistributionItem heaviestItem() {
if (!isNormalized())
throw new IllegalArgumentException(
"Distribution has not been normalized");
double heaviestWeight = 0.;
DiscreteDistributionItem result = null;
for (DiscreteDistributionItem item : distributionItems) {
double weight = item.getWeight();
if (weight > heaviestWeight) {
result = item;
heaviestWeight = weight;
}
}
if (null == result)
throw new RuntimeException("No heaviest-weight item found");
return result;
}
/**
* Get the distribution item with the lightest weight.
* @return The distribution item with the lightest weight.
*/
public DiscreteDistributionItem lightestItem() {
if (!isNormalized())
throw new IllegalArgumentException(
"Distribution has not been normalized");
double lightestWeight = Double.MAX_VALUE;
DiscreteDistributionItem result = null;
for (DiscreteDistributionItem item : distributionItems) {
double weight = item.getWeight();
if (weight > MathMethods.TINY && weight < lightestWeight) {
result = item;
lightestWeight = weight;
}
}
if (null == result)
throw new RuntimeException("No lightest-weight item found");
return result;
}
/**
* Find the {@link DiscreteDistributionItem} instance with the
* indicated integer value.
* @param value The indicated integer value.
* @return The {@link DiscreteDistributionItem} instance with the
* indicated integer value.
*/
private DiscreteDistributionItem findItem(int value) {
return findItem(value, 0, itemCount() - 1);
}
private DiscreteDistributionItem findItem(int value, int low, int high) {
if (high < low)
return null;
int mid = (high + low) / 2;
DiscreteDistributionItem item = getItem(mid);
int midValue = item.getValue();
if (value == midValue) return item;
if (value < midValue) return findItem(value, 0, mid - 1);
return findItem(value, mid + 1, high);
}
/**
* Associate the indicated integer value with the indicated weight.
* @param value The indicated value.
* @param weight The weight to be associated with the indicated value.
* @return True when the associated weight has changed; false otherwise.
*/
public boolean setWeight(int value, double weight) {
DiscreteDistributionItem item = findItem(value);
if (null == item)
throw new IllegalArgumentException("Invalid value " + value);
return item.setWeight(weight);
}
@Override
public void normalize() {
sumOfWeights = 0.;
numericalMean = 0.;
for (DiscreteDistributionItem item : distributionItems) {
double weight = item.getWeight();
sumOfWeights += weight;
item.setSum(sumOfWeights);
int value = item.getID();
numericalMean += weight * value;
}
if (sumOfWeights < MathMethods.TINY)
throw new RuntimeException(
"Distribution has no items with positive weight");
numericalMean /= sumOfWeights;
numericalVariance = 0.;
for (DiscreteDistributionItem item : distributionItems) {
int value = item.getID();
double weight = item.getWeight();
double d = value - numericalMean;
numericalVariance += weight * d * d;
}
numericalVariance /= sumOfWeights;
updateUnmodifiableItems();
}
@Override
public boolean isNormalized() {
return sumOfWeights > 0.;
}
private DiscreteDistributionItem chooseItem(
double chooser, int low, int high) {
if (low > high) {
for (DiscreteDistributionItem v : distributionItems) {
System.out.println(v
+ " value " + v.getValue()
+ " weight " + v.getWeight()
+ " sum " + v.getSum());
}
throw new RuntimeException("Cannot locate item for driver "
+ MathMethods.formatDouble(chooser / sumOfWeights,3)
+ " chooser " + MathMethods.formatDouble(chooser,3)
+ " sum " + MathMethods.formatDouble(sumOfWeights,3));
}
int mid = (low + high) / 2;
DiscreteDistributionItem item = getItem(mid);
if (low == high) {
return item;
}
double sum = item.getSum();
double weight = item.getWeight();
if (chooser >= sum ) {
return chooseItem(chooser, mid+1, high);
}
if (chooser < sum - weight) {
return chooseItem(chooser, low, mid);
}
return item;
}
@Override
public Integer minRange() {
return getItem(0).getValue();
}
@Override
public Integer maxRange() {
return getItem(itemCount()-1).getValue();
}
@Override
public Integer maxGraphValue(double tail) {
DistributionBase.checkTail(tail);
if (!isNormalized())
throw new IllegalArgumentException("Distribution has not been normalized");
double threshold = (1. - tail) * sumOfWeights;
for (int index = itemCount(); --index >= 0;) {
DiscreteDistributionItem item = getItem(index);
if (item.getSum() - item.getWeight() > threshold) {
return item.getValue();
}
}
throw new RuntimeException("No maximum graph value");
}
@Override
public Integer minGraphValue(double tail) {
DistributionBase.checkTail(tail);
if (!isNormalized())
throw new IllegalArgumentException("Distribution has not been normalized");
double threshold = sumOfWeights * tail;
for (int index = 0; index < itemCount(); index++) {
DiscreteDistributionItem item = getItem(index);
if (item.getSum() > threshold) {
return item.getValue();
}
}
throw new RuntimeException("No maximum graph value");
}
/**
* Locate the distribution item for the indicated value and
* return the item's weight.
* @param value The indicated value.
* @return The weight associated with the indicated value.
* @throws IllegalArgumentException when the value is valid in
* the distribution.
*/
public double getItemWeight(int value) {
DiscreteDistributionItem item = findItem(value);
if (null == item) throw new IllegalArgumentException(
value + " is not valid in the distribution");
return item.getWeight();
}
/**
* Generate a text string listing the values and their associated weights.
* @return A text string listing the values and their associated weights.
*/
public String displayWeights() {
StringBuilder builder = new StringBuilder();
for (DiscreteDistributionItem item : distributionItems) {
if (0 < builder.length()) builder.append(' ');
builder.append(item.getValue());
builder.append(':');
builder.append(MathMethods.formatDouble(item.getWeight(), 3));
}
return builder.toString();
}
@Override
public double numericalMean() {
if (!isNormalized())
throw new IllegalArgumentException(
"Distribution has not been normalized");
return numericalMean;
}
@Override
public double numericalVariance() {
if (!isNormalized())
throw new IllegalArgumentException(
"Distribution has not been normalized");
return numericalVariance;
}
}
Listing 1: The
DiscreteDistribution
implementation class.
/**
* The {@link DiscreteDistributionItem} class defines an integer value of a discrete distribution and the weight
* associated with this value.
* @author Charles Ames
*/
public class DiscreteDistributionItem extends WriteableEntity {
private double weight;
private double sum;
protected DiscreteDistributionItem(DiscreteDistribution container) {
super(container);
setIDQuality(AttributeQuality.FIXED);
setWeight(0.);
}
protected DiscreteDistributionItem(DiscreteDistribution container, int value, double weight) {
this(container);
setID(value);
setWeight(weight);
}
public DiscreteDistribution getContainer() {
return (DiscreteDistribution) super.getContainer();
}
/**
* Get the integer value.
* @return The integer value.
*/
public int getValue() {
return getID();
}
/**
* Get the weight associated with this value.
* @return The weight associated with this value.
*/
public double getWeight() {
return weight;
}
/**
* Set the weight associated with this value. If the distribution has previously
* been normalized, then calling this function undoes the normalization.
* @param weight The weight intended to be associated with this value.
* @throws IllegalArgumentException when the weight is negative.
*/
public void setWeight(double weight) {
if (weight < 0.) throw new IllegalArgumentException("Weight is negative");
this.weight = weight;
getContainer().clearSumOfWeights();
}
/**
* Get the sum of weights associated with this value and its predecessor.
* @return The sum of weights associated with this value and its predecessor.
*/
public double getSum() {
return sum;
}
protected void setSum(double sum) {
this.sum = sum;
}
}
Listing 2: The
DiscreteDistributionItem
implementation class.
There are two discrete distribution classes which together implement a composite design pattern
where a single parent DiscreteDistribution
instance (Listing 1) stands for the
whole, while multiple child DiscreteDistributionItem
instance (Listing 2) detail the parts.
Package consumers typically interact with the parent only.
The type hierarchy for DiscreteDistribution
is:
The type hierarchy for DiscreteDistributionItem
is:
Construction begins with a call to initialize()
, next iterates through
calls to addItem(
int value, double weight), and completes with a call to normalize()
.
The first argument to addItem()
is a range value; values must be presented in ascending
order, but there is no requirement for the kth value to be one more than the k-1st value. The first and last values presented
define the extent of the distribution's range. The second argument to addItem()
is a weight; weights may never
be negative. There is no requirement that all the weights sum to unity.
The primary role for normalize()
is to populate sum
.
, setting
sum=0
for the item in position 0 and sum
for subsequent
items to weight+sum
from the
previous item.
In addition to supporting the PDF with value
and weight
fields,
DiscreteDistributionItem
supports the CDF and quantile function with sum
fields.
The sum
fields are calculated by normalize()
.
For the 0th item, normalize()
sets
DiscreteDistriutionItem.sum
to 0.
For the k+1st item, normalize()
sets
DiscreteDistriutionItem.sum
to
sum+weight
from the kth item.
Consumption begins once the call to normalize()
returns. The normalize()
method iterates through the DiscreteDistributionItem
items, populating
DiscreteDistributionItem.sum
with the CDF up to
— but not including — the current range value.
Method densityAt(Integer value)
first identifies which DiscreteDistributionItem
corresponds to value
.
It returns the item's weight
divided by the distribution's sumOfWeights
.
Method densityUpTo(Integer value)
also starts by identifying which DiscreteDistributionItem
corresponds to value
.
It returns sum+weight
, divided by the distribution's sumOfWeights
.
Method quantile(double driver)
returns an Integer
range value.
A double
chooser variable is calculated as driver
times
the distribution's sumOfWeights
.
Next step is a binary search to identify the DiscreteDistributionItem
whose
sum
is largest yet still < the chooser. That item's value
is returned.
© Charles Ames |
Page created: 2022-08-29 |
Last updated: 2022-08-29 |