05 October, 2014

# Generate random numbers with a given distribution

The rand function in MATLAB returns uniformly distributed pseudorandom values from the open interval (0, 1), but we often need random numbers of other kind of distributions.

A great article written by John S. Denker explains a method of generating random numbers with arbitrary distribution. This post is based on his work, and shows a simple MATLAB implementation.

## Introduction to probability functions

Suppose, that we want to pick an x random value from the (1, 11) interval and all possible values are equally probable. The so called probability density function (PDF) of this case can be seen on the following picture: We have an R range and we can pick any x random value between Rmin  = 1 and Rmax = 11. The probability of all possible x values equals to 0.1.

This is the so-called uniform distribution.

By having a closer look at the p(x) function, we realize, that the area under it equals to 1. That means that if we pick a random x value from the range (1, 11), the probability, that the value falls between 1 and 11 is exactly 1. Similarly, the area under p(x) for a sub-range of R equals to the probability of x falling into that sub-range.

For example the probability of x falling into the (1, 6) sub-range equals to 0.1 * (6 - 1) = 0.5. Because the (1, 6) sub-range covers the half of the possible outcomes, that is true.

Now take the integral of p(x) and have a look at the resulting P(x) function: It is important to realize, that P(Rmin ) equals to 0 and P(Rmax ) is 1. Any P(x) value shows us the probability of the picked value falling below x. This function is the cumulative distribution function (CDF)

Remember of the previous example. P(6) is exactly 0.5. Click on the links to read more about probability density function (PDF) and cumulative distribution function (CDF).

## The general case

Let us create a general probability density function first:

% probability density function is based on this data
probabilities = [0, 1, 10, 2, 0, 0, 4, 5, 3, 1, 0];
x = 1 : length(probabilities);

% do spline interpolation for smoothing
xq = 1 : 0.05 : length(probabilities);
pdf = interp1(x, probability, xq, 'spline');

% remove negative elements due to the spline interpolation
pdf(pdf < 0) = 0;

% normalize the function to have an area of 1.0 under it
pdf = pdf / sum(pdf);


The result is the following: We can see, that we have a rather big probability to get an x value in the (2, 4) range and there is an other probable range between 6 and 10. Now, calculate the cumulative distribution function:

% the integral of PDF is the cumulative distribution function
cdf = cumsum(pdf);


Which looks like: We see, that the more probable a region is, the more the P(x) function increases at that region.

A CDF is a rather nice function: it starts always from zero and results at one, and also the rand function in MATLAB returns uniformly distributed pseudorandom values from the open interval (0, 1). Now imagine, that you pick a lot of random values and project the values by using CDF onto the R region: That looks great, the P(x) → x projection does exactly what we want. Almost all values in the (0, 0.5) region are projected onto the (2, 4) interval, and the (0.5, 1) region is projected onto the (6, 10) interval. On the next picture the slightly increasing-decreasing probabilities can be seen, too: Now turn this into a MATLAB code. We have the discrete CDF P(x) function of a given distribution, and we want to project random values to the x plane. This can be done easily by interpolation. But if you have a closer look you can see, that multiple x values can belong to a P(x) value. So first we have to filter P(x) to have unique values.

% these two variables holds and describes the CDF
xq;         % x
cdf;        % P(x)

% remove non-unique elements


And now do the interpolation. Remember, we have to do it in an inverse way, because we want to project P(x) to x.

% create an array of 2500 random numbers
randomValues = rand(1, 2500);

% inverse interpolation to achieve P(x) -> x projection of the random values
projection = interp1(cdf, xq, randomValues);


To visualize the result, create a histogram from the projection:

hist(projection, 50);


It looks like the original probability density function, so this approach is definitely working:  By using this method any distribution can be achieved by a simple interpolation. To read more about this, have a look at this great article written by John S. Denker, which was the source of this post.