12 Simulation
Simulation is a powerful technique used to model and analyze complex systems by imitating their behavior under different conditions. These conditions can include varying parameters, inputs, or assumptions to understand how the system responds to different scenarios. A simulation frequently relies on randomization to introduce uncertainty and variability into the model, allowing for a more realistic representation of the system’s behavior.
In this chapter, we explore various aspects of simulation in VBA, including randomization, Monte Carlo simulation, and empirical distribution analysis.
12.1 Randomization
Randomization is a process that involves generating random numbers or sequences. This can be useful in various applications, such as simulations, games, and data analysis.
In VBA, we configure random numbers using:
Rnd()
= Generates a pseudo-random number between 0 and 1.Randomize([seed])
= initialize the random number generator with a seed value (if empty, seed is based on the current system time).
data:image/s3,"s3://crabby-images/7d2ad/7d2ad5535ef4a3c82bd5febb15ad44aaf54a2883" alt=""
12.1.1 Seed
A seed is the starting point for generating a sequence of random numbers. It ensures that each time the VBA program runs, the sequence of random numbers is different.
Notice that every time this subprocedure is executed, the sequences of random numbers change because the seed is based on the current system time.
In programming languages, random numbers are generated using algorithms that produce pseudo-random numbers. These numbers are not truly random—they are deterministic and predictable, provided you know the algorithm and the seed value.
Option Explicit
Sub TestGenerateSequenceOfRandomNumbers()
Debug.Print ("## 10 random numbers between 0 and 1:")
Dim i As Integer
For i = 1 To 10
Debug.Print " -"; i, Rnd
Next i
'Random numbers between upper and lower bounds
Const upperBound As Long = 100
Const lowerBound As Long = 200
Debug.Print ("## 10 Integer random numbers between 100 a 200:")
For i = 1 To 10
Debug.Print " -"; i, Int(Rnd * (upperBound - lowerBound) + lowerBound)
Next i
Randomize (123)
Debug.Print ("## 10 Integer random numbers between 100 a 200 (with RandBetween):")
For i = 1 To 10
Debug.Print " -"; i, WorksheetFunction.RandBetween(100, 200)
Next i
End Sub
12.1.2 Generating a Repeatable Sequence
Sometimes, we need to generate the same sequence of random numbers for testing or debugging purposes. For example, we may want to ensure that the same random numbers are generated each time the program runs. This is useful for creating repeatable simulations or ensuring consistent results.
In VBA, we can achieve this by using the Randomize
function with a specific seed value combined with the Rnd(-1)
function. Rnd(-1)
returns the last generated random number and doesn’t change the internal seed of the random number generator. The last generated random number is used as the seed for the next random number.
In Listing 12.1, we demonstrate how to generate a repeatable sequence of random numbers using the Randomize
function with a specific seed value. The sequence of random numbers will be the same each time you run the program with the same seed value. The runCounter
variable is used to keep track of the number of runs across different executions (i.e., each time the program is run, the counter is incremented). To reset the counter, you have press Reset
(the square button) in the VBA editor.
The repeatable sequence in Listing 12.1 is generated as follows:
- Configure
Rnd(-1)
to return the last generated random number. - Initialize the random number generator with a specific seed value using
Randomize
(e.g.,Randomize(42)
). - Generate random numbers using
Rnd
. - The last generated random number is used as the seed for the next random number. So, if the number is 0.5, the seed will be 0.5.
In Figure 12.2, we show the output of the TestGenerateSequenceOfRandomNumbers
subprocedure for two different runs. Notice that the sequence of random numbers will be the same each time you run the program with the same seed value (except for the RandBetween
function, which uses a different method to generate random numbers).
If you only use Randomize(42)
without Rnd(-1)
, the sequence of random numbers will change each time you run the program (see Figure 12.3).
Randomize
function with a specific seed value. The sequence of random numbers will be the same each time you run the program with the same seed value. The runCounter
variable is used to keep track of the number of runs across different executions (i.e., each time the program is run, the counter is incremented).
Option Explicit
Public runCounter As Integer
Sub TestGenerateRepeatableSequenceOfRandomNumbers()
'Counter to keep track of the number of runs across different executions
runCounter = runCounter + 1
Debug.Print ("# Run: " & runCounter)
'If you provide Rnd(-1) as the argument, it returns the last generated
'random number and doesn't change the internal seed of the random number generator.
Rnd (-1)
'Seed 42 instead of system timer
Randomize (42)
Debug.Print ("## 10 random numbers between 0 and 1:")
Dim i As Integer
For i = 1 To 10
Debug.Print " -"; i, Rnd
Next i
'Random numbers between upper and lower bounds
Const upperBound As Long = 100
Const lowerBound As Long = 200
Debug.Print ("## 10 Integer random numbers between 100 a 200:")
For i = 1 To 10
Debug.Print " -"; i, Int(Rnd * (upperBound - lowerBound) + lowerBound)
Next i
'We cannot get repeatable sequences with Excel functions RAND or RANDBETWEEN function
Debug.Print ("## 10 Integer random numbers between 100 a 200 (with RandBetween):")
For i = 1 To 10
Debug.Print " -"; i, WorksheetFunction.RandBetween(100, 200)
Next i
End Sub
In Figure 12.2, we demonstrate how to generate random numbers between 0 and 1 and integer random numbers between 100 and 200. The numbers repeat across different runs because, as shown in Listing 12.1, we use Rnd(-1)
to return the last generated random number and Randomize(42)
to initialize the random number generator with a specific seed value.
# Run: 1
## 10 random numbers between 0 and 1:
- 1 0.9078093
- 2 0.6110868
- 3 0.1681854
- 4 0.4335963
- 5 0.0147441
- 6 0.5766594
- 7 0.6623368
- 8 2.853477E-02
- 9 0.2477508
- 10 0.0741725
## 10 Integer random numbers between 100 a 200:
- 1 116
- 2 120
- 3 160
- 4 145
- 5 110
- 6 163
- 7 139
- 8 163
- 9 167
- 10 134
## 10 Integer random numbers between 100 a 200 (with RandBetween):
- 1 135
- 2 185
- 3 131
- 4 182
- 5 182
- 6 162
- 7 180
- 8 147
- 9 182
- 10 126
# Run: 2
## 10 random numbers between 0 and 1:
- 1 0.9078093
- 2 0.6110868
- 3 0.1681854
- 4 0.4335963
- 5 0.0147441
- 6 0.5766594
- 7 0.6623368
- 8 2.853477E-02
- 9 0.2477508
- 10 0.0741725
## 10 Integer random numbers between 100 a 200:
- 1 116
- 2 120
- 3 160
- 4 145
- 5 110
- 6 163
- 7 139
- 8 163
- 9 167
- 10 134
## 10 Integer random numbers between 100 a 200 (with RandBetween):
- 1 191
- 2 194
- 3 143
- 4 137
- 5 135
- 6 148
- 7 128
- 8 146
- 9 118
- 10 151
In Figure 12.3, we demonstrate what happens when the Rnd
function is not configured. Although the seed is initialized with Randomize(42)
, the sequence of numbers changes across different runs.
# Run: 1
## 10 random numbers between 0 and 1:
- 1 0.7195905
- 2 0.9904406
- 3 0.8360323
- 4 0.2623309
- 5 0.281714
- 6 0.7662282
- 7 0.3721949
- 8 0.3132669
- 9 0.4006346
- 10 0.844281
## 10 Integer random numbers between 100 a 200:
- 1 191
- 2 140
- 3 142
- 4 145
- 5 143
- 6 137
- 7 170
- 8 180
- 9 157
- 10 110
## 10 Integer random numbers between 100 a 200 (with RandBetween):
- 1 149
- 2 101
- 3 141
- 4 114
- 5 124
- 6 160
- 7 192
- 8 188
- 9 165
- 10 145
# Run: 2
## 10 random numbers between 0 and 1:
- 1 0.8477665
- 2 0.4713912
- 3 0.046745
- 4 0.9194995
- 5 0.6922882
- 6 0.6382654
- 7 0.6948041
- 8 0.7792772
- 9 0.9510899
- 10 0.597456
## 10 Integer random numbers between 100 a 200:
- 1 113
- 2 159
- 3 103
- 4 187
- 5 122
- 6 160
- 7 101
- 8 167
- 9 129
- 10 169
## 10 Integer random numbers between 100 a 200 (with RandBetween):
- 1 140
- 2 180
- 3 181
- 4 138
- 5 187
- 6 128
- 7 158
- 8 140
- 9 193
- 10 140
Rnd
function. Two consecutive runs lead to different sequences of random numbers because the seed is based on the current system time.
12.2 Monte Carlo Simulation
Many real-world problems are too complex to be solved using traditional mathematical methods. Monte Carlo simulation is a statistical technique used to model and analyze the behavior of complex systems through random sampling. The method derives its name from the famous Monte Carlo Casino, known for its games of chance.
Monte Carlo simulation applies randomness to understand and predict the outcomes of real-world systems under uncertain conditions. It involves running a large number of simulations by repeatedly sampling random inputs and using them in the model. Each simulation represents one possible outcome of the system based on the random inputs.
12.2.1 Steps in Monte Carlo Simulation
A Monte Carlo simulation typically involves the following steps.
- Step 1: Define the Problem: Identify a scenario where uncertainty and randomness play a significant role.
- Step 2: Set Up the Model: Create a model that represents the system or process you want to analyze. This model should include variables, equations, and assumptions that describe how the system behaves.
- Step 3: Generate Random Inputs: Generate random inputs to represent system uncertainty. Inputs are typically drawn from probability distributions, such as the Poisson distribution, normal distribution, or uniform distribution.
- Step 4: Run Simulations: Run a large number of simulations by repeatedly sampling random inputs and using them in the model. Each simulation represents one possible outcome of the system based on the random inputs.
- Step 5: Collect Results: For each simulation, collect and record the results or outputs of interest, such as completion times, financial metrics, travel times, or delays. Store the results in a data structure.
- Step 6: Analyze the Results: Upon running a large number of simulations, analyze the results to draw conclusions about the system’s behavior. Calculate summary statistics, such as means, standard deviations, and percentiles. Visualize the results using charts or histograms.
- Step 7: Decision Making: Use the results to make predictions, optimize processes, or assess the impact of various factors in different scenarios.
12.2.2 Examples
12.2.2.1 Monte Carlo (KPI)
Monte Carlo simulation can be used to estimate Key Performance Indicators (KPIs) under uncertain conditions. For example, in Listing 12.2, we demonstrate a Monte Carlo simulation to estimate a KPI using random samples. The simulation generates random samples for the KPI based on the mean and standard deviation and stores them in the kpiSamples
array. You may consider that an “uncertain condition” could be the demand for a product, the time taken to complete a task, or the number of customers visiting a store. By varying the distribution of the random samples, you can model different scenarios and assess how they affect the KPI and the overall system.
Option Explicit
'Create a random sequence of `numSamples` using a KPI's mean and standard deviation
Function getKPISampleList(numSamples As Long, _
kpiMean As Double, _
kpiStd As Double) As Double()
'Array of random KPI samples
ReDim kpiSamples(1 To numSamples) As Double
Dim s As Long
'Generate `numSamples` random samples for KPI
For s = 1 To numSamples
kpiSamples(s) = WorksheetFunction.Norm_Inv(Rnd, kpiMean, kpiStd)
Next s
getKPISampleList = kpiSamples
End Function
'Average KPI sample list
Function getAverageFromKPISampleList(kpiSampleList() As Double) As Double
Dim i As Long
Dim kpiTotalSum As Double
Dim numSamples As Long
numSamples = UBound(kpiSampleList) - LBound(kpiSampleList) + 1
For i = LBound(kpiSampleList) To UBound(kpiSampleList)
kpiTotalSum = kpiTotalSum + kpiSampleList(i)
Next i
getAverageFromKPISampleList = kpiTotalSum / numSamples
End Function
Sub TestMonteCarloSimulationKPI()
'KPI features
Dim kpiMean As Double
Dim kpiStd As Double
kpiMean = 100
kpiStd = 20
Debug.Print "# KPI Monte Carlo Simulation KPI=(mean=" & kpiMean & ", std=" & kpiStd & ")"
Dim numSamples() As Variant
numSamples = Array(10, 100, 1000, 10000, 100000, 1000000)
Dim n As Variant
'The more samples, the better the approximation
For Each n In numSamples
'Simulation: sample KPI `n` times
Dim kpiSamples() As Double
kpiSamples = getKPISampleList(Int(n), kpiMean, kpiStd)
'Average samples and measure error (distance from true mean)
Dim kpiAvg As Double, percentageError As Double
kpiAvg = getAverageFromKPISampleList(kpiSamples)
percentageError = (Abs(kpiMean - kpiAvg) / kpiMean) * 100
Debug.Print "#n=" & n, "kpi (avg) = "; Round(kpiAvg, 2), " Error:" & Round(percentageError, 2) & "%"
Next n
End Sub
12.2.2.2 Monte Carlo (PI)
Monte Carlo simulation can also be used to estimate mathematical constants, such as the value of Pi. In Listing 12.3, we demonstrate a Monte Carlo simulation to estimate the value of Pi using random points.
The simulation generates random points within a square and checks how many points fall within a quarter circle inscribed in the square. The ratio of points inside the circle to the total number of points is used to estimate the value of Pi. Mathematically:
\[ \begin{align*} \text{Square Area} &= \pi r^2 \\ \text{Circle Area} &= 4 r^2 \\ \frac{\text{Circle Area}}{\text{Square Area}} &= \frac{\pi}{4} \\ \pi &= 4 \times \left(\frac{\text{Circle Area}}{\text{Square Area}}\right) \end{align*} \]
Where \(\text{Circle Area}\) is the number of points inside the circle, and \(\text{Square Area}\) is the number of points inside the square.
The more points generated, the closer the estimated value of Pi will be to the true value. The TestMonteCarloSimulationPi
subprocedure demonstrates the simulation for different numbers of points (e.g., 100, 1000, 10000, 100000, 1000000).
Option Explicit
Function GenerateRandomPoints(NumPoints As Long) As Double()
Dim Points() As Double
'Columns are x and y coordinates
ReDim Points(1 To NumPoints, 1 To 2)
Dim i As Long
For i = 1 To NumPoints
Points(i, 1) = Rnd() ' X-coordinate (between 0 and 1)
Points(i, 2) = Rnd() ' Y-coordinate (between 0 and 1)
Next i
GenerateRandomPoints = Points
End Function
' Check if the point is inside the quarter circle (x^2 + y^2 <= 1)
Function IsInsideCircle(x As Double, y As Double) As Boolean
IsInsideCircle = (x ^ 2 + y ^ 2) <= 1
End Function
' Check how many points fall within a circle
Function CountPointsInsideCircle(Points() As Double) As Long
Dim InsideCircleCount As Long
InsideCircleCount = 0
Dim i As Long
For i = LBound(Points, 1) To UBound(Points, 1)
Dim RandomX As Double
Dim RandomY As Double
RandomX = Points(i, 1)
RandomY = Points(i, 2)
If IsInsideCircle(RandomX, RandomY) Then
InsideCircleCount = InsideCircleCount + 1
End If
Next i
CountPointsInsideCircle = InsideCircleCount
End Function
Function MonteCarloPiEstimation(NumPoints As Long) As Double
'Generate random points inside a quart
Dim RandomPoints() As Double
RandomPoints = GenerateRandomPoints(NumPoints)
Dim InsideCircleCount As Long
InsideCircleCount = CountPointsInsideCircle(RandomPoints)
' Estimate the value of Pi using the Monte Carlo method
Dim PiEstimation As Double
' Area square = PI * r^2
' Area circle = 4 * r^2
' (Area circle)/(Area square) = PI/4
' PI = 4 * (Area circle) / (Area square)
' *Area circle = number of points inside the circle
' *Area square = number of points inside the square
PiEstimation = 4 * InsideCircleCount / NumPoints
MonteCarloPiEstimation = PiEstimation
End Function
Sub TestMonteCarloSimulationPi()
Dim NumPoints As Long
For Each NumPoints In Array(100, 1000, 10000, 100000, 1000000)
Dim PiEstimation As Double
PiEstimation = MonteCarloPiEstimation(NumPoints)
Debug.Print "# NumPoints=" & NumPoints, "Pi Estimation: " & PiEstimation & " (Error: " & Abs(WorksheetFunction.Pi - PiEstimation) & ")"
Next NumPoints
End Sub
12.2.3 Empirical Distribution
An empirical distribution describes the observed frequency of each unique value within a given dataset. It contrasts with theoretical distributions, which are constructed from mathematical models (e.g., normal distribution), since it is derived directly from real data observations. Empirical distributions can change when additional data is collected or when different samples are used, making it a flexible tool for data analysis.
12.2.3.1 Empirical Distribution Setup
- Data Collection: Start with a dataset that contains a range of values, such as test scores, product prices, or temperature measurements.
- Identifying Unique Values: Identify and list all distinct values present in the dataset.
- Counting Occurrences: For each unique value, calculate how frequently it appears in the dataset. Frequencies represent the empirical probabilities associated with each value.
- Normalizing Frequencies: Divide the frequencies by the total number of data points. This normalization step yields the probability of each unique value occurring in the dataset.
- Data Visualization: Represent the empirical distribution through various graphical forms, such as bar graphs, pie charts, or line plots. Visualization aids in understanding the relative likelihood of each unique value.
12.2.3.2 Practical Use Cases
- Exam Scores: Consider a class of 30 students where their midterm exam scores are recorded. The empirical distribution would show how often each score (e.g., 70, 80, 90) appears in the dataset.
- Sales Prices: In a retail store, the sales team records the prices at which products are sold throughout the week. The empirical distribution would reveal the range of prices and their respective frequencies.
- Customer Ages: A marketing company has demographic data for its customers, including their ages. The empirical distribution would indicate the age groups that are more prevalent among their customer base.
- Website Page Views: A website tracks the number of page views per day for different pages. The empirical distribution would show the frequency of page view counts, indicating popular and less popular pages.
12.2.3.3 Creating Distribution from Values
In the following, we will create an empirical distribution table from a list of values. The table will include columns for the value, frequency, and normalized frequency of each unique value in the dataset. In Table 12.1, we provide an example of student grades and their corresponding empirical distribution table.
SimulationEmpiricalDistribution
. And execute the ExampleCreateEmpiricalDistributionTableFromSpreadsheetValuesAndSaveOnSpreadsheet
subprocedure to create the empirical distribution table. You can assign this sub to a button (Developer > Insert > Button) entitled “Create Empirical Distribution Table”.
A | B | |
---|---|---|
1 | Students | Grades |
2 | Student 1 | A |
3 | Student 2 | A |
4 | Student 3 | B |
5 | Student 4 | B |
6 | Student 5 | B |
7 | Student 6 | B |
8 | Student 7 | B |
9 | Student 8 | B |
10 | Student 9 | C |
11 | Student 10 | C |
D | E | F | |
---|---|---|---|
1 | Value | Frequency | Normal |
2 | A | 2 | 0.2 |
3 | B | 6 | 0.6 |
4 | C | 2 | 0.2 |
In Listing 12.4, we show how to accomplish this in VBA. The logic is divided into several functions and subprocedures to facilitate understanding and testing. They include:
ExampleCreateEmpiricalDistributionTableFromSpreadsheetValuesAndSaveOnSpreadsheet
:- Reads values from a spreadsheet (column B),
- Creates an empirical distribution table, and
- Writes empirical distribution table to the spreadsheet (column D).
GetUniqueValuesFrom
: Identifies and lists all distinct values present in the dataset.- Input: Array of values (e.g.,
[A, A, B, B, B, B, B, B, C, C]
) - Output: Array of unique values (e.g.,
[A, B, C]
)
- Input: Array of values (e.g.,
GetFrequencyUniqueValuesFrom
: For each unique value, calculates how frequently it appears in the dataset.- Input: Array of unique values and array of all values, such as (
[A, B, C], [A, A, B, B, B, B, B, B, C, C]
) - Output: Array of frequencies (e.g.,
[2, 6, 2]
)
- Input: Array of unique values and array of all values, such as (
GetNormalizedFrequencies
: Divides the frequencies by the total number of data points to obtain the probability of each unique value occurring in the dataset.- Input: Array of frequencies (e.g.,
[2, 6, 2]
) - Output: Array of normalized frequencies (e.g.,
[0.2, 0.6, 0.2]
)
- Input: Array of frequencies (e.g.,
FindValueInVector
: Searches for a specific value in an array.- Input: Value to find and array to search
- Output: Index of the value in the array or -1 if not found
- Examples:
FindValueInVector(10, [10, 40, 50, 3])
returns 0FindValueInVector(15, [10, 40, 50, 3])
returns -1
createEmpiricalDistributionTable
: Combines the above functions to create an empirical distribution table.- Input: Array of values (e.g.,
[A, A, B, B, B, B, B, B, C, C]
) - Output: Empirical distribution table (e.g.,
[[Value, Frequency, Normal], [A, 2, 0.2], [B, 6, 0.6], [C, 2, 0.2]]
)
- Input: Array of values (e.g.,
TestGetUniqueValuesFrom
,TestGetFrequencyUniqueValuesFrom
,TestGetNormalizedFrequencies
, andTestFindValueInVector
: Test functions to ensure the logic works correctly.WriteTableInSpreadsheetCell
andReadSpreadsheetColumnValuesFromTable
: Utility functions to read and write data to a spreadsheet.SimulationEmpiricalDistribution
: Spreadsheet used to test the empirical distribution functions.
GetUniqueValuesFrom
, GetFrequencyUniqueValuesFrom
, and GetNormalizedFrequencies
are used to generate the table. The FindValueInVector
function is a utility function that searches for a specific value in an array.
Function createEmpiricalDistributionTable(values() As Variant) As Variant
Dim empiricalDistributionTable() As Variant
Dim uniqueValues() As Variant
Dim frequencies() As Long
Dim normalizedFrequencies() As Double
uniqueValues = GetUniqueValuesFrom(values)
frequencies = GetFrequencyUniqueValuesFrom(uniqueValues, values)
normalizedFrequencies = GetNormalizedFrequencies(frequencies)
'Empirical distribution table columns
Const COLUMN_VALUE = 1
Const COLUMN_FREQUENCY = 2
Const COLUMN_NORMALIZED = 3
'Create table with one extra row for the headers
ReDim empiricalDistributionTable( _
LBound(uniqueValues) To UBound(uniqueValues) + 1, _
COLUMN_VALUE To COLUMN_NORMALIZED)
'Table headers
empiricalDistributionTable(1, COLUMN_VALUE) = "Value"
empiricalDistributionTable(1, COLUMN_FREQUENCY) = "Frequency"
empiricalDistributionTable(1, COLUMN_NORMALIZED) = "Normal"
Dim row As Long
For row = LBound(uniqueValues) To UBound(uniqueValues)
empiricalDistributionTable(row + 1, COLUMN_VALUE) = uniqueValues(row)
empiricalDistributionTable(row + 1, COLUMN_FREQUENCY) = frequencies(row)
empiricalDistributionTable(row + 1, COLUMN_NORMALIZED) = normalizedFrequencies(row)
Next row
createEmpiricalDistributionTable = empiricalDistributionTable
End Function
'See spreadsheet `SimulationEmpiricalDistribution`
Sub ExampleCreateEmpiricalDistributionTableFromSpreadsheetValuesAndSaveOnSpreadsheet()
Dim empiricalDistributionTable() As Variant
Dim distributionValues() As Variant
'Read values from spreadsheet (column B)
distributionValues = ReadSpreadsheetColumnValuesFromTable("SimulationEmpiricalDistribution", 2)
empiricalDistributionTable = createEmpiricalDistributionTable(distributionValues)
'Write empirical distribution table to spreadsheet (column D)
Call WriteTableInSpreadsheetCell(empiricalDistributionTable, "SimulationEmpiricalDistribution", 1, 4)
End Sub
'Identify and list all distinct values present in the dataset
Function GetUniqueValuesFrom(values() As Variant) As Variant()
Dim lenVec As Long
Dim uniqueValues() As Variant
Dim value As Variant
For Each value In values
'Try to find the current value in list of unique values
Dim posValue As Long
posValue = FindValueInVector(value, uniqueValues)
'If value not in list of unique values
If posValue = -1 Then
'Resize the lists of unique values
ReDim Preserve uniqueValues(1 To lenVec + 1)
'Add value
uniqueValues(lenVec + 1) = value
lenVec = lenVec + 1
End If
Next value
GetUniqueValuesFrom = uniqueValues
End Function
Sub TestGetUniqueValuesFrom()
Dim values() As Variant
Dim uniqueValues() As Variant
values = Array("A", "A", "B", "B", "B", "B", "B", "B", "C", "C")
uniqueValues = GetUniqueValuesFrom(values)
Debug.Assert uniqueValues(1) = "A"
Debug.Assert uniqueValues(2) = "B"
Debug.Assert uniqueValues(3) = "C"
End Sub
'For each unique value, calculate how frequently it appears in the dataset.
Function GetFrequencyUniqueValuesFrom(listUniqueValues() As Variant, _
listValues() As Variant) As Long()
Dim frequenciesValues() As Long
If LengthVector(listUniqueValues) > 0 Then
ReDim frequencyValues(LBound(listUniqueValues) To _
UBound(listUniqueValues)) As Long
Dim i As Long
Dim value As Variant
'Loop unique values
For i = LBound(listUniqueValues) To UBound(listUniqueValues)
'Loop all values
For Each value In listValues
'Count how many times unique value appeared
If value = listUniqueValues(i) Then
frequencyValues(i) = frequencyValues(i) + 1
End If
Next value
Next i
End If
GetFrequencyUniqueValuesFrom = frequencyValues
End Function
Sub TestGetFrequencyUniqueValuesFrom()
'Function `GetUniqueValuesFrom` is used below, so it is also tested
TestGetUniqueValuesFrom
Dim values() As Variant
Dim frequencyUniqueValues() As Long
values = Array("A", "A", "B", "B", "B", "B", "B", "B", "C", "C")
frequencyUniqueValues = GetFrequencyUniqueValuesFrom(GetUniqueValuesFrom(values), values)
frequencyUniqueValues(1) = 2
frequencyUniqueValues(2) = 6
frequencyUniqueValues(3) = 2
End Sub
'Divide the frequencies by the total number of data points.
'This normalization step yields the probability of each unique value
'occurring in the dataset.
Function GetNormalizedFrequencies(frequencies() As Long) As Double()
Dim normalFrequencies() As Double
Dim f As Long
Dim sumFrequencies As Long
ReDim normalFrequencies(LBound(frequencies) To _
UBound(frequencies)) As Double
For f = LBound(frequencies) To UBound(frequencies)
sumFrequencies = sumFrequencies + frequencies(f)
Next f
For f = LBound(frequencies) To UBound(frequencies)
normalFrequencies(f) = frequencies(f) / sumFrequencies
Next f
GetNormalizedFrequencies = normalFrequencies
End Function
Function TestGetNormalizedFrequencies()
'Functions `GetUniqueValuesFrom` and `GetFrequencyUniqueValuesFrom` are
'used below, so they is also tested
TestGetFrequencyUniqueValuesFrom
TestGetUniqueValuesFrom
Dim values() As Variant
Dim normalizedFrequencies() As Double
values = Array("A", "A", "B", "B", "B", "B", "B", "B", "C", "C")
normalizedFrequencies = GetNormalizedFrequencies( _
GetFrequencyUniqueValuesFrom(GetUniqueValuesFrom(values), values))
normalizedFrequencies(1) = 0.2
normalizedFrequencies(2) = 0.6
normalizedFrequencies(3) = 0.2
End Function
Function FindValueInVector(value As Variant, vector() As Variant) As Long
FindValueInVector = -1
Dim i As Long
'If joining values results in an empty string "", array is empty
If Join(vector, "") = "" Then
Exit Function
End If
For i = LBound(vector) To UBound(vector)
If vector(i) = value Then
FindValueInVector = i
Exit Function
End If
Next i
End Function
Sub TestFindValueInVector()
Dim vector() As Variant
'Try to find value in empty array
Debug.Assert FindValueInVector(10, vector) = -1
'Populate vector and try to find value
vector = Array(10, 40, 50, 3)
Debug.Assert FindValueInVector(10, vector) = 0
Debug.Assert FindValueInVector(40, vector) = 1
Debug.Assert FindValueInVector(50, vector) = 2
Debug.Assert FindValueInVector(3, vector) = 3
Debug.Assert FindValueInVector(15, vector) = -1
End Sub
12.2.3.4 Using an Empirical Distribution
In Listing 12.5, we demonstrate how to use an empirical distribution to simulate random values according to their probabilities. The simulation is broken down into several functions and subprocedures:
SelectRndValueBasedOnEmpiricalDistribution
: Selects a random value based on its probability in the distribution.- Input: Array of values and array of probabilities (e.g., [“A”, “B”, “C”] and [0.2, 0.6, 0.2])
- Output: Random value (“A”, “B”, “C”) based on the distribution. “B” has 60% chance of being selected, “A” 20%, and “C” 20%.
- Testing:
TestSelectRndValueBasedOnEmpiricalDistribution
subprocedure.
SampleValuesFrom
: Samplesn
values from the distribution according to their probabilities.- Input: Number of samples, array of values, and array of probabilities (e.g., 10, [“A”, “B”, “C”], [0.2, 0.6, 0.2])
- Output: Array of
n
random values based on the distribution (e.g., [“B”, “B”, “A”, “B”, “B”, “B”, “B”, “B”, “B”, “C”])
TestSelectRndValueBasedOnEmpiricalDistribution
: Demonstrates the use of the above functions with an example involving grades (A, B, C) and their associated probabilities.PrintTable
: Prints a table to the debug window with formatted columns. This function is used to display the empirical distribution table.IsEmpiricalDistributionInputValid
,VectorsHaveDifferentLength
, andProbabilitiesDoNotAddUpToOne
: Helper functions to check the validity of the input data for the empirical distribution.LengthVector
andTestLengthVector
: Helper functions to calculate the length of a vector and test theLengthVector
function.
Run the code from TestSelectRndValueBasedOnEmpiricalDistribution
to see how the empirical distribution is used to simulate random values based on their probabilities. The sample size n
is increased to demonstrate how the probabilities become more accurate as the sample size grows.
SelectRndValueBasedOnEmpiricalDistribution
function selects a random value based on its probability in the distribution. The SampleValuesFrom
function samples n
values from the distribution according to their probabilities. The TestSelectRndValueBasedOnEmpiricalDistribution
subprocedure demonstrates the use of these functions with an example involving grades (A, B, C) and their associated probabilities.
Option Explicit
'Select a random value according to its probability in the distribution
Function SelectRndValueBasedOnEmpiricalDistribution( _
values() As Variant, _
probabilities() As Double) As Variant
Dim RandomNumber As Double
Dim TotalProbability As Double
TotalProbability = 0
Dim indexProbability As Long
RandomNumber = Rnd
For indexProbability = LBound(probabilities) To UBound(probabilities)
TotalProbability = TotalProbability + probabilities(indexProbability)
If RandomNumber < TotalProbability Then
Exit For
End If
Next indexProbability
SelectRndValueBasedOnEmpiricalDistribution = values(indexProbability)
End Function
'Sample `n` values from `values` according to `probabilities`
Function SampleValuesFrom(n As Long, _
values() As Variant, _
probabilities() As Double) As Variant()
If IsEmpiricalDistributionInputValid(values, probabilities) Then
Debug.Print vbCrLf & "# Sample size = " & n
Dim count As Long
ReDim listValues(1 To n) As Variant
Do
count = count + 1
Dim rndValue As Variant
rndValue = SelectRndValueBasedOnEmpiricalDistribution(values, probabilities)
listValues(count) = rndValue
'Debug.Print "Value " & count, rndValue
Loop While count < n
SampleValuesFrom = listValues
End If
End Function
'Use empirical distribution to simulate random values according
'to their probabilities.
Sub TestSelectRndValueBasedOnEmpiricalDistribution()
Dim values(1 To 4) As Variant
Dim probabilities(1 To 4) As Double
'Example with grades
values(1) = "A"
probabilities(1) = 0.2
values(2) = "B"
probabilities(2) = 0.6
values(3) = "C"
probabilities(3) = 0.2
'Check if empirical distribution is correct
Debug.Assert IsEmpiricalDistributionInputValid(values, probabilities)
Dim n As Long
n = 10
'Notice how probabities are more accurate as sample size `n` grows
Do
Dim samples() As Variant
samples = SampleValuesFrom(n, values, probabilities)
'Examine probabilities
Dim table() As Variant
table = createEmpiricalDistributionTable(samples)
Call PrintTable(table)
n = n * 10
Loop Until n > 10000000
End Sub
Sub PrintTable(table() As Variant)
Dim i As Long
Dim j As Long
Dim cellWidth As Integer
Dim output As String
' Calculate the maximum cell width based on the longest element in the table
cellWidth = 0
For i = LBound(table, 1) To UBound(table, 1)
For j = LBound(table, 2) To UBound(table, 2)
Dim cellLength As Integer
cellLength = Len(table(i, j))
If cellLength > cellWidth Then
cellWidth = cellLength
End If
Next j
Next i
' Print the table
For i = LBound(table, 1) To UBound(table, 1)
For j = LBound(table, 2) To UBound(table, 2)
output = output & Left(table(i, j) & String(cellWidth, " "), cellWidth) & vbTab
Next j
Debug.Print output
output = ""
Next i
End Sub
'# Helpers to check validity of empirical distribution
' - Values and probabilies vectors have to have same size
' - Probabilities have to add up to one
Function IsEmpiricalDistributionInputValid( _
values() As Variant, _
probabilities() As Double) As Boolean
'Check if input data is valid
If VectorsHaveDifferentLength(values, probabilities) Or _
ProbabilitiesDoNotAddUpToOne(probabilities) Then
IsEmpiricalDistributionInputValid = False
Else
IsEmpiricalDistributionInputValid = True
End If
End Function
Function VectorsHaveDifferentLength(v1 As Variant, v2 As Variant) As Boolean
If LBound(v1) <> LBound(v2) Or UBound(v1) <> UBound(v2) Then
VectorsHaveDifferentLength = True
Debug.Print ("Error! Value and probability vectors have different length.")
Else
VectorsHaveDifferentLength = False
End If
End Function
Function ProbabilitiesDoNotAddUpToOne(probabilities() As Double) As Boolean
Dim sumProbabilities As Double
Dim probability As Variant
For Each probability In probabilities
sumProbabilities = sumProbabilities + probability
Next probability
'Precision level (necessary due to rounding innacuracies by the computer)
If Abs(sumProbabilities - 1) > 0.00000001 Then
ProbabilitiesDoNotAddUpToOne = True
Debug.Print ("Error! Probabilites do not add up to one.")
Else
ProbabilitiesDoNotAddUpToOne = False
End If
End Function
'Assumes vector is populated. For unpopulated arrays, check http://www.cpearson.com/excel/VBAArrays.htm
Function LengthVector(vector() As Variant) As Long
LengthVector = UBound(vector) - LBound(vector) + 1
End Function
Sub TestLengthVector()
Dim vector() As Variant
'Try to get length of empty array
Debug.Assert LengthVector(vector) = 0
'Populate vector and try to find value
vector = Array(10, 40, 50, 3)
Debug.Assert LengthVector(vector) = 4
End Sub
12.2.4 Exercises
12.2.4.1 Number of Points To Match PI Precision
Create a sub to estimate the number of points needed to match the PI precision to a certain number of decimal places. Your sub should receive the maximum error allowed as Double
and return the number of points as Long
.
12.2.4.2 Plotting Random Points (PI Estimation) Inside a Circle
Create a subprocedure to plot the random points generated by the GenerateRandomPoints
function inside a circle with radius 1. Then, create scatter plot to show points inside the circle red and outside blue.