12  Simulation

Simulation is a powerful technique used to model and analyze complex systems by imitating their behavior under different conditions. These conditions can include varying parameters, inputs, or assumptions to understand how the system responds to different scenarios. A simulation frequently relies on randomization to introduce uncertainty and variability into the model, allowing for a more realistic representation of the system’s behavior.

In this chapter, we explore various aspects of simulation in VBA, including randomization, Monte Carlo simulation, and empirical distribution analysis.

12.1 Randomization

Randomization is a process that involves generating random numbers or sequences. This can be useful in various applications, such as simulations, games, and data analysis.

In VBA, we configure random numbers using:

  • Rnd() = Generates a pseudo-random number between 0 and 1.
  • Randomize([seed]) = initialize the random number generator with a seed value (if empty, seed is based on the current system time).
Figure 12.1: How to know if a number is random?

12.1.1 Seed

A seed is the starting point for generating a sequence of random numbers. It ensures that each time the VBA program runs, the sequence of random numbers is different.

Notice that every time this subprocedure is executed, the sequences of random numbers change because the seed is based on the current system time.

Note

In programming languages, random numbers are generated using algorithms that produce pseudo-random numbers. These numbers are not truly random—they are deterministic and predictable, provided you know the algorithm and the seed value.

Option Explicit

Sub TestGenerateSequenceOfRandomNumbers()
       
    
    Debug.Print ("## 10 random numbers between 0 and 1:")
    
    Dim i As Integer
    
    For i = 1 To 10
        Debug.Print " -"; i, Rnd
    Next i
    
    'Random numbers between upper and lower bounds
    Const upperBound As Long = 100
    Const lowerBound As Long = 200
    
    Debug.Print ("## 10 Integer random numbers between 100 a 200:")
    For i = 1 To 10
        Debug.Print " -"; i, Int(Rnd * (upperBound - lowerBound) + lowerBound)
    Next i
    Randomize (123)
    Debug.Print ("## 10 Integer random numbers between 100 a 200 (with RandBetween):")
    For i = 1 To 10
        Debug.Print " -"; i, WorksheetFunction.RandBetween(100, 200)
    Next i

End Sub

12.1.2 Generating a Repeatable Sequence

Sometimes, we need to generate the same sequence of random numbers for testing or debugging purposes. For example, we may want to ensure that the same random numbers are generated each time the program runs. This is useful for creating repeatable simulations or ensuring consistent results.

In VBA, we can achieve this by using the Randomize function with a specific seed value combined with the Rnd(-1) function. Rnd(-1) returns the last generated random number and doesn’t change the internal seed of the random number generator. The last generated random number is used as the seed for the next random number.

In Listing 12.1, we demonstrate how to generate a repeatable sequence of random numbers using the Randomize function with a specific seed value. The sequence of random numbers will be the same each time you run the program with the same seed value. The runCounter variable is used to keep track of the number of runs across different executions (i.e., each time the program is run, the counter is incremented). To reset the counter, you have press Reset (the square button) in the VBA editor.

The repeatable sequence in Listing 12.1 is generated as follows:

  1. Configure Rnd(-1) to return the last generated random number.
  2. Initialize the random number generator with a specific seed value using Randomize (e.g., Randomize(42)).
  3. Generate random numbers using Rnd.
  4. The last generated random number is used as the seed for the next random number. So, if the number is 0.5, the seed will be 0.5.

In Figure 12.2, we show the output of the TestGenerateSequenceOfRandomNumbers subprocedure for two different runs. Notice that the sequence of random numbers will be the same each time you run the program with the same seed value (except for the RandBetween function, which uses a different method to generate random numbers).

If you only use Randomize(42) without Rnd(-1), the sequence of random numbers will change each time you run the program (see Figure 12.3).

Listing 12.1: Example of generating a repeatable sequence of random numbers using the Randomize function with a specific seed value. The sequence of random numbers will be the same each time you run the program with the same seed value. The runCounter variable is used to keep track of the number of runs across different executions (i.e., each time the program is run, the counter is incremented).
Option Explicit

Public runCounter As Integer

Sub TestGenerateRepeatableSequenceOfRandomNumbers()
    
    'Counter to keep track of the number of runs across different executions
    runCounter = runCounter + 1
    Debug.Print ("# Run: " & runCounter)

    'If you provide Rnd(-1) as the argument, it returns the last generated
    'random number and doesn't change the internal seed of the random number generator.
    Rnd (-1)
    'Seed 42 instead of system timer
    Randomize (42)
    
    
    Debug.Print ("## 10 random numbers between 0 and 1:")
    
    Dim i As Integer
    
    For i = 1 To 10
        Debug.Print " -"; i, Rnd
    Next i
    
    'Random numbers between upper and lower bounds
    Const upperBound As Long = 100
    Const lowerBound As Long = 200
    
    Debug.Print ("## 10 Integer random numbers between 100 a 200:")
    For i = 1 To 10
        Debug.Print " -"; i, Int(Rnd * (upperBound - lowerBound) + lowerBound)
    Next i
    
    'We cannot get repeatable sequences with Excel functions RAND or RANDBETWEEN function
    Debug.Print ("## 10 Integer random numbers between 100 a 200 (with RandBetween):")
    For i = 1 To 10
        Debug.Print " -"; i, WorksheetFunction.RandBetween(100, 200)
    Next i
    
End Sub

In Figure 12.2, we demonstrate how to generate random numbers between 0 and 1 and integer random numbers between 100 and 200. The numbers repeat across different runs because, as shown in Listing 12.1, we use Rnd(-1) to return the last generated random number and Randomize(42) to initialize the random number generator with a specific seed value.

# Run: 1
## 10 random numbers between 0 and 1:
 - 1           0.9078093 
 - 2           0.6110868 
 - 3           0.1681854 
 - 4           0.4335963 
 - 5           0.0147441 
 - 6           0.5766594 
 - 7           0.6623368 
 - 8           2.853477E-02 
 - 9           0.2477508 
 - 10          0.0741725 
## 10 Integer random numbers between 100 a 200:
 - 1           116 
 - 2           120 
 - 3           160 
 - 4           145 
 - 5           110 
 - 6           163 
 - 7           139 
 - 8           163 
 - 9           167 
 - 10          134 
## 10 Integer random numbers between 100 a 200 (with RandBetween):
 - 1           135 
 - 2           185 
 - 3           131 
 - 4           182 
 - 5           182 
 - 6           162 
 - 7           180 
 - 8           147 
 - 9           182 
 - 10          126 
(a)
# Run: 2
## 10 random numbers between 0 and 1:
 - 1           0.9078093 
 - 2           0.6110868 
 - 3           0.1681854 
 - 4           0.4335963 
 - 5           0.0147441 
 - 6           0.5766594 
 - 7           0.6623368 
 - 8           2.853477E-02 
 - 9           0.2477508 
 - 10          0.0741725 
## 10 Integer random numbers between 100 a 200:
 - 1           116 
 - 2           120 
 - 3           160 
 - 4           145 
 - 5           110 
 - 6           163 
 - 7           139 
 - 8           163 
 - 9           167 
 - 10          134 
## 10 Integer random numbers between 100 a 200 (with RandBetween):
 - 1           191 
 - 2           194 
 - 3           143 
 - 4           137 
 - 5           135 
 - 6           148 
 - 7           128 
 - 8           146 
 - 9           118 
 - 10          151 
(b)
Figure 12.2: Example of generating random numbers between 0 and 1 and integer random numbers between 100 and 200. The first and second runs show different the same sequence of random numbers because the seed is based on the current system time.

In Figure 12.3, we demonstrate what happens when the Rnd function is not configured. Although the seed is initialized with Randomize(42), the sequence of numbers changes across different runs.

# Run: 1
## 10 random numbers between 0 and 1:
 - 1           0.7195905 
 - 2           0.9904406 
 - 3           0.8360323 
 - 4           0.2623309 
 - 5           0.281714 
 - 6           0.7662282 
 - 7           0.3721949 
 - 8           0.3132669 
 - 9           0.4006346 
 - 10          0.844281 
## 10 Integer random numbers between 100 a 200:
 - 1           191 
 - 2           140 
 - 3           142 
 - 4           145 
 - 5           143 
 - 6           137 
 - 7           170 
 - 8           180 
 - 9           157 
 - 10          110 
## 10 Integer random numbers between 100 a 200 (with RandBetween):
 - 1           149 
 - 2           101 
 - 3           141 
 - 4           114 
 - 5           124 
 - 6           160 
 - 7           192 
 - 8           188 
 - 9           165 
 - 10          145 
(a)
# Run: 2
## 10 random numbers between 0 and 1:
 - 1           0.8477665 
 - 2           0.4713912 
 - 3           0.046745 
 - 4           0.9194995 
 - 5           0.6922882 
 - 6           0.6382654 
 - 7           0.6948041 
 - 8           0.7792772 
 - 9           0.9510899 
 - 10          0.597456 
## 10 Integer random numbers between 100 a 200:
 - 1           113 
 - 2           159 
 - 3           103 
 - 4           187 
 - 5           122 
 - 6           160 
 - 7           101 
 - 8           167 
 - 9           129 
 - 10          169 
## 10 Integer random numbers between 100 a 200 (with RandBetween):
 - 1           140 
 - 2           180 
 - 3           181 
 - 4           138 
 - 5           187 
 - 6           128 
 - 7           158 
 - 8           140 
 - 9           193 
 - 10          140 
(b)
Figure 12.3: Example of generating random numbers between 0 and 1 and integer random numbers between 100 and 200, without using the Rnd function. Two consecutive runs lead to different sequences of random numbers because the seed is based on the current system time.

12.2 Monte Carlo Simulation

Many real-world problems are too complex to be solved using traditional mathematical methods. Monte Carlo simulation is a statistical technique used to model and analyze the behavior of complex systems through random sampling. The method derives its name from the famous Monte Carlo Casino, known for its games of chance.

Monte Carlo simulation applies randomness to understand and predict the outcomes of real-world systems under uncertain conditions. It involves running a large number of simulations by repeatedly sampling random inputs and using them in the model. Each simulation represents one possible outcome of the system based on the random inputs.

12.2.1 Steps in Monte Carlo Simulation

A Monte Carlo simulation typically involves the following steps.

  • Step 1: Define the Problem: Identify a scenario where uncertainty and randomness play a significant role.
  • Step 2: Set Up the Model: Create a model that represents the system or process you want to analyze. This model should include variables, equations, and assumptions that describe how the system behaves.
  • Step 3: Generate Random Inputs: Generate random inputs to represent system uncertainty. Inputs are typically drawn from probability distributions, such as the Poisson distribution, normal distribution, or uniform distribution.
  • Step 4: Run Simulations: Run a large number of simulations by repeatedly sampling random inputs and using them in the model. Each simulation represents one possible outcome of the system based on the random inputs.
  • Step 5: Collect Results: For each simulation, collect and record the results or outputs of interest, such as completion times, financial metrics, travel times, or delays. Store the results in a data structure.
  • Step 6: Analyze the Results: Upon running a large number of simulations, analyze the results to draw conclusions about the system’s behavior. Calculate summary statistics, such as means, standard deviations, and percentiles. Visualize the results using charts or histograms.
  • Step 7: Decision Making: Use the results to make predictions, optimize processes, or assess the impact of various factors in different scenarios.

12.2.2 Examples

12.2.2.1 Monte Carlo (KPI)

Monte Carlo simulation can be used to estimate Key Performance Indicators (KPIs) under uncertain conditions. For example, in Listing 12.2, we demonstrate a Monte Carlo simulation to estimate a KPI using random samples. The simulation generates random samples for the KPI based on the mean and standard deviation and stores them in the kpiSamples array. You may consider that an “uncertain condition” could be the demand for a product, the time taken to complete a task, or the number of customers visiting a store. By varying the distribution of the random samples, you can model different scenarios and assess how they affect the KPI and the overall system.

Listing 12.2: Example of a Monte Carlo simulation to estimate a Key Performance Indicator (KPI) using random samples. The simulation generates random samples for the KPI based on the mean and standard deviation. The average of the samples is calculated, and the error percentage is measured compared to the true mean.
Option Explicit

'Create a random sequence of `numSamples` using a KPI's mean and standard deviation
Function getKPISampleList(numSamples As Long, _
                          kpiMean As Double, _
                          kpiStd As Double) As Double()
    
    'Array of random KPI samples
    ReDim kpiSamples(1 To numSamples) As Double
    
    Dim s As Long
    
    'Generate `numSamples` random samples for KPI
    For s = 1 To numSamples
        kpiSamples(s) = WorksheetFunction.Norm_Inv(Rnd, kpiMean, kpiStd)
    Next s
    
    getKPISampleList = kpiSamples
    
End Function

'Average KPI sample list
Function getAverageFromKPISampleList(kpiSampleList() As Double) As Double

    Dim i As Long
    Dim kpiTotalSum As Double
    Dim numSamples As Long
    
    numSamples = UBound(kpiSampleList) - LBound(kpiSampleList) + 1
    
    For i = LBound(kpiSampleList) To UBound(kpiSampleList)
        kpiTotalSum = kpiTotalSum + kpiSampleList(i)
    Next i
    
    getAverageFromKPISampleList = kpiTotalSum / numSamples
    
End Function

Sub TestMonteCarloSimulationKPI()
    
    'KPI features
    Dim kpiMean As Double
    Dim kpiStd As Double
    kpiMean = 100
    kpiStd = 20
    
    Debug.Print "# KPI Monte Carlo Simulation KPI=(mean=" & kpiMean & ", std=" & kpiStd & ")"
   
    Dim numSamples() As Variant
    numSamples = Array(10, 100, 1000, 10000, 100000, 1000000)
    Dim n As Variant
    
    'The more samples, the better the approximation
    For Each n In numSamples
        
        'Simulation: sample KPI `n` times
        Dim kpiSamples() As Double
        kpiSamples = getKPISampleList(Int(n), kpiMean, kpiStd)
        
        'Average samples and measure error (distance from true mean)
        Dim kpiAvg As Double, percentageError As Double
        kpiAvg = getAverageFromKPISampleList(kpiSamples)
        percentageError = (Abs(kpiMean - kpiAvg) / kpiMean) * 100
        
        Debug.Print "#n=" & n, "kpi (avg) = "; Round(kpiAvg, 2), " Error:" & Round(percentageError, 2) & "%"
    Next n
    
End Sub

12.2.2.2 Monte Carlo (PI)

Monte Carlo simulation can also be used to estimate mathematical constants, such as the value of Pi. In Listing 12.3, we demonstrate a Monte Carlo simulation to estimate the value of Pi using random points.

The simulation generates random points within a square and checks how many points fall within a quarter circle inscribed in the square. The ratio of points inside the circle to the total number of points is used to estimate the value of Pi. Mathematically:

\[ \begin{align*} \text{Square Area} &= \pi r^2 \\ \text{Circle Area} &= 4 r^2 \\ \frac{\text{Circle Area}}{\text{Square Area}} &= \frac{\pi}{4} \\ \pi &= 4 \times \left(\frac{\text{Circle Area}}{\text{Square Area}}\right) \end{align*} \]

Where \(\text{Circle Area}\) is the number of points inside the circle, and \(\text{Square Area}\) is the number of points inside the square.

The more points generated, the closer the estimated value of Pi will be to the true value. The TestMonteCarloSimulationPi subprocedure demonstrates the simulation for different numbers of points (e.g., 100, 1000, 10000, 100000, 1000000).

Listing 12.3: Example of a Monte Carlo simulation to estimate the value of Pi using random points. The simulation generates random points within a square and checks how many points fall within a quarter circle inscribed in the square. The ratio of points inside the circle to the total number of points is used to estimate the value of Pi.
Option Explicit

Function GenerateRandomPoints(NumPoints As Long) As Double()
    Dim Points() As Double
    
    'Columns are x and y coordinates
    ReDim Points(1 To NumPoints, 1 To 2)
    
    Dim i As Long
    For i = 1 To NumPoints
        Points(i, 1) = Rnd() ' X-coordinate (between 0 and 1)
        Points(i, 2) = Rnd() ' Y-coordinate (between 0 and 1)
    Next i
    
    GenerateRandomPoints = Points
End Function

' Check if the point is inside the quarter circle (x^2 + y^2 <= 1)
Function IsInsideCircle(x As Double, y As Double) As Boolean
    IsInsideCircle = (x ^ 2 + y ^ 2) <= 1
End Function

' Check how many points fall within a circle
Function CountPointsInsideCircle(Points() As Double) As Long
    Dim InsideCircleCount As Long
    InsideCircleCount = 0
    
    Dim i As Long
    For i = LBound(Points, 1) To UBound(Points, 1)
        Dim RandomX As Double
        Dim RandomY As Double
        RandomX = Points(i, 1)
        RandomY = Points(i, 2)
        
        If IsInsideCircle(RandomX, RandomY) Then
            InsideCircleCount = InsideCircleCount + 1
        End If
    Next i
    
    CountPointsInsideCircle = InsideCircleCount
End Function

Function MonteCarloPiEstimation(NumPoints As Long) As Double
    
    'Generate random points inside a quart
    Dim RandomPoints() As Double
    RandomPoints = GenerateRandomPoints(NumPoints)
    
    Dim InsideCircleCount As Long
    InsideCircleCount = CountPointsInsideCircle(RandomPoints)
    
    ' Estimate the value of Pi using the Monte Carlo method
    Dim PiEstimation As Double
    ' Area square = PI * r^2
    ' Area circle =  4 * r^2
    ' (Area circle)/(Area square) = PI/4
    ' PI = 4 * (Area circle) / (Area square)
    ' *Area circle = number of points inside the circle
    ' *Area square = number of points inside the square
    PiEstimation = 4 * InsideCircleCount / NumPoints
    
    MonteCarloPiEstimation = PiEstimation
    
End Function

Sub TestMonteCarloSimulationPi()
    
    Dim NumPoints As Long
    
    For Each NumPoints In Array(100, 1000, 10000, 100000, 1000000)
        Dim PiEstimation As Double
        PiEstimation = MonteCarloPiEstimation(NumPoints)
        Debug.Print "# NumPoints=" & NumPoints, "Pi Estimation: " & PiEstimation & " (Error: " & Abs(WorksheetFunction.Pi - PiEstimation) & ")"
    Next NumPoints
    
End Sub

12.2.3 Empirical Distribution

An empirical distribution describes the observed frequency of each unique value within a given dataset. It contrasts with theoretical distributions, which are constructed from mathematical models (e.g., normal distribution), since it is derived directly from real data observations. Empirical distributions can change when additional data is collected or when different samples are used, making it a flexible tool for data analysis.

12.2.3.1 Empirical Distribution Setup

  1. Data Collection: Start with a dataset that contains a range of values, such as test scores, product prices, or temperature measurements.
  2. Identifying Unique Values: Identify and list all distinct values present in the dataset.
  3. Counting Occurrences: For each unique value, calculate how frequently it appears in the dataset. Frequencies represent the empirical probabilities associated with each value.
  4. Normalizing Frequencies: Divide the frequencies by the total number of data points. This normalization step yields the probability of each unique value occurring in the dataset.
  5. Data Visualization: Represent the empirical distribution through various graphical forms, such as bar graphs, pie charts, or line plots. Visualization aids in understanding the relative likelihood of each unique value.

12.2.3.2 Practical Use Cases

  • Exam Scores: Consider a class of 30 students where their midterm exam scores are recorded. The empirical distribution would show how often each score (e.g., 70, 80, 90) appears in the dataset.
  • Sales Prices: In a retail store, the sales team records the prices at which products are sold throughout the week. The empirical distribution would reveal the range of prices and their respective frequencies.
  • Customer Ages: A marketing company has demographic data for its customers, including their ages. The empirical distribution would indicate the age groups that are more prevalent among their customer base.
  • Website Page Views: A website tracks the number of page views per day for different pages. The empirical distribution would show the frequency of page view counts, indicating popular and less popular pages.

12.2.3.3 Creating Distribution from Values

In the following, we will create an empirical distribution table from a list of values. The table will include columns for the value, frequency, and normalized frequency of each unique value in the dataset. In Table 12.1, we provide an example of student grades and their corresponding empirical distribution table.

Table 12.1: Copy the data (Table 12.1 (a)) to a spreadsheet called SimulationEmpiricalDistribution. And execute the ExampleCreateEmpiricalDistributionTableFromSpreadsheetValuesAndSaveOnSpreadsheet subprocedure to create the empirical distribution table. You can assign this sub to a button (Developer > Insert > Button) entitled “Create Empirical Distribution Table”.
(a) Student grades to be used as the dataset for the empirical distribution.
A B
1 Students Grades
2 Student 1 A
3 Student 2 A
4 Student 3 B
5 Student 4 B
6 Student 5 B
7 Student 6 B
8 Student 7 B
9 Student 8 B
10 Student 9 C
11 Student 10 C
(b) Empirical distribution table created from the student grades dataset.
D E F
1 Value Frequency Normal
2 A 2 0.2
3 B 6 0.6
4 C 2 0.2

In Listing 12.4, we show how to accomplish this in VBA. The logic is divided into several functions and subprocedures to facilitate understanding and testing. They include:

  • ExampleCreateEmpiricalDistributionTableFromSpreadsheetValuesAndSaveOnSpreadsheet:
    • Reads values from a spreadsheet (column B),
    • Creates an empirical distribution table, and
    • Writes empirical distribution table to the spreadsheet (column D).
  • GetUniqueValuesFrom: Identifies and lists all distinct values present in the dataset.
    • Input: Array of values (e.g., [A, A, B, B, B, B, B, B, C, C])
    • Output: Array of unique values (e.g., [A, B, C])
  • GetFrequencyUniqueValuesFrom: For each unique value, calculates how frequently it appears in the dataset.
    • Input: Array of unique values and array of all values, such as ([A, B, C], [A, A, B, B, B, B, B, B, C, C])
    • Output: Array of frequencies (e.g., [2, 6, 2])
  • GetNormalizedFrequencies: Divides the frequencies by the total number of data points to obtain the probability of each unique value occurring in the dataset.
    • Input: Array of frequencies (e.g., [2, 6, 2])
    • Output: Array of normalized frequencies (e.g., [0.2, 0.6, 0.2])
  • FindValueInVector: Searches for a specific value in an array.
    • Input: Value to find and array to search
    • Output: Index of the value in the array or -1 if not found
    • Examples:
      • FindValueInVector(10, [10, 40, 50, 3]) returns 0
      • FindValueInVector(15, [10, 40, 50, 3]) returns -1
  • createEmpiricalDistributionTable: Combines the above functions to create an empirical distribution table.
    • Input: Array of values (e.g., [A, A, B, B, B, B, B, B, C, C])
    • Output: Empirical distribution table (e.g., [[Value, Frequency, Normal], [A, 2, 0.2], [B, 6, 0.6], [C, 2, 0.2]])
  • TestGetUniqueValuesFrom, TestGetFrequencyUniqueValuesFrom, TestGetNormalizedFrequencies, and TestFindValueInVector: Test functions to ensure the logic works correctly.
  • WriteTableInSpreadsheetCell and ReadSpreadsheetColumnValuesFromTable: Utility functions to read and write data to a spreadsheet.
  • SimulationEmpiricalDistribution: Spreadsheet used to test the empirical distribution functions.
Listing 12.4: An empirical distribution table is created from a list of values. The table includes columns for the value, frequency, and normalized frequency of each unique value in the dataset. The functions GetUniqueValuesFrom, GetFrequencyUniqueValuesFrom, and GetNormalizedFrequencies are used to generate the table. The FindValueInVector function is a utility function that searches for a specific value in an array.
Function createEmpiricalDistributionTable(values() As Variant) As Variant
   
    Dim empiricalDistributionTable() As Variant
    
    Dim uniqueValues() As Variant
    Dim frequencies() As Long
    Dim normalizedFrequencies() As Double
    
    uniqueValues = GetUniqueValuesFrom(values)
    frequencies = GetFrequencyUniqueValuesFrom(uniqueValues, values)
    normalizedFrequencies = GetNormalizedFrequencies(frequencies)
    
    'Empirical distribution table columns
    Const COLUMN_VALUE = 1
    Const COLUMN_FREQUENCY = 2
    Const COLUMN_NORMALIZED = 3
    
    'Create table with one extra row for the headers
    ReDim empiricalDistributionTable( _
        LBound(uniqueValues) To UBound(uniqueValues) + 1, _
        COLUMN_VALUE To COLUMN_NORMALIZED)
    
    'Table headers
    empiricalDistributionTable(1, COLUMN_VALUE) = "Value"
    empiricalDistributionTable(1, COLUMN_FREQUENCY) = "Frequency"
    empiricalDistributionTable(1, COLUMN_NORMALIZED) = "Normal"
    
    Dim row As Long
    For row = LBound(uniqueValues) To UBound(uniqueValues)
        empiricalDistributionTable(row + 1, COLUMN_VALUE) = uniqueValues(row)
        empiricalDistributionTable(row + 1, COLUMN_FREQUENCY) = frequencies(row)
        empiricalDistributionTable(row + 1, COLUMN_NORMALIZED) = normalizedFrequencies(row)
    Next row
    
    createEmpiricalDistributionTable = empiricalDistributionTable
    
End Function

'See spreadsheet `SimulationEmpiricalDistribution`
Sub ExampleCreateEmpiricalDistributionTableFromSpreadsheetValuesAndSaveOnSpreadsheet()

    Dim empiricalDistributionTable() As Variant
    Dim distributionValues() As Variant
    
    'Read values from spreadsheet (column B)
    distributionValues = ReadSpreadsheetColumnValuesFromTable("SimulationEmpiricalDistribution", 2)
    empiricalDistributionTable = createEmpiricalDistributionTable(distributionValues)

    'Write empirical distribution table to spreadsheet (column D)
    Call WriteTableInSpreadsheetCell(empiricalDistributionTable, "SimulationEmpiricalDistribution", 1, 4)
    
End Sub


'Identify and list all distinct values present in the dataset
Function GetUniqueValuesFrom(values() As Variant) As Variant()
    
    Dim lenVec As Long
    Dim uniqueValues() As Variant
    Dim value As Variant
    
    For Each value In values
        
        'Try to find the current value in list of unique values
        Dim posValue As Long
        posValue = FindValueInVector(value, uniqueValues)
        
        'If value not in list of unique values
        If posValue = -1 Then
        
            'Resize the lists of unique values
            ReDim Preserve uniqueValues(1 To lenVec + 1)
            
            'Add value
            uniqueValues(lenVec + 1) = value

            lenVec = lenVec + 1

        End If
        
        
    Next value
        
    GetUniqueValuesFrom = uniqueValues
    
End Function

Sub TestGetUniqueValuesFrom()

    Dim values() As Variant
    Dim uniqueValues() As Variant
    values = Array("A", "A", "B", "B", "B", "B", "B", "B", "C", "C")
    uniqueValues = GetUniqueValuesFrom(values)
    Debug.Assert uniqueValues(1) = "A"
    Debug.Assert uniqueValues(2) = "B"
    Debug.Assert uniqueValues(3) = "C"
    
End Sub


'For each unique value, calculate how frequently it appears in the dataset.
Function GetFrequencyUniqueValuesFrom(listUniqueValues() As Variant, _
                                      listValues() As Variant) As Long()

    Dim frequenciesValues() As Long
    
    If LengthVector(listUniqueValues) > 0 Then
            
        ReDim frequencyValues(LBound(listUniqueValues) To _
                              UBound(listUniqueValues)) As Long
        
        Dim i As Long
        Dim value As Variant
        
        'Loop unique values
        For i = LBound(listUniqueValues) To UBound(listUniqueValues)
            'Loop all values
            For Each value In listValues
                'Count how many times unique value appeared
                If value = listUniqueValues(i) Then
                    frequencyValues(i) = frequencyValues(i) + 1
                End If
            Next value
        Next i
    
    End If
    
    GetFrequencyUniqueValuesFrom = frequencyValues

End Function


Sub TestGetFrequencyUniqueValuesFrom()
    
    'Function `GetUniqueValuesFrom` is used below, so it is also tested
    TestGetUniqueValuesFrom
    
    Dim values() As Variant
    Dim frequencyUniqueValues() As Long
    values = Array("A", "A", "B", "B", "B", "B", "B", "B", "C", "C")
    frequencyUniqueValues = GetFrequencyUniqueValuesFrom(GetUniqueValuesFrom(values), values)
    
    frequencyUniqueValues(1) = 2
    frequencyUniqueValues(2) = 6
    frequencyUniqueValues(3) = 2
    
End Sub


'Divide the frequencies by the total number of data points.
'This normalization step yields the probability of each unique value
'occurring in the dataset.
Function GetNormalizedFrequencies(frequencies() As Long) As Double()
    
    Dim normalFrequencies() As Double
    
    Dim f As Long
    Dim sumFrequencies As Long

    ReDim normalFrequencies(LBound(frequencies) To _
                            UBound(frequencies)) As Double
    
    For f = LBound(frequencies) To UBound(frequencies)
        sumFrequencies = sumFrequencies + frequencies(f)
    Next f
    
    For f = LBound(frequencies) To UBound(frequencies)
        normalFrequencies(f) = frequencies(f) / sumFrequencies
    Next f
   
    GetNormalizedFrequencies = normalFrequencies
    
End Function

Function TestGetNormalizedFrequencies()
    
    'Functions `GetUniqueValuesFrom` and `GetFrequencyUniqueValuesFrom` are
    'used below, so they is also tested
    TestGetFrequencyUniqueValuesFrom
    TestGetUniqueValuesFrom
    
    Dim values() As Variant
    Dim normalizedFrequencies() As Double
    values = Array("A", "A", "B", "B", "B", "B", "B", "B", "C", "C")
    normalizedFrequencies = GetNormalizedFrequencies( _
        GetFrequencyUniqueValuesFrom(GetUniqueValuesFrom(values), values))
    normalizedFrequencies(1) = 0.2
    normalizedFrequencies(2) = 0.6
    normalizedFrequencies(3) = 0.2
    
End Function


Function FindValueInVector(value As Variant, vector() As Variant) As Long
    
    FindValueInVector = -1
    Dim i As Long
    
    'If joining values results in an empty string "", array is empty
    If Join(vector, "") = "" Then
        Exit Function
    End If
    
    For i = LBound(vector) To UBound(vector)
        If vector(i) = value Then
            FindValueInVector = i
            Exit Function
        End If
    Next i
    
End Function

Sub TestFindValueInVector()

    Dim vector() As Variant
    
    'Try to find value in empty array
    Debug.Assert FindValueInVector(10, vector) = -1
    
    'Populate vector and try to find value
    vector = Array(10, 40, 50, 3)
    Debug.Assert FindValueInVector(10, vector) = 0
    Debug.Assert FindValueInVector(40, vector) = 1
    Debug.Assert FindValueInVector(50, vector) = 2
    Debug.Assert FindValueInVector(3, vector) = 3
    Debug.Assert FindValueInVector(15, vector) = -1
    
End Sub

12.2.3.4 Using an Empirical Distribution

In Listing 12.5, we demonstrate how to use an empirical distribution to simulate random values according to their probabilities. The simulation is broken down into several functions and subprocedures:

  • SelectRndValueBasedOnEmpiricalDistribution: Selects a random value based on its probability in the distribution.
    • Input: Array of values and array of probabilities (e.g., [“A”, “B”, “C”] and [0.2, 0.6, 0.2])
    • Output: Random value (“A”, “B”, “C”) based on the distribution. “B” has 60% chance of being selected, “A” 20%, and “C” 20%.
    • Testing: TestSelectRndValueBasedOnEmpiricalDistribution subprocedure.
  • SampleValuesFrom: Samples n values from the distribution according to their probabilities.
    • Input: Number of samples, array of values, and array of probabilities (e.g., 10, [“A”, “B”, “C”], [0.2, 0.6, 0.2])
    • Output: Array of n random values based on the distribution (e.g., [“B”, “B”, “A”, “B”, “B”, “B”, “B”, “B”, “B”, “C”])
  • TestSelectRndValueBasedOnEmpiricalDistribution: Demonstrates the use of the above functions with an example involving grades (A, B, C) and their associated probabilities.
  • PrintTable: Prints a table to the debug window with formatted columns. This function is used to display the empirical distribution table.
  • IsEmpiricalDistributionInputValid, VectorsHaveDifferentLength, and ProbabilitiesDoNotAddUpToOne: Helper functions to check the validity of the input data for the empirical distribution.
  • LengthVector and TestLengthVector: Helper functions to calculate the length of a vector and test the LengthVector function.

Run the code from TestSelectRndValueBasedOnEmpiricalDistribution to see how the empirical distribution is used to simulate random values based on their probabilities. The sample size n is increased to demonstrate how the probabilities become more accurate as the sample size grows.

Listing 12.5: Example of using an empirical distribution to simulate random values according to their probabilities. The SelectRndValueBasedOnEmpiricalDistribution function selects a random value based on its probability in the distribution. The SampleValuesFrom function samples n values from the distribution according to their probabilities. The TestSelectRndValueBasedOnEmpiricalDistribution subprocedure demonstrates the use of these functions with an example involving grades (A, B, C) and their associated probabilities.
Option Explicit

'Select a random value according to its probability in the distribution
Function SelectRndValueBasedOnEmpiricalDistribution( _
    values() As Variant, _
    probabilities() As Double) As Variant
    
    Dim RandomNumber As Double
    Dim TotalProbability As Double
    TotalProbability = 0
    Dim indexProbability As Long
    RandomNumber = Rnd
    For indexProbability = LBound(probabilities) To UBound(probabilities)
        TotalProbability = TotalProbability + probabilities(indexProbability)
        If RandomNumber < TotalProbability Then
            Exit For
        End If
    Next indexProbability
        
    SelectRndValueBasedOnEmpiricalDistribution = values(indexProbability)
    
End Function

'Sample `n` values from `values` according to `probabilities`
Function SampleValuesFrom(n As Long, _
                          values() As Variant, _
                          probabilities() As Double) As Variant()

    If IsEmpiricalDistributionInputValid(values, probabilities) Then
    
        Debug.Print vbCrLf & "# Sample size = " & n
        Dim count As Long
        ReDim listValues(1 To n) As Variant
        
        Do
            count = count + 1
            Dim rndValue As Variant
            rndValue = SelectRndValueBasedOnEmpiricalDistribution(values, probabilities)
            listValues(count) = rndValue
            
            'Debug.Print "Value " & count, rndValue
        Loop While count < n
        SampleValuesFrom = listValues
    
    End If
    
End Function

'Use empirical distribution to simulate random values according
'to their probabilities.
Sub TestSelectRndValueBasedOnEmpiricalDistribution()

    Dim values(1 To 4) As Variant
    Dim probabilities(1 To 4) As Double
     
    'Example with grades
    values(1) = "A"
    probabilities(1) = 0.2
    
    values(2) = "B"
    probabilities(2) = 0.6
    
    values(3) = "C"
    probabilities(3) = 0.2
    
    'Check if empirical distribution is correct
    Debug.Assert IsEmpiricalDistributionInputValid(values, probabilities)
    
    
    Dim n As Long
    n = 10
    'Notice how probabities are more accurate as sample size `n` grows
    Do
    
        Dim samples() As Variant
        samples = SampleValuesFrom(n, values, probabilities)
        'Examine probabilities
        Dim table() As Variant
        table = createEmpiricalDistributionTable(samples)
        Call PrintTable(table)
        n = n * 10
    Loop Until n > 10000000

End Sub

Sub PrintTable(table() As Variant)
    Dim i As Long
    Dim j As Long
    Dim cellWidth As Integer
    Dim output As String

    ' Calculate the maximum cell width based on the longest element in the table
    cellWidth = 0
    For i = LBound(table, 1) To UBound(table, 1)
        For j = LBound(table, 2) To UBound(table, 2)
            Dim cellLength As Integer
            cellLength = Len(table(i, j))
            If cellLength > cellWidth Then
                cellWidth = cellLength
            End If
        Next j
    Next i

    ' Print the table
    For i = LBound(table, 1) To UBound(table, 1)
        For j = LBound(table, 2) To UBound(table, 2)
            output = output & Left(table(i, j) & String(cellWidth, " "), cellWidth) & vbTab
        Next j
        Debug.Print output
        output = ""
    Next i
End Sub


'# Helpers to check validity of empirical distribution
' - Values and probabilies vectors have to have same size
' - Probabilities have to add up to one

Function IsEmpiricalDistributionInputValid( _
        values() As Variant, _
        probabilities() As Double) As Boolean
        
    'Check if input data is valid
    If VectorsHaveDifferentLength(values, probabilities) Or _
       ProbabilitiesDoNotAddUpToOne(probabilities) Then
        IsEmpiricalDistributionInputValid = False
    Else
        IsEmpiricalDistributionInputValid = True
    End If
End Function

Function VectorsHaveDifferentLength(v1 As Variant, v2 As Variant) As Boolean
    If LBound(v1) <> LBound(v2) Or UBound(v1) <> UBound(v2) Then
        VectorsHaveDifferentLength = True
        Debug.Print ("Error! Value and probability vectors have different length.")
    Else
        VectorsHaveDifferentLength = False
    End If
    
End Function


Function ProbabilitiesDoNotAddUpToOne(probabilities() As Double) As Boolean
    
    Dim sumProbabilities As Double
    Dim probability As Variant
    
    For Each probability In probabilities
        sumProbabilities = sumProbabilities + probability
    Next probability
    
    'Precision level (necessary due to rounding innacuracies by the computer)
    If Abs(sumProbabilities - 1) > 0.00000001 Then
        ProbabilitiesDoNotAddUpToOne = True
        Debug.Print ("Error! Probabilites do not add up to one.")
    Else
        ProbabilitiesDoNotAddUpToOne = False
    End If
    
End Function

'Assumes vector is populated. For unpopulated arrays, check http://www.cpearson.com/excel/VBAArrays.htm
Function LengthVector(vector() As Variant) As Long
    LengthVector = UBound(vector) - LBound(vector) + 1
End Function

Sub TestLengthVector()

    Dim vector() As Variant
    'Try to get length of empty array
    Debug.Assert LengthVector(vector) = 0

    'Populate vector and try to find value
    vector = Array(10, 40, 50, 3)
    Debug.Assert LengthVector(vector) = 4
    
End Sub

12.2.4 Exercises

12.2.4.1 Number of Points To Match PI Precision

Create a sub to estimate the number of points needed to match the PI precision to a certain number of decimal places. Your sub should receive the maximum error allowed as Double and return the number of points as Long.

Function EstimateNumPointsToMatchPiPrecision(maxError As Double) As Long
    ' Your code here!
End Function

Sub TestEstimateNumPointsToMatchPiPrecision()
    Dim maxError As Double
    maxError = 0.0001
    Debug.Print EstimateNumPointsToMatchPiPrecision(maxError)
End Sub

12.2.4.2 Plotting Random Points (PI Estimation) Inside a Circle

Create a subprocedure to plot the random points generated by the GenerateRandomPoints function inside a circle with radius 1. Then, create scatter plot to show points inside the circle red and outside blue.