16  Data Structures

A data structure is a method of organizing and storing data in a computer so that it can be accessed and manipulated efficiently. They provide a level of abstraction, allowing programmers to manage data without needing to worry about the low-level implementation details.

Choosing the appropriate data structure is essential because different structures offer varying performance in terms of memory usage, speed, and ease of use. For example, suppose you are storing a list of customer orders, each containing specific details. If you have n orders, consider the following factors:

The complexity of these operations will depend on the data structure you choose. Different data structures are optimized for specific tasks and involve trade-offs, making it critical to match the structure to the needs of the operations you intend to perform.

Big-O Notation

To compare the performance of different data structures, computer scientists use the Big-O notation. It is a mathematical notation that describes the performance of algorithms and data structures in terms of their time and space complexity when the input size approaches infinity. For example, an algorithm with a time complexity of O(n) means that the time taken by the algorithm grows linearly with the input size n. An algorithm with a time complexity of O(n^2) grows quadratically with the input size. If you are interested in learning more about Big-O notation, check out the Big-O Cheat Sheet.

In this chapter, we will explore common data structures in Python, including lists, tuples, sets, and dictionaries. We will discuss their properties, use cases, and performance characteristics to help you choose the right data structure for your needs.

16.1 Lists

Lists are a collection of elements that are ordered and changeable. They are defined by enclosing the elements in square brackets [] and separated by commas. Lists can contain elements of different data types, including other lists. In Table 16.1, we have an example of a list containing integers, strings, and a nested list.

Table 16.1: Examples of lists in Python.
List Example Description
["apple", "banana", "cherry"] A list of strings.
[1, 2, 3, 4, 5] A list of integers.
["apple", 1, "banana", 2] A list of mixed data types.
["apple", ["banana", "cherry"], "orange"] A list with a nested list.
[] An empty list.
[[], [], []] A list of empty lists.
[1, [2, [3, [4]]]] A nested list with multiple levels. Notice the , separating the elements.
[[1, 2], [3, 4], [5, 6]] A 2D list with three rows and two columns.
Nomeclature: Arrays, Lists, and Vectors

In computer science, the terms array, list, and vector are often used interchangeably to refer to a collection of elements. However, there are subtle differences between them:

  • Array: An array is a fixed-size collection of elements of the same data type. Arrays are contiguous blocks of memory, and elements are accessed using an index. In Python, lists are more flexible than traditional arrays because they can store elements of different data types and grow dynamically.
  • Vector: A vector is a one-dimensional array that can grow or shrink in size. In some programming languages, vectors are implemented as resizable arrays, similar to Python lists. The term vector is commonly used in mathematical and scientific contexts to represent a one-dimensional array of values.
  • List: Lists in Python are similar to arrays in other programming languages but offer additional functionality, such as built-in methods for manipulation and iteration.

In this chapter, we will use the term list to refer to collections of elements in Python.

16.1.1 List Slicing

In Python, you can use list slicing to extract a subset of elements from a list. This is similar to string slicing, where you extract a substring from a string. List slicing is done by specifying the start and end indices separated by a colon :, and an optional step size separated by another colon :. The start index is inclusive, and the end index is exclusive. In Table 16.2, we have examples of list slicing considering the list l = [1, 2, 3, 4, 5].

Table 16.2: Examples of list slicing in Python.
List Slicing Description Example Result
l[start:end] Returns a subset of the list from the start index to the end index (excluding the end index). l[1:3] [2, 3]
l[:end] Returns a subset of the list from the beginning to the end index (excluding the end index). l[:3] [1, 2, 3]
l[start:] Returns a subset of the list from the start index to the end. l[2:] [3, 4, 5]
l[:] Returns a copy of the entire list. l[:] [1, 2, 3, 4, 5]
l[start:end:step] Returns a subset of the list from the start index to the end index (excluding the end index) with a specified step size. l[1:5:2] [2, 4]
l[::-1] Reverses the order of the elements in the list. l[::-1] [5, 4, 3, 2, 1]
l[::2] Returns every second element of the list. l[::2] [1, 3, 5]
l[-1] Returns the last element of the list. l[-1] 5
l[-2] Returns the second-to-last element of the list. l[-2] 4
l[-3:] Returns the last three elements of the list. l[-3:] [3, 4, 5]
l[:-2] Returns all elements except the last two elements of the list. l[:-2] [1, 2, 3]
l[-1:-4:-1] Returns the last three elements of the list in reverse order. l[-1:-4:-1] [5, 4, 3]

16.1.2 Multidimensional Lists

A multidimensional list is a list of lists, where each element in the list is itself a list. This allows you to create a matrix or a table-like structure in Python. In Table 16.3, we have an example of a 2D list representing a matrix.

Table 16.3: Examples of multidimensional lists in Python.
Multidimensional List Description
[[1, 2, 3], [4, 5, 6], [7, 8, 9]] A 2D list representing a 3x3 matrix.
[[1, 2], [3, 4], [5, 6]] A 2D list representing a 3x2 matrix.
[[1], [2], [3], [4]] A 2D list representing a 4x1 matrix.
[[1, 2, 3], [4, 5], [6, 7, 8, 9]] A 2D list with varying row lengths.
[[[1, 2], [3, 4]], [[5, 6], [7, 8]]] A 3D list representing a 2x2x2 cube.

Remember that you can access elements in a multidimensional list by using multiple indices. For example, to access the element at row i and column j of a 2D list matrix, you can use matrix[i][j]. In Listing 16.1, we have an example of accessing elements in a 2D list.

Listing 16.1: Accessing the element at row 1 and column 2 of a 2D list.
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Accessing the element at row 1 and column 2
element = matrix[1][2]
print(element)  # Output: 6
6

16.1.3 List Methods

Python provides several built-in methods to work with lists. These methods allow you to perform common operations on lists, such as adding elements, removing elements, sorting elements, and more. In Table 29.14, we have a list of common list methods in Python.

Table 16.4: Common list methods in Python.
List Method Description Example Result
list.append(element) Adds an element to the end of the list. fruits = ["apple", "banana"]; fruits.append("cherry") ["apple", "banana", "cherry"]
list.insert(index, element) Inserts an element at the specified index. fruits = ["apple", "banana"]; fruits.insert(1, "cherry") ["apple", "cherry", "banana"]
list.remove(element) Removes the first occurrence of the specified element from the list. fruits = ["apple", "banana", "cherry"]; fruits.remove("banana") ["apple", "cherry"]
list.pop(index) Removes the element at the specified index (or the last element if no index is specified) and returns it. fruits = ["apple", "banana", "cherry"]; fruits.pop(1) ["apple", "cherry"]
list.index(element) Returns the index of the first occurrence of the specified element in the list. ["apple", "banana", "cherry"].index("banana") 1
element in list Returns True if the element is present in the list, False otherwise. "banana" in ["apple", "banana", "cherry"] True
list.count(element) Returns the number of occurrences of the specified element in the list. ["apple", "banana", "cherry", "banana"].count("banana") 2
list.sort() Sorts the elements of the list in ascending order. fruits = ["banana", "apple", "cherry"]; fruits.sort() ["apple", "banana", "cherry"]
list.reverse() Reverses the order of the elements in the list. fruits = ["apple", "banana", "cherry"]; fruits.reverse() ["cherry", "banana", "apple"]
list.copy() Returns a copy of the list. fruits = ["apple", "banana", "cherry"]; fruits_copy = fruits.copy() ["apple", "banana", "cherry"]
list.clear() Removes all elements from the list. fruits = ["apple", "banana", "cherry"]; fruits.clear() []
len(list) Returns the number of elements in the list. len(["apple", "banana", "cherry"]) 3
max(list) Returns the maximum element in the list. max([3, 1, 4, 1, 5, 9, 2, 6, 5]) 9
min(list) Returns the minimum element in the list. min([3, 1, 4, 1, 5, 9, 2, 6, 5]) 1
sum(list) Returns the sum of all elements in the list. sum([1, 2, 3, 4, 5]) 15
sorted(list) Returns a new list with the elements of the original list sorted. sorted([3, 1, 4, 1, 5, 9, 2, 6, 5]) [1, 1, 2, 3, 4, 5, 5, 6, 9]
list.extend(iterable) Extends the list by appending elements from the iterable. fruits = ["apple", "banana"]; fruits.extend(["cherry", "orange"]) ["apple", "banana", "cherry", "orange"]
sorted(list, key=function) Returns a new list with the elements of the original list sorted using the specified key function. sorted(["apple", "banana", "cherry"], key=len) ["apple", "cherry", "banana"]
reversed(list) Returns an iterator that iterates over the elements of the list in reverse order. list(reversed(["apple", "banana", "cherry"])) ["cherry", "banana", "apple"]

16.1.4 Inplace vs. Out-of-place Operations

In Python, operations on lists can be categorized as inplace or out-of-place operations based on whether the original list is modified or a new list is created. Inplace operations modify the original list, while out-of-place operations create a new list without modifying the original list.

  • Example 1. Append Operation: The list.append(element) method is an inplace operation because it modifies the original list by adding an element to the end. On the other hand, the list + [element] operation is an out-of-place operation because it creates a new list by concatenating the original list with another list.
  • Example 2. Sort Operation: The list.sort() method is an inplace operation because it modifies the original list by sorting the elements in place. In contrast, the sorted(list) function is an out-of-place operation because it creates a new sorted list without modifying the original list.
  • Example 3. Reverse Operation: The list.reverse() method is an inplace operation because it modifies the original list by reversing the order of the elements. The reversed(list) function is an out-of-place operation because it creates a new list with the elements in reverse order.

16.1.5 Looping Through a List

You can iterate over the elements of a list using a for loop. In each iteration, the loop variable takes the value of the current element in the list. In Listing 16.2, we have an example of looping through a list.

Listing 16.2: Looping through a list of fruits and printing each fruit.
fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)
apple
banana
cherry

In Listing 16.3, we have an example of looping through a list with the index of each element using the range function.

Listing 16.3: Looping through a list of fruits and printing the index and fruit for each element using the range function.
fruits = ["apple", "banana", "cherry"]
for i in range(len(fruits)):
    print(f"Index: {i}, Fruit: {fruits[i]}")
Index: 0, Fruit: apple
Index: 1, Fruit: banana
Index: 2, Fruit: cherry

In Listing 16.4, we have an example of looping through a list with the index of each element. The enumerate function is used to get both the index and the element in each iteration.

Listing 16.4: Looping through a list of fruits and printing the index and fruit for each element.
fruits = ["apple", "banana", "cherry"]
for i, fruit in enumerate(fruits):
    print(f"Index: {i}, Fruit: {fruit}")
Index: 0, Fruit: apple
Index: 1, Fruit: banana
Index: 2, Fruit: cherry

If you transform the result of the enumerate function into a list, you will get a list of tuples where each tuple contains the index and the element. In Listing 16.5, we have an example of transforming the result of the enumerate function into a list. So, the output will be [(0, "apple"), (1, "banana"), (2, "cherry")]. Therefore, what the for loop does is to map each tuple to the variables i and fruit.

Listing 16.5: Transforming the result of the enumerate function into a list of tuples containing the index and the element.
fruits = ["apple", "banana", "cherry"]
indexed_fruits = list(enumerate(fruits))
print(indexed_fruits)
[(0, 'apple'), (1, 'banana'), (2, 'cherry')]

If you want to loop through a list in reverse order, you can use the reversed function. In Listing 16.6, we have an example of looping through a list in reverse order.

Listing 16.6: Looping through a list of fruits in reverse order and printing each fruit.
fruits = ["apple", "banana", "cherry"]
for fruit in reversed(fruits):
    print(fruit)
cherry
banana
apple

In multidimensional lists, you can use nested loops to iterate over the elements of the inner lists. In Listing 16.7, we have an example of looping through a 2D list.

Listing 16.7: Looping through a 2D list and printing each element. The inner loop iterates over the elements of each row, and the outer loop iterates over the rows.
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
for row in matrix:
    for element in row:
        print(element, end=" ")
    print()
1 2 3 
4 5 6 
7 8 9 

16.1.6 List Comprehension

List comprehension is a concise way to create lists in Python. It allows you to create a new list by applying an expression to each element of an existing list. In Table 16.5, we have examples of list comprehension.

Table 16.5: Examples of list comprehension in Python.
List Comprehension Description Example Result
[expression for item in list] Applies the expression to each item in the list. [x*2 for x in [1, 2, 3]] [2, 4, 6]
[expression for item in list if condition] Applies the expression to each item in the list that satisfies the condition. [x*2 for x in [1, 2, 3] if x > 1] [4, 6]
[expression if condition else other for item in list] Applies the expression to each item in the list if the condition is True, otherwise applies other. [x*2 if x > 1 else x for x in [1, 2, 3]] [1, 4, 6]

Why Use List Comprehension?

The main advantages of using list comprehension are:

  • Concise Syntax: List comprehension allows you to create lists in a single line of code, making it more concise and readable.
  • Efficient: List comprehension is more efficient than traditional for loops as it reduces the number of lines of code and improves performance. The performance improvement comes from the fact that list comprehension is optimized for speed (it is implemented in C under the hood).
  • Functional Programming: List comprehension is a functional programming concept that emphasizes the use of expressions and functions to create lists, making your code more functional and declarative. So far, you have been using imperative programming, which focuses on statements and loops to achieve a result.
  • Versatile: List comprehension can be used for a wide range of operations, such as filtering, mapping, and transforming lists, making it a versatile tool for data processing and manipulation.
  • Pythonic: List comprehension is a common practice in Python and is considered Pythonic, meaning it follows the idiomatic style and conventions of the Python language.

About the speed-up factor, Listing 16.8 provides an example of how list comprehension can be faster than traditional for loops.

Listing 16.8: Comparing the runtime of creating a list of squared numbers using traditional for loop and list comprehension. The runtime is given in seconds. Using list comprehension is 1.47 times faster than using a for loop. The difference in runtime becomes more significant as the size of the list increases. The time module is used to measure the runtime (in seconds) of each operation.
(a) Version with for loop.
import time
start_time = time.time()
squared = []
for i in range(10_000_000):
    squared.append(i ** 2)
runtime_for = time.time() - start_time
print("Runtime:", runtime_for)
Runtime: 1.840977668762207
(b) Version with list comprehension.
import time
start_time = time.time()
squared = [i ** 2 for i in range(10_000_000)]
runtime_lc = time.time() - start_time
print("Runtime:", runtime_lc)
Runtime: 1.2558317184448242

In examples Listing 16.9, Listing 16.10, Listing 16.11, and Listing 16.12, we have examples of list comprehension compared to traditional for loops.

Listing 16.9: Applying the expression n ** 2 to each element n in the list numbers using traditional for loop and list comprehension.
(a) Version with for loop.
numbers = [1, 2, 3, 4, 5]
squared = []
for n in numbers:
    squared.append(n ** 2)
(b) Version with list comprehension.
numbers = [1, 2, 3, 4, 5]
squared = [n ** 2 for n in numbers]
Listing 16.10: Filtering elements that are even using traditional for loop and list comprehension.
(a) Version with for loop.
numbers = [1, 2, 3, 4, 5]
even = []
for n in numbers:
    if n % 2 == 0:
        even.append(n)
(b) Version with list comprehension.
numbers = [1, 2, 3, 4, 5]
even = [n for n in numbers if n % 2 == 0]
Listing 16.11: Apply the expression n ** 2 to even elements and n to odd elements using traditional for loop and list comprehension.
(a) Version with for loop.
numbers = [1, 2, 3, 4, 5]
result = []
for n in numbers:
    if n % 2 == 0:
        result.append(n ** 2)
    else:
        result.append(n)
(b) Version with list comprehension.
numbers = [1, 2, 3, 4, 5]
result = [n ** 2 if n % 2 == 0 else n for n in numbers]
Listing 16.12: Flattening a 2D list using traditional for loop and list comprehension. Flattening means converting a multi-dimensional list into a single-dimensional list.
(a) Version with for loop.
numbers = [[1, 2], [3, 4], [5, 6]]
flattened = []
for sublist in numbers:
    for n in sublist:
        flattened.append(n)
(b) Version with list comprehension.
numbers = [[1, 2], [3, 4], [5, 6]]
flattened = [n for sublist in numbers for n in sublist]

16.2 Tuples

Tuples are similar to lists but are immutable, meaning their elements cannot be changed after creation. Tuples are defined by enclosing the elements in parentheses () and separating them by commas. In Table 29.19, we have examples of tuples containing integers, strings, and a nested tuple.

Table 16.6: Examples of tuples in Python.
Tuple Example Description
("apple", "banana", "cherry") A tuple of strings.
(1, 2, 3, 4, 5) A tuple of integers.
("apple", 1, "banana", 2) A tuple of mixed data types.
("apple", ("banana", "cherry"), "orange") A tuple with a nested tuple.
() An empty tuple.
((1, 2), (3, 4), (5, 6)) A 2D tuple with three rows and two columns.
(1,) A tuple with a single element. Note the comma , after the element.

16.2.1 Tuple Packing and Unpacking

Tuple packing is the process of creating a tuple by enclosing multiple values in parentheses. Tuple unpacking is the process of extracting the values from a tuple into separate variables. In Listing 16.13, we have examples of tuple packing and unpacking.

Listing 16.13: Tuple packing and unpacking in Python.
# Tuple packing
fruits = "apple", "banana", "cherry"

# Tuple unpacking
fruit1, fruit2, fruit3 = fruits

print(fruit1)  # Output: apple
print(fruit2)  # Output: banana
print(fruit3)  # Output: cherry
apple
banana
cherry

16.2.2 Tuple Methods

Tuples are immutable, so they have fewer methods compared to lists. However, they still provide some useful methods for working with tuples. In Table 29.20, we have a list of common tuple methods in Python for a tuple t = (1, 2, 3, 1, 2).

Table 16.7: Common tuple methods in Python considering a tuple t = (1, 2, 3, 1, 2).
Tuple Method Description Example Result
tuple.count(element) Returns the number of occurrences of the specified element in the tuple. t.count(1) 2
tuple.index(element) Returns the index of the first occurrence of the specified element in the tuple. t.index(2) 1
len(tuple) Returns the number of elements in the tuple. len(t) 5
max(tuple) Returns the maximum element in the tuple. max(t) 3
min(tuple) Returns the minimum element in the tuple. min(t) 1
sum(tuple) Returns the sum of elements in the tuple. sum(t) 9
sorted(tuple) Returns a list with the elements of the original tuple sorted. tuple(sorted(t)) (1, 1, 2, 2, 3)
reversed(tuple) Returns an iterator that iterates over the elements of the tuple in reverse order. tuple(reversed(t)) (2, 1, 3, 2, 1)

16.2.3 Named Tuples

Named tuples are a convenient way to define a new tuple subclass with named fields. Named tuples provide a more readable and self-documenting alternative to regular tuples by allowing you to access fields by name instead of index. In Listing 16.14, we have an example of creating a named tuple using the namedtuple function from the collections module.

Listing 16.14: Creating and using a named tuple in Python.
from collections import namedtuple

# Define a named tuple called 'Point' with fields 'x' and 'y'
Point = namedtuple("Point", ["x", "y"])

# Create a new Point object
p = Point(1, 2)

# Access the fields by name
print(p.x)  # Output: 1
print(p.y)  # Output: 2
1
2

Named tuples are useful for creating lightweight data structures with named fields. They are similar to regular tuples but provide more clarity and readability when working with structured data.

16.2.4 Zip Function

The zip function in Python is used to combine multiple iterables (such as lists, tuples, or strings) into a single iterable of tuples. The resulting iterable contains tuples where the i-th tuple contains the i-th element from each of the input iterables. In Listing 16.15, we have an example of using the zip function to combine two lists into a list of tuples.

Listing 16.15: Combining two lists into a list of tuples using the zip function.
route = ["City A", "City B", "City C"]
arrival_time = ["10:00", "11:30", "13:00"]

# Combine the two lists into a list of tuples
route_times = list(zip(route, arrival_time))

# Origin and destination pairs
od_pairs = list(zip(route[:-1], route[1:]))

print(route_times)
print(od_pairs)
[('City A', '10:00'), ('City B', '11:30'), ('City C', '13:00')]
[('City A', 'City B'), ('City B', 'City C')]

The zip function is commonly used to iterate over multiple iterables simultaneously. In Listing 16.16, we have an example of using the zip function to iterate over two lists simultaneously.

Listing 16.16: Iterating over two lists simultaneously using the zip function.
fruits = ["apple", "banana", "cherry"]
prices = [1.00, 0.50, 2.00]

for fruit, price in zip(fruits, prices):
    print(f"{fruit}: ${price}")
apple: $1.0
banana: $0.5
cherry: $2.0

The zip function can also be used to unzip a list of tuples into separate lists. In Listing 16.17, we have an example of using the zip function to unzip a list of tuples into separate lists.

Listing 16.17: Unzipping a list of tuples into separate lists using the zip function. The * operator is used to unpack the list of tuples, which is then passed to the zip function.
pairs = [
    ("apple", 1.00),
    ("banana", 0.50),
    ("cherry", 2.00)]

# Unzip the list of tuples into separate lists
fruits, prices = zip(*pairs)

print(fruits)
print(prices)
('apple', 'banana', 'cherry')
(1.0, 0.5, 2.0)

16.3 Sets

Sets are unordered collections of unique elements in Python. Sets are defined by enclosing the elements in curly braces {} and separating them by commas. Sets do not allow duplicate elements, and they are mutable, meaning you can add or remove elements from a set. In Table 16.8, we have examples of sets containing integers, strings, and a mixed set.

Table 16.8: Examples of sets in Python.
Set Example Description
{1, 2, 3, 4, 5} A set of integers.
{"apple", "banana", "cherry"} A set of strings.
{1, "apple", 2, "banana"} A mixed set of integers and strings.
set() An empty set.

16.3.1 Set Methods

Sets provide a variety of methods to perform common set operations, such as adding elements, removing elements, and performing set operations like union, intersection, and difference. In Table 29.15, we have a list of common set methods in Python for a set s = {1, 2, 3, 4, 5}.

Table 16.9: Common set methods in Python considering a set s = {1, 2, 3, 4, 5}.
Set Method Description Example Result
set.add(element) Adds an element to the set. s.add(6) {1, 2, 3, 4, 5, 6}
set.remove(element) Removes an element from the set. Raises an error if the element is not present. s.remove(3) {1, 2, 4, 5}
set.discard(element) Removes an element from the set if it is present. Does not raise an error if the element is not present. s.discard(3) {1, 2, 4, 5}
set.pop() Removes and returns an arbitrary element from the set. Raises an error if the set is empty. s.pop() 1
set.clear() Removes all elements from the set. s.clear() set()
len(set) Returns the number of elements in the set. len(s) 5
element in set Returns True if the element is present in the set, False otherwise. 2 in s True
set.union(other) Returns a new set with elements from both sets. s.union({4, 5, 6}) {1, 2, 3, 4, 5, 6}
set.intersection(other) Returns a new set with elements common to both sets. s.intersection({3, 4, 5, 6}) {3, 4, 5}
set.difference(other) Returns a new set with elements in the set but not in the other set. s.difference({3, 4}) {1, 2, 5}
set.symmetric_difference(other) Returns a new set with elements in either set but not in both sets. s.symmetric_difference({4, 5, 6}) {1, 2, 3, 6}
set.issubset(other) Returns True if the set is a subset of the other set. s.issubset({1, 2, 3, 4, 5, 6}) True
set.issuperset(other) Returns True if the set is a superset of the other set. s.issuperset({1, 2, 3}) True

16.3.2 Transforming Lists to Sets

You can convert a list to a set using the set constructor. The set constructor creates a new set from the elements of the list, removing any duplicate elements. In Listing 16.18, we have an example of converting a list to a set.

Listing 16.18: Converting a list of numbers to a set to remove duplicate elements.
numbers = [1, 2, 3, 4, 5, 1, 2, 3]
unique_numbers = set(numbers)

print(unique_numbers)  # Output: {1, 2, 3, 4, 5}
{1, 2, 3, 4, 5}

Converting a list to a set is useful when you want to remove duplicate elements from the list or perform set operations on the elements of the list.

16.3.3 Set Comprehension

Set comprehension is a concise way to create sets in Python. It allows you to create a new set by applying an expression to each element of an existing iterable. In Table 16.10, we have examples of set comprehension.

Table 16.10: Examples of set comprehension in Python.
Set Comprehension Description Example Result
{expression for element in iterable} Applies the expression to each element in the iterable. {x**2 for x in {1, 2, 3, 4, 5}} {1, 4, 9, 16, 25}
{expression for element in iterable if condition} Applies the expression to each element that satisfies the condition. {x**2 for x in {1, 2, 3, 4, 5} if x % 2 == 0} {4, 16}

16.4 Dictionaries

Dictionaries are unordered collections of key-value pairs in Python. Each key-value pair in a dictionary is separated by a colon : and enclosed in curly braces {}. Keys are unique within a dictionary, and they can be of any immutable type (such as strings, numbers, or tuples). Values can be of any type, including lists, tuples, dictionaries, or other objects. In Table 29.16, we have examples of dictionaries containing strings, numbers, and a nested dictionary.

Table 16.11: Examples of dictionaries in Python.
Dictionary Example Description
{"name": "Alice", "age": 30} A dictionary with string keys and values.
{1: "apple", 2: "banana"} A dictionary with integer keys and string values.
{("x", "y"): 10, ("a", "b"): 20} A dictionary with tuple keys and integer values.
{"name": "Alice", "address": {"city": "New York", "zip": 10001}} A dictionary with a nested dictionary as a value.
{} An empty dictionary.

16.4.1 What Can a Dictionary Key Be?

A dictionary key in Python can be of any immutable type. Immutable types are those that cannot be changed after creation. Common examples of immutable types include strings, numbers, and tuples. In Table 16.12, we have examples of valid and invalid dictionary keys.

Hashable Types

Dictionary keys in Python must be hashable, which means they must be immutable and have a hash value that does not change. A β€œhash value” is a unique integer that represents the value of an object. It is like a fingerprint that identifies the object. Using hash values allows Python to quickly look up the key in the dictionary without having to search through all the keys.

Table 16.12: Examples of valid and invalid dictionary keys in Python.
Dictionary Key Type Example Valid? Reason
String "name" Yes Strings are immutable.
Integer 42 Yes Integers are immutable.
Float 3.14 Yes Floats are immutable.
Tuple ("x", "y") Yes Tuples are immutable.
List ["apple", "banana"] No Lists are mutable.
Dictionary {"name": "Alice"} No Dictionaries are mutable.
Set {1, 2, 3} No Sets are mutable.
Frozen Set frozenset({1, 2, 3}) Yes Frozen sets are immutable.
Custom Object Point(1, 2) Depends Custom objects can be mutable or immutable. To make them immutable, you need to implement the __hash__ and __eq__ methods. (advanced topic)

To overcome the limitation of mutable types as dictionary keys, you can convert them to immutable types. For example, you can convert a list to a tuple or a set to a frozen set to use them as dictionary keys. Alternatively, you can use a string representation of the mutable type as the key. In Listing 16.19, we have an example of using a tuple as a dictionary key.

Listing 16.19: Using mutable types as dictionary keys by converting them to immutable types.
# Convert a list to a tuple to use it as a dictionary key
data = {}
key1 = tuple(["apple", "banana"])
data[key1] = 10

# Convert a set to a frozen set to use it as a dictionary key
key2 = frozenset({1, 2, 3})
data[key2] = 20

# Convert a dictionary to a string representation to use it as a dictionary key
key3 = str({"name": "Alice"})
data[key3] = 30

print(data)
{('apple', 'banana'): 10, frozenset({1, 2, 3}): 20, "{'name': 'Alice'}": 30}

16.4.2 Accessing Dictionary Elements

You can access the value associated with a key in a dictionary using the key within square brackets []. If the key is not present in the dictionary, it raises a KeyError exception. Alternatively, you can use the get method to access the value associated with the key. The get method returns None if the key is not found, or you can specify a default value to return if the key is not present. In Listing 16.20, we have examples of accessing dictionary elements by key in Python.

Listing 16.20: Accessing dictionary elements by key in Python. The get method returns a default value if the key is not found. You can also use a try-except block to handle KeyError exceptions.
person = {"name": "Alice", "age": 30, "city": "New York"}

print(person["name"])  # Output: Alice
print(person["age"])   # Output: 30
print(person["city"])  # Output: New York
print(person.get("income", 0))  # Output: 0 (default value)

# Using try-except block to handle KeyError
try:
    print(person["address"])
except Exception as e:
    print(f"KeyError: {e}")
Alice
30
New York
0
KeyError: 'address'

16.4.3 Updating Dictionary Elements

You can update the value associated with a key in a dictionary by assigning a new value to the key. If the key is not present in the dictionary, it is added with the new value. In Listing 16.21, we have examples of updating dictionary elements in Python.

Listing 16.21: Updating dictionary elements by key in Python. If the key is not present, it is added with the new value.
person = {"name": "Alice", "age": 30, "city": "New York"}

# Update the value associated with the key
person["age"] = 31
person["city"] = "Los Angeles"

# Add a new key-value pair
person["email"] = "alice@gmail.com"

print(person)
{'name': 'Alice', 'age': 31, 'city': 'Los Angeles', 'email': 'alice@gmail.com'}

16.4.4 Dictionary Methods

Dictionaries provide a variety of methods to work with key-value pairs. In Table 29.17, we have a list of common dictionary methods in Python for a dictionary d = {"name": "Alice", "age": 30, "city": "New York"}.

How to Study This?

To study the dictionary methods, create example dictionaries and apply each method to see how it worksβ€”When in doubt, code it out!.

Table 16.13: Common dictionary methods in Python considering a dictionary d = {"name": "Alice", "age": 30, "city": "New York"}.
Dictionary Method Description Example Result
dict.keys() Returns a view of the keys in the dictionary. d.keys() dict_keys(["name", "age", "city"])
dict.values() Returns a view of the values in the dictionary. d.values() dict_values(["Alice", 30, "New York"])
dict.items() Returns a view of the key-value pairs in the dictionary. d.items() dict_items([("name", "Alice"), ("age", 30), ("city", "New York")])
dict.get(key) Returns the value associated with the key, or None if the key is not found. d.get("name") "Alice"
dict.get(key, default) Returns the value associated with the key, or the default value if the key is not found. d.get("address", "Unknown") "Unknown"
dict.pop(key) Removes the key and returns the value associated with the key. d.pop("age") 30
dict.popitem() Removes and returns an arbitrary key-value pair from the dictionary. d.popitem() ("city", "New York")
dict.update(other) Updates the dictionary with key-value pairs from another dictionary or iterable. d.update({"city": "Los Angeles"}) {"name": "Alice", "city": "Los Angeles"}
dict.clear() Removes all key-value pairs from the dictionary. d.clear() {}
len(dict) Returns the number of key-value pairs in the dictionary. len(d) 3
key in dict Returns True if the key is present in the dictionary, False otherwise. "name" in d True
del dict[key] Removes the key and its associated value from the dictionary. del d["age"] {"name": "Alice", "city": "New York"}

16.4.5 Looping Through a Dictionary

You can iterate over the key-value pairs of a dictionary using a for loop. In each iteration, the loop variable takes the key of the current key-value pair. In Listing 16.22, we have an example of looping through a dictionary.

Listing 16.22: Looping through a dictionary of a person’s information and printing the key and value for each key-value pair.
person = {"name": "Alice", "age": 30, "city": "New York"}
for key in person:
    print(key, person[key])
name Alice
age 30
city New York

In Listing 16.23, we have an example of looping through a dictionary using the items method, which returns a view of the key-value pairs as tuples.

Listing 16.23: Looping through a dictionary of a person’s information using the items method and printing the key and value for each key-value pair.
person = {"name": "Alice", "age": 30, "city": "New York"}
for key, value in person.items():
    print(key, value)
name Alice
age 30
city New York

In Listing 16.24, we have an example of looping through a dictionary using the keys method, which returns a view of the keys in the dictionary.

Listing 16.24: Looping through a dictionary of a person’s information using the keys method and printing the key and value for each key-value pair.
person = {"name": "Alice", "age": 30, "city": "New York"}

for key in person.keys():
    print(key, person[key])
name Alice
age 30
city New York

In Listing 16.25, we have an example of looping through a dictionary using the values method, which returns a view of the values in the dictionary.

Listing 16.25: Looping through a dictionary of a person’s information using the values method and printing the value for each key-value pair.
person = {"name": "Alice", "age": 30, "city": "New York"}

for value in person.values():
    print(value)
Alice
30
New York

16.4.6 Dictionary Comprehension

Dictionary comprehension is a concise way to create dictionaries in Python. It allows you to create a new dictionary by applying an expression to each key-value pair of an existing dictionary. In Table 29.18, we have examples of dictionary comprehension.

Table 16.14: Examples of dictionary comprehension in Python.
Dictionary Comprehension Description Example Result
{key: value for key, value in dictionary.items()} Applies the expression to each key-value pair in the dictionary. {k: v*2 for k, v in {"a": 1, "b": 2}.items()} {"a": 2, "b": 4}
{key: value for key, value in dictionary.items() if condition} Applies the expression to each key-value pair that satisfies the condition. {k: v*2 for k, v in {"a": 1, "b": 2}.items() if v > 1} {"b": 4}
{key: expression for key in keys} Applies the expression to each key in the list of keys. {k: k*2 for k in ["a", "b"]} {"a": "aa", "b": "bb"}
{key: expression for key in keys if condition} Applies the expression to each key that satisfies the condition. {k: k*2 for k in ["a", "b"] if k == "b"} {"b": "bb"}

16.4.7 Merge Dictionaries

You can merge two dictionaries in Python using the update method or dictionary unpacking. In Listing 16.26, we have an example of merging two dictionaries using the update method and in Listing 16.27, we have an example of merging two dictionaries using dictionary unpacking. The difference between the two methods is that the update method modifies the original dictionary, while dictionary unpacking creates a new dictionary with the merged key-value pairs.

Listing 16.26: Merging two dictionaries in Python using the update method.
# Merge two dictionaries using the update method
d1 = {"a": 1, "b": 2}

d2 = {"b": 3, "c": 4}

d1.update(d2)
print(d1)  # Output: {"a": 1, "b": 3, "c": 4}
{'a': 1, 'b': 3, 'c': 4}
Listing 16.27: Merging two dictionaries in Python using dictionary unpacking.
# Merge two dictionaries using dictionary unpacking
d1 = {"a": 1, "b": 2}
d2 = {"b": 3, "c": 4}

merged = {**d1, **d2}
print(merged)  # Output: {"a": 1, "b": 3, "c": 4}
{'a': 1, 'b': 3, 'c': 4}

16.4.8 Default Dictionary

The defaultdict class from the collections module is a subclass of the standard dictionary that provides a default value for missing keys. When you access a key that does not exist in a defaultdict, it creates the key and assigns a default value to it. In Listing 16.28, we have an example of using the defaultdict class to create a dictionary with a default value of 0.

Listing 16.28: Creating a defaultdict with a default value of 0 and accessing a missing key.
from collections import defaultdict

# Create a defaultdict with a default value of 0
d = defaultdict(int)

# Access a missing key

print(d["a"])  # Output: 0
print(d)  # Output: {"a": 0}
0
defaultdict(<class 'int'>, {'a': 0})

The defaultdict class is useful when you want to avoid KeyError exceptions when accessing missing keys in a dictionary. It simplifies the handling of missing keys by automatically assigning a default value when a key is accessed for the first time.

A useful applcation is creating a frequency dictionary to count the occurrences of elements in a list. In Listing 16.29, we have an example of using a defaultdict to create a frequency dictionary.

Listing 16.29: Creating a frequency dictionary using a defaultdict to count the occurrences of elements in a list.
from collections import defaultdict

# Create a frequency dictionary using defaultdict
numbers = [1, 2, 1, 3, 2, 1, 4, 2, 5]
frequency = defaultdict(int)

for num in numbers:
    frequency[num] += 1

print(frequency)  # Output: {1: 3, 2: 3, 3: 1, 4: 1, 5: 1}
defaultdict(<class 'int'>, {1: 3, 2: 3, 3: 1, 4: 1, 5: 1})

The default value can also be a list, set, dictionary, or any other data type. In Listing 16.30, we have an example of using a defaultdict with a default value of an empty list.

Listing 16.30: Creating a defaultdict with a default value of an empty list and using it to represent a graph. The graph dictionary stores the adjacency list of each vertex in the graph.
from collections import defaultdict

edges = [("A", "B"), ("A", "C"), ("B", "C"), ("C", "D")]

# Create a defaultdict with a default value of an empty list
graph = defaultdict(list)

# Add edges to the graph
for u, v in edges:
    graph[u].append(v)

print(graph)  # Output: {"A": ["B", "C"], "B": ["C"], "C": ["D"]}
defaultdict(<class 'list'>, {'A': ['B', 'C'], 'B': ['C'], 'C': ['D']})

Also, you can use a defaultdict with a dictionary as the default value to create a nested dictionary. In Listing 16.31, we have an example of using a defaultdict with a default value of an empty dictionary. This is useful for creating a nested dictionary where the keys of the outer dictionary correspond to the keys of the inner dictionary.

Listing 16.31: Creating a defaultdict with a default value of an empty dictionary and using it to represent a weighted graph. The graph dictionary stores the edge weights between vertices in the graph.
from collections import defaultdict

# Create a defaultdict with a default value of an empty dictionary

graph = defaultdict(dict)

edges = [("A", "B", 30.0), ("A", "C", 20.0), ("B", "C", 10.0)]

# Add edges to the graph
for u, v, w in edges:
    graph[u][v] = w

print(graph)  # Output: {"A": {"B": 30.0, "C": 20.0}, "B": {"C": 10.0}}
defaultdict(<class 'dict'>, {'A': {'B': 30.0, 'C': 20.0}, 'B': {'C': 10.0}})

16.4.9 Ordered Dictionary

The OrderedDict class from the collections module is a subclass of the standard dictionary that maintains the order of key-value pairs based on the order of insertion. In regular dictionaries, the order of key-value pairs is not guaranteed, and it may vary between different Python versions. This is because dictionaries are implemented as hash tables, which do not preserve the order of elements: each key is associated with a hash value (an integer that represents the key) that determines its position in the hash table. Therefore, the order cannot be relied upon when iterating over a regular dictionary. In contrast, when you iterate over an OrderedDict, the key-value pairs are returned in the order they were inserted. In Listing 16.32, we have an example of using the OrderedDict class to create an ordered dictionary.

Listing 16.32: Creating an OrderedDict and adding key-value pairs to maintain the order of insertion.
from collections import OrderedDict

# Create an ordered dictionary
od = OrderedDict()

# Add key-value pairs to the ordered dictionary
od["a"] = 1
od["b"] = 2
od["c"] = 3

print(od)  # Output: OrderedDict([('a', 1), ('b', 2), ('c', 3)])
OrderedDict({'a': 1, 'b': 2, 'c': 3})

16.4.10 Reading JSON Data into a Dictionary

JSON (JavaScript Object Notation) is a popular data format for storing and exchanging data. It is human-readable and easy to parse in most programming languages, including Python. You can read JSON data from a file or a string and convert it into a dictionary using the json module.

Consider you have file data.json with the following JSON data:

{
    "name": "Alice",
    "age": 30,
    "city": "New York"
}

You can read the JSON data from the file and convert it into a dictionary using the json.load function. In Listing 16.33, we have an example of reading JSON data from a file and converting it into a dictionary.

Listing 16.33: Reading JSON data from a file and converting it into a dictionary using the json.load function.
import json

# Read JSON data from a file and convert it into a dictionary
with open("data.json") as file:
    data = json.load(file)

print(data)  # Output: {"name": "Alice", "age": 30, "city": "New York"}

You can also convert a JSON string into a dictionary using the json.loads function. In Listing 16.34, we have an example of converting a JSON string into a dictionary.

Listing 16.34: Converting JSON data from a string into a dictionary using the json.loads function.
import json

# JSON data as a string
json_data = '{"name": "Alice", "age": 30, "city": "New York"}'

# Convert JSON data into a dictionary
data = json.loads(json_data)

print(data)  # Output: {"name": "Alice", "age": 30, "city": "New York"}
{'name': 'Alice', 'age': 30, 'city': 'New York'}

16.5 Type Hints

When you define functions involving data structures, it is helpful to use type hints to specify the expected types of the input arguments and return values. Type hints provide additional information about the function’s parameters and return values, making the code more readable and easier to understand. In Table 16.15, we provide examples of using type hints for functions that work with lists, tuples, sets, and dictionaries.

Table 16.15: Examples of using type hints for functions that work with lists, tuples, sets, and dictionaries.
Function Signature Description
def sum_list(lst: List[int]) -> int: Function that takes a list of integers as input and returns the sum of the elements.
def sum_list(lst: List[int | float]) -> int: Function that takes a list of integers or floats as input and returns the sum of the elements.
def get_tuple_length(tpl: tuple[str, int]) -> int: Function that takes a tuple of a string and an integer as input and returns the length of the tuple.
def get_set_size(s: set[str]) -> int: Function that takes a set of strings as input and returns the size of the set.
def get_dict_value(d: dict[str, int], key: str) -> int: Function that takes a dictionary of strings to integers and a key as input and returns the value associated with the key.
def get_dict_keys(d: dict[str, int]) -> List[str]: Function that takes a dictionary of strings to integers as input and returns a list of keys.
def find_reachable_cities(distances: dict[str, dict[str, float]]) -> dict[str, set]: Function that takes a dictionary of city distances and returns a dictionary of reachable cities for each city.

16.6 pprint Module

The pprint module in Python provides a way to pretty-print data structures, making them more readable and easier to understand. It is particularly useful when working with complex data structures like dictionaries and lists.

To use the pprint module, you need to import it into your Python script. The pprint module provides the pprint function, which formats the data structure in a more human-readable way.

In Listing 16.35, we have an example of pretty-printing a dictionary using the pprint module.

Listing 16.35: Pretty-printing a dictionary using the pprint module.
import pprint

data = {
    "name": "John Doe",
    "age": 30,
    "email": "j.doe@gmail.com",
    "phone": "123-456-7890",
    "address": {
        "street": "123 Main St",
        "city": "New York",
        "zip": "10001"
    }
}
print("Normal print:")
print(data)
print("Pretty print:")
pprint.pprint(data)
Normal print:
{'name': 'John Doe', 'age': 30, 'email': 'j.doe@gmail.com', 'phone': '123-456-7890', 'address': {'street': '123 Main St', 'city': 'New York', 'zip': '10001'}}
Pretty print:
{'address': {'city': 'New York', 'street': '123 Main St', 'zip': '10001'},
 'age': 30,
 'email': 'j.doe@gmail.com',
 'name': 'John Doe',
 'phone': '123-456-7890'}
Listing 16.36: Pretty-printing a list of lists using the pprint module.
import pprint

data = [
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
    [13, 14, 15, 16],
    [17, 18, 19, 20],
    [21, 22, 23, 24],
    [25, 26, 27, 28],
    [29, 30, 31, 32],
    [33, 34, 35, 36],
    [37, 38, 39, 40],
]

print("Normal print:")
print(data)
print("Pretty print:")
pprint.pprint(data)
Normal print:
[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24], [25, 26, 27, 28], [29, 30, 31, 32], [33, 34, 35, 36], [37, 38, 39, 40]]
Pretty print:
[[1, 2, 3, 4],
 [5, 6, 7, 8],
 [9, 10, 11, 12],
 [13, 14, 15, 16],
 [17, 18, 19, 20],
 [21, 22, 23, 24],
 [25, 26, 27, 28],
 [29, 30, 31, 32],
 [33, 34, 35, 36],
 [37, 38, 39, 40]]

16.7 Deep Copy vs. Shallow Copy

When working with nested data structures like lists of lists or dictionaries of dictionaries, you may encounter situations where you need to create a copy of the data structure. Python provides two types of copying mechanisms: deep copy and shallow copy. In Table 16.16, we compare deep copy and shallow copy based on several criteria.

Table 16.16: Comparison of deep copy and shallow copy based on several criteria.
Criteria Deep Copy Shallow Copy
Copy Behavior Creates a new object and recursively copies all nested objects. Changes to the original data structure do not affect the copied data structure. Creates a new object but does not create copies of nested objects. Instead, it copies references to the nested objects. Changes to the nested objects in the original data structure are reflected in the copied data structure and vice versa.
Memory Usage Higher memory usage due to creating copies of all nested objects. Lower memory usage as it does not create copies of nested objects.
Performance Slower performance due to the recursive copying of nested objects. Faster performance as it does not copy nested objects.
Use Case Use when you need to create an independent copy of the data structure. Use when you want to create a shallow copy of the data structure and do not need to isolate changes between the original and copied data structures.

16.7.1 Shallow Copy

A shallow copy creates a new object but does not create copies of nested objects. Instead, it copies references to the nested objects. As a result, changes made to the nested objects in the original data structure are reflected in the copied data structure and vice versa.

In Listing 16.37, we have an example of creating a shallow copy of a list of lists. Note that modifying the nested list in the original data structure reflects the changes in the copied data structure. This is a side effect of shallow copying, where changes to the nested objects are not isolated between the original and copied data structures.

Listing 16.37: Modifying a nested list in the original data structure reflects the changes in the copied data structure.
(a) Creating a shallow copy of a list of lists using the copy module.
import copy

# Create a list of lists
data_original = [
    [1, 2, 3], # Nested list 1
    [4, 5, 6], # Nested list 2
    [7, 8, 9], # Nested list 3
]

# Create a shallow copy of the list of lists
data_copy = copy.copy(data_original)

# Modify the nested list in the original
# data structure
data_original[0][0] = 100

# Modify the nested list in the copied
# data structure
data_copy[0][1] = 200

# Delete the second list in the original
# data structure
del data_original[1]

# Delete the third list in the copied
# data structure
del data_copy[2]

# Adding a new list to the original data structure
data_original.append(["O", "O", "O"])

# Adding a new list to the copied data structure
data_copy.append(["C", "C", "C"])

print(f"Original data:\n{data_original}")
print(f"Shallow copy:\n{data_copy}")
Original data:
[[100, 200, 3], [7, 8, 9], ['O', 'O', 'O']]
Shallow copy:
[[100, 200, 3], [4, 5, 6], ['C', 'C', 'C']]
(b) Creating a shallow copy of a list of lists using the list constructor.
# Create a list of lists
data_original = [
    [1, 2, 3], # Nested list 1
    [4, 5, 6], # Nested list 2
    [7, 8, 9], # Nested list 3
]

# Create a shallow copy of the list of lists 
# using the list constructor
data_copy = list(data_original)

# Modify the nested list in the original
# data structure
data_original[0][0] = 100

# Modify the nested list in the copied
# data structure
data_copy[0][1] = 200

# Delete the second list in the original
# data structure
del data_original[1]

# Delete the third list in the copied
# data structure
del data_copy[2]

# Adding a new list to the original data structure
data_original.append(["O", "O", "O"])

# Adding a new list to the copied data structure
data_copy.append(["C", "C", "C"])

print(f"Original data:\n{data_original}")
print(f"Shallow copy:\n{data_copy}")
Original data:
[[100, 200, 3], [7, 8, 9], ['O', 'O', 'O']]
Shallow copy:
[[100, 200, 3], [4, 5, 6], ['C', 'C', 'C']]

The behavior of shallow copying is similar for dictionaries (or other mutable objects) with nested dictionaries. Listing 16.38 shows an example of creating a shallow copy of a dictionary of dictionaries. The dictionary stores data about street networks, where the first key represents the starting node id, the second key represents the ending node id, and the values are attributes of the street. A β€œnode” is a point where two or more streets intersect.

Notice that:

  • Changes in edge 1-2 in the original data structure are reflected in the copied data structure, and vice versa.
  • The new edge 3-5 added to the original data structure is also present in the copied data structure.
  • Removing node 2 from the copied data structure does not affect the original data structure. This is because we are only removing the reference to the nested dictionary, not the nested dictionary itself.
Listing 16.38: Creating a shallow copy of a dictionary of dictionaries using the dict constructor (the same behavior is expected if you use the copy module such as copy.copy(data_original)). The data structure represents a street network with attributes for each street segment. For example, the street segment between nodes 1 and 2 has a street name of β€œ123 Main St,” a maximum speed of 60 km/h, and a length of 100 meters.
import pprint

# Create a dictionary of dictionaries
data_original = {
    1: {2: {"street": "123 Main St", "max_speed_km_h": 60, "length_meters": 100}},
    2: {3: {"street": "456 Elm St", "max_speed_km_h": 50, "length_meters": 200}},
    3: {4: {"street": "789 Oak St", "max_speed_km_h": 40, "length_meters": 300}}
}

# Create a shallow copy of the dictionary of dictionaries
data_copy = dict(data_original)

# Modify the nested dictionary in the original
# data structure
data_original[1][2]["max_speed_km_h"] = 70

# Modify the nested dictionary in the copied
# data structure
data_copy[1][2]["length_meters"] = 1000

# Add a new street segment from node 3 to node 5
# to the original data structure
data_original[3][5] = {"street": "ABC Pine St", "max_speed_km_h": 30, "length_meters": 400}

# Remove the street node 2 data from the copied
del data_copy[2]

print("Original data:")
pprint.pprint(data_original)
print("Shallow copy:")
pprint.pprint(data_copy)
Original data:
{1: {2: {'length_meters': 1000, 'max_speed_km_h': 70, 'street': '123 Main St'}},
 2: {3: {'length_meters': 200, 'max_speed_km_h': 50, 'street': '456 Elm St'}},
 3: {4: {'length_meters': 300, 'max_speed_km_h': 40, 'street': '789 Oak St'},
     5: {'length_meters': 400, 'max_speed_km_h': 30, 'street': 'ABC Pine St'}}}
Shallow copy:
{1: {2: {'length_meters': 1000, 'max_speed_km_h': 70, 'street': '123 Main St'}},
 3: {4: {'length_meters': 300, 'max_speed_km_h': 40, 'street': '789 Oak St'},
     5: {'length_meters': 400, 'max_speed_km_h': 30, 'street': 'ABC Pine St'}}}

16.7.2 Deep Copy

A deep copy creates a new object and recursively creates copies of all nested objects. As a result, changes made to the nested objects in the original data structure are not reflected in the copied data structure and vice versa (i.e., no side effects, since the nested objects are independent).

Recursion

β€œRecursive” copy means that the copying process is applied recursively to all nested objects. For example, when creating a deep copy of a list of lists, the copying process is applied to each nested list, creating a new list for each nested list.

In Listing 16.39, we modify Listing 16.38 to use the copy.deepcopy function to create a deep copy of the dictionary of dictionaries. The deepcopy function recursively creates copies of all nested dictionaries, ensuring that changes to the nested dictionaries in the original data structure do not affect the copied data structure.

Notice that:

  • Changes in edge 1-2 in the original data structure are not reflected in the copied data structure, and vice versa.
  • The new edge 3-5 added to the original data structure is not present in the copied data structure.
  • Removing node 2 from the copied data structure does not affect the original data structure.
Listing 16.39: Creating a deep copy of a dictionary of dictionaries using the copy.deepcopy function. Differently from the shallow copy presented in Listing 16.38, the deep copy ensures that changes to the nested dictionaries in the original data structure do not affect the copied data structure and vice versa.
import copy
import pprint

# Create a dictionary of dictionaries
data_original = {
    1: {2: {"street": "123 Main St", "max_speed_km_h": 60, "length_meters": 100}},
    2: {3: {"street": "456 Elm St", "max_speed_km_h": 50, "length_meters": 200}},
    3: {4: {"street": "789 Oak St", "max_speed_km_h": 40, "length_meters": 300}}
}

# Create a shallow copy of the dictionary of dictionaries
data_copy = copy.deepcopy(data_original)

# Modify the nested dictionary in the original
# data structure
data_original[1][2]["max_speed_km_h"] = 70

# Modify the nested dictionary in the copied
# data structure
data_copy[1][2]["length_meters"] = 1000

# Add a new street segment from node 3 to node 5
# to the original data structure
data_original[3][5] = {"street": "ABC Pine St", "max_speed_km_h": 30, "length_meters": 400}

# Remove the street node 2 data from the copied
del data_copy[2]

print("Original data:")
pprint.pprint(data_original)
print("Shallow copy:")
pprint.pprint(data_copy)
Original data:
{1: {2: {'length_meters': 100, 'max_speed_km_h': 70, 'street': '123 Main St'}},
 2: {3: {'length_meters': 200, 'max_speed_km_h': 50, 'street': '456 Elm St'}},
 3: {4: {'length_meters': 300, 'max_speed_km_h': 40, 'street': '789 Oak St'},
     5: {'length_meters': 400, 'max_speed_km_h': 30, 'street': 'ABC Pine St'}}}
Shallow copy:
{1: {2: {'length_meters': 1000, 'max_speed_km_h': 60, 'street': '123 Main St'}},
 3: {4: {'length_meters': 300, 'max_speed_km_h': 40, 'street': '789 Oak St'}}}

16.8 Comparing Data Structures

When comparing data structures such as lists, tuples, dictionaries, or sets in Python, you can use the == operator to check if the data structures are equal (element-wise comparison). However, to determine if two data structures are identical (i.e., they refer to the same object in memory), you have to use the is operator. Conversely, use not is to check if two data structures are different objects in memory. Listing 16.40 shows an example of comparing nested lists using the == and is operators. Notice that the objects data, data_shallow, and data_deep are different objects in memory, as shown by the is operator. However, the nested lists nested1 and nested2 are the same objects in memory for the shallow copy data_shallow, while they are different objects in memory for the deep copy data_deep.

Listing 16.40: Example of comparing nested lists using the == and is operators. The == operator checks if the data structures are equal (element-wise comparison), while the is operator checks if the data structures are identical (i.e., they refer to the same object in memory).
import copy

# Create data structures
nested1 = [1,2]
nested2 = [3,4]
data = [nested1, nested2]

# Shallow copy of data
data_shallow = copy.copy(data)

# Deep copy of data
data_deep = copy.deepcopy(data)

# Compare data structures
print(f"data == data_shallow: {data == data_shallow}")
print(f"data == data_deep: {data == data_deep}")      

# Compare identity
print(f"data is data_shallow: {data is data_shallow}")
print(f"data is data_deep: {data is data_deep}")

# Compare identity of nested lists
print(f"nested1 is data_shallow[0]: {nested1 is data_shallow[0]}")
print(f"nested2 is data_shallow[1]: {nested2 is data_shallow[1]}")
print(f"nested1 is data_deep[0]: {nested1 is data_deep[0]}")
print(f"nested2 is data_deep[1]: {nested2 is data_deep[1]}")
data == data_shallow: True
data == data_deep: True
data is data_shallow: False
data is data_deep: False
nested1 is data_shallow[0]: True
nested2 is data_shallow[1]: True
nested1 is data_deep[0]: False
nested2 is data_deep[1]: False

16.9 Filter, Map, and Reduce (Advanced Topic)

In functional programming, filter, map, and reduce are higher-order functions that operate on lists or other iterable objects. These functions allow you to perform common data processing tasks, such as filtering elements, applying a function to each element, and aggregating elements.

  • filter(function, iterable): The filter function applies a function to each element in the iterable and returns an iterator containing the elements for which the function returns True. In Listing 16.41, we have an example of using the filter function to filter even numbers from a list.
  • map(function, iterable): The map function applies a function to each element in the iterable and returns an iterator containing the results. In Listing 16.42, we have an example of using the map function to square each element in a list.
  • reduce(function, iterable): The reduce function applies a function to the elements of the iterable cumulatively and returns a single value. In Python 3, the reduce function is available in the functools module. In Listing 16.43, we have an example of using the reduce function to calculate the sum of elements in a list.
Listing 16.41: Filtering even numbers from a list using the filter function.
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = list(filter(lambda x: x % 2 == 0, numbers))
print(even_numbers)  # Output: [2, 4, 6, 8, 10]
[2, 4, 6, 8, 10]
Listing 16.42: Applying the function x ** 2 to each element x in the list numbers using the map function.
numbers = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x ** 2, numbers))
print(squared)  # Output: [1, 4, 9, 16, 25]
[1, 4, 9, 16, 25]
Listing 16.43: Calculating the sum of elements in a list using the reduce function.
from functools import reduce

numbers = [1, 2, 3, 4, 5]
sum = reduce(lambda x, y: x + y, numbers)
print(sum)  # Output: 15
15

Notice the use of the lambda keyword in the examples. A lambda function is an anonymous function that can have any number of arguments but can only have one expression. It is commonly used in functional programming to define small, throwaway functions. These functions could be rewritten using a regular function definition, but the lambda syntax is more concise and is often used for short, simple functions. In Listing 16.44, Listing 16.45, and Listing 16.46, we have examples of rewriting the previous examples using regular function definitions.

Listing 16.44: Filtering even numbers from a list using the filter function with a regular function definition.
def is_even(x):
    return x % 2 == 0
    
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_numbers = list(filter(is_even, numbers))
print(even_numbers)  # Output: [2, 4, 6, 8, 10]
[2, 4, 6, 8, 10]
Listing 16.45: Applying the function x ** 2 to each element x in the list numbers using the map function with a regular function definition.
def square(x):
    return x ** 2

numbers = [1, 2, 3, 4, 5]
squared = list(map(square, numbers))
print(squared)  # Output: [1, 4, 9, 16, 25]
[1, 4, 9, 16, 25]
Listing 16.46: Calculating the sum of elements in a list using the reduce function with a regular function definition.
from functools import reduce

def add(x, y):
    return x + y
    
numbers = [1, 2, 3, 4, 5]
sum = reduce(add, numbers)
print(sum)  # Output: 15
15

16.10 Exercises

16.10.1 Euro Cup Game Assignment

The UEFA European Football Championship, commonly known as the Euro Cup, is a football competition held every four years.

In the Euro Cup, teams are distributed across different groups based on their past performance. For example, in Euro Cup 2024, 24 teams are divided into 6 groups of 4 teams each. In Table 16.17, we have a list of teams that need to be randomly assigned matches against other teams in the same group.

Table 16.17: Teams in Group A of the Euro Cup 2024.
Team Name Opponent
Germany
Switzerland
Hungary
Scotland

The rules to assign matches are as follows:

  1. A team cannot play against itself.
  2. Each team must play against a different team in the same group.

Create a function get_opponents that assigns opponents to each team in a group. The function takes a list of team names as input and returns a list of opponents for each team.

def get_opponents(teams):
    pass # Add code here!

def test_get_opponents():
    a, b = get_opponents(["A", "B"])
    assert a != "A" and b != "B" and a in ["A", "B"] and b in ["A", "B"]
    a, b, c = get_opponents(["A", "B", "C"])
    assert a != "A" and b != "B" and c != "C" and a in ["A", "B", "C"] and b in ["A", "B", "C"] and c in ["A", "B", "C"]

16.10.3 Dot Product

Create a function that takes two lists of the same length as input and returns the dot product of the input lists (a single number).

def dot_product(l1, l2):
    pass # Add code here!

def test_dot_product():
    l1 = [1, 2, 3, 4, 5]
    l2 = [5, 4, 2, 3, 1]
    assert dot_product(l1, l2) == 36

test_dot_product()

16.10.4 Find Maximum Element

Create a function that takes a list of numbers as input and returns the maximum element in the list.

def find_max_element(lst):
    pass # Add code here!

def test_find_max_element():
    lst = [1, 2, 3, 4, 5]
    assert find_max_element(lst) == 5
    lst = [5, 4, 3, 2, 1]
    assert find_max_element(lst) == 5
    lst = [1, 2, 3, 2, 4, 2]
    assert find_max_element(lst) == 4
    lst = []
    assert find_max_element(lst) == None
    lst = [1]
    assert find_max_element(lst) == 1
    lst = [-1, -2, -3, -4, -5]
    assert find_max_element(lst) == -1

test_find_max_element()

16.10.5 Search Element

Create a function that takes a list of numbers and a target value as input and returns True if the target value is found in the list, or False otherwise.

def search_element(lst, target):
    pass # Add code here!

def test_search_element():
    lst = [1, 2, 3, 4, 5]
    assert search_element(lst, 3) == True
    assert search_element(lst, 6) == False

test_search_element()

16.10.6 Count Elements

Create a function that takes a list of numbers and a target value as input and returns the count of occurrences of the target value in the list.

def count_elements(lst, target):
    pass # Add code here!

def test_count_elements():
    l1 = [1, 2, 3, 2, 4, 2]
    l2 = [1, 2, 3, 4, 5]
    assert count_elements(l1, 2) == 3
    assert count_elements(l2, 6) == 0

test_count_elements()

16.10.7 Search Array in Another

Create a function that takes two lists of numbers as input and returns True if the first list can be found as a contiguous subsequence within the second list. Otherwise, it returns False.

def search_array_in_another(sub_array, main_array):
    pass # Add code here!

def test_search_array_in_another():
    main_array = [1, 2, 3, 4, 5]
    sub_array1 = [2, 3]
    sub_array2 = [3, 2]
    assert search_array_in_another(sub_array1, main_array) == True
    assert search_array_in_another(sub_array2, main_array) == False

test_search_array_in_another()

In Listing 16.47, we provide a solution using list slicing to check if one array is a subarray of another array. The function search_array_in_another takes two lists as input and iterates over the possible start indices for the subarray in the main array. It then slices the main array to check if the slice matches the subarray.

Listing 16.47
def search_array_in_another(main_array, sub_array):
    length_sub = len(sub_array)
    length_main = len(main_array)
    # Maximum start index for sub_array in main_array
    max_start_index = length_main - length_sub
    # Iterate over possible start indices for sub_array in main_array
    for start_index in range(max_start_index + 1):
        # Calculate the end index for the slice of main_array
        end_index = start_index + length_sub
        # Get the slice of main_array
        slice_main = main_array[start_index:end_index]
        # Check if the slice of main_array matches sub_array
        if slice_main == sub_array:
            return True
    return False


assert search_array_in_another([1, 2, 3, 4, 5], [2, 3]) == True
assert search_array_in_another([1, 2, 3, 4, 5], [3, 2]) == False
assert search_array_in_another([4, 5], [4, 5]) == True

In Listing 16.48, we provide a solution using nested loops to check if one array is a subarray of another array. Each element of the subarray is checked against the corresponding element in the main array. If all elements match, the function returns True.

Listing 16.48
def search_array_in_another(main_array, sub_array):
    sub_length = len(sub_array)
    main_length = len(main_array)
    
    # Maximum start index for sub_array in main_array
    max_start_index = main_length - sub_length
    
    # Iterate over possible start indices for sub_array in main_array
    for start_index in range(max_start_index + 1):
        # Assume sub_array is found
        is_match = True
        
        # Check each element of sub_array against the corresponding element in main_array
        for i in range(sub_length):
            if main_array[start_index + i] != sub_array[i]:
                is_match = False
                break
        
        # If all elements matched, return True
        if is_match:
            return True
    
    # If no match was found, return False
    return False


assert search_array_in_another([1, 2, 3, 4, 5], [2, 3]) == True
assert search_array_in_another([1, 2, 3, 4, 5], [3, 2]) == False
assert search_array_in_another([4, 5], [4, 5]) == True

16.10.9 Matrix Addition

Create a function that takes two 2D arrays (matrices) as input and returns a new matrix containing the element-wise addition of the input matrices. Ensure that the matrices have the same dimensions.

def matrix_addition(matrix1, matrix2):
    pass # Add code here!

def test_matrix_addition():

    assert matrix_addition([[1, 2], [3, 4]], [[5, 6], [7, 8]]) == [[6, 8], [10, 12]]
    assert matrix_addition([[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]) == [[8, 10, 12], [14, 16, 18]]
    assert matrix_addition([[1]], [[2]]) == [[3]]
    # Incompatible dimensions
    assert matrix_addition([[1, 2], [3, 4]], [[5, 6]]) == None

test_matrix_addition()

16.10.10 Transpose Matrix

Create a function that takes a 2D array (matrix) as input and returns a new matrix that is the transpose of the input matrix. The transpose swaps rows with columns.

def transpose_matrix(matrix):
    pass # Add code here!

def test_transpose_matrix():
    assert transpose_matrix([[1, 2, 3], [4, 5, 6]]) == [[1, 4], [2, 5], [3, 6]]
    assert transpose_matrix([[1, 2], [3, 4], [5, 6]]) == [[1, 3, 5], [2, 4, 6]]
    assert transpose_matrix([[1]]) == [[1]]
    assert transpose_matrix([]) == []

test_transpose_matrix()

16.10.11 Matrix Multiplication

Create a function that takes two 2D arrays (matrices) as input and returns a new matrix that is the result of matrix multiplication. Ensure that the matrices have compatible dimensions for multiplication (e.g., the number of columns in the first matrix must match the number of rows in the second matrix).

def matrix_multiplication(matrix1, matrix2):
    pass # Add code here!

def test_matrix_multiplication():
    assert matrix_multiplication([[1, 2], [3, 4]], [[5, 6], [7, 8]]) == [[19, 22], [43, 50]]
    assert matrix_multiplication([[1, 2, 3], [4, 5, 6]], [[7, 8], [9, 10], [11, 12]]) == [[58, 64], [139, 154]]
    assert matrix_multiplication([[1]], [[2]]) == [[2]]
    # Incompatible dimensions
    assert matrix_multiplication([[1, 2], [3, 4]], [[5, 6]]) == None

test_matrix_multiplication()

16.10.12 Sum of Matrix Rows

Create a function that takes a 2D array (matrix) as input and returns a 1D array containing the sum of each row in the matrix.

def sum_matrix_rows(matrix):
    pass # Add code here!

def test_sum_matrix_rows():
    assert sum_matrix_rows([[1, 2, 3], [4, 5, 6]]) == [6, 15]
    assert sum_matrix_rows([[1, 2], [3, 4], [5, 6]]) == [3, 7, 11]
    assert sum_matrix_rows([[1]]) == [1]
    assert sum_matrix_rows([]) == []

test_sum_matrix_rows()

16.10.13 Empty Array of Size

Create a function that initializes a dynamic one-dimensional array with a specified size and type. The array should be initialized with default values based on the type (e.g., 0 for integers, 0.0 for floats, and False for booleans).

def initialize_list(size, type=int):
    pass # Add code here!

def test_initialize_list():
    assert initialize_list(3) == [0, 0, 0]
    assert initialize_list(5, float) == [0.0, 0.0, 0.0, 0.0, 0.0]
    assert initialize_list(4, bool) == [False, False, False, False]

test_initialize_list()

16.10.14 Inplace List Re-dimensioning

Create a function that takes a dynamic one-dimensional array as input and re-dimensions it to a new specified size. Preserve the existing data while changing the size.

def redimension_list(array, new_size):
    pass # Add code here!

def test_redimension_list():
    
    # Original and modified lists should have the same elements
    originalst = [1, 2, 3, 4, 5]
    modified = redimension_list(original, 3)
    # The === operator checks if the two lists are the same object in memory
    assert originalst === modified
    
test_redimension_list()

16.10.15 List Filtering

Create a function that takes a list of numbers and a threshold value as input and returns an array containing the elements from the input list that are greater than or equal to the threshold.

def filter_list(lst, threshold):
    pass # Add code here!

def test_filter_list():
    lst = [1, 2, 3, 4, 5]
    filtered = filter_list(lst, 3)
    assert filtered == [3, 4, 5]

test_filter_list()

16.10.16 List Reversal

Create a function that takes a list of numbers as input and returns a new list containing the same numbers in reverse order.

def reverse_list(lst):
    pass # Add code here!

def test_reverse_list():
    lst = [1, 2, 3, 4, 5]
    reversed_list = reverse_list(lst)
    assert reversed_list == [5, 4, 3, 2, 1]

test_reverse_list()

16.10.17 List Sorting

Create a function that takes a list of numbers as input and returns a new list containing the same numbers sorted in ascending order.

def sort_list(lst):
    pass # Add code here!

def test_sort_list():
    lst = [3, 1, 2, 5, 4]
    sorted_list = sort_list(lst)
    assert sorted_list == [1, 2, 3, 4, 5]

test_sort_list()

16.10.18 List Sum

Create a function that takes two lists of the same length as input and returns a new list containing the element-wise sum of the input lists.

def list_sum(l1, l2):
    pass # Add code here!

def test_list_sum():
    l1 = [1, 2, 3, 4, 5]
    l2 = [5, 4, 3, 2, 1]
    list_sum_result = list_sum(l1, l2)
    assert list_sum_result == [6, 6, 6, 6, 6]

test_list_sum()

16.10.19 Element-Wise Multiplication

Create a function that takes two lists of the same length as input and returns a new list containing the element-wise multiplication of the input lists.

def elementwise_multiply(l1, l2):
    pass # Add code here!

def test_elementwise_multiply():
    l1 = [1, 2, 3]
    l2 = [4, 5, 6]
    elementwise_multiply_result = elementwise_multiply(l1, l2)
    assert elementwise_multiply_result == [4, 10, 18]

test_elementwise_multiply()

16.10.20 Concatenate Lists

Create a function that takes two lists as input and returns a new list that is the result of concatenating the elements of the second list to the end of the first list.

def concatenate_lists(l1, l2):
    pass # Add code here!

def test_concatenate_lists():
    l1 = [1, 2, 3, 4, 5]
    l2 = [6, 7, 8, 9, 10]
    concatenated = concatenate_lists(l1, l2)
    assert len(concatenated) == 10
    assert concatenated[4] == 5
    assert concatenated[5] == 6
    assert concatenated[6] == 7

test_concatenate_lists()

16.10.21 Find Intersection of Lists

Create a function that takes two lists of numbers as input and returns a new list containing the elements that are common to both input lists (the intersection).

def find_list_intersection(l1, l2):
    pass # Add code here!

def test_find_list_intersection():
    l1 = [1, 2, 3, 4, 5]
    l2 = [3, 4, 5, 6, 7]
    intersection = find_list_intersection(l1, l2)
    assert len(intersection) == 3
    assert intersection == [3, 4, 5]

test_find_list_intersection()

16.10.22 Best Student Finder

In Table 16.18, we have the scores of students in different subjects. The table contains the student ID and the scores for Programming, Probability, and Calculus, respectively.

Table 16.18: Matrix of student scores in different subjects.
ID Programming Probability Calculus
s01 85 75 90
s02 70 80 85
s03 75 70 80

Aiming to analyze the performance of students in each subject to select next year’s TAs, develop a function best_student that finds the best grade in a subject and the student who obtained that grade.

For example, in Table 16.18, the best grade in Programming is 85, and the student who obtained that grade is s01. Therefore, the output in B5 should be 85, and the output in B6 should be s01.

def best_student(scores):
    pass # Add code here!

def test_best_student():
    scores = {
        "s01": [85, 75, 90],
        "s02": [70, 80, 85],
        "s03": [75, 70, 80],
    }
    assert best_student(scores, "Programming") == (85, "s01")
    assert best_student(scores, "Probability") == (80, "s02")
    assert best_student(scores, "Calculus") == (90, "s01")