20  Manipulating Files

Files are an essential part of any programming language. Information is often stored in files, and programs need to read from and write to files to interact with data. When you save data in a variable, it is stored in the computer’s memory (RAM), which is temporary. If you want to store data permanently, you need to save it to a file on disk. Therefore, to persist data across different runs of a program, you need to read from and write to files. Python provides built-in functions and modules to work with files, making it easy to handle file operations.

In this chapter, you will learn how to read and write files in Python, manipulate file paths, organize files, and handle errors related to file operations.

20.1 Parts of a File Path

A file path is the location of a file or folder on a computer. It consists of several parts, including the following:

  • Root: The root is the starting point of the file system. On Windows, the root is typically a drive letter followed by a colon (e.g., C:). On macOS and Linux, the root is /.
  • Directories: Directories are folders that contain files or other directories. They are separated by slashes (/ on macOS and Linux, \ on Windows).
  • File Name: The file name is the name of the file, including the extension (e.g., example.txt).
  • Extension: The extension is the part of the file name after the dot (.) that indicates the file type (e.g., .txt, .csv, .py).

20.2 Directory Paths

When working with files, you often need to manipulate directory paths. Here are some common terms related to directory paths:

  • Parent Directory: The parent directory of a file or folder is the directory that contains it. For example:
    • The parent directory of file.txt in C:\Users\UserName\Documents\file.txt is Documents.
    • The parent directory of Documents in C:\Users\UserName\Documents is UserName.
    • The parent directory of UserName in C:\Users\UserName is Users.
  • Current Directory: The current directory is the directory in which the program is running. When you open a file without specifying a path, Python looks for the file in the current directory.
  • Home Directory: The home directory is the default directory for a user. On Windows, it is typically C:\Users\UserName, and on macOS and Linux, it is /Users/UserName.
  • Working Directory: The working directory is the directory from which the program is running. For example:
    • If you run a Python script from C:\Users\UserName, the working directory is C:\Users\UserName.
    • If you run a Python script from /Users/UserName, the working directory is /Users/UserName.
    • If you run a Python script from a different directory, that directory becomes the working directory.

20.3 Absolute vs. Relative Paths

There are two types of file paths: absolute and relative.

  • Absolute Path: An absolute path specifies the full path to a file or folder from the root directory.
  • Relative Path: A relative path specifies the path to a file or folder relative to the current working directory. In file paths, .. (double dots) and . (single dot) are relative path notations used to navigate directories:
    • ..: Refers to the parent directory of the current directory.
    • .: Refers to the current directory.

For example, consider the following directory structure:

C:\
└── Users
    └── UserName
        └── Documents
            └── folder
                └── file.txt

The absolute paths are:

  • C:\Users\UserName\Documents\folder\file.txt (full path to file.txt).
  • C:\Users\UserName\Documents\folder (full path to folder).
  • C:\Users\UserName\Documents (full path to Documents).
  • C:\Users\UserName (full path to UserName).
  • C:\Users (full path to Users).

Now, let’s assume the current working directory is C:\Users\UserName\Documents (your terminal or command prompt is open in this directory or your Python script is running from this directory). Table 20.1 shows examples of navigating directories using the cd (change directory) command with relative paths.

Table 20.1: Examples of navigating directories using cd with relative paths assuming the starting directory is C:\Users\UserName\Documents.
Target Directory Command Explanation
C:\Users\UserName cd .. Move up one level from Documents to the parent directory UserName.
C:\Users cd ..\.. Move up two levels to Users.
C:\Users\UserName\Documents\folder cd folder Move down into the folder subdirectory of Documents.
C:\Users\UserName\Documents cd . Stay in the current directory (Documents).
C:\Users\UserName\Documents\folder cd folder Move down to folder from Documents.
C:\Users\UserName\Documents\folder\file.txt cd folder Navigate to folder and access file.txt within it.
C:\ cd ..\..\.. Move up three levels to the root directory (C:\).

20.4 Checking Path Validity

Before working with files, it’s essential to check if the file or directory path is valid. You can use the os.path module in Python to check the validity of a path. The os.path module provides functions to manipulate file paths and check path validity.

Here are some common functions in the os.path module:

  • os.path.exists(path): Check if the path exists.
  • os.path.isfile(path): Check if the path is a file.
  • os.path.isdir(path): Check if the path is a directory.
  • os.path.getsize(path): Get the size of the file in bytes.
  • os.path.basename(path): Get the base name of the file or directory. For example, os.path.basename('C:/Users/UserName/Documents/file.txt') returns file.txt.
  • os.path.dirname(path): Get the directory name of the file or directory. For example, os.path.dirname('C:/Users/UserName/Documents/file.txt') returns C:/Users/UserName/Documents.
  • os.path.split(path): Split the path into the directory name and the base name. For example, os.path.split('C:/Users/UserName/Documents/file.txt') returns ('C:/Users/UserName/Documents', 'file.txt').
  • os.path.join(path1, path2): Join two paths together. For example, os.path.join('C:/Users/UserName', 'Documents') returns C:/Users/UserName/Documents.
  • os.path.splitext(path): Split the file path into the root and the extension. For example, os.path.splitext('file.txt') returns ('file', '.txt').
 on Windows

On Windows, file paths use backslashes (\) to separate directories (e.g., C:\Users\UserName\Documents\file.txt). You can see this pattern in the terminal, file explorer, and many applications. However, you can also use forward slashes (/) on Windows. Python supports both backslashes and forward slashes in file paths. If you use backslashes in a string literal, you need to escape them with another backslash (e.g., C:\\Users\\UserName\\Documents\\file.txt). Alternatively, you can use raw strings by prefixing the string with r (e.g., r'C:\Users\UserName\Documents\file.txt'). Raw strings treat backslashes as literal characters and do not escape them.

20.5 Creating Folders

You can create new folders in Python using the os.makedirs() function. The os.makedirs() function creates a directory and any necessary parent directories. For example:

  • os.makedirs('C:/Users/UserName/Documents/NewFolder') creates a new folder named NewFolder in the Documents directory.
    • If any parent directories (e.g., Documents, UserName) do not exist, they are created as well.
    • If the folder NewFolder already exists, os.makedirs() will raise a FileExistsError. If you want to ignore this error, you can use the exist_ok=True parameter, for example: os.makedirs('C:/Users/UserName/Documents/NewFolder', exist_ok=True).
    • If the path is relative, the folder is created in the current working directory (the directory from which the Python script is run). You can use os.getcwd() to get the current working directory.
  • os.makedirs('NewFolder') creates a new folder named NewFolder in the current working directory. The current working directory is the directory from which the Python script is run (you can get the current working directory using os.getcwd()).

20.6 Folder Contents

You can list the contents of a folder using the os.listdir() function. The os.listdir() function returns a list of files and directories in the specified folder. For example, consider the following directory structure:

C:\
└── Users
    └── UserName
        β”œβ”€β”€ main.py
        β”œβ”€β”€ data
        β”œβ”€β”€ src
        └── tests

Suppose main.py contains the following code:

import os

# List the contents of the current working directory
contents = os.listdir()

# Print the contents
print(contents)

In the terminal, this will result in different outputs depending on the current working directory:

Table 20.2: The output of os.listdir() depends on the current working directory from which the Python script is run.
Current Working Directory Terminal Command Output
C:\Users\UserName python main.py ['main.py', 'data', 'src', 'tests']
C:\Users\ python UserName\main.py ['UserName']
C:\ python Users\UserName\main.py ['Users']

20.7 Reading and Writing Files

In Python, you can read and write files using the built-in open() function. The open() function returns a file object that allows you to interact with the file. You can specify the mode in which you want to open the file (e.g., read, write, append) and the encoding to use when reading or writing the file.

20.7.1 Reading Files

You can read the contents of a file using the read() method of a file object. The read() method reads the entire file and returns its contents as a string. For example:

# Open a file for reading
file = open('file.txt', 'r')

# Read the contents of the file
contents = file.read()

# Close the file
file.close()

# Print the contents
print(contents)

The open() function takes two arguments: the file path and the mode ('r' for reading). The read() method reads the entire file, and the close() method closes the file after reading.

20.7.2 Writing to Files

You can write to a file using the write() method of a file object. The write() method writes the specified string to the file. For example:

# Open a file for writing
file = open('file.txt', 'w')

# Write to the file
file.write('Hello, World!')

# Close the file
file.close()

The open() function takes two arguments: the file path and the mode ('w' for writing). The write() method writes the specified string to the file, and the close() method closes the file after writing. Any existing content in the file is overwritten when you write to a file in write mode ('w'). If you want to append to the file without overwriting the existing content, you can use append mode ('a'). For example:

# Open a file for appending
file = open('file.txt', 'a')

# Append to the file
file.write('\nThis is a new line.')

# Close the file
file.close()

20.7.3 If a File Does Not Exist

When opening a file for reading or writing, Python raises a FileNotFoundError if the file does not exist. You can handle this error using a try-except block. For example:

try:
    # Open a file for reading
    file = open('file.txt', 'r')

    # Read the contents of the file
    contents = file.read()

    # Close the file
    file.close()

    # Print the contents
    print(contents)

except FileNotFoundError:
    print('File not found.')

20.7.4 Using with Statement

When working with files, it’s a good practice to use the with statement to ensure that the file is properly closed after reading or writing. The with statement automatically closes the file when the block of code is exited. For example:

# Open a file for reading using the with statement
with open('file.txt', 'r') as file:
    contents = file.read()

# The file is automatically closed after the block
print(contents)

The with statement ensures that the file is closed even if an exception occurs while reading or writing the file.

20.7.5 Reading Line by Line

You can read a file line by line using the readline() method of a file object. The readline() method reads a single line from the file and returns it as a string. For example:

# Open a file for reading
with open('file.txt', 'r') as file:
    # Read the first line
    line1 = file.readline()

    # Read the second line
    line2 = file.readline()

# Print the lines
print(line1)
print(line2)

20.7.6 Reading All Lines

You can read all lines from a file into a list using the readlines() method of a file object. The readlines() method reads all lines from the file and returns them as a list of strings. Then, you can iterate over the list to process each line. For example:

# Open a file for reading
with open('file.txt', 'r') as file:
    # Read all lines
    lines = file.readlines()

# Print the lines
for line in lines:
    print(line)

But what happens if the file is too large to fit into memory? For example, if a file is several gigabytes in size, reading it all into memory at once may not be feasible. In such cases, you can read the file line by line using a for loop. For example:

# Open a file for reading
with open('file.txt', 'r') as file:
    # Read the file line by line
    for line in file:
        print(line)

This method reads the file line by line, processing each line as it is read. It is more memory-efficient than reading the entire file into memory at once.

20.8 File Sizes

You can get the size of a file in bytes using the os.path.getsize() function. The os.path.getsize() function takes a file path as an argument and returns the size of the file in bytes. For example, in Listing 20.1, the size of the file vogals.txt is printed after each character is added to the file.

Listing 20.1: Creating a file and getting its size in bytes as more characters are added. Each character is one byte.
def get_file_size(file_path):
    """Return the size of the file in bytes."""
    return os.path.getsize(file_path)

for letter in 'aeiou':
    
    # Append the letter to the file
    with open('data/vogals.txt', 'a') as file:
        file.write(letter)
    
    print(f"## Added letter: {letter}")
    
    # Show the content of the file
    print(f"   - Content of file: {open('vogals.txt', 'r').read()}")
    
    # Print the size of the file    
    print(f"   - Size of file: {get_file_size('vogals.txt')} bytes")

The result of running the code in Listing 20.1 is as follows:

## Added letter: a
   - Content of file: a
   - Size of file: 1 bytes
## Added letter: e
   - Content of file: ae
   - Size of file: 2 bytes
## Added letter: i
   - Content of file: aei
   - Size of file: 3 bytes
## Added letter: o
   - Content of file: aeio
   - Size of file: 4 bytes
## Added letter: u
   - Content of file: aeiou
   - Size of file: 5 bytes

20.9 Copying Files and Folders

You can copy files and folders in Python using the shutil module. The shutil module provides functions to copy, move, and delete files and directories.

Function Description Example
shutil.copy(src, dst) Copy a file from src to dst. shutil.copy('file.txt', 'copy.txt')
shutil.copytree(src, dst) Copy a directory from src to dst. All files and subdirectories are copied recursively. shutil.copytree('folder', 'copy')

Recursively copying a directory copies all files and subdirectories within the directory. For example, consider the following directory structure:

C:\
└── Users
    └── UserName
        └── Documents
            β”œβ”€β”€ file.txt
            └── folder
                └── file2.txt

If you run the following code from the C:\Users\UserName directory:

import shutil

# Copy a directory
shutil.copytree('Documents', 'Documents2')

The resulting directory structure will be:

C:\
└── Users
    └── UserName
        └── Documents
            β”œβ”€β”€ file.txt
            └── folder
                └── file2.txt
        └── Documents2
            β”œβ”€β”€ file.txt
            └── folder
                └── file2.txt

20.10 Renaming and Moving Files and Folders

To rename files and folders in Python, you can use the os.rename() function. The os.rename() function renames a file or directory from the source (src) to the destination (dst). The destination can have a different name or path. For example:

  • os.rename('file.txt', 'new_file.txt') renames the file file.txt to new_file.txt.
  • os.rename('folder', 'new_folder') renames the directory folder to new_folder.
  • os.rename('file.txt', 'new_folder/file.txt') moves the file file.txt to the new_folder directory with the same name. The folder new_folder must exist before moving the file.
  • shutil.move('file.txt', 'new_folder/new_file.txt') moves the file file.txt to the new_folder directory with a new name new_file.txt. The folder new_folder is created if it does not exist.
  • shutil.move('folder', 'new_folder') moves the directory folder to the new_folder directory. Practically, it renames the directory folder to new_folder.
  • shutil.move('folder', 'new_folder/folder') moves the directory folder to the new_folder directory with the same name. The folder new_folder is created if it does not exist.

20.11 Removing Files and Folders

You can permanently delete files and folders in Python using:

  • os.remove(file_path): Delete a file.
  • os.unlink(file_path): Delete a file (same as os.remove()).
  • os.rmdir(directory_path): Delete an empty directory.
  • shutil.rmtree(directory_path): Delete a directory and all its contents.
Test Before Deleting

Before deleting files or folders, make sure to test your code thoroughly to avoid accidentally deleting important data. Once a file or folder is deleted, it cannot be recovered easily. Some tips to avoid accidental deletions include:

  • Use version control systems like Git to track changes and revert to previous versions if needed.

  • Make backups of important files and folders before performing any deletion operations.

  • Double-check the file or folder path before deleting it. You can comment out the deletion code and run the script to see the files or folders that will be deleted. For example:

    for file in os.listdir():
        if os.path.isfile(file) and file.endswith('.txt'):
            print(f"Deleting file: {file}")
            # os.remove(file)

20.12 Using pathlib for File Operations

The pathlib module provides an object-oriented interface for file path manipulation. It simplifies working with file paths by providing classes and methods to handle paths in a platform-independent way.

20.12.1 Creating File Paths

In pathlib, file paths are represented as Path objects. You can create a Path object by passing the file path as a string to the Path constructor. Listing 20.2 shows how to create a file path object using pathlib.

Listing 20.2: Creating a file path object in pathlib. The file path Documents/folder/file.txt is created using the constructor Path and the / operator.
(a) Example of creating a file path object using constructor Path. Each part of the path is passed as a separate argument.
from pathlib import Path

# Create a file path object
file_path = Path('Documents', 'folder', 'file.txt')

# Print the file path
print(file_path)
(b) Example of creating a file path object using operator /. Each part of the path is joined using /.
from pathlib import Path

# Create a file path object
file = Path('Documents') / 'folder' / 'file.txt'

# Print the file path
print(file)

20.12.2 Parts of a File Path

In pathlib, you can access different parts of a file path using attributes and methods. For example, the file path C:\Users\UserName\Documents\file.txt has the following parts:

  • C: Drive letter.
  • C:\: Anchor (root directory).
  • \Users\UserName\Documents: Parents (directories leading to the file).
  • file.txt: Name (file name).
    • file: Stem (file name without extension).
    • .txt: Suffix (file extension).

Let p = Path(r"C:\Users\UserName\Documents\file.txt"). Table 20.4 shows some examples of accessing parts of a file path using pathlib.

Table 20.3: Examples of accessing parts of a file path using pathlib.
Attribute/Method Description Result
p.name Get the file name. 'file.txt'
p.stem Get the file name without the extension. 'file'
p.suffix Get the file extension. '.txt'
p.anchor Get the root directory. 'C:\\'
p.parent Get the parent directory. WindowsPath('C:/Users/UserName/Documents')
p.parents Get the parent directories. WindowsPath('C:/Users/UserName/Documents'), WindowsPath('C:/Users/UserName'), WindowsPath('C:/Users'), WindowsPath('C:/')

20.12.3 File Handling Methods

In pathlib, file handling methods are available to read, write, and manipulate files. Consider the directory structure:

C:\
└── Users
    └── UserName
        β”œβ”€β”€ Documents
        β”‚   └── file.txt
        └── folder
            └── file2.txt

Table 20.4 shows some examples of how to read, write, rename, and delete files using pathlib.

Table 20.4: Examples of file handling methods using pathlib.
Command Description Result
Path('Documents', 'file.txt').exists() Check if the file exists. True
Path('Documents', 'file.txt').is_file() Check if it is a file. True
Path('Documents', 'file.txt').is_dir() Check if it is a directory. False
Path('Documents', 'file.txt').absolute() Get the absolute path. WindowsPath('C:/Users/UserName/Documents/file.txt')
Path('../OtherUser').resolve() Get the absolute path of a relative path. WindowsPath('C:/Users/OtherUser')
Path('/') Create a root path. WindowsPath('/')
Path('/').absolute() Get the absolute path of the root. WindowsPath('C:/')
Path().cwd() Get the current working directory. WindowsPath('C:/Users/UserName')
Path.home() Get the home directory. This directory is platform-dependent. WindowsPath('C:/Users/UserName')

20.13 Walking Directories

You can walk through directories and subdirectories using the os.walk() function. The os.walk() function generates the file names in a directory tree by walking either top-down or bottom-up. For each directory in the tree, it yields a 3-tuple containing the directory path, the subdirectories in the directory, and the files in the directory.

For example, consider the following directory structure:

C:\
└── Users
    └── UserName
        β”œβ”€β”€ Documents
        β”‚   └── file.txt
        └── folder
            └── file2.txt

You can walk through the directories and subdirectories using the os.walk() function. For example:

import os

# Walk through the directory tree
for root, dirs, files in os.walk('C:/Users/UserName'):
    print(f"Directory: {root}")
    print(f"Subdirectories: {dirs}")
    print(f"Files: {files}")
    print()

The os.walk() function generates the following output:

Directory: C:/Users/UserName
Subdirectories: ['Documents', 'folder']
Files: []

Directory: C:/Users/UserName/Documents
Subdirectories: []
Files: ['file.txt']

Directory: C:/Users/UserName/folder
Subdirectories: []
Files: ['file2.txt']

20.14 Working with JSON Files

The json module in Python allows you to work with JSON (JavaScript Object Notation) files. JSON is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate.

You can read JSON data from a file using the json.load() function. The json.load() function reads the JSON data from the file and returns it as a Python dictionary. For example, consider the following JSON data in a file named config.json where you have stored configuration settings of a simulation:

{
    "name": "My Simulation",
    "steps": 1000,
    "interval": 0.1,
    "parameters": {
        "temperature": 25,
        "pressure": 1.0
    }
}

You can read this JSON data from the file using the following Python code:

import json

# Open a JSON file for reading
with open('config.json', 'r') as file:
    # Load the JSON data
    config = json.load(file)

# Print the data
print(config)

The json.load() function reads the JSON data from the file and returns it as a Python dictionary. You can then access the data in the dictionary as needed.

You can also write JSON data to a file using the json.dump() function. The json.dump() function writes the JSON data to the file in a human-readable format. For example, consider the following Python dictionary that you want to write to a JSON file:

data = [
    {"solution_id": 1, "runtime": 10.5, "method": "Heuristic", "score": 70},
    {"solution_id": 2, "runtime": 15.2, "method": "Exact", "score": 100},
    {"solution_id": 3, "runtime": 12.8, "method": "Approximate", "score": 90}
]

You can write this data to a JSON file named results.json using the following Python code:

import json

# Open a JSON file for writing
with open('results.json', 'w') as file:
    # Write the JSON data
    json.dump(data, file, indent=4)

The json.dump() function writes the JSON data to the file in a human-readable format with an indentation of 4 spaces. You can adjust the indentation level as needed.

20.14.1 Working with JSON Data as Strings

The loads() and dumps() functions in the json module are used to work with JSON data as strings. The loads() function parses a JSON string and returns a Python object, while the dumps() function converts a Python object to a JSON string. For example:

import json

# JSON string
json_string = '{"solution_id": 1, "runtime": 10.5, "method": "Heuristic"}'
print(json_string)

# Parse the JSON string
data = json.loads(json_string)
print(data)

# Convert the Python object to a JSON string
json_data = json.dumps(data)
print(json_data)

20.15 CSV, Excel, and Other File Formats

In addition to JSON files, Python supports reading and writing data in various file formats, such as CSV (Comma-Separated Values), Excel, and text files. Search for the appropriate Python libraries to work with these file formats, such as:

  • csv or pandas for reading and writing CSV files. Useful for tabular data.
  • openpyxl or pandas for reading and writing Excel files. Useful for spreadsheets.
  • sqlite3 for working with SQLite databases. Useful for storing structured data and querying it using SQL.
  • yaml for reading and writing YAML (Yet Another Markup Language) files. Useful for configuration files.
  • pickle for serializing and deserializing Python objects (not human-readable, but efficient for storing data). Serializing is the process of converting a Python object into a byte stream, and deserializing is the process of converting the byte stream back into a Python object.
  • pyarrow for reading and writing Apache Parquet files. Parquet is a columnar storage file format that is efficient for analytics and data processing used in big data systems.

20.16 References