21 Manipulating Files
Files are an essential part of any programming language. Information is often stored in files, and programs need to read from and write to files to interact with data. When you save data in a variable, it is stored in the computerβs memory (RAM), which is temporary. If you want to store data permanently, you need to save it to a file on disk. Therefore, to persist data across different runs of a program, you need to read from and write to files. Python provides built-in functions and modules to work with files, making it easy to handle file operations.
In this chapter, you will learn how to read and write files in Python, manipulate file paths, organize files, and handle errors related to file operations.
21.1 Parts of a File Path
A file path is the location of a file or folder on a computer. It consists of several parts, including the following:
- Root: The root is the starting point of the file system. On Windows, the root is typically a drive letter followed by a colon (e.g.,
C:). On macOS and Linux, the root is/. - Directories: Directories are folders that contain files or other directories. They are separated by slashes (
/on macOS and Linux,\on Windows). - File Name: The file name is the name of the file, including the extension (e.g.,
example.txt). - Extension: The extension is the part of the file name after the dot (
.) that indicates the file type (e.g.,.txt,.csv,.py).
21.2 Directory Paths
When working with files, you often need to manipulate directory paths. Here are some common terms related to directory paths:
- Parent Directory: The parent directory of a file or folder is the directory that contains it. For example:
- The parent directory of
file.txtinC:\Users\UserName\Documents\file.txtisDocuments. - The parent directory of
DocumentsinC:\Users\UserName\DocumentsisUserName. - The parent directory of
UserNameinC:\Users\UserNameisUsers.
- The parent directory of
- Current Directory: The current directory is the directory in which the program is running. When you open a file without specifying a path, Python looks for the file in the current directory.
- Home Directory: The home directory is the default directory for a user. On Windows, it is typically
C:\Users\UserName, and on macOS and Linux, it is/Users/UserName. - Working Directory: The working directory is the directory from which the program is running. For example:
- If you run a Python script from
C:\Users\UserName, the working directory isC:\Users\UserName. - If you run a Python script from
/Users/UserName, the working directory is/Users/UserName. - If you run a Python script from a different directory, that directory becomes the working directory.
- If you run a Python script from
21.3 Absolute vs. Relative Paths
There are two types of file paths: absolute and relative.
- Absolute Path: An absolute path specifies the full path to a file or folder from the root directory.
- Relative Path: A relative path specifies the path to a file or folder relative to the current working directory. In file paths,
..(double dots) and.(single dot) are relative path notations used to navigate directories:..: Refers to the parent directory of the current directory.
.: Refers to the current directory.
For example, consider the following directory structure:
The absolute paths are:
C:\Users\UserName\Documents\folder\file.txt(full path tofile.txt).C:\Users\UserName\Documents\folder(full path tofolder).C:\Users\UserName\Documents(full path toDocuments).C:\Users\UserName(full path toUserName).C:\Users(full path toUsers).
Now, letβs assume the current working directory is C:\Users\UserName\Documents (your terminal or command prompt is open in this directory or your Python script is running from this directory). Table 21.1 shows examples of navigating directories using the cd (change directory) command with relative paths.
cd with relative paths assuming the starting directory is C:\Users\UserName\Documents.
| Target Directory | Command | Explanation |
|---|---|---|
C:\Users\UserName |
cd .. |
Move up one level from Documents to the parent directory UserName. |
C:\Users |
cd ..\.. |
Move up two levels to Users. |
C:\Users\UserName\Documents\folder |
cd folder |
Move down into the folder subdirectory of Documents. |
C:\Users\UserName\Documents |
cd . |
Stay in the current directory (Documents). |
C:\Users\UserName\Documents\folder |
cd folder |
Move down to folder from Documents. |
C:\Users\UserName\Documents\folder\file.txt |
cd folder |
Navigate to folder and access file.txt within it. |
C:\ |
cd ..\..\.. |
Move up three levels to the root directory (C:\). |
21.4 Checking Path Validity
Before working with files, itβs essential to check if the file or directory path is valid. You can use the os.path module in Python to check the validity of a path. The os.path module provides functions to manipulate file paths and check path validity.
Here are some common functions in the os.path module:
os.path.exists(path): Check if the path exists.os.path.isfile(path): Check if the path is a file.os.path.isdir(path): Check if the path is a directory.os.path.getsize(path): Get the size of the file in bytes.os.path.basename(path): Get the base name of the file or directory. For example,os.path.basename('C:/Users/UserName/Documents/file.txt')returnsfile.txt.os.path.dirname(path): Get the directory name of the file or directory. For example,os.path.dirname('C:/Users/UserName/Documents/file.txt')returnsC:/Users/UserName/Documents.os.path.split(path): Split the path into the directory name and the base name. For example,os.path.split('C:/Users/UserName/Documents/file.txt')returns('C:/Users/UserName/Documents', 'file.txt').os.path.join(path1, path2): Join two paths together. For example,os.path.join('C:/Users/UserName', 'Documents')returnsC:/Users/UserName/Documents.os.path.splitext(path): Split the file path into the root and the extension. For example,os.path.splitext('file.txt')returns('file', '.txt').
On Windows, file paths use backslashes (\) to separate directories (e.g., C:\Users\UserName\Documents\file.txt). You can see this pattern in the terminal, file explorer, and many applications. However, you can also use forward slashes (/) on Windows. Python supports both backslashes and forward slashes in file paths. If you use backslashes in a string literal, you need to escape them with another backslash (e.g., C:\\Users\\UserName\\Documents\\file.txt). Alternatively, you can use raw strings by prefixing the string with r (e.g., r'C:\Users\UserName\Documents\file.txt'). Raw strings treat backslashes as literal characters and do not escape them.
21.5 Creating Folders
You can create new folders in Python using the os.makedirs() function. The os.makedirs() function creates a directory and any necessary parent directories. For example:
os.makedirs('C:/Users/UserName/Documents/NewFolder')creates a new folder namedNewFolderin theDocumentsdirectory.- If any parent directories (e.g.,
Documents,UserName) do not exist, they are created as well. - If the folder
NewFolderalready exists,os.makedirs()will raise aFileExistsError. If you want to ignore this error, you can use theexist_ok=Trueparameter, for example:os.makedirs('C:/Users/UserName/Documents/NewFolder', exist_ok=True). - If the path is relative, the folder is created in the current working directory (the directory from which the Python script is run). You can use
os.getcwd()to get the current working directory.
- If any parent directories (e.g.,
os.makedirs('NewFolder')creates a new folder namedNewFolderin the current working directory. The current working directory is the directory from which the Python script is run (you can get the current working directory usingos.getcwd()).
21.6 Folder Contents
You can list the contents of a folder using the os.listdir() function. The os.listdir() function returns a list of files and directories in the specified folder. For example, consider the following directory structure:
Suppose main.py contains the following code:
In the terminal, this will result in different outputs depending on the current working directory:
os.listdir() depends on the current working directory from which the Python script is run.
| Current Working Directory | Terminal Command | Output |
|---|---|---|
C:\Users\UserName |
python main.py |
['main.py', 'data', 'src', 'tests'] |
C:\Users\ |
python UserName\main.py |
['UserName'] |
C:\ |
python Users\UserName\main.py |
['Users'] |
21.7 Reading and Writing Files
In Python, you can read and write files using the built-in open() function. The open() function returns a file object that allows you to interact with the file. You can specify the mode in which you want to open the file (e.g., read, write, append) and the encoding to use when reading or writing the file.
21.7.1 Reading Files
You can read the contents of a file using the read() method of a file object. The read() method reads the entire file and returns its contents as a string. For example:
The open() function takes two arguments: the file path and the mode ('r' for reading). The read() method reads the entire file, and the close() method closes the file after reading.
21.7.2 Writing to Files
You can write to a file using the write() method of a file object. The write() method writes the specified string to the file. For example:
The open() function takes two arguments: the file path and the mode ('w' for writing). The write() method writes the specified string to the file, and the close() method closes the file after writing. Any existing content in the file is overwritten when you write to a file in write mode ('w'). If you want to append to the file without overwriting the existing content, you can use append mode ('a'). For example:
21.7.3 If a File Does Not Exist
When opening a file for reading or writing, Python raises a FileNotFoundError if the file does not exist. You can handle this error using a try-except block. For example:
21.7.4 Using with Statement
When working with files, itβs a good practice to use the with statement to ensure that the file is properly closed after reading or writing. The with statement automatically closes the file when the block of code is exited. For example:
The with statement ensures that the file is closed even if an exception occurs while reading or writing the file.
21.7.5 Reading Line by Line
You can read a file line by line using the readline() method of a file object. The readline() method reads a single line from the file and returns it as a string. For example:
21.7.6 Reading All Lines
You can read all lines from a file into a list using the readlines() method of a file object. The readlines() method reads all lines from the file and returns them as a list of strings. Then, you can iterate over the list to process each line. For example:
But what happens if the file is too large to fit into memory? For example, if a file is several gigabytes in size, reading it all into memory at once may not be feasible. In such cases, you can read the file line by line using a for loop. For example:
This method reads the file line by line, processing each line as it is read. It is more memory-efficient than reading the entire file into memory at once.
21.8 File Sizes
You can get the size of a file in bytes using the os.path.getsize() function. The os.path.getsize() function takes a file path as an argument and returns the size of the file in bytes. For example, in Listing 21.1, the size of the file vogals.txt is printed after each character is added to the file.
def get_file_size(file_path):
"""Return the size of the file in bytes."""
return os.path.getsize(file_path)
for letter in 'aeiou':
# Append the letter to the file
with open('data/vogals.txt', 'a') as file:
file.write(letter)
print(f"## Added letter: {letter}")
# Show the content of the file
print(f" - Content of file: {open('vogals.txt', 'r').read()}")
# Print the size of the file
print(f" - Size of file: {get_file_size('vogals.txt')} bytes")The result of running the code in Listing 21.1 is as follows:
## Added letter: a
- Content of file: a
- Size of file: 1 bytes
## Added letter: e
- Content of file: ae
- Size of file: 2 bytes
## Added letter: i
- Content of file: aei
- Size of file: 3 bytes
## Added letter: o
- Content of file: aeio
- Size of file: 4 bytes
## Added letter: u
- Content of file: aeiou
- Size of file: 5 bytes21.9 Copying Files and Folders
You can copy files and folders in Python using the shutil module. The shutil module provides functions to copy, move, and delete files and directories.
| Function | Description | Example |
|---|---|---|
shutil.copy(src, dst) |
Copy a file from src to dst. |
shutil.copy('file.txt', 'copy.txt') |
shutil.copytree(src, dst) |
Copy a directory from src to dst. All files and subdirectories are copied recursively. |
shutil.copytree('folder', 'copy') |
Recursively copying a directory copies all files and subdirectories within the directory. For example, consider the following directory structure:
If you run the following code from the C:\Users\UserName directory:
The resulting directory structure will be:
21.10 Renaming and Moving Files and Folders
To rename files and folders in Python, you can use the os.rename() function. The os.rename() function renames a file or directory from the source (src) to the destination (dst). The destination can have a different name or path. For example:
os.rename('file.txt', 'new_file.txt')renames the filefile.txttonew_file.txt.os.rename('folder', 'new_folder')renames the directoryfoldertonew_folder.os.rename('file.txt', 'new_folder/file.txt')moves the filefile.txtto thenew_folderdirectory with the same name. The foldernew_foldermust exist before moving the file.shutil.move('file.txt', 'new_folder/new_file.txt')moves the filefile.txtto thenew_folderdirectory with a new namenew_file.txt. The foldernew_folderis created if it does not exist.shutil.move('folder', 'new_folder')moves the directoryfolderto thenew_folderdirectory. Practically, it renames the directoryfoldertonew_folder.shutil.move('folder', 'new_folder/folder')moves the directoryfolderto thenew_folderdirectory with the same name. The foldernew_folderis created if it does not exist.
21.11 Removing Files and Folders
You can permanently delete files and folders in Python using:
os.remove(file_path): Delete a file.os.unlink(file_path): Delete a file (same asos.remove()).os.rmdir(directory_path): Delete an empty directory.shutil.rmtree(directory_path): Delete a directory and all its contents.
Before deleting files or folders, make sure to test your code thoroughly to avoid accidentally deleting important data. Once a file or folder is deleted, it cannot be recovered easily. Some tips to avoid accidental deletions include:
Use version control systems like Git to track changes and revert to previous versions if needed.
Make backups of important files and folders before performing any deletion operations.
Double-check the file or folder path before deleting it. You can comment out the deletion code and run the script to see the files or folders that will be deleted. For example:
21.12 Using pathlib for File Operations
The pathlib module provides an object-oriented interface for file path manipulation. It simplifies working with file paths by providing classes and methods to handle paths in a platform-independent way.
21.12.1 Creating File Paths
In pathlib, file paths are represented as Path objects. You can create a Path object by passing the file path as a string to the Path constructor. Listing 21.2 shows how to create a file path object using pathlib.
pathlib. The file path Documents/folder/file.txt is created using the constructor Path and the / operator.
Path. Each part of the path is passed as a separate argument.
21.12.2 Parts of a File Path
In pathlib, you can access different parts of a file path using attributes and methods. For example, the file path C:\Users\UserName\Documents\file.txt has the following parts:
C: Drive letter.C:\: Anchor (root directory).\Users\UserName\Documents: Parents (directories leading to the file).file.txt: Name (file name).file: Stem (file name without extension)..txt: Suffix (file extension).
Let p = Path(r"C:\Users\UserName\Documents\file.txt"). Table 21.4 shows some examples of accessing parts of a file path using pathlib.
pathlib.
| Attribute/Method | Description | Result |
|---|---|---|
p.name |
Get the file name. | 'file.txt' |
p.stem |
Get the file name without the extension. | 'file' |
p.suffix |
Get the file extension. | '.txt' |
p.anchor |
Get the root directory. | 'C:\\' |
p.parent |
Get the parent directory. | WindowsPath('C:/Users/UserName/Documents') |
p.parents |
Get the parent directories. | WindowsPath('C:/Users/UserName/Documents'), WindowsPath('C:/Users/UserName'), WindowsPath('C:/Users'), WindowsPath('C:/') |
21.12.3 File Handling Methods
In pathlib, file handling methods are available to read, write, and manipulate files. Consider the directory structure:
Table 21.4 shows some examples of how to read, write, rename, and delete files using pathlib.
pathlib.
| Command | Description | Result |
|---|---|---|
Path('Documents', 'file.txt').exists() |
Check if the file exists. | True |
Path('Documents', 'file.txt').is_file() |
Check if it is a file. | True |
Path('Documents', 'file.txt').is_dir() |
Check if it is a directory. | False |
Path('Documents', 'file.txt').absolute() |
Get the absolute path. | WindowsPath('C:/Users/UserName/Documents/file.txt') |
Path('../OtherUser').resolve() |
Get the absolute path of a relative path. | WindowsPath('C:/Users/OtherUser') |
Path('/') |
Create a root path. | WindowsPath('/') |
Path('/').absolute() |
Get the absolute path of the root. | WindowsPath('C:/') |
Path().cwd() |
Get the current working directory. | WindowsPath('C:/Users/UserName') |
Path.home() |
Get the home directory. This directory is platform-dependent. | WindowsPath('C:/Users/UserName') |
21.13 Walking Directories
You can walk through directories and subdirectories using the os.walk() function. The os.walk() function generates the file names in a directory tree by walking either top-down or bottom-up. For each directory in the tree, it yields a 3-tuple containing the directory path, the subdirectories in the directory, and the files in the directory.
For example, consider the following directory structure:
You can walk through the directories and subdirectories using the os.walk() function. For example:
The os.walk() function generates the following output:
21.14 Working with JSON Files
The json module in Python allows you to work with JSON (JavaScript Object Notation) files. JSON is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate.
You can read JSON data from a file using the json.load() function. The json.load() function reads the JSON data from the file and returns it as a Python dictionary. For example, consider the following JSON data in a file named config.json where you have stored configuration settings of a simulation:
You can read this JSON data from the file using the following Python code:
The json.load() function reads the JSON data from the file and returns it as a Python dictionary. You can then access the data in the dictionary as needed.
You can also write JSON data to a file using the json.dump() function. The json.dump() function writes the JSON data to the file in a human-readable format. For example, consider the following Python dictionary that you want to write to a JSON file:
You can write this data to a JSON file named results.json using the following Python code:
The json.dump() function writes the JSON data to the file in a human-readable format with an indentation of 4 spaces. You can adjust the indentation level as needed.
21.14.1 Working with JSON Data as Strings
The loads() and dumps() functions in the json module are used to work with JSON data as strings. The loads() function parses a JSON string and returns a Python object, while the dumps() function converts a Python object to a JSON string. For example:
21.15 CSV, Excel, and Other File Formats
In addition to JSON files, Python supports reading and writing data in various file formats, such as CSV (Comma-Separated Values), Excel, and text files. Search for the appropriate Python libraries to work with these file formats, such as:
csvorpandasfor reading and writing CSV files. Useful for tabular data.openpyxlorpandasfor reading and writing Excel files. Useful for spreadsheets.sqlite3for working with SQLite databases. Useful for storing structured data and querying it using SQL.yamlfor reading and writing YAML (Yet Another Markup Language) files. Useful for configuration files.picklefor serializing and deserializing Python objects (not human-readable, but efficient for storing data). Serializing is the process of converting a Python object into a byte stream, and deserializing is the process of converting the byte stream back into a Python object.pyarrowfor reading and writing Apache Parquet files. Parquet is a columnar storage file format that is efficient for analytics and data processing used in big data systems.