20 Manipulating Files
Files are an essential part of any programming language. Information is often stored in files, and programs need to read from and write to files to interact with data. When you save data in a variable, it is stored in the computerβs memory (RAM), which is temporary. If you want to store data permanently, you need to save it to a file on disk. Therefore, to persist data across different runs of a program, you need to read from and write to files. Python provides built-in functions and modules to work with files, making it easy to handle file operations.
In this chapter, you will learn how to read and write files in Python, manipulate file paths, organize files, and handle errors related to file operations.
20.1 Parts of a File Path
A file path is the location of a file or folder on a computer. It consists of several parts, including the following:
- Root: The root is the starting point of the file system. On Windows, the root is typically a drive letter followed by a colon (e.g.,
C:
). On macOS and Linux, the root is/
. - Directories: Directories are folders that contain files or other directories. They are separated by slashes (
/
on macOS and Linux,\
on Windows). - File Name: The file name is the name of the file, including the extension (e.g.,
example.txt
). - Extension: The extension is the part of the file name after the dot (
.
) that indicates the file type (e.g.,.txt
,.csv
,.py
).
20.2 Directory Paths
When working with files, you often need to manipulate directory paths. Here are some common terms related to directory paths:
- Parent Directory: The parent directory of a file or folder is the directory that contains it. For example:
- The parent directory of
file.txt
inC:\Users\UserName\Documents\file.txt
isDocuments
. - The parent directory of
Documents
inC:\Users\UserName\Documents
isUserName
. - The parent directory of
UserName
inC:\Users\UserName
isUsers
.
- The parent directory of
- Current Directory: The current directory is the directory in which the program is running. When you open a file without specifying a path, Python looks for the file in the current directory.
- Home Directory: The home directory is the default directory for a user. On Windows, it is typically
C:\Users\UserName
, and on macOS and Linux, it is/Users/UserName
. - Working Directory: The working directory is the directory from which the program is running. For example:
- If you run a Python script from
C:\Users\UserName
, the working directory isC:\Users\UserName
. - If you run a Python script from
/Users/UserName
, the working directory is/Users/UserName
. - If you run a Python script from a different directory, that directory becomes the working directory.
- If you run a Python script from
20.3 Absolute vs. Relative Paths
There are two types of file paths: absolute and relative.
- Absolute Path: An absolute path specifies the full path to a file or folder from the root directory.
- Relative Path: A relative path specifies the path to a file or folder relative to the current working directory. In file paths,
..
(double dots) and.
(single dot) are relative path notations used to navigate directories:..
: Refers to the parent directory of the current directory.
.
: Refers to the current directory.
For example, consider the following directory structure:
The absolute paths are:
C:\Users\UserName\Documents\folder\file.txt
(full path tofile.txt
).C:\Users\UserName\Documents\folder
(full path tofolder
).C:\Users\UserName\Documents
(full path toDocuments
).C:\Users\UserName
(full path toUserName
).C:\Users
(full path toUsers
).
Now, letβs assume the current working directory is C:\Users\UserName\Documents
(your terminal or command prompt is open in this directory or your Python script is running from this directory). Table 20.1 shows examples of navigating directories using the cd
(change directory) command with relative paths.
cd
with relative paths assuming the starting directory is C:\Users\UserName\Documents
.
Target Directory | Command | Explanation |
---|---|---|
C:\Users\UserName |
cd .. |
Move up one level from Documents to the parent directory UserName . |
C:\Users |
cd ..\.. |
Move up two levels to Users . |
C:\Users\UserName\Documents\folder |
cd folder |
Move down into the folder subdirectory of Documents . |
C:\Users\UserName\Documents |
cd . |
Stay in the current directory (Documents ). |
C:\Users\UserName\Documents\folder |
cd folder |
Move down to folder from Documents . |
C:\Users\UserName\Documents\folder\file.txt |
cd folder |
Navigate to folder and access file.txt within it. |
C:\ |
cd ..\..\.. |
Move up three levels to the root directory (C:\ ). |
20.4 Checking Path Validity
Before working with files, itβs essential to check if the file or directory path is valid. You can use the os.path
module in Python to check the validity of a path. The os.path
module provides functions to manipulate file paths and check path validity.
Here are some common functions in the os.path
module:
os.path.exists(path)
: Check if the path exists.os.path.isfile(path)
: Check if the path is a file.os.path.isdir(path)
: Check if the path is a directory.os.path.getsize(path)
: Get the size of the file in bytes.os.path.basename(path)
: Get the base name of the file or directory. For example,os.path.basename('C:/Users/UserName/Documents/file.txt')
returnsfile.txt
.os.path.dirname(path)
: Get the directory name of the file or directory. For example,os.path.dirname('C:/Users/UserName/Documents/file.txt')
returnsC:/Users/UserName/Documents
.os.path.split(path)
: Split the path into the directory name and the base name. For example,os.path.split('C:/Users/UserName/Documents/file.txt')
returns('C:/Users/UserName/Documents', 'file.txt')
.os.path.join(path1, path2)
: Join two paths together. For example,os.path.join('C:/Users/UserName', 'Documents')
returnsC:/Users/UserName/Documents
.os.path.splitext(path)
: Split the file path into the root and the extension. For example,os.path.splitext('file.txt')
returns('file', '.txt')
.
On Windows, file paths use backslashes (\
) to separate directories (e.g., C:\Users\UserName\Documents\file.txt
). You can see this pattern in the terminal, file explorer, and many applications. However, you can also use forward slashes (/
) on Windows. Python supports both backslashes and forward slashes in file paths. If you use backslashes in a string literal, you need to escape them with another backslash (e.g., C:\\Users\\UserName\\Documents\\file.txt
). Alternatively, you can use raw strings by prefixing the string with r
(e.g., r'C:\Users\UserName\Documents\file.txt'
). Raw strings treat backslashes as literal characters and do not escape them.
20.5 Creating Folders
You can create new folders in Python using the os.makedirs()
function. The os.makedirs()
function creates a directory and any necessary parent directories. For example:
os.makedirs('C:/Users/UserName/Documents/NewFolder')
creates a new folder namedNewFolder
in theDocuments
directory.- If any parent directories (e.g.,
Documents
,UserName
) do not exist, they are created as well. - If the folder
NewFolder
already exists,os.makedirs()
will raise aFileExistsError
. If you want to ignore this error, you can use theexist_ok=True
parameter, for example:os.makedirs('C:/Users/UserName/Documents/NewFolder', exist_ok=True)
. - If the path is relative, the folder is created in the current working directory (the directory from which the Python script is run). You can use
os.getcwd()
to get the current working directory.
- If any parent directories (e.g.,
os.makedirs('NewFolder')
creates a new folder namedNewFolder
in the current working directory. The current working directory is the directory from which the Python script is run (you can get the current working directory usingos.getcwd()
).
20.6 Folder Contents
You can list the contents of a folder using the os.listdir()
function. The os.listdir()
function returns a list of files and directories in the specified folder. For example, consider the following directory structure:
C:\
βββ Users
βββ UserName
βββ main.py
βββ data
βββ src
βββ tests
Suppose main.py
contains the following code:
import os
# List the contents of the current working directory
contents = os.listdir()
# Print the contents
print(contents)
In the terminal, this will result in different outputs depending on the current working directory:
os.listdir()
depends on the current working directory from which the Python script is run.
Current Working Directory | Terminal Command | Output |
---|---|---|
C:\Users\UserName |
python main.py |
['main.py', 'data', 'src', 'tests'] |
C:\Users\ |
python UserName\main.py |
['UserName'] |
C:\ |
python Users\UserName\main.py |
['Users'] |
20.7 Reading and Writing Files
In Python, you can read and write files using the built-in open()
function. The open()
function returns a file object that allows you to interact with the file. You can specify the mode in which you want to open the file (e.g., read, write, append) and the encoding to use when reading or writing the file.
20.7.1 Reading Files
You can read the contents of a file using the read()
method of a file object. The read()
method reads the entire file and returns its contents as a string. For example:
# Open a file for reading
file = open('file.txt', 'r')
# Read the contents of the file
contents = file.read()
# Close the file
file.close()
# Print the contents
print(contents)
The open()
function takes two arguments: the file path and the mode ('r'
for reading). The read()
method reads the entire file, and the close()
method closes the file after reading.
20.7.2 Writing to Files
You can write to a file using the write()
method of a file object. The write()
method writes the specified string to the file. For example:
# Open a file for writing
file = open('file.txt', 'w')
# Write to the file
file.write('Hello, World!')
# Close the file
file.close()
The open()
function takes two arguments: the file path and the mode ('w'
for writing). The write()
method writes the specified string to the file, and the close()
method closes the file after writing. Any existing content in the file is overwritten when you write to a file in write mode ('w'
). If you want to append to the file without overwriting the existing content, you can use append mode ('a'
). For example:
20.7.3 If a File Does Not Exist
When opening a file for reading or writing, Python raises a FileNotFoundError
if the file does not exist. You can handle this error using a try-except
block. For example:
20.7.4 Using with
Statement
When working with files, itβs a good practice to use the with
statement to ensure that the file is properly closed after reading or writing. The with
statement automatically closes the file when the block of code is exited. For example:
# Open a file for reading using the with statement
with open('file.txt', 'r') as file:
contents = file.read()
# The file is automatically closed after the block
print(contents)
The with
statement ensures that the file is closed even if an exception occurs while reading or writing the file.
20.7.5 Reading Line by Line
You can read a file line by line using the readline()
method of a file object. The readline()
method reads a single line from the file and returns it as a string. For example:
20.7.6 Reading All Lines
You can read all lines from a file into a list using the readlines()
method of a file object. The readlines()
method reads all lines from the file and returns them as a list of strings. Then, you can iterate over the list to process each line. For example:
# Open a file for reading
with open('file.txt', 'r') as file:
# Read all lines
lines = file.readlines()
# Print the lines
for line in lines:
print(line)
But what happens if the file is too large to fit into memory? For example, if a file is several gigabytes in size, reading it all into memory at once may not be feasible. In such cases, you can read the file line by line using a for
loop. For example:
# Open a file for reading
with open('file.txt', 'r') as file:
# Read the file line by line
for line in file:
print(line)
This method reads the file line by line, processing each line as it is read. It is more memory-efficient than reading the entire file into memory at once.
20.8 File Sizes
You can get the size of a file in bytes using the os.path.getsize()
function. The os.path.getsize()
function takes a file path as an argument and returns the size of the file in bytes. For example, in Listing 20.1, the size of the file vogals.txt
is printed after each character is added to the file.
def get_file_size(file_path):
"""Return the size of the file in bytes."""
return os.path.getsize(file_path)
for letter in 'aeiou':
# Append the letter to the file
with open('data/vogals.txt', 'a') as file:
file.write(letter)
print(f"## Added letter: {letter}")
# Show the content of the file
print(f" - Content of file: {open('vogals.txt', 'r').read()}")
# Print the size of the file
print(f" - Size of file: {get_file_size('vogals.txt')} bytes")
The result of running the code in Listing 20.1 is as follows:
## Added letter: a
- Content of file: a
- Size of file: 1 bytes
## Added letter: e
- Content of file: ae
- Size of file: 2 bytes
## Added letter: i
- Content of file: aei
- Size of file: 3 bytes
## Added letter: o
- Content of file: aeio
- Size of file: 4 bytes
## Added letter: u
- Content of file: aeiou
- Size of file: 5 bytes
20.9 Copying Files and Folders
You can copy files and folders in Python using the shutil
module. The shutil
module provides functions to copy, move, and delete files and directories.
Function | Description | Example |
---|---|---|
shutil.copy(src, dst) |
Copy a file from src to dst . |
shutil.copy('file.txt', 'copy.txt') |
shutil.copytree(src, dst) |
Copy a directory from src to dst . All files and subdirectories are copied recursively. |
shutil.copytree('folder', 'copy') |
Recursively copying a directory copies all files and subdirectories within the directory. For example, consider the following directory structure:
C:\
βββ Users
βββ UserName
βββ Documents
βββ file.txt
βββ folder
βββ file2.txt
If you run the following code from the C:\Users\UserName
directory:
The resulting directory structure will be:
20.10 Renaming and Moving Files and Folders
To rename files and folders in Python, you can use the os.rename()
function. The os.rename()
function renames a file or directory from the source (src
) to the destination (dst
). The destination can have a different name or path. For example:
os.rename('file.txt', 'new_file.txt')
renames the filefile.txt
tonew_file.txt
.os.rename('folder', 'new_folder')
renames the directoryfolder
tonew_folder
.os.rename('file.txt', 'new_folder/file.txt')
moves the filefile.txt
to thenew_folder
directory with the same name. The foldernew_folder
must exist before moving the file.shutil.move('file.txt', 'new_folder/new_file.txt')
moves the filefile.txt
to thenew_folder
directory with a new namenew_file.txt
. The foldernew_folder
is created if it does not exist.shutil.move('folder', 'new_folder')
moves the directoryfolder
to thenew_folder
directory. Practically, it renames the directoryfolder
tonew_folder
.shutil.move('folder', 'new_folder/folder')
moves the directoryfolder
to thenew_folder
directory with the same name. The foldernew_folder
is created if it does not exist.
20.11 Removing Files and Folders
You can permanently delete files and folders in Python using:
os.remove(file_path)
: Delete a file.os.unlink(file_path)
: Delete a file (same asos.remove()
).os.rmdir(directory_path)
: Delete an empty directory.shutil.rmtree(directory_path)
: Delete a directory and all its contents.
Before deleting files or folders, make sure to test your code thoroughly to avoid accidentally deleting important data. Once a file or folder is deleted, it cannot be recovered easily. Some tips to avoid accidental deletions include:
Use version control systems like Git to track changes and revert to previous versions if needed.
Make backups of important files and folders before performing any deletion operations.
Double-check the file or folder path before deleting it. You can comment out the deletion code and run the script to see the files or folders that will be deleted. For example:
20.12 Using pathlib
for File Operations
The pathlib
module provides an object-oriented interface for file path manipulation. It simplifies working with file paths by providing classes and methods to handle paths in a platform-independent way.
20.12.1 Creating File Paths
In pathlib
, file paths are represented as Path
objects. You can create a Path
object by passing the file path as a string to the Path
constructor. Listing 20.2 shows how to create a file path object using pathlib
.
pathlib
. The file path Documents/folder/file.txt
is created using the constructor Path
and the /
operator.
Path
. Each part of the path is passed as a separate argument.
20.12.2 Parts of a File Path
In pathlib
, you can access different parts of a file path using attributes and methods. For example, the file path C:\Users\UserName\Documents\file.txt
has the following parts:
C
: Drive letter.C:\
: Anchor (root directory).\Users\UserName\Documents
: Parents (directories leading to the file).file.txt
: Name (file name).file
: Stem (file name without extension)..txt
: Suffix (file extension).
Let p = Path(r"C:\Users\UserName\Documents\file.txt")
. Table 20.4 shows some examples of accessing parts of a file path using pathlib
.
pathlib
.
Attribute/Method | Description | Result |
---|---|---|
p.name |
Get the file name. | 'file.txt' |
p.stem |
Get the file name without the extension. | 'file' |
p.suffix |
Get the file extension. | '.txt' |
p.anchor |
Get the root directory. | 'C:\\' |
p.parent |
Get the parent directory. | WindowsPath('C:/Users/UserName/Documents') |
p.parents |
Get the parent directories. | WindowsPath('C:/Users/UserName/Documents') , WindowsPath('C:/Users/UserName') , WindowsPath('C:/Users') , WindowsPath('C:/') |
20.12.3 File Handling Methods
In pathlib
, file handling methods are available to read, write, and manipulate files. Consider the directory structure:
C:\
βββ Users
βββ UserName
βββ Documents
β βββ file.txt
βββ folder
βββ file2.txt
Table 20.4 shows some examples of how to read, write, rename, and delete files using pathlib
.
pathlib
.
Command | Description | Result |
---|---|---|
Path('Documents', 'file.txt').exists() |
Check if the file exists. | True |
Path('Documents', 'file.txt').is_file() |
Check if it is a file. | True |
Path('Documents', 'file.txt').is_dir() |
Check if it is a directory. | False |
Path('Documents', 'file.txt').absolute() |
Get the absolute path. | WindowsPath('C:/Users/UserName/Documents/file.txt') |
Path('../OtherUser').resolve() |
Get the absolute path of a relative path. | WindowsPath('C:/Users/OtherUser') |
Path('/') |
Create a root path. | WindowsPath('/') |
Path('/').absolute() |
Get the absolute path of the root. | WindowsPath('C:/') |
Path().cwd() |
Get the current working directory. | WindowsPath('C:/Users/UserName') |
Path.home() |
Get the home directory. This directory is platform-dependent. | WindowsPath('C:/Users/UserName') |
20.13 Walking Directories
You can walk through directories and subdirectories using the os.walk()
function. The os.walk()
function generates the file names in a directory tree by walking either top-down or bottom-up. For each directory in the tree, it yields a 3-tuple containing the directory path, the subdirectories in the directory, and the files in the directory.
For example, consider the following directory structure:
C:\
βββ Users
βββ UserName
βββ Documents
β βββ file.txt
βββ folder
βββ file2.txt
You can walk through the directories and subdirectories using the os.walk()
function. For example:
import os
# Walk through the directory tree
for root, dirs, files in os.walk('C:/Users/UserName'):
print(f"Directory: {root}")
print(f"Subdirectories: {dirs}")
print(f"Files: {files}")
print()
The os.walk()
function generates the following output:
20.14 Working with JSON Files
The json
module in Python allows you to work with JSON (JavaScript Object Notation) files. JSON is a lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate.
You can read JSON data from a file using the json.load()
function. The json.load()
function reads the JSON data from the file and returns it as a Python dictionary. For example, consider the following JSON data in a file named config.json
where you have stored configuration settings of a simulation:
{
"name": "My Simulation",
"steps": 1000,
"interval": 0.1,
"parameters": {
"temperature": 25,
"pressure": 1.0
}
}
You can read this JSON data from the file using the following Python code:
import json
# Open a JSON file for reading
with open('config.json', 'r') as file:
# Load the JSON data
config = json.load(file)
# Print the data
print(config)
The json.load()
function reads the JSON data from the file and returns it as a Python dictionary. You can then access the data in the dictionary as needed.
You can also write JSON data to a file using the json.dump()
function. The json.dump()
function writes the JSON data to the file in a human-readable format. For example, consider the following Python dictionary that you want to write to a JSON file:
data = [
{"solution_id": 1, "runtime": 10.5, "method": "Heuristic", "score": 70},
{"solution_id": 2, "runtime": 15.2, "method": "Exact", "score": 100},
{"solution_id": 3, "runtime": 12.8, "method": "Approximate", "score": 90}
]
You can write this data to a JSON file named results.json
using the following Python code:
import json
# Open a JSON file for writing
with open('results.json', 'w') as file:
# Write the JSON data
json.dump(data, file, indent=4)
The json.dump()
function writes the JSON data to the file in a human-readable format with an indentation of 4 spaces. You can adjust the indentation level as needed.
20.14.1 Working with JSON Data as Strings
The loads()
and dumps()
functions in the json
module are used to work with JSON data as strings. The loads()
function parses a JSON string and returns a Python object, while the dumps()
function converts a Python object to a JSON string. For example:
20.15 CSV, Excel, and Other File Formats
In addition to JSON files, Python supports reading and writing data in various file formats, such as CSV (Comma-Separated Values), Excel, and text files. Search for the appropriate Python libraries to work with these file formats, such as:
csv
orpandas
for reading and writing CSV files. Useful for tabular data.openpyxl
orpandas
for reading and writing Excel files. Useful for spreadsheets.sqlite3
for working with SQLite databases. Useful for storing structured data and querying it using SQL.yaml
for reading and writing YAML (Yet Another Markup Language) files. Useful for configuration files.pickle
for serializing and deserializing Python objects (not human-readable, but efficient for storing data). Serializing is the process of converting a Python object into a byte stream, and deserializing is the process of converting the byte stream back into a Python object.pyarrow
for reading and writing Apache Parquet files. Parquet is a columnar storage file format that is efficient for analytics and data processing used in big data systems.