如何用Python自动化文件处理：实用脚本与技巧

天天向上

发布： 2025-01-12 10:10:18

原创

633 人浏览过

Python 是一门非常适合文件处理的编程语言，它拥有强大的标准库和第三方库，可以帮助我们高效地进行文件读写、转换、搜索、批量处理等任务。本文将介绍如何使用 Python 自动化文件处理，提供一些常见的文件操作脚本和技巧。

一、Python进行文件处理的基本操作

在 Python 中，文件处理主要依赖内置的 os、shutil、pathlib、glob 等模块以及 open() 函数。以下是一些基本的文件操作示例。

1. 打开和关闭文件

通过 open() 函数可以打开文件，默认是以文本模式打开，也可以指定其他模式如二进制模式、追加模式等。

# 以读模式打开文件
file = open('example.txt', 'r')

# 读取文件内容
content = file.read()
print(content)

# 关闭文件
file.close()

推荐使用 with 语句来管理文件资源，它会自动处理文件的打开和关闭，避免忘记关闭文件导致资源泄漏：

with open('example.txt', 'r') as file:
    content = file.read()
    print(content)

2. 写入文件

open() 函数也可以用来写入文件，通过指定写入模式（'w'：写入模式，'a'：追加模式）来打开文件。

# 写入文件（会覆盖原内容）
with open('output.txt', 'w') as file:
    file.write('Hello, World!\n')

# 追加写入文件（不会覆盖原内容）
with open('output.txt', 'a') as file:
    file.write('Appending some text.\n')

3. 读取文件的每一行

with open('example.txt', 'r') as file:
    for line in file:
        print(line.strip())  # 去掉行末的换行符

4. 读取文件并转为列表

with open('example.txt', 'r') as file:
    lines = file.readlines()  # 返回文件中每一行的列表
print(lines)

二、使用`os`和`pathlib`模块进行文件和目录管理

1. 创建文件和目录

使用 os 模块可以创建文件和目录：

import os

# 创建目录
os.makedirs('new_folder', exist_ok=True)

# 创建空文件
with open('new_folder/example.txt', 'w') as file:
    file.write('This is a new file.')

pathlib 也是一个非常现代和高效的文件路径操作库，下面是使用 pathlib 创建目录的例子：

from pathlib import Path

# 创建目录
Path('new_folder').mkdir(parents=True, exist_ok=True)

# 创建空文件
file_path = Path('new_folder/example.txt')
file_path.write_text('This is a new file created using pathlib.')

2. 删除文件和目录

删除文件使用 os.remove()，删除目录使用 os.rmdir() 或 shutil.rmtree()（递归删除目录及其内容）。

import os

# 删除文件
os.remove('new_folder/example.txt')

# 删除空目录
os.rmdir('new_folder')

# 删除非空目录
import shutil
shutil.rmtree('new_folder')

3. 检查文件或目录是否存在

import os

# 检查文件是否存在
if os.path.exists('example.txt'):
    print("File exists.")
else:
    print("File does not exist.")

# 检查目录是否存在
if os.path.isdir('new_folder'):
    print("Directory exists.")

使用 pathlib 也可以进行类似操作：

from pathlib import Path

file_path = Path('example.txt')
if file_path.exists():
    print("File exists.")
else:
    print("File does not exist.")

三、批量处理文件：遍历文件夹和文件操作

1. 列出目录中的所有文件

import os

for filename in os.listdir('path_to_directory'):
    if filename.endswith('.txt'):
        print(filename)

使用 pathlib 也可以实现类似功能：

from pathlib import Path

for file in Path('path_to_directory').glob('*.txt'):
    print(file.name)

2. 遍历所有子目录和文件

使用 os.walk() 来递归遍历所有子目录和文件：

import os

for dirpath, dirnames, filenames in os.walk('path_to_directory'):
    print(f"Directory: {dirpath}")
    for filename in filenames:
        print(f"    {filename}")

使用 pathlib 中的 rglob() 方法也能遍历所有子目录：

from pathlib import Path

for file in Path('path_to_directory').rglob('*.txt'):
    print(file.name)

3. 批量重命名文件

import os

for filename in os.listdir('path_to_directory'):
    if filename.endswith('.txt'):
        old_path = os.path.join('path_to_directory', filename)
        new_path = os.path.join('path_to_directory', f'new_{filename}')
        os.rename(old_path, new_path)

4. 批量复制文件

import shutil
import os

src_dir = 'path_to_directory'
dst_dir = 'path_to_backup_directory'

# 确保目标目录存在
os.makedirs(dst_dir, exist_ok=True)

for filename in os.listdir(src_dir):
    if filename.endswith('.txt'):
        src = os.path.join(src_dir, filename)
        dst = os.path.join(dst_dir, filename)
        shutil.copy(src, dst)

5. 批量删除文件

import os

for filename in os.listdir('path_to_directory'):
    if filename.endswith('.log'):
        os.remove(os.path.join('path_to_directory', filename))

四、文件内容操作：查找和替换

1. 查找文件中的某个内容

with open('example.txt', 'r') as file:
    lines = file.readlines()

for line in lines:
    if 'search_term' in line:
        print(line)

2. 替换文件中的内容

with open('example.txt', 'r') as file:
    content = file.read()

content = content.replace('old_text', 'new_text')

with open('example.txt', 'w') as file:
    file.write(content)

五、处理大型文件：逐行读取和内存管理

当处理大型文件时，我们希望逐行读取文件，以节省内存。可以使用 with open() 和逐行读取的方法。

1. 逐行读取文件

with open('large_file.txt', 'r') as file:
    for line in file:
        process(line)  # 处理每一行

2. 处理文件的分块读取

对于非常大的文件，可以按块（如每次读取一定数量的字符）来进行处理：

def process_file_in_chunks(file_name, chunk_size=1024):
    with open(file_name, 'r') as file:
        while chunk := file.read(chunk_size):
            process(chunk)  # 每次处理 chunk_size 字节

process_file_in_chunks('large_file.txt', 2048)

六、总结

Python 提供了非常丰富的功能来自动化文件处理。通过 os、shutil、pathlib 等模块，我们可以轻松实现文件和目录的操作，如文件的读取、写入、重命名、删除等。对于批量处理和大型文件处理，Python 也提供了高效的解决方案。通过这些技巧和脚本，能够帮助你大幅提高文件处理效率，减少手动操作的错误和时间消耗。

无论是对单个文件进行操作，还是对整个目录树进行批量处理，Python 都能够帮助你轻松实现文件处理的自动化。