Practice: Read corrupted files#
This practical is an extension to the parsing exercises done previously in
Read/write files and functions (very basic. You will practice the following:
write code in scripts,
use ipython and execute Python programs with the command python3,
use objects of simple types (numbers, str, list, etc.),
index and slice,
use loops and conditions,
try, except,
read and write in text files.
We will write scripts that read a file (or a set of files) with a predefined format and compute simple quantities (sum, average, number) from the values in the files.
Exercise 15 (Parse a file with comments: file_with_comment_col0.txt)
Contrary to the previous exercises, the files contains some comments (i.e. lines
starting with a #). Adapt previous script so that we do not consider these lines (see
file file_with_comment_col0.txt).
Example of output
python3 reading_file_with_comment.py
file = ../data/file_with_comment_col0.txt
nb = 100; total = 53.29; avg = 0.53
To complicate things further, another file contains comments in the middle of the line
(see e.g. file_with_comment_anywhere.txt that contains some comments that mainly
prevent the string to float conversion.
Adapt script reading_file_with_comment to handle this format.
Example of output
python3 step1.1.py
file = "../data/file_with_comment_col0.txt"
nb = 100 ; sum = 53.29 ; avg = 0.53
file = "../data/file_with_comment_anywhere.txt"
nb = 96 ; sum = 51.65 ; avg = 0.54
# total over all files:
nb = 196 ; sum = 104.93 ; avg = 0.54
Solution to Exercise 15 (Parse a file with comments: file_with_comment_col0.txt)
#!/usr/bin/env python3
"""Computes basic statistics on file that contains a set of lines, each line
containing one float and possibly some comments in the middle of the line.
"""
def compute_stats(file_name):
"""
computes the statistics of data in file_name
:param file_name: the name of the file to process
:type file_name: str
:return: the statistics
:rtype: a tuple (number, sum, average)
"""
sum_ = 0.0
number = 0
with open(file_name) as file:
for line in file:
if line.startswith('#'):
continue
if '#' in line:
line = line.split('#', 1)[0]
elem = float(line)
sum_ += elem
number += 1
return number, sum_, float(sum_ / number)
base_path = '../common/data_read_files'
file_names = [
f"{base_path}/file_with_comment_col0.txt",
f"{base_path}/file_with_comment_anywhere.txt",
]
numbers = []
sums = []
for file_name in file_names:
len_file, sum_file, avg_file = compute_stats(file_name)
numbers.append(len_file)
sums.append(sum_file)
print(
f'file = "{file_name}"\nnb = {len_file:5}; '
f"sum = {sum_file:7.2f}; avg = {sum_file / len_file:5.2f}"
)
all_sum = sum(sums)
all_numbers = sum(numbers)
all_avg = all_sum/all_numbers
print('# total over all files:\n'
f'nb = {all_numbers}; sum = {all_sum:.2f}; avg = {all_avg:.2f}')
file = "../common/data_read_files/file_with_comment_col0.txt"
nb = 100; sum = 53.29; avg = 0.53
file = "../common/data_read_files/file_with_comment_anywhere.txt"
nb = 96; sum = 51.65; avg = 0.54
# total over all files:
nb = 196; sum = 104.93; avg = 0.54
Exercise 16 (Parse more complicated files)
As a last exercise, we now have to deal with several columns on each line,
p1=0.7742 p2=0.74973 p3=0.77751
p1=0.7493 p2=0.34762 p3=0.44521
p1=0.4261 p3=0.88275 p2=0.74016
Write a function that compute statistics separately for p1, p2, p3
BONUS: When parsing
file_mut_cols_with_error, print to the screen the lines which contain errors.
Example of output
python3 step2.0.py
p1 in ../data/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
p2 in ../data/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
p3 in ../data/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
Unexpected field p7=0.213607026802 at line 23 of ../data/file_mut_cols_with_error.txt
p1 in ../data/file_mut_cols_with_error.txt - nb: 25, sum: 12.82, avg: 0.51
p2 in ../data/file_mut_cols_with_error.txt - nb: 23, sum: 11.35, avg: 0.49
p3 in ../data/file_mut_cols_with_error.txt - nb: 23, sum: 11.69, avg: 0.51
Solution to Exercise 16 (Parse more complicated files)
#!/usr/bin/env python3
def compute_p1p2p3_stats(file_name):
"""
Computes the statistics of data in a file stored in 3 fields.
Each field of the form key=val where key is p1, p2 or p3, and val is a float.
:param file_name: the name of the file to process
:type file_name: str
:return: A tuple containing the statistics for each field under the form of a tuple (number, sum, average)
"""
p1_sum = 0.0
p2_sum = 0.0
p3_sum = 0.0
p1_number = 0
p2_number = 0
p3_number = 0
with open(file_name) as handle:
for i, line in enumerate(handle):
if line.startswith("#"):
continue
fields = line.strip().split()
for field in fields:
if field.startswith("p1="):
p1_sum = p1_sum + float(field[3:])
p1_number = p1_number + 1
continue
if field.startswith("p2="):
p2_sum = p2_sum + float(field[3:])
p2_number = p2_number + 1
continue
if field.startswith("p3="):
p3_sum = p3_sum + float(field[3:])
p3_number = p3_number + 1
continue
print(f"Unexpected field {field} at line {i} of {file_name}")
return (
(p1_number, p1_sum, p1_sum / p1_number),
(p2_number, p2_sum, p2_sum / p2_number),
(p3_number, p3_sum, p3_sum / p3_number),
)
base_path = '../common/data_read_files'
file_names = [f"{base_path}/file_mut_cols.txt", f"{base_path}/file_mut_cols_with_error.txt"]
for file_name in file_names:
p1_result, p2_result, p3_result = compute_p1p2p3_stats(file_name)
print(
f"p1 in {file_name} - nb: {p1_result[0]}, sum: {p1_result[1]:.2f}, avg: {p1_result[2]:.2f}"
)
print(
f"p2 in {file_name} - nb: {p2_result[0]}, sum: {p2_result[1]:.2f}, avg: {p2_result[2]:.2f}"
)
print(
f"p3 in {file_name} - nb: {p3_result[0]}, sum: {p3_result[1]:.2f}, avg: {p3_result[2]:.2f}"
)
p1 in ../common/data_read_files/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
p2 in ../common/data_read_files/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
p3 in ../common/data_read_files/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
Unexpected field p7=0.213607026802 at line 23 of ../common/data_read_files/file_mut_cols_with_error.txt
p1 in ../common/data_read_files/file_mut_cols_with_error.txt - nb: 25, sum: 12.82, avg: 0.51
p2 in ../common/data_read_files/file_mut_cols_with_error.txt - nb: 23, sum: 11.35, avg: 0.49
p3 in ../common/data_read_files/file_mut_cols_with_error.txt - nb: 23, sum: 11.69, avg: 0.51