Practice: Read corrupted files

Practice: Read corrupted files#

This practical is an extension to the parsing exercises done previously in Read/write files and functions (very basic). You will practice the following:

  • write code in scripts,

  • use ipython and execute Python programs with the command python3,

  • use objects of simple types (numbers, str, list, etc.),

  • index and slice,

  • use loops and conditions,

  • try, except,

  • read and write in text files.

We will write scripts that read a file (or a set of files) with a predefined format and compute simple quantities (sum, average, number) from the values in the files.

Exercise 22 (Parse a file with comments: file_with_comment_col0.txt)

Contrary to the previous exercises, the files contains some comments (i.e. lines starting with a #). Adapt previous script so that we do not consider these lines (see file file_with_comment_col0.txt). You can use the skeleton for this problem: /common/data_read_files/your_solutions/treat_files_comments.py.

Example of output

python3 reading_file_with_comment.py
file = ../file_with_comment_col0.txt
size = 100; total = 53.29; avg = 0.53

To complicate things further, another file contains comments in the middle of the line (see e.g. file_with_comment_anywhere.txt that contains some comments that mainly prevent the string to float conversion).

Adapt the script so that the comments are not taken into account.

Example of output

python3 step1.1.py
file = "../file_with_comment_col0.txt"
size = 100  ; sum = 53.29  ; avg = 0.53
file = "../file_with_comment_anywhere.txt"
size = 96   ; sum = 51.65  ; avg = 0.54
# total over all files:
size = 196  ; sum = 104.93 ; avg = 0.54

Solution to Exercise 22 (Parse a file with comments: file_with_comment_col0.txt)

Show the solution
#!/usr/bin/python3
"""computes basic statistics (size, sum and average) on files
containing lines with one float and possibly some comments in
the middle of the line.
"""


def compute_stats(path):
    """
    computes the statistics of data in a file.

    :param path: the name of the file to process
    :type path: str
    :return: the statistics
    :rtype: a tuple (size, sum, average)
    """
    sum_ = 0.0
    size = 0
    with open(path, encoding="utf-8") as file:
        for line in file:
            if line.startswith("#"):
                continue
            if "#" in line:
                line = line.split("#", 1)[0]
            elem = float(line)
            sum_ += elem
            size += 1

    return size, sum_, float(sum_ / size)


file_paths = [
    "../file_with_comment_col0.txt",
    "../file_with_comment_anywhere.txt",
]

sizes = []
sums = []

for file_path in file_paths:
    len_file, sum_file, avg_file = compute_stats(file_path)
    sizes.append(len_file)
    sums.append(sum_file)

    print(
        f'file = "{file_path}"\nnb = {len_file:5}; '
        f"sum = {sum_file:7.2f}; avg = {sum_file / len_file:5.2f}"
    )


all_sum = sum(sums)
all_size = sum(sizes)
all_avg = all_sum / all_size
print(
    "# total over all files:\n"
    f"size = {all_size}; sum = {all_sum:.2f}; avg = {all_avg:.2f}"
)

Exercise 23 (Parse more complicated files)

As a last exercise, we now have to deal with several columns on each line,

p1=0.7742 p2=0.74973 p3=0.77751
p1=0.7493 p2=0.34762 p3=0.44521
p1=0.4261 p3=0.88275 p2=0.74016
  • Write a function that compute statistics separately for p1, p2, p3

  • BONUS: When parsing file_mut_cols_with_error, print to the screen the lines which contain errors.

Example of output

python3 step2.0.py
p1 in ../file_mut_cols.txt - size: 25, sum: 12.72, avg: 0.51
p2 in ../file_mut_cols.txt - size: 25, sum: 12.72, avg: 0.51
p3 in ../file_mut_cols.txt - size: 25, sum: 12.72, avg: 0.51
Unexpected field p7=0.213607026802 at line 23 of ../file_mut_cols_with_error.txt
p1 in ../file_mut_cols_with_error.txt - size: 25, sum: 12.82, avg: 0.51
p2 in ../file_mut_cols_with_error.txt - size: 23, sum: 11.35, avg: 0.49
p3 in ../file_mut_cols_with_error.txt - size: 23, sum: 11.69, avg: 0.51