Practice: Read corrupted files

Practice: Read corrupted files#

This practical is an extension to the parsing exercises done previously in Read/write files and functions (very basic. You will practice the following:

  • write code in scripts,

  • use ipython and execute Python programs with the command python3,

  • use objects of simple types (numbers, str, list, etc.),

  • index and slice,

  • use loops and conditions,

  • try, except,

  • read and write in text files.

We will write scripts that read a file (or a set of files) with a predefined format and compute simple quantities (sum, average, number) from the values in the files.

Exercise 15 (Parse a file with comments: file_with_comment_col0.txt)

Contrary to the previous exercises, the files contains some comments (i.e. lines starting with a #). Adapt previous script so that we do not consider these lines (see file file_with_comment_col0.txt).

Example of output

python3 reading_file_with_comment.py
file = ../data/file_with_comment_col0.txt
nb = 100; total = 53.29; avg = 0.53

To complicate things further, another file contains comments in the middle of the line (see e.g. file_with_comment_anywhere.txt that contains some comments that mainly prevent the string to float conversion.

Adapt script reading_file_with_comment to handle this format.

Example of output

python3 step1.1.py
file = "../data/file_with_comment_col0.txt"
nb = 100  ; sum = 53.29  ; avg = 0.53
file = "../data/file_with_comment_anywhere.txt"
nb = 96   ; sum = 51.65  ; avg = 0.54
# total over all files:
nb = 196  ; sum = 104.93 ; avg = 0.54

Exercise 16 (Parse more complicated files)

As a last exercise, we now have to deal with several columns on each line,

p1=0.7742 p2=0.74973 p3=0.77751
p1=0.7493 p2=0.34762 p3=0.44521
p1=0.4261 p3=0.88275 p2=0.74016
  • Write a function that compute statistics separately for p1, p2, p3

  • BONUS: When parsing file_mut_cols_with_error, print to the screen the lines which contain errors.

Example of output

python3 step2.0.py
p1 in ../data/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
p2 in ../data/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
p3 in ../data/file_mut_cols.txt - nb: 25, sum: 12.72, avg: 0.51
Unexpected field p7=0.213607026802 at line 23 of ../data/file_mut_cols_with_error.txt
p1 in ../data/file_mut_cols_with_error.txt - nb: 25, sum: 12.82, avg: 0.51
p2 in ../data/file_mut_cols_with_error.txt - nb: 23, sum: 11.35, avg: 0.49
p3 in ../data/file_mut_cols_with_error.txt - nb: 23, sum: 11.69, avg: 0.51