Practice: Read corrupted files

Practice: Read corrupted files#

This practical is an extension to the parsing exercises done previously in Read/write files and functions (very basic). You will practice the following:

  • write code in scripts,

  • use ipython and execute Python programs with the command python3,

  • use objects of simple types (numbers, str, list, etc.),

  • index and slice,

  • use loops and conditions,

  • try, except,

  • read and write in text files.

We will write scripts that read a file (or a set of files) with a predefined format and compute simple quantities (sum, average, number) from the values in the files.

Exercise 24 (Parse a file with comments: file_with_comment_col0.txt)

Contrary to the previous exercises, the files contains some comments (i.e. lines starting with a #). Adapt previous script so that we do not consider these lines (see file file_with_comment_col0.txt). You can use the skeleton for this problem: /common/data_read_files/your_solutions/treat_files_comments.py.

Example of output

python3 reading_file_with_comment.py
file = ../file_with_comment_col0.txt
size = 100; total = 53.29; avg = 0.53

To complicate things further, another file contains comments in the middle of the line (see e.g. file_with_comment_anywhere.txt that contains some comments that mainly prevent the string to float conversion).

Adapt the script so that the comments are not taken into account.

Example of output

python3 step1.1.py
file = "../file_with_comment_col0.txt"
size = 100  ; sum = 53.29  ; avg = 0.53
file = "../file_with_comment_anywhere.txt"
size = 96   ; sum = 51.65  ; avg = 0.54
# total over all files:
size = 196  ; sum = 104.93 ; avg = 0.54

Exercise 25 (Parse more complicated files)

As a last exercise, we now have to deal with several columns on each line,

p1=0.7742 p2=0.74973 p3=0.77751
p1=0.7493 p2=0.34762 p3=0.44521
p1=0.4261 p3=0.88275 p2=0.74016
  • Write a function that compute statistics separately for p1, p2, p3

  • BONUS: When parsing file_mut_cols_with_error, print to the screen the lines which contain errors.

Example of output

python3 step2.0.py
p1 in ../file_mut_cols.txt - size: 25, sum: 12.72, avg: 0.51
p2 in ../file_mut_cols.txt - size: 25, sum: 12.72, avg: 0.51
p3 in ../file_mut_cols.txt - size: 25, sum: 12.72, avg: 0.51
Unexpected field p7=0.213607026802 at line 23 of ../file_mut_cols_with_error.txt
p1 in ../file_mut_cols_with_error.txt - size: 25, sum: 12.82, avg: 0.51
p2 in ../file_mut_cols_with_error.txt - size: 23, sum: 11.35, avg: 0.49
p3 in ../file_mut_cols_with_error.txt - size: 23, sum: 11.69, avg: 0.51