Read / write files#
There are a lot of specialized tools to open specific file types (images, xml, csv, hdf5, netcdf, etc.). Here we focus on the low-level general method to open files.
open built-in function and file object#
file = open("../common/examples/helloworld.py")
txt = file.read()
file.close()
print(txt)
print("Hello world")
name = "Pierre"
print("My name is " + name)
But what if something weird happens while the file is open (e.g. a division by 0)?
-> Exception is raised that could be caught and run some code that is not aware of the file being open.
-> The file remains open.
Context: with keyword#
For such objects that need to be closed, it is a good practice to use the keyword with.
Like this, we are sure that the file will be closed even if there is an error:
with open("../common/examples/helloworld.py") as file:
txt = file.read()
print(txt)
print("Hello world")
name = "Pierre"
print("My name is " + name)
Important
This is much better than using the close function: use with!
Loop over lines#
with open("../common/examples/helloworld.py") as file:
for line in file:
print(f"line ?: " + line.strip())
line ?: print("Hello world")
line ?:
line ?: name = "Pierre"
line ?: print("My name is " + name)
And now using enumerate to get the index of the line:
with open("../common/examples/helloworld.py") as file:
for i, line in enumerate(file):
print(f"line {i:2d}: {line.strip()}")
line 0: print("Hello world")
line 1:
line 2: name = "Pierre"
line 3: print("My name is " + name)
Options of the built-in function open (read, write, append)#
# write data in a file
with open("/tmp/zoo.txt", "w") as file_zoo:
file_zoo.write("sam;cat;2\n")
file_zoo.write("liloo;lion;2\n")
with open("/tmp/zoo.txt", "a") as file_zoo:
file_zoo.write("peter;panda;5\n")
with open("/tmp/zoo.txt") as file_zoo:
print(file_zoo.read())
sam;cat;2
liloo;lion;2
peter;panda;5
with open("/tmp/zoo.txt", "r") as file_zoo:
print(file_zoo.readline())
print(file_zoo.read())
sam;cat;2
liloo;lion;2
peter;panda;5
Difference between write and print
write writes the raw string in the file: as long as no carriage return (\n) is added, write will write on the same line.
print prints the string to standard output and adds a carriage return at the end.
This is why sam;cat;2 is followed by two carriage returns: one from the line (raw: "sam;cat;2\n") and one added by print.
Options of the built-in function open (binary file)#
Until now, we have only written text files. It is usually more efficient memory wise to use binary format.
with open("/tmp/test", "wb") as file:
file.write(b"a")
Remarks:
In practice, saving data in binary file is most of the time a bad idea. There are much better solutions to do this (see for example h5py and h5netcdf).
There are Python libraries to read and process many types for files (csv, xml, json, images, tabulars, etc.).
Material#
There is a file file0.1.txt in the folder ../common/data_read_files. The folder
contains several files, this exercise uses only file0.1.txt, but keep your solution in
a text file, it will be improved for other files in other exercises.
Exercise 11 (Parsing file0.1.txt)
look at the content of file
file0.1.txt,compute the sum, average and number of values from the file using a script named
compute_stats_single_file.py.
This can be done through the following steps:
open the file,
iterate on the lines,
for each line, convert its values to a float,
update the current statistics.
Example of output
python3 compute_stats_single_file.py
file = "../data/file0.1.txt"
# on Windows:
# file = r"..\data\file0.1.txt"
# r like "raw" = no interpretation of the special characters ("\n", "\t", etc.)
# Such r-strings are also useful when we write Latex code in Python.
nb = 78; sum = 42.46; avg = 0.54
Solution to Exercise 11 (Parsing file0.1.txt)
#!/usr/bin/env python3
"""Computes basic statistics on file that contains a set of lines, each line
containing one float.
"""
file_name = "../common/data_read_files/file0.1.txt"
my_sum = 0.0
number = 0
with open(file_name) as handle:
for line in handle:
elem = float(line)
my_sum = my_sum + elem
number += 1
# not formatted output
# print('nb={}, sum={}, avg={}'.format(number, sum, sum/float(number)))
# formatted output
print(
f'file = "{file_name}\nnb = {number}; sum = {my_sum:.2f};'
f"avg = {my_sum / number:.2f}"
)
file = "../common/data_read_files/file0.1.txt
nb = 78; sum = 42.46;avg = 0.54