Documentation and testing#
Document your code#
Why?
“Code is more often read than written.” - Guido von Rossum
Who do you write code for?
users
developers (yourself + others, potentially)
You will go back to code you’ve written some time ago and think “What in the world was I thinking?”. If you are having trouble reading your own code, imagine what your users or other developers are experiencing when they are trying to use or contribute to your code.
Documentation is essential
It doesn’t matter how good your software is: if the documentation is not good enough, people will not use it!
Documentation versus comments#
Documentation: specific position (docstrings) and format -> describes use and functionality. For the users.
Comments: in/between code lines -> why I’m doing this? For the developers.
About comments#
Starts with a hash sign (#), next to commented code, short
def sum_numbers(*numbers):
"""Return the sum of numbers."""
# initialize the total_sum var
total_sum = 0
print("numbers =", numbers)
# TODO: check numbers type
# calculate the sum
for number in numbers:
total_sum += number
print("total_sum =", total_sum)
return total_sum
Note
Python code is usually quite readable in itself. When this is the case, it is not necessary to add too many comments and it is much better to choose nice names and to organize the code to increase its readability and understandability.
About docstring#
A docstring is a string literal that occurs as the first statement in a module, function,
class, or method definition. When configured correctly, it can help your users and
yourself with your project’s documentation. Such a docstring becomes the doc special
attribute of that object, and it can be printed to the console using the built-in
function help():
help(str.startswith)
Help on method_descriptor:
startswith(self, prefix[, start[, end]], /) unbound builtins.str method
Return True if the string starts with the specified prefix, False otherwise.
prefix
A string or a tuple of strings to try.
start
Optional start position. Default: start of the string.
end
Optional stop position. Default: end of the string.
Structure:
A one-line summary line
A blank line
Any further elaboration for the docstring
Another blank line
Syntax: triple-double quote (""")
def get_spreadsheet_cols(file_loc, print_cols=False):
"""Gets and prints the spreadsheet's header columns
Parameters
----------
file_loc : str
The file location of the spreadsheet
print_cols : bool, optional
A flag used to print the columns to the
console (default is False)
Returns
-------
list
a list of strings used that are the
header columns
"""
file_data = pd.read_excel(file_loc)
col_headers = list(file_data.columns.values)
if print_cols:
print("\n".join(col_headers))
return col_headers
help(get_spreadsheet_cols)
Help on function get_spreadsheet_cols in module __main__:
get_spreadsheet_cols(file_loc, print_cols=False)
Gets and prints the spreadsheet's header columns
Parameters
----------
file_loc : str
The file location of the spreadsheet
print_cols : bool, optional
A flag used to print the columns to the
console (default is False)
Returns
-------
list
a list of strings used that are the
header columns
How to document a project#
Readme file, docs folder (with tutorials, technical doc…), license, …
There are tools to help you create your documentation and automate doc generation from docstrings (for example Sphinx and Read The Docs).
Typing annotations for documentation#
Modern Python can include typing annotations. These annotations can be used by third party tools such as type checkers, IDEs, linters. They can also be useful for the perspective of the documentation. For example, one can write:
def compute_quantities(a: float, b: float) -> dict[str, float]:
return {"product": a * b, "sum": a + b}
Testing#
Why testing?#
Coding without testing is dangerous.
To make sure the code conforms with the specs, and/or define correct specs.
Fig. 2 solid-code-wrong-specs#
To avoid regression:
when there is a refactor
when there is a critical code evolution
when it crashes, to select where to look for the pb
When to test?#
Historically, we used to test after coding
Code,
write the tests,
if the tests fail, go to 1.
But it is better to do TDD (Test Driven Dev): write the test before coding
Define the spec,
write the tests,
code,
test,
if the tests fail, go to 1, 2 or 3.
If enough ressources are available, the person who writes the test and the one who codes are different. But they are following the same specs!
What do we test?#
Unit tests
Functional tests
Unit tests#
Test that functions conform with the specs, i.e. with the “paper” analysis
def add(a, b):
""" add a and b and return a+b"""
return a+b
Let’s write the test:
What do we do if a and b are not of the same type?
What if a or b (or a and b) are
None?What if a + b does not exist? What should happen?
What if a or b is
NaN?
All these questions need to be answered in order to write the test…thus to write the function!
Functional tests#
Check that the code works when assembling different functions, i.e. the functions can work together!
Fig. 3 A failing functional test#
How do we test?#
With the assert keyword#
assert tests a condition and raises an AssertError if the condition does not evaluate
to True.
assert 42 == 40 + 2, "This expression has to be True"
assert type(10) is int
Both tests result in True so nothing comes out. But if we run:
assert 42 == 40 + 4, "42 is not equal to 44"
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[1], line 1
----> 1 assert 42 == 40 + 4, "42 is not equal to 44"
AssertionError: 42 is not equal to 44
we get an AssertionError message and the process stops.
Example: let’s write a test for a simple function#
def add(arg0, arg1):
"""Print and return the sum of the two arguments (duck typing).
assuming arg0 + arg1 is well defined.
"""
result = arg0 + arg1
return result
def test_add():
""" test add is ok with int and strings"""
print("testing add with int ", end="")
assert add(1, 2) == 3
print(" .... OK")
print("testing add ok with str", end="")
assert add("a", "b") == "ab"
print("... OK")
print("test add ok")
test_add()
testing add with int .... OK
testing add ok with str... OK
test add ok
Note
test function name should always start with test_.
You should write tests to cover most (if not all) of your code…
With the Pytest package#
Pytest is a software testing framework that helps you write and run readable and scalable tests.
It is not part of the standard library so it needs to be installed (typically with
pip/conda/mamba install pytest).
Once your tests are written in test_xxx.py files, just run:
pytest
This will execute all files starting with the test_ prefix and return a detailed report
on the test session.
Alternatively, one can run pytest test_xxx.py to only run this test file.
There are several useful options (see pytest -h) but here is a selection of the most
useful ones:
-v, --verbose Increase verbosity
-s Shortcut for --capture=no
-x, --exitfirst Exit instantly on first error or failed test
--lf, --last-failed Rerun only the tests that failed at the last run (or all
if none failed)
--ff, --failed-first Run all tests, but run the last failures first.
Tip
pytest --pdb --pdbcls=IPython.terminal.debugger:TerminalPdb starts a debug
session where an error was raised (pdb is the builtin Python debugger).
The related help says:
--pdb Start the interactive Python debugger on errors or
KeyboardInterrupt
--pdbcls=modulename:classname
Specify a custom interactive Python debugger for use
with --pdb.For example:
--pdbcls=IPython.terminal.debugger:TerminalPdb
One can remember about the command pytest -h | grep pdb, to find again this
useful command.
Test coverage and the Coverage package#
The notion of test coverage is useful. It is important to know which code is at least executed during testing. The coverage is the percentage of lines run by the test suite.
One can measure the coverage with the package coverage
and the Pytest plugin pytest-cov
(pip install pytest coverage pytest-cov).
If your code is in a directory src and your test files in a directory tests,
pytest --cov=src tests will run your tests, measure the coverage and produce a short
report. You can then produce a html visualisation of these results by running
coverage html.
Exercise 10
Do it yourself:
The goal is to write a function that returns the sum of the first argument with twice the
second argument. First write a test for this function. Try to use pytest!
Solution to Exercise 10
First write the tests in a file named test_*.py
from my_mod import add_second_twice
def test_add_second_twice():
""" test add second twice"""
print("testing add second twice with int ", end="")
assert add_second_twice(3, 5) == 13
print("...OK")
print("testing add second twice with strings ", end="")
assert add_second_twice("aa", "bb") == "aabbbb"
print("...OK")
print("testing add second twice with list ", end="")
assert add_second_twice([1,2], [3,4]) == [1, 2, 3, 4, 3, 4]
print("...OK")
print("test add second twice OK with int, string and list")
and empty functions
def add_second_twice(arg0, arg1):
"""Return the sum of the first argument with twice the second one.
Arguments should be of type that support
sum and product by an integer
(e.g. numerical, string, list, ...)
:param arg0: first argument
:param arg1: second argument
:return: arg0 + 2 * arg1
"""
pass
Then implement the function and test:
def add_second_twice(arg0, arg1):
"""Return the sum of the first argument with twice the second one.
Arguments should be of type that support sum and product by
an integer (e.g. numerical, string, list, ...)
:param arg0: first argument
:param arg1: second argument
:return: arg0 + 2 * arg1
"""
result = arg0 + 2*arg1
print(f'arg0 + 2*arg1 = {arg0} + 2*{arg1} = {result}')
return result
def test_add_second_twice():
""" test add second twice"""
print("testing add second twice with int ", end="")
assert add_second_twice(3, 5) == 13
print("...OK")
print("testing add second twice with strings ", end="")
assert add_second_twice("aa", "bb") == "aabbbb"
print("...OK")
print("testing add second twice with list ", end="")
assert add_second_twice([1,2], [3,4]) == [1, 2, 3, 4, 3, 4]
print("...OK")
print("test add second twice OK with int, string and list")
test_add_second_twice()
testing add second twice with int arg0 + 2*arg1 = 3 + 2*5 = 13
...OK
testing add second twice with strings arg0 + 2*arg1 = aa + 2*bb = aabbbb
...OK
testing add second twice with list arg0 + 2*arg1 = [1, 2] + 2*[3, 4] = [1, 2, 3, 4, 3, 4]
...OK
test add second twice OK with int, string and list