Documentation and testing#

Document your code#

Why?

“Code is more often read than written.” - Guido von Rossum

Who do you write code for?

  • users

  • developers (yourself + others, potentially)

You will go back to code you’ve written some time ago and think “What in the world was I thinking?”. If you are having trouble reading your own code, imagine what your users or other developers are experiencing when they are trying to use or contribute to your code.

Documentation is essential

It doesn’t matter how good your software is: if the documentation is not good enough, people will not use it!

Documentation versus comments#

  • Documentation: specific position (docstrings) and format -> describes use and functionality. For the users.

  • Comments: in/between code lines -> why I’m doing this? For the developers.

About comments#

Starts with a hash sign (#), next to commented code, short

def sum_numbers(*numbers):
    """Return the sum of numbers."""

    # initialize the total_sum var
    total_sum = 0

    print("numbers =", numbers)

    # TODO: check numbers type

    # calculate the sum
    for number in numbers:
        total_sum += number
    print("total_sum =", total_sum)

    return total_sum

Note

Python code is usually quite readable in itself. When this is the case, it is not necessary to add too many comments and it is much better to choose nice names and to organize the code to increase its readability and understandability.

About docstring#

A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. When configured correctly, it can help your users and yourself with your project’s documentation. Such a docstring becomes the doc special attribute of that object, and it can be printed to the console using the built-in function help():

help(str.startswith)
Help on method_descriptor:

startswith(self, prefix[, start[, end]], /) unbound builtins.str method
    Return True if the string starts with the specified prefix, False otherwise.

    prefix
      A string or a tuple of strings to try.
    start
      Optional start position. Default: start of the string.
    end
      Optional stop position. Default: end of the string.

Structure:

  • A one-line summary line

  • A blank line

  • Any further elaboration for the docstring

  • Another blank line

Syntax: triple-double quote (""")

def get_spreadsheet_cols(file_loc, print_cols=False):
    """Gets and prints the spreadsheet's header columns

    Parameters
    ----------
    file_loc : str
        The file location of the spreadsheet
    print_cols : bool, optional
        A flag used to print the columns to the
        console (default is False)

    Returns
    -------
    list
        a list of strings used that are the
        header columns
    """

    file_data = pd.read_excel(file_loc)
    col_headers = list(file_data.columns.values)

    if print_cols:
        print("\n".join(col_headers))

    return col_headers


help(get_spreadsheet_cols)
Help on function get_spreadsheet_cols in module __main__:

get_spreadsheet_cols(file_loc, print_cols=False)
    Gets and prints the spreadsheet's header columns

    Parameters
    ----------
    file_loc : str
        The file location of the spreadsheet
    print_cols : bool, optional
        A flag used to print the columns to the
        console (default is False)

    Returns
    -------
    list
        a list of strings used that are the
        header columns

How to document a project#

Readme file, docs folder (with tutorials, technical doc…), license, …

There are tools to help you create your documentation and automate doc generation from docstrings (for example Sphinx and Read The Docs).

Typing annotations for documentation#

Modern Python can include typing annotations. These annotations can be used by third party tools such as type checkers, IDEs, linters. They can also be useful for the perspective of the documentation. For example, one can write:

def compute_quantities(a: float, b: float) -> dict[str, float]:
    return {"product": a * b, "sum": a + b}

Testing#

Why testing?#

  • Coding without testing is dangerous.

    https://lesjoiesducode.fr/content/034/yqo5ASD.gif

    Fig. 1 quand-je-déploie-en-prod-sans-tester#

  • To make sure the code conforms with the specs, and/or define correct specs.

    https://thecodinglove.com/content/037/g784kEU.gif

    Fig. 2 solid-code-wrong-specs#

  • To avoid regression:

    • when there is a refactor

    • when there is a critical code evolution

    • when it crashes, to select where to look for the pb

When to test?#

Historically, we used to test after coding

  1. Code,

  2. write the tests,

  3. if the tests fail, go to 1.

But it is better to do TDD (Test Driven Dev): write the test before coding

  1. Define the spec,

  2. write the tests,

  3. code,

  4. test,

  5. if the tests fail, go to 1, 2 or 3.

If enough ressources are available, the person who writes the test and the one who codes are different. But they are following the same specs!

What do we test?#

  • Unit tests

  • Functional tests

Unit tests#

Test that functions conform with the specs, i.e. with the “paper” analysis

def add(a, b):
    """ add a and b and return a+b"""
    return a+b

Let’s write the test:

  • What do we do if a and b are not of the same type?

  • What if a or b (or a and b) are None?

  • What if a + b does not exist? What should happen?

  • What if a or b is NaN?

All these questions need to be answered in order to write the test…thus to write the function!

Functional tests#

Check that the code works when assembling different functions, i.e. the functions can work together!

functional test

Fig. 3 A failing functional test#

How do we test?#

With the assert keyword#

assert tests a condition and raises an AssertError if the condition does not evaluate to True.

assert 42 == 40 + 2, "This expression has to be True"
assert type(10) is int

Both tests result in True so nothing comes out. But if we run:

assert 42 == 40 + 4, "42 is not equal to 44"
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[1], line 1
----> 1 assert 42 == 40 + 4, "42 is not equal to 44"

AssertionError: 42 is not equal to 44

we get an AssertionError message and the process stops.

Example: let’s write a test for a simple function#
def add(arg0, arg1):
    """Print and return the sum of the two arguments (duck typing).
    assuming arg0 + arg1 is well defined.
    """
    result = arg0 + arg1
    return result

def test_add():
    """ test add is ok with int and strings"""
    print("testing add with int ", end="")
    assert add(1, 2) == 3
    print(" .... OK")
    print("testing add ok with str", end="")
    assert add("a", "b") == "ab"
    print("... OK")
    print("test add ok")

test_add()
testing add with int  .... OK
testing add ok with str... OK
test add ok

Note

test function name should always start with test_.

You should write tests to cover most (if not all) of your code…

With the Pytest package#

Pytest is a software testing framework that helps you write and run readable and scalable tests.

It is not part of the standard library so it needs to be installed (typically with pip/conda/mamba install pytest).

Once your tests are written in test_xxx.py files, just run:

pytest

This will execute all files starting with the test_ prefix and return a detailed report on the test session.

Alternatively, one can run pytest test_xxx.py to only run this test file.

There are several useful options (see pytest -h) but here is a selection of the most useful ones:

  -v, --verbose         Increase verbosity
  -s                    Shortcut for --capture=no

  -x, --exitfirst       Exit instantly on first error or failed test

  --lf, --last-failed   Rerun only the tests that failed at the last run (or all
                        if none failed)
  --ff, --failed-first  Run all tests, but run the last failures first.

Tip

pytest --pdb --pdbcls=IPython.terminal.debugger:TerminalPdb starts a debug session where an error was raised (pdb is the builtin Python debugger).

The related help says:

  --pdb                 Start the interactive Python debugger on errors or
                        KeyboardInterrupt
  --pdbcls=modulename:classname
                        Specify a custom interactive Python debugger for use
                        with --pdb.For example:
                        --pdbcls=IPython.terminal.debugger:TerminalPdb

One can remember about the command pytest -h | grep pdb, to find again this useful command.

Test coverage and the Coverage package#

The notion of test coverage is useful. It is important to know which code is at least executed during testing. The coverage is the percentage of lines run by the test suite.

One can measure the coverage with the package coverage and the Pytest plugin pytest-cov (pip install pytest coverage pytest-cov).

If your code is in a directory src and your test files in a directory tests, pytest --cov=src tests will run your tests, measure the coverage and produce a short report. You can then produce a html visualisation of these results by running coverage html.

Exercise 10

Do it yourself:

The goal is to write a function that returns the sum of the first argument with twice the second argument. First write a test for this function. Try to use pytest!