Classes and objects#

Education objectives

  • class, type, objects, attribute, methods

  • special methods (“dunder”)

  • OOP and encapsulation

  • when (not) to use OOP

Object-oriented programming: encapsulation#

Python is also an object-oriented language. For some problems, Object-Oriented Programming (OOP) is a very efficient paradigm. Many libraries use it, so it is worth understanding what OOP is, when it is useful, and how it is used in Python.

In this notebook, we focus on the OOP notion of encapsulation and do not study the more advanced concept of inheritance.

When to use OOP#

OOP is a good fit when:

  • you have state (data that evolves over time), and

  • you have operations that are naturally tied to that state.

A typical example: a simulation that tracks the position and velocity of a particle and provides methods to advance it in time. A counter-example: a collection of utility functions that transform data independently — plain functions are simpler and more appropriate there.

A rough heuristic: if you find yourself passing the same “context” dictionary to every function you write, a class is probably a better design.

Concepts#

Object

An object is an entity that has a state and a behaviour. Objects are the basic elements of object-oriented systems.

Class

Classes are “families” of objects. A class is a pattern that describes how objects are built and behave.

Introduction based on the tuple type#

These concepts are so fundamental to Python that we have already been using objects and classes throughout this course.

In particular, str, list, and dict are “types” or “classes”. In Python these two names mean essentially the same thing. We tend to say “type” for built-in types and “class” for types defined in libraries or user code.

We have already used tuple:

my_tuple = tuple("abca")

Here we have instantiated (i.e. created an instance of) the built-in type tuple.

We can use dir to inspect its attribute names. We filter out names starting with __ for now, since those are special methods we will look at shortly.

[name for name in dir(my_tuple) if not name.startswith("__")]
['count', 'index']

tuple exposes two methods: count and index. Methods are functions attached to an object that act on or with it.

result = my_tuple.count("a")
result
2

tuple has no plain data attributes — all state is encapsulated inside the object.

We are now going to build our own Tuple class step by step.

Attributes and the __init__ special method#

It is good practice to write a test function first, which defines exactly what we want.

def test_init(cls):
    obj = cls("abc")

We can verify that the built-in tuple passes:

test_init(tuple)

No assertion error means our test is reasonable. Now let us start with a minimal class:

class Tuple:
    """Our own tuple class."""
test_init(Tuple)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 test_init(Tuple)

Cell In[4], line 2, in test_init(cls)
      1 def test_init(cls):
----> 2     obj = cls("abc")

TypeError: Tuple() takes no arguments

We need an __init__ method. This is the initialiser: Python calls it automatically right after the object is created, to set up its initial state.

class Tuple:
    def __init__(self, iterable):
        """Initialise from any iterable (list, string, range, ...)."""
        if not isinstance(iterable, list):
            iterable = list(iterable)
        # "private" attribute (convention: leading underscore)
        self._list = iterable

isinstance(x, T) checks whether x is an instance of T (or a subclass of it). Prefer it over type(x) == T, which would reject subclasses.

Note

The self argument is a convention, not a keyword. It refers to the object on which the method is called. We will see exactly how that works after introducing a simpler method.

test_init(Tuple)

Attributes are accessed with the dot. Note that “private” attributes (prefixed with _) are accessible in Python — the underscore is a social contract, not enforcement.

tup = Tuple("abca")
tup._list
['a', 'b', 'c', 'a']

__len__: making len() work#

The built-in len function does not know about our class yet:

len(tup)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[11], line 1
----> 1 len(tup)

TypeError: object of type 'Tuple' has no len()

We define __len__ to fix this. This is the protocol Python uses for any object that has a notion of length.

def test_len(cls):
    obj = cls("abca")
    assert len(obj) == 4
test_len(tuple)
class Tuple:
    def __init__(self, iterable):
        if not isinstance(iterable, list):
            iterable = list(iterable)
        self._list = iterable

    def __len__(self):
        return len(self._list)
test_len(Tuple)

Adding the count method#

def test_count(cls):
    obj = cls("abca")
    assert obj.count("a") == 2
    assert obj.count("b") == 1
test_count(tuple)
test_count(Tuple)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[18], line 1
----> 1 test_count(Tuple)

Cell In[16], line 3, in test_count(cls)
      1 def test_count(cls):
      2     obj = cls("abca")
----> 3     assert obj.count("a") == 2
      4     assert obj.count("b") == 1

AttributeError: 'Tuple' object has no attribute 'count'
class Tuple:
    def __init__(self, iterable):
        if not isinstance(iterable, list):
            iterable = list(iterable)
        self._list = iterable

    def __len__(self):
        return len(self._list)

    def count(self, obj):
        """Return the number of occurrences of obj."""
        return self._list.count(obj)
test_count(Tuple)

This is a good moment to understand the self argument:

tup = Tuple("abca")
assert Tuple.count(tup, "a") == tup.count("a")

Important

tup.count("a") is syntactic sugar for Tuple.count(tup, "a"). The self parameter is the object the method is called on.

Special (“dunder”) methods and __repr__#

Special methods — whose names start and end with __ — define how objects behave in built-in operations. Let us see how many a tuple has:

tup_builtin = tuple("abca")
[name for name in dir(tup_builtin) if name.startswith("__")]
['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

One of the most useful is __repr__, which controls how the object is displayed in the Python shell or in Jupyter:

tup_builtin
('a', 'b', 'c', 'a')
repr(tup_builtin)  # same as tup_builtin.__repr__()
"('a', 'b', 'c', 'a')"

Let us see what our class currently produces:

tup = Tuple("abca")
tup
<__main__.Tuple at 0x7fab307c6c10>

Not very informative. Let us write a test:

def test_repr(cls):
    assert repr(cls("abca")) == "('a', 'b', 'c', 'a')"
    # edge cases: empty tuple and single-element tuple
    assert repr(cls("")) == "()"
    assert repr(cls("a")) == "('a',)"  # note the trailing comma!
test_repr(tuple)
test_repr(Tuple)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[28], line 1
----> 1 test_repr(Tuple)

Cell In[26], line 2, in test_repr(cls)
      1 def test_repr(cls):
----> 2     assert repr(cls("abca")) == "('a', 'b', 'c', 'a')"
      3     # edge cases: empty tuple and single-element tuple
      4     assert repr(cls("")) == "()"

AssertionError: 

The single-element case is tricky: ('a',) requires a trailing comma to be unambiguous — without it Python reads the parentheses as grouping, not as a tuple.

class Tuple:
    def __init__(self, iterable):
        if not isinstance(iterable, list):
            iterable = list(iterable)
        self._list = iterable

    def __len__(self):
        return len(self._list)

    def count(self, obj):
        """Return the number of occurrences of obj."""
        return self._list.count(obj)

    def __repr__(self):
        if len(self._list) == 0:
            return "()"
        if len(self._list) == 1:
            return f"({self._list[0]!r},)"
        return "(" + ", ".join(repr(x) for x in self._list) + ")"
test_repr(Tuple)
Tuple("abca")
('a', 'b', 'c', 'a')

__str__ vs __repr__

Python has two string-conversion methods:

  • __repr__ should be unambiguous — ideally, eval(repr(obj)) == obj.

  • __str__ should be readable — used by print().

If only __repr__ is defined, str() falls back to it.

__getitem__: enabling indexing#

Why does tup_builtin[0] work? Because tuple defines __getitem__. Let us add it:

def test_getitem(cls):
    obj = cls("abca")
    assert obj[0] == "a"
    assert obj[-1] == "a"
test_getitem(tuple)
test_getitem(Tuple)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[34], line 1
----> 1 test_getitem(Tuple)

Cell In[32], line 3, in test_getitem(cls)
      1 def test_getitem(cls):
      2     obj = cls("abca")
----> 3     assert obj[0] == "a"
      4     assert obj[-1] == "a"

TypeError: 'Tuple' object is not subscriptable
class Tuple:
    def __init__(self, iterable):
        if not isinstance(iterable, list):
            iterable = list(iterable)
        self._list = iterable

    def __len__(self):
        return len(self._list)

    def __getitem__(self, index):
        return self._list[index]

    def count(self, obj):
        """Return the number of occurrences of obj."""
        return self._list.count(obj)

    def __repr__(self):
        if len(self._list) == 0:
            return "()"
        if len(self._list) == 1:
            return f"({self._list[0]!r},)"
        return "(" + ", ".join(repr(x) for x in self._list) + ")"
test_getitem(Tuple)

A pleasant side effect: once __getitem__ and __len__ are defined, Python’s for loop works too — for free.

for item in Tuple("abc"):
    print(item)
a
b
c

This is an example of Python’s protocol system: you get iteration, in tests, and more simply by implementing a small set of dunder methods.

Back to __init__#

tup = Tuple("abca")

is actually equivalent to:

# Step 1: allocate a bare, uninitialised object
tup = Tuple.__new__(Tuple)
# Step 2: initialise it (equivalent to Tuple.__init__(tup, "abca"))
tup.__init__("abca")

You can now read the second line as Tuple.__init__(tup, "abca") — which is exactly the general rule we saw for count.

Summary#

Here is the interface our Tuple class now exposes:

What you write

Dunder method called

Tuple(...)

__new__ + __init__

len(tup)

__len__

tup[i]

__getitem__

repr(tup)

__repr__

for x in tup

__getitem__ (or __iter__)

Python’s data model is built on this protocol system: any object that implements the right dunder methods integrates seamlessly with built-in operations and the standard library.