Practice: weather data

Practice: weather data#

This section will compute statistics and observe trends in monthly weather data in the period 1951-2023. We use real measurements from Météo-France that are available for any region in France ( https://meteo.data.gouv.fr/datasets/ ). We simplified the task by creating the data file chamonix_weather_data_1951-2023.csv that contains data from one of Chamonix observatories. You can use the code written previously in the previous section where you build a weather station.

The following exercises can be either solved using only native python tools, or using specialized libraries to handle csv data. The former is better to learn python, but the latter is a lot more efficient.

Load the data#

The main difference with the previous exercises is that the files are getting bigger and treating them by hand is not a good way to proceed anymore. Still, since we are dealing with a text file, it is a good idea to open the file and check the content. The first line gives the name of each column, where NOM_USUEL is the location name (only Chamonix in our case), AAAAMM is the date, RR is the rain (more precisely: cumul mensuel des hauteurs de précipitation (en mm et 1/10)), TM the average temperature.

Exercise 30

Read the file chamonix_weather_data_1951-2023.csv in ../../common/data_read_files/ and load the data in the structure of your choice. You can use a structure defined elsewhere. There is some missing data points in the serie which you can discard.

You can use functions:

def load_data(path, separator=";"):
    """Loading the data from a file in a single dictionary"""
    pass


path = "../../common/data_read_files/chamonix_weather_data_1950-2023.csv"
station = load_data(path)

or classes:

from typing import Self


class WeatherStation:
    """A weather station that holds wind and temperature"""

    def __init__(self):
        """initialize the weather station with empty values"""
        pass

    @classmethod
    def from_csv(self, path: str, name: str, sep=";") -> Self:
        """loads a filename using the fields
            NOM_USUEL       : nom usuel du poste
            AAAAMM          : mois
            RR              : cumul mensuel des hauteurs de précipitation (en mm et 1/10)
            TM              : moyenne mensuelle des (TN+TX)/2 quotidiennes (en °C et 1/10)
        raise RuntimeError if import fails
        :path: str
        :returns: None
        """
        pass

Compute statistics#

Exercise 31

In this part, you will access the database you wrote previously and compute statistics from the data. Specifically

  • Compute the average temperature during the full period 1950-2023 and its standard deviation

  • Compute the average rainfall in June in the full period

  • Compute the average temperature when it is raining more than 100 mm during the month

Note: When computing averages, we consider each month to weight the same (February has the same weight despite having less days).

Climate evolution#

We now turn to the more interesting part, we compute how temperatures increased in Chamonix. Then we model the temperatures to make a prediction for 2050.

Exercise 32

We model the temperature evolution in the simplest way possible and assume that the temperature increase linearly.

  • Use numerical dates (for instance ‘2020-06’ should be 2020.5)

  • Plot the temperatures as a function of time.

Approximate the temperature \(T\) as a function of time \(t\), \(T(t) = a * t + b\). Find the two parameters \(a\) and \(b\), resp. the slope and intercept, that provide the best fit to the data. We recommend to use the function scipy.optimize.curve_fit.

from scipy.optimize import curve_fit


def linear(dates, slope, intercept):
    return slope * dates + intercept


# fit the model with curve_fit
pass

Other options such as scikit_learn or solving ordinary least square is also possible, but is more involved mathematically. Control that the model follows the data by plotting the model and the data on the same plot. Print the change in degrees per century and the average temperature predicted in 2050.

Final comment: The exercises could be solved with either pandas or numpy. However the two librairies have different scopes. To read CSV files, compute means, or handle dates pandas offer a simpler interface. However to perform more complex mathematical operations such as linear algebra operations, numpy is a better choice. Furthermore, numpy arrays are handled by a variety of other librairies (e.g. scipy). In practice, you can of course use both librairies, depending on the task. If you want to redo the exercise with other stations, which will likely have missing data, the simplest way is to use pandas.