Liam Brannigan

Unit testing example - mixed layer depth

The chart shows a temperature profile measured by an ARGO float in the North Atlantic in May 2010. The profile has a region of relative uniform temperature near the surface known as the mixed layer. We want a function that will find the depth of the bottom of this layer.

We define the mixed layer depth as the first depth where the temperature is 0.1 degrees colder than the near-surface temperature. We define the near-surface temperature as the average of the top 2 measurements.

Define our first function for 1D data

We define our first function. This function takes a 1D array of temperature values and a float to set the temperature difference threshold. It does a simple loop through the profile until it finds the first index where the temperature difference threshold is passed.

def findMixedLayerIndex(temperature:np.ndarray,thresholdTemperatureDifference:float):
    surfaceTemps = temperature[:2].mean()
    depthIndex = 2
    temperatureDifference = surfaceTemps - temperature[depthIndex]
    while temperatureDifference < thresholdTemperatureDifference:
        depthIndex += 1
        temperatureDifference = surfaceTemps - temperature[depthIndex]
    return depthIndex

We now want to test this function with some data. We’ll define the temperature array manually.

temperature = np.array([5.0,5.0,4.95,4.89,4.85])
targetDepthIndex = 3
output = mixedLayerIndex(temperature=temperature,thresholdTemperatureDifference=0.1)
assert output == targetDepthIndex,f"output:{output},targetDepthIndex:{targetDepthIndex}"

Great - that worked!

We used python’s assert statement here to test if the output was correct. This is fine when we have scalar values or text, but doesn’t cover some scientific use-cases for example:

Define our second function for 2D data

Instead we use Numpy’s built-in testing module through np.testing. We demonstrate this here by modifying our mixed layer depth index function to work with 2D arrays instead of 1D arrays.

def mixedLayerIndexArray(temperature:np.ndarray,thresholdTemperatureDifference:float):
    surfaceTemps = temperature[:2].mean(axis=0)
    depthIndexList = []
    for col in range(temperature.shape[1]):
        depthIndex = 2
        temperatureDifference = surfaceTemps[col] - temperature[depthIndex,col]
        while (temperatureDifference < thresholdTemperatureDifference) and (depthIndex < temperature.shape[0]-1):
            depthIndex += 1
            temperatureDifference = surfaceTemps[col] - temperature[depthIndex,col]
        depthIndexList.append(depthIndex)
    depthIndexArray = np.array(depthIndexList)
    return depthIndexArray
temperature = np.array([
    [5.0,5.0,4.95,4.89,4.85],
    [5.0,5.0,4.95,4.94,4.93]
]).T
print(f"Shape of temperature array: {temperature.shape}")
assert temperature.shape[1] == 2
targetDepthIndexArray = np.array([3,4])
output = mixedLayerIndexArray(temperature=temperature,thresholdTemperatureDifference=0.1)
np.testing.assert_array_equal(output,targetDepthIndexArray)

Numpy’s testing module also allows you to test whether 2 arrays are almost equal within a specified tolerance with np.testing.assert_array_almost_equal.

Testing with dataframes

Testing with dataframes is similar to testing with numpy. Pandas comes with its own testing module at pd.testing.

def mixedLayerIndexDataframe(temperatureDf:pd.DataFrame,thresholdTemperatureDifference:float):
    surfaceTemps = temperatureDf.iloc[:2].mean(axis=0)
    depthIndexList = []
    baseMixedLayerTemperature = []
    for col in temperatureDf.columns:
        depthIndex = 2
        temperatureDifference = surfaceTemps.iloc[col] - temperatureDf.iloc[depthIndex].loc[col]
        while (temperatureDifference < thresholdTemperatureDifference) and (depthIndex < temperature.shape[0]-1):
            depthIndex += 1
            temperatureDifference = surfaceTemps.iloc[col] - temperatureDf.iloc[depthIndex].loc[col]
        depthIndexList.append(depthIndex)
        baseMixedLayerTemperature.append(temperatureDf.iloc[depthIndex].loc[col])

    mixedLayerDf = pd.DataFrame({'depthIndex': depthIndexList,'mlTemp':baseMixedLayerTemperature})
    return mixedLayerDf

Testing principles

The basic principles for all software testing are: Arrange, Act, Assert

Unit testing in practice

In practice you need an automated testing framework to:

I highly recommend the third-party pytest package rather than the built-in unit test module:

To date I have only used the built-in unit test module for certain functionality e.g. for testing if a test will return an exception.

Test datasets

What goes in the test dataset?

Further reading

Blog post on why we write tests for data analysis and some strategies for what to test

Unit testing principles

Intro to unit testing for data science video