The chart shows a temperature profile measured by an ARGO float in the North Atlantic in May 2010. The profile has a region of relative uniform temperature near the surface known as the mixed layer. We want a function that will find the depth of the bottom of this layer.
We define the mixed layer depth as the first depth where the temperature is 0.1 degrees colder than the near-surface temperature. We define the near-surface temperature as the average of the top 2 measurements.
We define our first function. This function takes a 1D array of temperature values and a float to set the temperature difference threshold. It does a simple loop through the profile until it finds the first index where the temperature difference threshold is passed.
def findMixedLayerIndex(temperature:np.ndarray,thresholdTemperatureDifference:float):
surfaceTemps = temperature[:2].mean()
depthIndex = 2
temperatureDifference = surfaceTemps - temperature[depthIndex]
while temperatureDifference < thresholdTemperatureDifference:
depthIndex += 1
temperatureDifference = surfaceTemps - temperature[depthIndex]
return depthIndex
We now want to test this function with some data. We’ll define the temperature array manually.
temperature = np.array([5.0,5.0,4.95,4.89,4.85])
targetDepthIndex = 3
output = mixedLayerIndex(temperature=temperature,thresholdTemperatureDifference=0.1)
assert output == targetDepthIndex,f"output:{output},targetDepthIndex:{targetDepthIndex}"
Great - that worked!
We used python’s assert
statement here to test if the output was correct. This is fine when we have scalar values or text, but doesn’t cover some scientific use-cases for example:
Instead we use Numpy’s built-in testing module through np.testing
. We demonstrate this here by modifying our mixed layer depth index function to work with 2D arrays instead of 1D arrays.
def mixedLayerIndexArray(temperature:np.ndarray,thresholdTemperatureDifference:float):
surfaceTemps = temperature[:2].mean(axis=0)
depthIndexList = []
for col in range(temperature.shape[1]):
depthIndex = 2
temperatureDifference = surfaceTemps[col] - temperature[depthIndex,col]
while (temperatureDifference < thresholdTemperatureDifference) and (depthIndex < temperature.shape[0]-1):
depthIndex += 1
temperatureDifference = surfaceTemps[col] - temperature[depthIndex,col]
depthIndexList.append(depthIndex)
depthIndexArray = np.array(depthIndexList)
return depthIndexArray
temperature = np.array([
[5.0,5.0,4.95,4.89,4.85],
[5.0,5.0,4.95,4.94,4.93]
]).T
print(f"Shape of temperature array: {temperature.shape}")
assert temperature.shape[1] == 2
targetDepthIndexArray = np.array([3,4])
output = mixedLayerIndexArray(temperature=temperature,thresholdTemperatureDifference=0.1)
np.testing.assert_array_equal(output,targetDepthIndexArray)
Numpy’s testing module also allows you to test whether 2 arrays
are almost equal within a specified tolerance with np.testing.assert_array_almost_equal
.
Testing with dataframes is similar to testing with numpy
. Pandas comes with its own testing module
at pd.testing
.
def mixedLayerIndexDataframe(temperatureDf:pd.DataFrame,thresholdTemperatureDifference:float):
surfaceTemps = temperatureDf.iloc[:2].mean(axis=0)
depthIndexList = []
baseMixedLayerTemperature = []
for col in temperatureDf.columns:
depthIndex = 2
temperatureDifference = surfaceTemps.iloc[col] - temperatureDf.iloc[depthIndex].loc[col]
while (temperatureDifference < thresholdTemperatureDifference) and (depthIndex < temperature.shape[0]-1):
depthIndex += 1
temperatureDifference = surfaceTemps.iloc[col] - temperatureDf.iloc[depthIndex].loc[col]
depthIndexList.append(depthIndex)
baseMixedLayerTemperature.append(temperatureDf.iloc[depthIndex].loc[col])
mixedLayerDf = pd.DataFrame({'depthIndex': depthIndexList,'mlTemp':baseMixedLayerTemperature})
return mixedLayerDf
The basic principles for all software testing are: Arrange, Act, Assert
Assert: check that the output of the function meets your expectations
In practice you need an automated testing framework to:
I highly recommend the third-party pytest
package rather than the built-in unit test
module:
To date I have only used the built-in unit test
module for certain functionality e.g. for testing if a test will return an exception.
Blog post on why we write tests for data analysis and some strategies for what to test