In January 2021, Alexander Amini, Chief Scientific Officer at Themis AI, delivered a *lecture** centered around the theme of "Evidential Deep Learning." Here, we are showcasing the important issue of the difference between probability and confidence and how at Themis AI we are developing tools to estimate these different elements in the model, allowing you to determine its reliability and trustworthiness. For more information on the lecture see **MIT Introduction to Deep Learning.*

At Themis AI, we are building a groundbreaking technology to create and ensure trustworthy AI. More specifically, we have designed revolutionary tools that calculate *whether and when* a model can be trusted to deliver correct results. This approach is fundamentally different from existing techniques and might require thinking about robustness with a slight shift in paradigm and perspective.

In order for you to trust your model’s outputs, there are at least two things you need to assess:

1. Your model’s gaps in knowledge.

2. How and whether those gaps affect the reliability of your model when producing specific outputs.

The first issue is related to the type of data your model was trained on. If the training data set was biased, there might be gaps in your model’s knowledge, which might affect its performance. The second issue is related to inference. That is, given a certain input x, if the model lacks data and knowledge about x, the model will likely be unreliable in its assessment.

Now, and this is key: we argue that to estimate both one and two above, you need to imbue your model with a way to estimate its own (epistemic) uncertainty. Your model needs to estimate gaps in its knowledge (based on training) and, in addition, it needs to estimate the reliability of its specific outputs by assessing uncertainty during inference

You might think that your model already does that by estimating likelihood. Historically, probability in neural networks has been perceived as a surrogate for model confidence. These networks traditionally offer deterministic outputs. Such an approach, however, doesn't shed light on the uncertainty or variance associated with this prediction.

Consider a sophisticated automated image classifier, diligently trained on numerous images of cats and dogs. If shown a photograph containing either a cat or a dog, this classifier can distinguish between the two with impressive accuracy. But what if the image captures both a cat and a dog *together*? One might reasonably expect the classifier to produce an evenly split output, given that it recognizes features of both animals. Hence, in this case the probability would be 0.5 for each. However, this is not to say that the model should not be confident that there is a dog (and a cat) in the input image. Indeed, the model, an expert in both cat and dog images, detected both cats’ and dogs' features in the input image. As a result, confidence should be high, even though the likelihood is 0.5 because of the ambiguity in the input data. As a result, you see that this probability is not the same as confidence.

Venture a step further and introduce the classifier to an entirely unfamiliar image, say, of a car. Traditional neural network configurations, rooted in probabilistic methodologies, would still attempt a classification. The resulting outputs would inevitably be misleading, as the probabilities they calculate, always summing up to 1, would force the model into categorizing the car as either a cat or a dog, and do so while producing a certain likelihood. It is important to see that that likelihood is not the confidence the model should have in its answer. For instance, if the model says that the image of the car is a cat with a probability of 0.8, not only is the answer wrong, but also the probability estimation cannot and should not be the confidence the model has in that answer. In contrast, the model should be able to tell us that it has low confidence in the fact that the image is a cat and do so based on an uncertainty estimation that should recognize that the image is out of distribution, i.e., it is not among the things the model knows.

In conclusion, we need to be extremely careful to distinguish between probability and confidence in the model’s output. The former tracks the input and its possible ambiguity. The latter tracks whether the inferences that produce a certain output are based on knowledge or not. That means that to determine confidence, we also need to establish the model’s level of uncertainty in training and the extent to which the model has been trained on a biased, incomplete dataset. At Themis AI, we provide the tools to estimate these different elements in the model, allowing you to determine its reliability and trustworthiness.

Work with us