In April 2023, Sadhana Lolla, machine learning scientist at Themis AI, delivered a lecture centered around the theme of "Robust and Trustworthy Deep Learning." During her presentation, she unveiled the cutting-edge technological advancements in progress at Themis AI. Over the next few weeks, we will publish a sequence of blog posts that will showcase the main points of her talk: last week we talked about bias in ML, in this blog post we will discuss why uncertainty-awareness is important for robustness and how we can estimate uncertainty in deep learning. For more information on the lecture see MIT Introduction to Deep Learning.
Uncertainty in deep learning refers to the model's inability to reliably make predictions in certain scenarios. It is important to estimate uncertainty because this estimation allows the model to communicate whether the answers it is providing are not based on sufficient knowledge and are, therefore, untrustworthy. This uncertainty estimation can be helpful to the users of the model in two ways:
First, users can adjust their trust in the model’s answers based on its uncertainty level. By providing an uncertainty score alongside its predictions, the model enables users to make more informed decisions about whether to rely on the model’s recommendations. This helps address cases in which the model provides confident but incorrect predictions. Relying on such predictions may lead users to make unsafe and unethical decisions in critical domains.
Second, users can, in some cases, remedy this lack of knowledge and mitigate high uncertainty by ensuring that the model becomes reliable across different scenarios. However, this is not available for all types of uncertainty, as we will see shortly.
There are multiple types of uncertainty in neural networks. The two main types of uncertainty are uncertainty in the data and uncertainty within the model.
Uncertainty in the Data: Aleatoric Uncertainty
The first type of uncertainty comes from noisy or ambiguous data points that do not follow the expected pattern in the training set. These data points can lead to inaccurate predictions since similar inputs may result in widely different outputs. For example, in semantic segmentation, areas with high data uncertainty are the boundaries between objects because the data covering those regions is typically quite noisy. This uncertainty is called Aleatoric uncertainty and is irreducible and cannot be eliminated in real-world scenarios: even adding more data will not remedy the ambiguous nature of the input.
Luckily, we can train models to estimate Aleatoric uncertainty. In particular, the goal of estimating aleatoric uncertainty is to learn a set of variances that correspond to the input. One way to achieve this is to add an extra layer to the model. That is, when given an input, we expect the model to provide not only a prediction (ŷ) but also a variance, which will be higher for input-areas with more noise. This implies that variance is not constant; our input distribution may have areas with both high and low variance, making our variance dependent on the specific input. Furthermore, to train this model with its additional layer, we need to modify our loss function to account for variance at any point. By minimizing the Mean Squared Error, we can learn the parameters of a multivariate Gaussian with a mean ŷi and a constant variance. To now extend this to non-constant variances, we change the loss function to the negative log likelihood (NLL). In essence, we can view this NLL as a generalization of the mean squared error loss to account for non-constant variances. With variance included in the loss function, we can finally determine how accurately our predictions parameterize the distribution that constitutes our input.
Uncertainty in the Model: Epistemic Uncertainty
Uncertainty in the model stems from the concepts discussed in the previous blog post, specifically, the bias resulting from inadequate representation in the data on which the model is trained. High model uncertainty happens in areas where the model lacks sufficient training data. In such regions, the model cannot offer reliable predictions due to the absence of similar training data. This type of uncertainty is termed Epistemic Uncertainty.
To illustrate, consider a classifier initially trained to identify images of dogs and cats. When presented with an image of a horse, a type of data it hasn't encountered during training, the model struggles as it lacks familiarity with horses' visual characteristics. Despite this unfamiliarity, the model is compelled to provide an output (e.g. "it is a dog"), which is clearly incorrect. Addressing this challenge is key, and one way to do so is by ensuring that alongside its outputs, the model also delivers an assessment of its level of uncertainty.
The ability to estimate uncertainty is essential in real-world, high-risk scenarios where the model may encounter unfamiliar objects or situations. In applications like autonomous driving, for instance, being aware of uncertainty can avert accidents and failures resulting from erroneous estimations. Furthermore, in regions of high uncertainty, adding training data is helpful to mitigate the lack of knowledge and subsequently reduce epistemic uncertainty.
Familiar techniques for detecting Epistemic uncertainty are Ensembling and Dropout layers. Ensemble methods require training the same network multiple times with random initializations and asking it to predict the exact same input. If a model has never seen a specific input before or that input is very hard to learn, all of these models should predict slightly different answers and the variance of them should be higher than if they were predicting a familiar input. This technique for determining uncertainty is however quite expensive as it requires training multiple copies of the same model.
The technique of using dropout layers is a way of introducing a degree of randomness or unpredictability into our neural networks. Dropout layers are usually adopted as a means of mitigating overfitting since they randomly deactivate various nodes within a layer during training. However, in the context of Epistemic uncertainty, we can go a step further. That is, we can incorporate Dropout layers after each layer in our model, and notably, we can maintain these Dropout layers during testing as well. This approach relies on sampling, which is a computationally intensive procedure. In contrast, at Themis AI, we designed and implemented strategies that are efficient and universally applicable.
One alternative, more successful approach to estimating Epistemic Uncertainty involves Generative modeling. We've already discussed Variational Autoencoders (VAEs) in the previous blog post. If we train a VAE on the same dataset we previously mentioned—composed solely of images of dogs and cats—the latent space of this model would consist of features associated with these animals. When presented with a prototypical image of a dog, the VAE should generate a representation of the dog with minimal reconstruction loss. Now, if we were to input an image of a horse to the VAE, the latent vector representing this horse would likely be incomprehensible to the decoder of this network. The decoder wouldn't possess the knowledge to map this latent vector back to the original input space, resulting in a notably inferior reconstruction and a higher reconstruction loss compared to what happens with a familiar input. This approach transcends the limitations of sampling-intensive methods, aligning with our commitment to developing more versatile strategies for diverse industries and users. Unfortunately, generative modeling can also be computationally intensive. To give an example, consider a situation in which utilizing a variational autoencoder isn't required for your specific task. In this case, training an entire decoder solely for the purpose of estimating epistemic uncertainty would be too costly.
To overcome the limitations of both sampling and generative modeling, at Themis AI we have developed an innovative method to estimate epistemic uncertainty in a reliable and efficient way. This is Evidential learning. Evidential learning produces estimations without relying on generative modeling or sampling. Our perspective on learning centers around an evidence-based process. Recall our earlier discussion on training ensembles: from the ensembles we were getting multiple predictions for the same input which allowed us to calculate variance. The concept behind Evidential learning is based on the idea that those predictions could themselves be drawn from a distribution. This method allows us to learn variance directly by placing priors on the distribution that the evidence comes from. That is, once we determine the parameters of this higher-order evidential distribution, we are able to learn variance (i.e., the measure of epistemic uncertainty) automatically, all without resorting to sampling or generative modeling.
In conclusion, by estimating Aleatoric and Epistemic uncertainty, we can gain insights into the model's level of knowledge and reliability in making predictions and thus improve safety in real-world applications. Whereas Aleatoric uncertainty is irreducible and cannot be eliminated, Epistemic uncertainty can be substantially reduced by adding more relevant data to the training set.