Uncertainty in Deep Learning, p.1 (Intro)
đźš¨ đźš§ Under Construction đźš§đźš¨
Introduction
The ability of deep neural networks to produce useful predictions is now abundantly clear in hundreds of different applications and tasks across dozens of absolutely different domains, which are constantly being improved and developed and continue to reshape our world in many ways. In the rapidly evolving landscape of deep learning, one oftenoverlooked aspect which is being overshadowed by the unprecedented accuracy and flexibility of neural networks is the modelâ€™s confidence in its predictionsâ€”its uncertainty. Along with predictions made by the model \(\textbf{y}\) for an input \(\textbf{x}\), it is often crucial to assess how confident the model is in these predictions and then use this information for other downstream applications.
The problem is not just of interest to academia, but often brings significant value to practical applications ranging from medical to selfdriving, which we will cover further in this series. One of the most important areas where uncertainty methods could be used is in safetycritical applications, where we must guarantee the accuracy of the predictions made by the network. As an example, consider the 2016 [incident] where an autonomous driving system tragically misidentified the side of a trailer as the bright sky, or the misclassification of individuals in images leading to societal outrage. These are not just failures in object recognition but failures in assessing the modelâ€™s own confidence.
Another example involves the numerous regulations and guidelines that apply to medical diagnoses made using neural networks and other forms of artificial intelligence. The specific regulations can vary significantly by country and region, but several key principles are generally observed worldwide. Regulatory bodies often impose requirements related to the performance of medical devices, which can include metrics like false positive rates. These can be achieved by rejecting unconfident and potentially erroneous predictions.
In the context of large language models (LLMs), addressing alignment, hallucinations, and explainable AI (xAI) are critical for enhancing their trustworthiness and effectiveness. Alignment ensures that LLMs operate in ways that resonate with human values and intentions, essential for preventing adverse outcomes. This is highlighted in Bender and Gebruâ€™s work on data and model biases in AI systems. Hallucinations, where models generate incorrect or nonsensical outputs, are particularly concerning. Research by McCoy et al. underscores the impact of hallucinations in undermining model reliability, especially when factual accuracy is crucial. Additionally, the field of explainable AI, as explored by Guidotti et al., aims to demystify complex model decisions, making them transparent and understandable to users. This transparency is vital in sectors like healthcare and finance, where understanding the reasoning behind model decisions is paramount for acceptance and ethical application.
As we will explore further, overrelying on neural networks can be a rookie mistake if not managed properly, leading to unpleasant or catastrophic consequences. Blindly trusting the technology without understanding its inner workings and limitations is a dangerous game that will eventually be lost. This is one of the reasons why the topic of model uncertainty and interpretability is becoming increasingly relevant and popular, gaining significant attention from major players such as [OpenAI], [xAI], [Google], [DeepMind], and many others.
In this series, I will cover the topic of uncertainty estimation for deep learning, current challenges and approaches to the task, discuss its applications, and modern developments. I aim to delve deeply into many of these topics, so the reader should expect a considerable amount of formulas and technical details (hopefully, this does not scare you away). The major reason for me to write this series is to share my accumulated knowledge of the topic with you and also demystify some popular misconceptions and misunderstandings. Since the text above should have given you a good answer to the question â€śwhy?â€ť care about uncertainty, what follows will address the â€śhow?â€ť. In this part, we will cover the formal definition of the tasks, discuss the forms that uncertainty estimation can take, and touch on the topic of probabilistic modeling and its connection to modern neural networks.
Probabilistic Modelling
Probabilistic modeling in machine learning is a mathematical framework used to make predictions about future events or unknown data. It incorporates the use of probability to model the uncertainty inherent in realworld phenomena. Hereâ€™s a basic explanation using some formulas:

Model Definition: A probabilistic model describes the likelihood of outcomes given certain input data. It typically involves defining a probability distribution that models your data. For example, you might assume a normal distribution for your data points:
\[P(x \mid \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{\frac{(x\mu)^2}{2\sigma^2}}\]Here, \(P(x \mid \mu, \sigma^2)\) is the probability of observing a data point \(x\) given the parameters \(\mu\) (mean) and \(\sigma^2\) (variance).

Parameter Estimation: The goal is often to estimate the parameters of the model that best fit the observed data. This can be done using methods like maximum likelihood estimation (MLE). For the normal distribution example, MLE would maximize the likelihood function:
\[L(\mu, \sigma^2 \mid x_1, x_2, \ldots, x_n) = \prod_{i=1}^n P(x_i \mid \mu, \sigma^2)\]This product of probabilities indicates the likelihood of observing the data set \(\{x_1, x_2, \ldots, x_n\}\) given the parameters \(\mu\) and \(\sigma^2\).

Prediction: Once the model parameters are estimated, predictions for new, unseen data are made by computing the probability of different outcomes. For example, the probability of a new data point ( x ) given the estimated parameters \(\hat{\mu}\) and \(\hat{\sigma^2}\) can be computed as:
\[P(x \mid \hat{\mu}, \hat{\sigma^2})\] 
Incorporating Prior Knowledge (Bayesian Approach): In some cases, you might use a Bayesian approach to include prior knowledge about the parameters through a prior distribution. The posterior distribution, which updates beliefs about the parameters after seeing the data, is calculated using Bayesâ€™ theorem:
\[P(\mu, \sigma^2 \mid x) = \frac{P(x \mid \mu, \sigma^2) P(\mu, \sigma^2)}{P(x)}\]Here, \(P(\mu, \sigma^2)\) is the prior, and \(P(x)\) is a normalizing constant.
In summary, probabilistic modeling in machine learning deals with using probability distributions and statistical methods to estimate the likelihood of various outcomes, integrating both data and uncertainty in predictions. This approach is fundamental in fields where uncertainty is intrinsic, such as in weather forecasting, market prediction, or medical diagnostics.
Types of Uncertainty
In the realm of uncertainty estimation, it is essential to distinguish between two primary types: aleatoric and epistemic uncertainty. Both types play critical roles in the analysis and modeling of various scientific and practical phenomena, but they originate from fundamentally different sources and require distinct approaches to manage. Understanding these differences is crucial for researchers and practitioners as they develop models and make decisions based on uncertain information. This insight forms the foundation for effectively dealing with uncertainty in numerous fields, from artificial intelligence to risk assessment. Letâ€™s explore each type in more detail to grasp their implications and management strategies.
Aleatoric Uncertainty
Aleatoric uncertainty, derived from the Latin word â€śaleaâ€ť meaning dice, refers to the inherent randomness or variability in a system or process that cannot be reduced even with more information. This type of uncertainty is often contrasted with epistemic uncertainty, which can be mitigated through additional data or improved understanding. In practical terms, aleatoric uncertainty manifests in phenomena such as the variability in sensor measurements due to noise, or the unpredictability in outcomes like coin tosses. It is intrinsic to the system being observed and must be managed rather than eliminated. For example, in statistical modeling and machine learning, aleatoric uncertainty is typically addressed by integrating it into predictive models, enabling them to acknowledge and quantify the randomness inherent in observations or results. This is crucial for making informed decisions under uncertainty, particularly in fields like finance, meteorology, and various engineering disciplines.
Epistemic Uncertainty
Epistemic uncertainty, often referred to as systematic uncertainty, stems from a lack of knowledge or incomplete information about a system or process. Unlike aleatoric uncertainty, which is inherent and irreducible, epistemic uncertainty can be diminished or resolved through further research, data collection, or advancements in technology. This type of uncertainty is prevalent in scenarios where the models or theories used to predict outcomes are underdeveloped, or where there are gaps in our understanding. For instance, in climate science, epistemic uncertainties may arise due to incomplete knowledge about certain climate processes or interactions. Addressing epistemic uncertainty is crucial for refining predictive models and improving accuracy. By enhancing our knowledge base and refining our theoretical frameworks, we can reduce epistemic uncertainty, leading to more reliable and actionable insights across various domains such as medicine, environmental science, and economic forecasting.