DEV Community

Cover image for Uncertainty-Aware AI from Multimodal Data: A PyTorch Tutorial with LUMA Dataset
Grigor Bezirganyan
Grigor Bezirganyan

Posted on

Uncertainty-Aware AI from Multimodal Data: A PyTorch Tutorial with LUMA Dataset

We perceive the world in a multimodal manner, combining information from our various senses — such as sight, hearing, smell, touch, and taste — to form a comprehensive understanding of our surroundings. To develop AI models capable of making decisions as well as, or better than, humans, it is essential for these models to also consider multimodal data. Furthermore, AI models must be aware of the confidence levels in their decisions, as incorrect decisions can lead to catastrophic outcomes. In this tutorial, we present a simple guide on how to use the LUMA multimodal dataset to introduce varying levels of uncertainty in the data and estimate the model’s uncertainty.

Uncertainty Quantification

Machine Learning and Deep Learning now drive a wide range of products and applications that we use daily, from image editing software to self-driving cars. These applications often process diverse types of information, including audio, images, text, and sensor data. To build Deep Learning models that perform well, it is crucial to integrate all these types of information during training. We refer to these various forms of data as “data modalities,” and the deep learning models that utilize them are known as Multimodal Deep Learning models.

Similar to conventional deep learning models, Multimodal Deep Learning models also suffer from overconfidence. Overconfidence occurs when a model assigns excessively high probabilities to its predictions, even when they are incorrect. This can often lead to catastrophic results. For example, a confidently wrong prediction in self-driving cars, can lead to injury or death of the passengers, as happened in 2016. To exclude such scenarios, we need to understand how confident really deep learning models are in their predictions. Uncertainty Quantification (UQ) serves this purpose and tries to quantify uncertainties in the data and in the trained model.

Bayesian statistics mostly distinguishes between two types of uncertainties: aleatoric and epistemic. The aleatoric uncertainty refers to the uncertainty inherent in data and cannot be reduced by observing more data. For example, if we look at the image below, we can see that the two classes are mixed, and it is hard to infer what the label of a new point on shall be in the mixed regions. Adding more data, will not make the classification easier.


Aleatoric Uncertainty [left] and Epistemic Uncertainty [right]. Image retrieved from: https://link.springer.com/article/10.1007/s10994-021-05946-3

Epistemic uncertainty, on the other hand, is the uncertainty of the model due to lack of knowledge. For example, in the image above, we see that we don’t have enough data points to confidently say which decision boundary is the best one. In contrast to the aleatoric uncertainty, in this case if we add more data points it can help to acquire additional information and hence, reduce the epistemic uncertainty.

In Multimodal Deep Learning we can have more complex interactions between uncertainties in modalities. It is possible to have a complementary information, which shall reduce the uncertainties, or to have conflicting information, which can increase the uncertainties.

In this blog post we will try to explore different uncertainty scenarios and measure the corresponding uncertainties on LUMA multimodal dataset¹.

LUMA Dataset

We are going to use the LUMA dataset, which allows us to inject different types of noises into each of the modalities and observe the changes in uncertainties. LUMA dataset is comprised of three modalities: audio, image and text. Image modality contains small 32x32 images of different objects. The audio modality contains the pronunciation of the labels of this object, and the text modality contains text passages about the objects. In total there are 50 classes, 42 of which are designed for model training and testing, and another 8 are provided as out-of-distribution data.

First, we need to download and compile the dataset. For that, we need to go to our command line interface (bash in my case), and run the following command, which will clone the LUMA dataset compiler and noise injector:

git clone https://github.com/bezirganyan/LUMA.git 
cd LUMA
Enter fullscreen mode Exit fullscreen mode

Then, we need to install the dependences by creating and activating a conda environment (make sure you have anaconda or miniconda installed):

conda env create -f environment.yml
conda activate luma_env
Enter fullscreen mode Exit fullscreen mode

Having all the dependencies, we can download the dataset to data directory with:

git lfs install
git clone https://huggingface.co/datasets/bezirganyan/LUMA data
Enter fullscreen mode Exit fullscreen mode

Finally, we can compile different dataset versions with different types and amounts of noises in each modality. For compiling the default dataset (i.e. without additional noises), we need to run:

python compile_dataset.py
Enter fullscreen mode Exit fullscreen mode

Now, the LUMA tool allows us to inject different types of noises.

  • Sample Noise — This type of noise adds realistic noise to each of the modalities. For example, for text modality, it can replace words with antonyms, add typo noise, spelling errors, etc. For audio modality, it can add background conversations, typing noises, etc. And for the image modality, noises like blur, defocus, frost, etc., can be added.
  • Label Noise — This type of noise, randomly witches the labels of the data samples to their closest classes, which shall increase the mixture between classes.
  • Diversity — This controls how divers the data points are. If we want to reduce the diversity, then the data points will be more concentrated in the latent space, which means the models will have less information to work with.
  • Out-of-distribution (OOD) sample — The LUMA dataset also provides us with OOD samples, which means that they are samples that are outside the training distribution. Ideally, the ML model shall have high uncertainty on these kinds of samples, so that it doesn’t make a confidently wrong decision on a distribution it hadn’t seen before.


Noise injection pipeline in LUMA Dataset

Let’s separately inject these noises. To control the amount of noises, we can modify (or create) the configuration file in cfg folder. Nevertheless, there are already some pre-configured options available, that we will use. For sample noise, we can make use of pre-defined configuration file cfg/noise_sample.yml. In particular, we can pay attention to theses lines in configuration for each modality:

sample_noise:
  add_noise_train: True
  add_noise_test: True
Enter fullscreen mode Exit fullscreen mode

They turn on or off the sample noise per modality. The lines immediately below, control noise parameters, and are different for each modality. For audio they look like this:

  sample_noise:
    add_noise_train: True
    add_noise_test: True
    noisy_data_ratio: 1
    min_snr: 3
    max_snr: 5
    output_path: data/noisy_audio
Enter fullscreen mode Exit fullscreen mode

where we can control the noisy data ratio (0.0–1.0), minimum and maximum signal-to-noise ratio, and where to save the noisy audio files.

For text, they look like this:

  sample_noise:
    add_noise_train: True
    add_noise_test: True
    noisy_data_ratio: 1
    noise_config: 
      KeyboardNoise:
        aug_char_min: 1
        aug_char_max: 5
        aug_word_min: 3
        aug_word_max: 8
      BackTranslationNoise:
        device: cuda # cuda or cpu
    ...
Enter fullscreen mode Exit fullscreen mode

Here, you can specify noises from: KeyboardNoise, BackTranslationNoise, SpellingNoise, OCRNoise, RandomCharNoise, RandomWordNoise, AntonymNoise. The parameters for each noise can be found here.

Finally, for image modality, the configuration looks like this:

    sample_noise:
        add_noise_train: True
        add_noise_test: True
        noisy_data_ratio: 1
        output_path: data/noisy_images.pth
        noise_config:
          gaussian_noise:
            severity: 4
          shot_noise:
              severity: 4
          impulse_noise:
            severity: 4
Enter fullscreen mode Exit fullscreen mode

You can choose noises from: gaussian_noise, shot_noise, impulse_noise,
defocus_blur, frosted_glass_blur, motion_blur, zoom_blur, snow, frost, fog, brightness, contrast, elastic, pixelate, jpeg_compression. For each of noises, you can specify a severity parameter, which obtains values from 1–5. Below you can see the examples of different noise types for image:

Image noise types. Image retrieved from: https://arxiv.org/pdf/1903.12261

Then, we can compile the datset with sample noise with:

python compile_dataset.py -c cfg/noise_sample.yml
Enter fullscreen mode Exit fullscreen mode

You can of course use any other configuration files.

To add label noise, one only needs to change the label_switch_prob for each modality. As an example, one can look at cfg/noise_label.yml. Finally, for diversity, one needs to change the compactness parameter. The higher the compactness value, the less diverse the data will be. An example of this can be seen in cfg/noise_diversity.yml.

The OOD data for each generation is saved in a separate file specified in the configuration file.

Loading the Dataset in PyTorch

We can use the class from dataset.py to load the dataset in PyTorch.

from dataset import LUMADataset

train_audio_path = 'data/audio/datalist_train.csv'
train_text_path = 'data/text_data_train.tsv'
train_image_path = 'data/image_data_train.pickle'
train_audio_data_path = 'data/audio'

train_dataset = LUMADataset(train_image_path, 
                            train_audio_path, 
                            train_audio_data_path,
                            train_text_path)
Enter fullscreen mode Exit fullscreen mode

Nevertheless, this will return a raw texts, audios and images, which may not be very comfortable to use in our models. Hence, we would like to process this samples before using them in our models and convert them to more convenient formats. For audio we would like to convert the raw audio data to mel-spectrograms. For that we will define a transform as:

from torchvision.transforms import Compose
from torchaudio.transforms import MelSpectrogram
import torch

class PadCutToSizeAudioTransform():
    def __init__(self, size):
        self.size = size

    def __call__(self, audio):
        if audio.shape[-1] < self.size:
            audio = torch.nn.functional.pad(audio, (0, self.size - audio.shape[-1]))
        elif audio.shape[-1] > self.size:
            audio = audio[:, :self.size]
        return audio

audio_transform = Compose([MelSpectrogram(), PadCutToSizeAudioTransform(128)])
Enter fullscreen mode Exit fullscreen mode

Here we use the MelSpectrogram transform, and then use a custom transform to pad/cut the spectrogram into the same size for all samples.

For text data, we choose to use the average Bert embeddings for training. To do that we can extract the text features into a file, and then define a custom transform for loading the embeddings instead of raw text:

from data_generation.text_processing import extract_deep_text_features

extract_deep_text_features(train_text_path, output_path='text_features_train.npy')
class Text2FeatureTransform():
    def __init__(self, features_path):
        with open(features_path, 'rb') as f:
            self.features = np.load(f)

    def __call__(self, text, idx):
        return self.features[idx]

text_transform=Text2FeatureTransform('text_features_train.npy')
Enter fullscreen mode Exit fullscreen mode

For the image modality, we will normalize the images and convert them to tensors:

from torchvision.transforms import ToTensor, Normalize

image_transform = Compose([
    ToTensor(),
    Normalize(mean=(0.51, 0.49, 0.44),
              std=(0.27, 0.26, 0.28))
])
Enter fullscreen mode Exit fullscreen mode

Finally, we will apply these transforms by passing them to the dataset class:

train_dataset = LUMADataset(train_image_path, train_audio_path, train_audio_data_path, train_text_path,
                            text_transform=text_transform,
                            audio_transform=audio_transform,
                            image_transform=image_transform)
Enter fullscreen mode Exit fullscreen mode

We can load test and OOD data in a similar fashion. The final data loading procedure will be:

import torch
from torchaudio.transforms import MelSpectrogram
from torchvision.transforms import Compose, Normalize, ToTensor

from data_generation.text_processing import extract_deep_text_features
from dataset import LUMADataset

train_audio_path = 'data/audio/datalist_train.csv'
train_text_path = 'data/text_data_train.tsv'
train_image_path = 'data/image_data_train.pickle'
audio_data_path = 'data/audio'

test_audio_path = 'data/audio/datalist_test.csv'
test_text_path = 'data/text_data_test.tsv'
test_image_path = 'data/image_data_test.pickle'

ood_audio_path = 'data/audio/datalist_ood.csv'
ood_text_path = 'data/text_data_ood.tsv'
ood_image_path = 'data/image_data_ood.pickle'


class PadCutToSizeAudioTransform():
    def __init__(self, size):
        self.size = size

    def __call__(self, audio):
        if audio.shape[-1] < self.size:
            audio = torch.nn.functional.pad(audio, (0, self.size - audio.shape[-1]))
        elif audio.shape[-1] > self.size:
            audio = audio[:, :self.size]
        return audio


class Text2FeatureTransform():
    def __init__(self, features_path):
        with open(features_path, 'rb') as f:
            self.features = np.load(f)

    def __call__(self, text, idx):
        return self.features[idx]


extract_deep_text_features(train_text_path, output_path='text_features_train.npy')
extract_deep_text_features(test_text_path, output_path='text_features_test.npy')
extract_deep_text_features(ood_text_path, output_path='text_features_ood.npy')

image_transform = Compose([
    ToTensor(),
    Normalize(mean=(0.51, 0.49, 0.44),
              std=(0.27, 0.26, 0.28))
])

text_transform_train = Text2FeatureTransform('text_features_train.npy')
text_transform_test = Text2FeatureTransform('text_features_test.npy')
text_transform_ood = Text2FeatureTransform('text_features_ood.npy')

audio_transform = Compose([MelSpectrogram(), PadCutToSizeAudioTransform(128)])

train_dataset = LUMADataset(train_image_path, train_audio_path, audio_data_path, train_text_path,
                            text_transform=text_transform_train,
                            audio_transform=audio_transform,
                            image_transform=image_transform)

test_dataset = LUMADataset(test_image_path, test_audio_path, audio_data_path, test_text_path,
                           text_transform=text_transform_test,
                           audio_transform=audio_transform,
                           image_transform=image_transform)

ood_dataset = LUMADataset(ood_image_path, ood_audio_path, audio_data_path, ood_text_path,
                          text_transform=text_transform_ood,
                          audio_transform=audio_transform,
                          image_transform=image_transform)
Enter fullscreen mode Exit fullscreen mode

Building Multimodal UQ model

For building the multimodal UQ model, we are going to use a recent multimodal approach based on evidential learning. Evidential deep learning³ is a method that enhances traditional deep learning models by not only making predictions but also providing a measure of uncertainty about those predictions. It leverages principles from Dempster-Shafer theory, a mathematical framework for evidence-based reasoning. This theory allows the model to combine different pieces of evidence to calculate degrees of belief, rather than a single deterministic output. Instead of just giving a single answer, evidential learning outputs a range of possible answers along with the confidence level in each.

Following the ideas presented by Xu et al., (2024), we are going to build evidential networks for each modality and combine them using their proposed conflictive opinion aggregation strategy (RCML⁴). The image classifier, hence, will look like this:

class ImageClassifier(torch.nn.Module):
    def __init__(self, num_classes, dropout=0.3):
        super(ImageClassifier, self).__init__()
        self.image_model = torch.nn.Sequential(
            torch.nn.Conv2d(3, 32, 3),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Conv2d(32, 64, 3),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Flatten(),
        )
        self.classifier = torch.nn.Linear(64 * 6 * 6, num_classes)

    def forward(self, x):
        image, audio, text = x
        image = self.image_model(image.float())
        return self.classifier(image)
Enter fullscreen mode Exit fullscreen mode

Similarly, the audio and text classifiers will be:

class AudioClassifier(torch.nn.Module):
    def __init__(self, num_classes, dropout=0.5):
        super(AudioClassifier, self).__init__()
        self.audio_model = torch.nn.Sequential(  # from batch_size x 1 x 128 x 128 spectrogram
            torch.nn.Conv2d(1, 32, 5),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Conv2d(32, 64, 3),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Conv2d(64, 64, 3),
            torch.nn.ReLU(),
            torch.nn.MaxPool2d(2),
            torch.nn.Dropout(dropout),
            torch.nn.Flatten()
        )
        self.classifier = torch.nn.Linear(64 * 14 * 14, num_classes)

    def forward(self, x):
        image, audio, text = x
        audio = self.audio_model(audio)
        return self.classifier(audio)

class TextClassifier(torch.nn.Module):
    def __init__(self, num_classes, dropout=0.5):
        super(TextClassifier, self).__init__()
        self.text_model = torch.nn.Sequential(
            torch.nn.Linear(768, 512),
            torch.nn.ReLU(),
            torch.nn.Dropout(dropout),
            torch.nn.Linear(512, 256),
            torch.nn.ReLU(),
            torch.nn.Dropout(dropout),
        )
        self.classifier = torch.nn.Linear(256, num_classes)

    def forward(self, x):
        image, audio, text = x
        text = self.text_model(text)
        return self.classifier(text)
Enter fullscreen mode Exit fullscreen mode

Having these uni-modal classifiers, we will combine them into a multimodal network:

class MultimodalClassifier(torch.nn.Module):
    def __init__(self, num_classes, dropout=0.5):
        super(MultimodalClassifier, self).__init__()
        self.image_model = ImageClassifier(num_classes, dropout)
        self.audio_model = AudioClassifier(num_classes, dropout)
        self.text_model = TextClassifier(num_classes, dropout)

    def forward(self, x):
        image_outputs = self.image_model(x)
        audio_outputs = self.audio_model(x)
        text_outputs = self.text_model(x)

        image_logits = torch.nn.functional.softplus(image_outputs)
        audio_logits = torch.nn.functional.softplus(audio_outputs)
        text_logits = torch.nn.functional.softplus(text_outputs)
        logits = [image_logits, audio_logits, text_logits]
        agg_logits = image_logits
        for i in range(1, 3):
            agg_logits = (agg_logits + logits[i])/2
        return agg_logits, (image_logits, audio_logits, text_logits)
Enter fullscreen mode Exit fullscreen mode

Here we use the softplus function, since in the evidential networks, evidences shall be non-negative numbers. The diagram of the architecture can be seen in the image below:


The Architecture of the Multimodal Classifier. It takes input from the 3 modalities, and provides the prediction based on the information form those modalities. The Fusion is performed based on the RCML approach discussed above.

To make our training easier, we are going to use the PyTorch Lightning framework. For that, we need to define another lightning class:

import numpy as np
import pytorch_lightning as pl
import torch
from torchmetrics import Accuracy

from baselines.utils import AvgTrustedLoss

class DirichletModel(pl.LightningModule):
    def __init__(self, model, num_classes=42, dropout=0.):
        super(DirichletModel, self).__init__()
        self.num_classes = num_classes
        self.model = model(num_classes=num_classes, monte_carlo=False, dropout=dropout, dirichlet=True)
        self.train_acc = Accuracy(task='multiclass', num_classes=num_classes)
        self.val_acc = Accuracy(task='multiclass', num_classes=num_classes)
        self.test_acc = Accuracy(task='multiclass', num_classes=num_classes)
        self.criterion = AvgTrustedLoss(num_views=3)
        self.aleatoric_uncertainties = None
        self.epistemic_uncertainties = None

    def forward(self, inputs):
        return self.model(inputs)

    def training_step(self, batch, batch_idx):
        loss, output, target = self.shared_step(batch)
        self.log('train_loss', loss)
        acc = self.train_acc(output, target)
        self.log('train_acc_step', acc, prog_bar=True)
        return loss

    def shared_step(self, batch):
        image, audio, text, target = batch
        output_a, output = self((image, audio, text))
        output = torch.stack(output)
        loss = self.criterion(output, target, output_a)
        return loss, output_a, target

    def validation_step(self, batch, batch_idx):
        loss, output, target = self.shared_step(batch)
        self.val_acc(output, target)
        alphas = output + 1
        probs = alphas / alphas.sum(dim=-1, keepdim=True)
        entropy = self.num_classes / alphas.sum(dim=-1)
        alpha_0 = alphas.sum(dim=-1, keepdim=True)
        aleatoric_uncertainty = -torch.sum(probs * (torch.digamma(alphas + 1) - torch.digamma(alpha_0 + 1)), dim=-1)
        return loss, output, target, entropy, aleatoric_uncertainty

    def test_step(self, batch, batch_idx):
        loss, output, target = self.shared_step(batch)
        self.test_acc(output, target)
        alphas = output + 1
        probs = alphas / alphas.sum(dim=-1, keepdim=True)
        entropy = self.num_classes / alphas.sum(dim=-1)
        alpha_0 = alphas.sum(dim=-1, keepdim=True)
        aleatoric_uncertainty = -torch.sum(probs * (torch.digamma(alphas + 1) - torch.digamma(alpha_0 + 1)), dim=-1)
        return loss, output, target, entropy, aleatoric_uncertainty

    def training_epoch_end(self, outputs):
        self.log('train_acc', self.train_acc.compute(), prog_bar=True)
        self.criterion.annealing_step += 1

    def validation_epoch_end(self, outputs):
        self.log('val_acc', self.val_acc.compute(), prog_bar=True)
        self.log('val_loss', np.mean([x[0].detach().cpu().numpy() for x in outputs]), prog_bar=True)
        self.log('val_entropy', torch.cat([x[3] for x in outputs]).mean(), prog_bar=True)
        self.log('val_sigma', torch.cat([x[4] for x in outputs]).mean(), prog_bar=True)

    def test_epoch_end(self, outputs):
        self.log('test_acc', self.test_acc.compute(), prog_bar=True)
        self.log('test_entropy_epi', torch.cat([x[3] for x in outputs]).mean())
        self.log('test_ale', torch.cat([x[4] for x in outputs]).mean())
        self.aleatoric_uncertainties = torch.cat([x[4] for x in outputs]).detach().cpu().numpy()
        self.epistemic_uncertainties = torch.cat([x[3] for x in outputs]).detach().cpu().numpy()

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=1e-2)
        scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.33, patience=5,
                                                               verbose=True)
        return {
            'optimizer': optimizer,
            'lr_scheduler': scheduler,
            'monitor': 'val_loss'
        }
Enter fullscreen mode Exit fullscreen mode

Here we predict the correct class of the network, and also compute the aleatoric and epistemic uncertainties.

Training the Multimodal Model

For training we just need to define dataloaders, and use PyTorch Lightning Trainer class for training.

batch_size = 128
classes = 42
dropout_p = 0.3
train_dataset, val_dataset = torch.utils.data.random_split(train_dataset, [int(0.8 * len(train_dataset)),
                                                                           len(train_dataset) - int(
                                                                               0.8 * len(train_dataset))])

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=8)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=8)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=8)
ood_loader = torch.utils.data.DataLoader(ood_dataset, batch_size=batch_size, shuffle=False, num_workers=8)

# Now we can use the loaders to train a model


model = DirichletModel(MultimodalClassifier, classes, dropout=dropout_p)
trainer = pl.Trainer(max_epochs=300,
                     gpus=1 if torch.cuda.is_available() else 0,
                     callbacks=[pl.callbacks.EarlyStopping(monitor='val_loss', patience=10, mode='min'),
                                pl.callbacks.ModelCheckpoint(monitor='val_loss', mode='min', save_last=True)])
trainer.fit(model, train_loader, val_loader)
print('Testing model')
trainer.test(model, test_loader)
print('Test results:')
print(trainer.callback_metrics)
aleatoric_uncertainties = model.aleatoric_uncertainties
epistemic_uncertainties = model.epistemic_uncertainties
print('Testing OOD')
trainer.test(model, ood_loader)
aleatoric_uncertainties_ood = model.aleatoric_uncertainties
epistemic_uncertainties_ood = model.epistemic_uncertainties
auc_score = roc_auc_score(
    np.concatenate([np.zeros(len(epistemic_uncertainties)), np.ones(len(epistemic_uncertainties_ood))]),
    np.concatenate([epistemic_uncertainties, epistemic_uncertainties_ood]))
print(f'AUC score: {auc_score}')
Enter fullscreen mode Exit fullscreen mode

Here we are logging the classification accuracy, the average uncertainty values and the AUC score for OOD detection.

For training on the noisy versions of the datasets, we just need to change the data paths to noisy data paths.

Training Results

On the clean data (without injecting additional noise), we get the following results:

UQ Results

As we can see, adding noise effectively raises the uncertainty metrics. An interesting research direction, hence, is to adjust the noise levels and see how the uncertainties change. It is essential not only to build DL models robust to these noises but find UQ methods that reliably can indicate when the models are unsure about their predictions.

Acknowledgements

This blog post is written based on the code and dataset of LUMA, published within the scope of my PhD thesis at Aix-Marseille University (AMU), CNRS, LIS. I would like to mention and thank my PhD Supervisors and paper co-authors Sana Sellami (AMU, CNRS, LIS), Laure Berti-Équille (IRD, ESPACE-DEV), and Sébastien Fournier (AMU, CNRS, LIS).

If you liked this port, please star LUMA at GitHub. We will be happy to hear you thoughts, questions or suggestions in the discussion below.


[1] Bezirganyan, G., Sellami, S., Berti-Équille, L., & Fournier, S. (2024). LUMA: A Benchmark Dataset for Learning from Uncertain and Multimodal Data. http://arxiv.org/abs/2406.09864 arXiv:2406.09864

[2] Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. North American Chapter of the Association for Computational Linguistics.

[3] Sensoy, M., Kandemir, M., & Kaplan, L.M. (2018). Evidential Deep Learning to Quantify Classification Uncertainty. ArXiv, abs/1806.01768.

[4] Xu, C., Si, J., Guan, Z., Zhao, W., Wu, Y., & Gao, X. (2024). Reliable Conflictive Multi-View Learning. AAAI Conference on Artificial Intelligence.

Top comments (0)