DEV Community

Cover image for What is a tensor
Stephen Collins
Stephen Collins

Posted on

What is a tensor

Used as the building block of machine learning models, a tensor is an N-dimensional container of data. Two very common Python machine learning libraries, PyTorch and Tensorflow, have their own object-oriented abstractions of tensors that are used to build PyTorch and Tensorflow machine learning models, respectively. In this blog post, we will cover basic tensor operations (in the context of the Python PyTorch and Tensorflow libraries), along with examples of using tensors in our own machine learning code here at Crypto Clamor.

Basic tensor operations

Tensors in both PyTorch and Tensorflow have a variety of operations available to use in various computations. One of these common operations makes use of operator overloading.

Operator overloading

Tensors in both PyTorch and Tensorflow make heavy use of Python's operator overloading functionality. Operator overloading is the ability of a programming language to override default operators (e.g., "+" and "-") to provide custom functionality to a class.

For example, to "add 1" to a tensor with Tensorflow with 1 dimension:

import tensorflow as tf

# initialize a 1-D tensor
rank_1_tensor = tf.constant([1,1,1])

# add "1" to each element in the rank_1_tensor
x = rank_1_tensor + 1
print(x)
# log output:
# tf.Tensor([2 2 2], shape=(3,), dtype=int32)
Enter fullscreen mode Exit fullscreen mode

and in PyTorch:

import torch

# initialize a 1-D tensor
rank_1_tensor = torch.tensor([1,1,1])

# add "1" to each element in the rank_1_tensor
x = rank_1_tensor + 1
print(x)
# log output:
# tensor([2, 2, 2])
Enter fullscreen mode Exit fullscreen mode

Examples of tensor usage in a machine learning model

Here at Crypto Clamor, one example of using tensors we can share is from how we initially fine-tuned our BERT model. We used a labeled (meaning, tweet text with associated sentiment score) csv data file to load into a Tensorflow dataset. This dataset we are using is a iterable data structure, where each element is a batch and each batch is a tuple of tensors.

import tensorflow as tf
from transformers import BertTokenizer, TFBertForSequenceClassification, InputFeatures
import pandas as pd
import matplotlib.pyplot as plt
import tarfile
import os

CSV_PATH = './labeled_model_data.csv'

NUM_EPOCHS = 10
DATASET_SIZE = 2500
BATCH_SIZE = 25
AUTOTUNE = tf.data.experimental.AUTOTUNE

dataset = tf.data.experimental.make_csv_dataset(
    CSV_PATH,
    batch_size=BATCH_SIZE,
    column_names=['score','timestamp', 'datestring', 'N/A', 'user', 'tweet'],
    label_name='score',
    select_columns=['score', 'tweet'],
    num_epochs=NUM_EPOCHS,
    header=False,
    shuffle_seed=0,
    shuffle=True,
    num_rows_for_inference=1600000,
    ignore_errors=True,).prefetch(AUTOTUNE)
Enter fullscreen mode Exit fullscreen mode

Here, we are creating a Tensorflow dataset using the experimental API's make_csv_dataset function. The output stored in our dataset variable is a Tensorflow Dataset, where each element in the tensorflow dataset is a batch (of size set to batch_size). Each element (that is, each batch) in the dataset is comprised of a tuple (features, labels) where features is a Tensor containing the corresponding feature data, and labels is a Tensor containing the batch's corresponding label data.

Conclusion

This blog post has just barely scratched the surface of what tensors are, and especially how they are used in building machine learning models. Hopefully you've learned a thing or two from this blog post on tensors.

Questions or comments? Connect with us on Twitter, LinkedIn or Facebook!

Top comments (0)