Open Source software: PyTorch features and how to use them

Make deep learning easier with these PyTorch features, tips and best practices.

PyTorch is an open-source machine learning framework that enables developers to create and train neural networks, accelerating the path from research to production. As an industry-leading technology company, Capital One values technologies like PyTorch for their ability to drive innovation, improve efficiency and enhance customer experiences.

What is PyTorch?

PyTorch is the brainchild of Facebook's AI Research (FAIR) lab, designed to make the development of deep learning models effortless and efficient. Its flexibility and high performance stem from the way it harnesses the power of GPUs and its dynamic computation graph. PyTorch has gained popularity for its intuitive interface, ease of debugging and seamless integration within the wider Python ecosystem.

The PyTorch machine learning framework accelerates the development process, enabling rapid prototyping and iteration. This agility is crucial in competitive fields that demand constant innovation. It also has strong community support and a robust library of pre-trained models and tools. Coupled with its efficient scaling and extensive integrations with existing systems, PyTorch is an excellent choice for a range of machine learning applications, from fraud detection to personalized customer experiences.

Explore #LifeAtCapitalOne

Innovate. Inspire. Feel your impact from day one.

Tensors

Tensors are multidimensional arrays that provide the foundation for data representation and manipulation in PyTorch. Understanding the nature of tensors, their relationship with NumPy arrays, and the various tensor operations that are available will help you handle data more efficiently in deep learning tasks.

What are tensors and how do they differ from NumPy arrays?

Tensors are the primary building blocks of PyTorch. They're multidimensional arrays that enable you to perform mathematical operations on the contained data. However, there are a few key differences when comparing tensors and NumPy arrays.

For starters, tensors are better at leveraging GPU capabilities for faster calculations, which has considerable performance impacts. However, the biggest difference is that they support automatic differentiation—a must-have for training deep learning models.

Tensor operations and mathematical functions available in PyTorch

PyTorch comes with a comprehensive suite of tensor operations and mathematical functions. They play a crucial role in the optimization process because they provide information on how to adjust the model's parameters to minimize the loss function. This also includes element-wise operations, linear algebra functions and reductions.

Gradients

In the realm of deep learning, gradients are indispensable for optimizing model parameters. The process of determining gradients helps identify how to update parameters to minimize the loss function. This is the essence of gradient descent, the engine propelling deep learning model training.

What are gradients and why are they important?

In deep learning, gradients are partial derivatives of a loss function relative to a model's parameters. The loss function measures the discrepancy between the model’s predictions and the ground truth labels for a given dataset. The goal is to minimize this loss function so the model learns to make predictions with higher accuracy.

Gradients are helpful in this context because they indicate the direction and magnitude of the changes necessary to minimize the loss function and improve accuracy. In essence, a gradient demonstrates how sensitive the loss function is to small parameter changes. If the gradient is positive for a specific parameter, increasing the parameter will increase the loss function, whereas decreasing the parameter will decrease the loss function. If the gradient is negative, the results of these same adjustments are inverted.

How to compute and use gradients in PyTorch

In machine learning, the iterative process of training a model involves repeatedly adjusting the model's parameters based on the input data and the desired output, in order to minimize a loss function. Backpropagation, short for backward propagation of errors, is an algorithm commonly used to compute the gradients of the loss function. Once gradients are calculated, an optimization algorithm such as gradient descent or one of its variants is leveraged to update the model's parameters. This process continues until a predefined threshold is met, such as reaching a certain number of epochs or level of validation loss. 

Autograd

Autograd streamlines the process of computing gradients for tensor operations, simplifying the training of deep learning models. By tracking tensor operations, autograd allows developers to focus on designing and implementing deep learning models without the need to manually calculate gradients.

PyTorch's automatic differentiation engine

In contrast to other deep learning frameworks that use static computation graphs, PyTorch's autograd feature builds a dynamic computation graph during the forward pass. This means that the graph is constructed on-the-fly as you perform tensor operations, which allows for more flexibility and ease of use, especially when working with variable-length inputs or complex control structures.

How to use autograd to compute gradients

To utilize autograd for gradient computation in PyTorch, create a tensor and set the requires_grad attribute to True. This informs PyTorch to track the tensor’s operations.

    import torch x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
  

Next, define a function involving the tensor and compute its value so that autograd can monitor the tensor operations, then call the backward() method on the output scalar to compute gradients.

    y = x * 2 z = y.mean() z.backward()
  

Finally, the computed gradients are accessible via the grad attribute of the input tensor:

    print(x.grad)
  

For more details, explore the official PyTorch autograd tutorial.

Neural networks

Neural networks are computational models inspired by the structure and function of biological neural networks. They’re designed to recognize patterns and perform complex tasks by analyzing and learning from raw data. With PyTorch, building and training neural networks is easy.

Creating and training neural networks with PyTorch

In PyTorch, neural networks are defined using the torch.nn module, which provides pre-built layers and other utilities for building complex neural networks. To define a network, simply create a class that inherits from torch.nn and implement the forward method for the desired computation. You can use the following code to get started:

    import torch.nn as nn
import torch.nn.functional as F 

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))
  

To train a network using PyTorch, define a loss function and optimizer, and then iteratively update the model parameters using gradient descent.

Modules and layers in PyTorch for building neural networks

PyTorch’s modules offer a variety of pre-built layers and activation functions that offer functionality used during the development and training of deep learning models. 

A few of the more common modules and layers are:

  • Linear layer using nn.Linear

    Represents a fully connected layer (also known as a dense layer) that applies a linear transformation to the input data
  • Convolution layer using nn.Conv2d

    Represents a 2D convolutional layer that applies a convolution operation on input data, typically used for processing images or other grid-like data
  • Rectified linear unit using nn.ReLU

    This is a function that applies the element-wise ReLU function, defined as ReLU(x) = max(0, x)
  • Hyperbolic tangent using nn.Tanh

    Applies the element-wise hyperbolic tangent function defined as tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))

Developers can combine these layers and functions in a modular fashion to create complex architectures. Additionally, nn.Module can be subclassed to enable the use of computations in the forward method.

Data loading and processing

Efficient data loading and processing is vital for deep learning projects. PyTorch provides valuable utilities and tools for handling various data types.

PyTorch’s utilities for loading preprocessing data

To load and preprocess data, PyTorch has the torchvision package as well as the DataLoader class. The former contains utilities for handling image data and includes popular datasets, transformations and models, while the latter offers a convenient way to load data in parallel.

How to create custom datasets and data loaders in PyTorch

Defining a custom dataset allows you to load and preprocess data from various sources, such as CSV files or image directories, and use it for training and testing machine learning models. To create custom datasets from your data, create a class that inherits torch.utils.Dataset. This will represent the custom dataset. The class requires defining two functions, shown in the example below. You can use the torch.utils.DataLoader class for loading.

    from torch.utils.data import Dataset, DataLoader

class CustomDataset(Dataset): 
def __init__(self, data, labels, transform=None): 
self.data = data self.labels = labels
self.transform = transform 

def __len__(self): 
return len(self.data) 

def __getitem__(self, idx): 
sample = self.data[idx] 
label = self.labels[idx]
  

In this example, the custom dataset class takes in two main arguments: data and labels, which are the input data and corresponding labels respectively. The optional argument transform can be applied to the data before it is returned by the getitem() method.

The init() method initializes the class by storing the input data, labels and optional transform argument. The len() method returns the length of the dataset, which is the number of samples in the data. The getitem() method works by first retrieving the sample and its corresponding label at the given index, then applies any transforms specified by self.transform to the sample, and finally returns the transformed sample along with its label.

Next, create an instance of the custom dataset by providing the data and labels. The torchvision.transforms module provides any necessary data transformations.

    data = torch.randn(100, 3, 32, 32)
labels = torch.randint(0, 10, (100,)) 
custom_dataset = MyCustomDataset(data, labels)
  

Then, create a data loader using a instantiate torch.utils.data.DataLoader class and passing the custom dataset to it. This requires specifying the batch size, shuffle option and number of worker processes.

    from torch.utils.data import DataLoader batch_size = 8 
shuffle = True 
num_workers = 2 
data_loader = DataLoader(custom_dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)
  

Keep in mind that this example assumes there are already data and labels in memory. This might also require reading and preprocessing data from other sources, such as images, text, or audio. This is done using the getitem() method of the Dataset class.

Visualization

Visualization is invaluable for understanding data and model architectures as well as for following training progress. It's an essential tool in any data science practitioner's tool belt.

Tools in PyTorch for visualizing data and models

PyTorch doesn't provide any built-in visualization tools, but it does integrate seamlessly with other popular libraries, such as Matplotlib and TensorBoard. Matplotlib is a versatile plotting library for Python, while TensorBoard is a powerful visualization tool designed specifically for deep learning.

How to visualize training progress and results using PyTorch

To visualize training progress with PyTorch, simply log relevant metrics, such as loss and accuracy, at each iteration or epoch, and then, use TensorBoard or Matplotlib to plot metrics over time.

The 60 Minute Blitz in the PyTorch documentation is an easy-to-follow guide to integrating TensorBoard. Meanwhile, integrating PyTorch with Matplotlib is also a straightforward process, since tensors are easily converted to NumPy arrays.

Distributed training

Distributed training, essential for handling large-scale deep learning projects, leverages PyTorch's capabilities to train models across multiple GPUs or machines.

PyTorch's distributed training capabilities

PyTorch supports distributed training to harness the power of multiple GPUs or machines, significantly reducing training time. The torch.distributed package provides a variety of communication backends, such as gloo, nccl and mpi, enabling seamless distributed training across diverse hardware setups.

How to use PyTorch to train models on multiple GPUs or on a cluster

To train models using multiple GPUs, wrap each model with nn.DataParallel or nn.parallel.DistributedDataParallel. For cluster training, initialize the distributed process group with torch.distributed.init_process_group() and specify the desired backend settings.

This also requires the DataLoader subclass to use DistributedSampler for data partitioning, which ensures that each worker processes a unique data subset. Lastly, adjust the training loop to synchronize gradients and update model parameters across all workers.

Deployment

Once a model is developed and trained, the final step is deploying it to a production environment and integrating it with other tools so it can work with new data and make predictions.

Options for deploying PyTorch models in production

Deploying PyTorch models can be accomplished with various methods, though PyTorch's built-in torch.jit module is one option. This compiles models to TorchScript, a statically typed subset of Python that offers efficient execution in production environments. You can also deploy models using cloud platforms, such as AWS SageMaker, Google Cloud AI Platform or Microsoft Azure ML.

Converting PyTorch models to other formats for deployment

In some cases, it's necessary to convert PyTorch models to other formats for deployment. The Open Neural Network Exchange is a popular choice, since it provides compatibility with other inference engines, such as TensorFlow or TensorRT. To ready a PyTorch model for this approach, use the torch.onnx.export() function.

Aside from deployment and conversion options, deploying a neural network model involves several other considerations and optimizations, such as serving, hardware acceleration, monitoring, versioning and security. Follow best practices and consider factors such as performance, resource usage and scalability to ensure success.

Embracing the power of PyTorch

From its expressive tensors and autograd engine to distributed training and deployment capabilities, PyTorch offers a versatile and powerful ecosystem for deep learning practitioners. Its dynamic computation graph, user-friendly interface and seamless integration with the Python world have propelled it to the forefront of AI research and development.

Explore tech careers at Capital One

As PyTorch continues to evolve, its rich feature set and thriving community support should drive groundbreaking innovations across various industries. If you’re excited about harnessing the power of AI and deep learning to revolutionize the financial industry, consider exploring the wide range of tech careers available at Capital One. With Capital One Tech supporting your growth, you can unlock the full potential of using tools like PyTorch to shape the future of machine learning.


Capital One Tech

Stories and ideas on development from the people who build it at Capital One.

Related Content

scattered lines of blue light and 1s and 0s
Article | May 19, 2021
hand holding a gold compass
Article | April 24, 2023 |6 min read
blue and white holographic charts and fingerprints levitating over a black phone screen. finger is touching the screen and shades of blue are around it
Article | April 28, 2020