Jump to content

PyTorch/Tensor

50% developed
From Wikibooks, open books for an open world

Tensor

[edit | edit source]

The basic object in PyTorch is tensor. Tensors are similar to numpy matrices with two important additions: they work with CUDA, and they can calculate gradients.

Tensors are created and manipulated similarly to numpy matrices:

>>> a = np.random.rand(10000, 10000).astype(np.float32)
>>> b = np.random.rand(10000, 10000).astype(np.float32)
>>> t = time.time(); c = np.matmul(a, b); time.time()-t
7.447854280471802
>>> a1 = torch.rand(10000, 10000, dtype=torch.float32) # note how torch.rand supports dtype
>>> b1 = torch.rand(10000, 10000, dtype=torch.float32)
>>> t = time.time(); c1 = torch.matmul(a1, b1); time.time()-t
7.758733749389648

All function like np.ones, np.zeros, np.empty and so on, as well as other main functions and arythmeric operators, also present in torch:

   >>> torch.ones(2,2)
   tensor([[1., 1.],
           [1., 1.]])
   >>> torch.ones(2,2, dtype=torch.int32)
   tensor([[1, 1],
           [1, 1]], dtype=torch.int32)
   >>> a=torch.ones(2,2) # or torch.ones((2,2)) which is the same
   >>> b=a+1
   >>> c=a*b
   >>> c.reshape(1,4) # or c.view(1,4) which is the same
   tensor(2., 2., 2., 2.)

For tensors, the function size is a function which returns torch.Size object, rather then a member which is a tuple. It is good, because torch.Size inherits tuple and has some additional operators defined:

>>> a=torch.ones(2,3,4)
>>> a.size()
torch.Size([2, 3, 4])
>>> a.size().numel()
24

The functions sum(), mean() and so on for tensors return not a number but a zero dimensional tensor. Tensor elements are also zero dimensional tensors rather than numbers:

   >>> a = torch.ones(2,2)
   >>> a.sum()
   tensor(4.)
   >>> a.sum().size()
   torch.Size([])
   >>> a.sum().dim() 
   0
   >>> a[0,0]
   tensor(1.)
To convert a zero dimensional tensor to a number, you should explicitly call the function item:
   >>> a.sum().item()
   4.0

Instead of numpy's astype, in torch there is a function to

   >>> a.to(torch.int16)
   tensor([[1, 1],
           [1, 1]], dtype=torch.int16)

The name is changed because the function to can do more than just change element types. It can also move data to and from CUDA, and it works for the wide range of torch datatypes, including neural networks.

Tensors and numpy matrices

[edit | edit source]

Since tensors and numpy matrices are so similar, it would be nice if we could convert them to each other. And we, indeed, can. It is as easy as cake. To convert tensor to matrix, just call numpy method. For the opposite, call torch.tensor constructor:

   >>> a=torch.ones(2,2, dtype=torch.float16)
   >>> a.numpy()
   array([[1., 1.],
          [1., 1.]], dtype=float16)
   >>> b=np.ones((2,2), dtype=np.float16)
   >>> torch.tensor(b)
   tensor([[1., 1.],
           [1., 1.]], dtype=torch.float16)

While you can use PyTorch without CUDA, it accelerates the computations by a factor of 10-20.

Before using CUDA, check whether it is available. Type:

   torch.cuda.is_available()

If it returned False, you may skip the rest of this section.

You may also check the versions of CUDA and cuDNN library:

   >>> torch.version.cuda
   '10.0'
   >>> torch.backends.cudnn.version()
   7401
   >>> torch.backends.cudnn.enabled
   True

Unlike numpy, tensors can be easily moved to and from CUDA memory. In CUDA, you can do almost whatever you can do out of it. If your computer is equipped with CUDA, and you installed the driver (NVIDIA CUDA 10.0 or higher), you can do the following:

cuda = torch.device('cuda')
a = torch.randn(10000, 10000, device=cuda)
b = torch.randn(10000, 10000, device=cuda)
t = time.time(); c = torch.matmul(a, b); print(time.time()-t)

On my computer, the time was 0.4 seconds, which is multiplications per second.

You can easily move tensors to and from CUDA memory with to method

>>> cuda = torch.device('cuda')
>>> cpu = torch.device('cpu')
>>> a = torch.ones(5,5)
>>> b = a.to(cuda) # move to cuda
>>> c = b.to(cpu) # move back to cpu
>>> a.device
device(type='cpu')
>>> b.device
device(type='cuda')
>>> c.device
device(type='cpu')

You cannot mix CUDA and CPU tensors in your expressions:

>>> a+b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: expected backend CPU and dtype Float but got backend CUDA and dtype Float

Autograd

[edit | edit source]

The autograd module implemented into PyTorch makes calculating gradients via backpropagation a piece of cake. You need to specify the requires_grad parameter ("requires" with -s, "grad" without), and call backward method.

>>> a=torch.ones(2,2, requires_grad=True)
>>> b=torch.eye(2,2, requires_grad=True)
>>> c = a*a*(b+1)
>>> d=c.sum() 
>>> d.backward() # calculate gradients
>>> a.grad # gradient of d with respect to a
tensor([[4., 2.],
        [2., 4.]])
>>> b.grad # gradient of d with respect to b
tensor([[1., 1.],
        [1., 1.]])

In-place operators

[edit | edit source]