PyTorch/Tensor
Tensor
[edit | edit source]The basic object in PyTorch is tensor. Tensors are similar to numpy matrices with two important additions: they work with CUDA, and they can calculate gradients.
Tensors are created and manipulated similarly to numpy matrices:
>>> a = np.random.rand(10000, 10000).astype(np.float32) >>> b = np.random.rand(10000, 10000).astype(np.float32) >>> t = time.time(); c = np.matmul(a, b); time.time()-t 7.447854280471802
>>> a1 = torch.rand(10000, 10000, dtype=torch.float32) # note how torch.rand supports dtype >>> b1 = torch.rand(10000, 10000, dtype=torch.float32) >>> t = time.time(); c1 = torch.matmul(a1, b1); time.time()-t 7.758733749389648
All function like np.ones, np.zeros, np.empty and so on, as well as other main functions and arythmeric operators, also present in torch:
>>> torch.ones(2,2) tensor([[1., 1.], [1., 1.]]) >>> torch.ones(2,2, dtype=torch.int32) tensor([[1, 1], [1, 1]], dtype=torch.int32) >>> a=torch.ones(2,2) # or torch.ones((2,2)) which is the same >>> b=a+1 >>> c=a*b >>> c.reshape(1,4) # or c.view(1,4) which is the same tensor(2., 2., 2., 2.)
For tensors, the function size
is a function which returns torch.Size
object, rather then a member which is a tuple. It is good, because torch.Size inherits tuple and has some additional operators defined:
>>> a=torch.ones(2,3,4) >>> a.size() torch.Size([2, 3, 4]) >>> a.size().numel() 24
The functions sum(), mean() and so on for tensors return not a number but a zero dimensional tensor. Tensor elements are also zero dimensional tensors rather than numbers:
>>> a = torch.ones(2,2) >>> a.sum() tensor(4.) >>> a.sum().size() torch.Size([]) >>> a.sum().dim() 0 >>> a[0,0] tensor(1.)
To convert a zero dimensional tensor to a number, you should explicitly call the function item
:
>>> a.sum().item()
4.0
Instead of numpy's astype
, in torch there is a function to
>>> a.to(torch.int16) tensor([[1, 1], [1, 1]], dtype=torch.int16)
The name is changed because the function to
can do more than just change element types. It can also move data to and from CUDA, and it works for the wide range of torch datatypes, including neural networks.
Tensors and numpy matrices
[edit | edit source]Since tensors and numpy matrices are so similar, it would be nice if we could convert them to each other. And we, indeed, can. It is as easy as cake. To convert tensor to matrix, just call numpy
method. For the opposite, call torch.tensor
constructor:
>>> a=torch.ones(2,2, dtype=torch.float16) >>> a.numpy() array([[1., 1.], [1., 1.]], dtype=float16) >>> b=np.ones((2,2), dtype=np.float16) >>> torch.tensor(b) tensor([[1., 1.], [1., 1.]], dtype=torch.float16)
CUDA
[edit | edit source]While you can use PyTorch without CUDA, it accelerates the computations by a factor of 10-20.
Before using CUDA, check whether it is available. Type:
torch.cuda.is_available()
If it returned False, you may skip the rest of this section.
You may also check the versions of CUDA and cuDNN library:
>>> torch.version.cuda '10.0' >>> torch.backends.cudnn.version() 7401 >>> torch.backends.cudnn.enabled True
Unlike numpy, tensors can be easily moved to and from CUDA memory. In CUDA, you can do almost whatever you can do out of it. If your computer is equipped with CUDA, and you installed the driver (NVIDIA CUDA 10.0 or higher), you can do the following:
cuda = torch.device('cuda') a = torch.randn(10000, 10000, device=cuda) b = torch.randn(10000, 10000, device=cuda) t = time.time(); c = torch.matmul(a, b); print(time.time()-t)
On my computer, the time was 0.4 seconds, which is multiplications per second.
You can easily move tensors to and from CUDA memory with to
method
>>> cuda = torch.device('cuda') >>> cpu = torch.device('cpu') >>> a = torch.ones(5,5) >>> b = a.to(cuda) # move to cuda >>> c = b.to(cpu) # move back to cpu >>> a.device device(type='cpu') >>> b.device device(type='cuda') >>> c.device device(type='cpu')
You cannot mix CUDA and CPU tensors in your expressions:
>>> a+b Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: expected backend CPU and dtype Float but got backend CUDA and dtype Float
Autograd
[edit | edit source]The autograd module implemented into PyTorch makes calculating gradients via backpropagation a piece of cake. You need to specify the requires_grad parameter ("requires" with -s, "grad" without), and call backward
method.
>>> a=torch.ones(2,2, requires_grad=True) >>> b=torch.eye(2,2, requires_grad=True) >>> c = a*a*(b+1) >>> d=c.sum() >>> d.backward() # calculate gradients >>> a.grad # gradient of d with respect to a tensor([[4., 2.], [2., 4.]]) >>> b.grad # gradient of d with respect to b tensor([[1., 1.], [1., 1.]])