# Bias trick in neural networks

### Turning two tensors (weights and biases) into one (weights with biases) for simpler computation in neural networks

Bias trick is about simplifying a linear operation `y = W * xi + b`

so it doesn’t require adding a bias (`b`

term), but rather having it included in weights matrix `W`

, so we perform only a multiplication, instead of multiplication and addition

Instead of having tensor with weights (`W`

) and tensor with biases (`b`

), we can append biases to the tail of weights tensor and add `1`

(bias dimension, a constant) to the vector with the training data (`xi`

)

We can visualize it in Python

```
import numpy as np
# Define the matrix W
W = np.array([
[0.2, -0.5, 0.1, 2.0],
[1.5, 1.3, 2.1, 0.0],
[0.0, 0.25, 0.2, -0.3]
])
# Define the vector xi
xi = np.array([56, 231, 24, 2])
# Define the bias vector b
b = np.array([1.1, 3.2, -1.2])
# Combine W and b into a new matrix W_new
W_new = np.hstack((W, b.reshape(-1, 1)))
# Define the augmented xi vector with an extra element for the bias
xi_augmented = np.append(xi, 1)
# Print the results
print("W:\n", W)
print("\nxi:\n", xi)
print("\nb:\n", b)
print("\nW_new:\n", W_new)
print("\nxi_augmented:\n", xi_augmented)
```

The output is

```
W:
[[ 0.2 -0.5 0.1 2. ]
[ 1.5 1.3 2.1 0. ]
[ 0. 0.25 0.2 -0.3 ]]
xi:
[ 56 231 24 2]
b:
[ 1.1 3.2 -1.2]
W_new:
[[ 0.2 -0.5 0.1 2. 1.1 ]
[ 1.5 1.3 2.1 0. 3.2 ]
[ 0. 0.25 0.2 -0.3 -1.2 ]]
xi_augmented:
[ 56 231 24 2 1]
```

Does this trick really works?

```
>>> original = np.dot(W, xi) + b
>>> original
array([-96.8 , 437.9 , 60.75])
>>> tricky = np.dot(W_new, xi_augmented)
>>> tricky
array([-96.8 , 437.9 , 60.75])
>>> original == tricky
array([ True, True, True])
```

Okay, but why it works, anyway?

What’s your intuition, before you read answer (unless you already know)?

…

…

The bias trick works, because of how dot product works

In dot product operation, we compute the output matrix by taking elements of vectors of `W`

matrix and elements of vector `xi`

and multiply the elements on corresponding positions

So when we add a bias vector to the end of weights matrix, each bias element becomes last element of each of vector in W matrix

These last elements are then multiplied by last element of new xi vector, which is `1`

So essentially we do the same computations, but without having to store biases in a separate tensor

And that's it!

Btw I was too lazy to type the sample data for the tensors used here, so I took sample data from https://cs231n.github.io/linear-classify/ and asked ChatGPT to transform the tensors into Python code. CS231n is a great resource on neural networks, worth checking out!

Thanks for reading