So you want to code a convolution in Python - a bite-sized image processing with kernel

Implementing 🌸 tinyconv with Kirby for fun and learning

Feb 16, 2024

Learning by doing is simply fun, so let’s play with convolutions a bit today

This is what you will build:

Theoretical background

Grant Sanderson from 3Blue1Brown created an excellent video about convolution, so if you want to refresh a theory behind a convolution, this is the go-to resource:

Let’s start!

Images are represented as pixels and each pixel contains an information about its color. The information about the color is being represented by three parts (channels) of the final color - red, green and blue (that’s why it’s called RGB). These colors are represented by three numbers, each of them is within range between 0 and 255

For instance, this color is represented by a number 240 for red channel, 128 for green and 128 for blue

And this one consists of 60 red, 179 green and 113 blue

So first things first, we want to represent an image as a 3 dimensional matrix (a table consisting of rows and columns), where each element (a cell of the table) contains information about all three colors that make up a pixel

We need some image to experiment with, so here’s Kirby - right-click and save this cute pixel art by alessamole12345 and put it next to your Python script, so you can use the image name as a source_image_path:

https://raw.githubusercontent.com/jmaczan/tinyconv/main/kirby.png

For purpose of manipulating images in our code, let’s use NumPy arrays and Pillow:

import numpy as np
from PIL import Image

image = np.array(Image.open(source_image_path).convert("RGB"))

By the way - the full code is available for free on GitHub:
https://github.com/jmaczan/tinyconv
So if something is not clear enough with chunks of code in this post, you can jump directly into the repository and check the complete implementation yourself!

Let’s check what if the image is what we expect it to be:

>>> print(image.ndim, image.shape, image.size)
3 (128, 128, 3) 49152

This looks good - our image size is 128x128 pixels and it consists of 3 color values per pixel

A quick look on a first pixel from our image confirms that -

>>> image[0][0]
array([153, 217, 234], dtype=uint8)

Having a representation of our image in the code, we can move to coding a kernel

A kernel is a 2 dimensional matrix. The values it holds depend on what effect we want to achieve by applying the convolution to our image

Wikipedia has a list of sample kernels, so let’s begin with one of them called Box blur

Box blur kernel consists of a single value repeated across the matrix and all the values have to sum to 1

Interestingly, kernels always have an odd number of elements. I think it’s because with an even number of elements, the kernel wouldn’t have a single central element. Because of that, the computation based on a surrounding of a pixel would not be possible

Implementing this with NumPy is [unironically] a pleasure as always:

kernel = np.full(9, 1 / 9).reshape(3, 3)

What we did here is we initialized a vector of 9 elements, each element of value 1/9. Then, we reshaped the vector to become 2d matrix, consisting of 3 vectors with 3 elements each

Let’s just see if nothing is messed up here:

>>> print(kernel)
[[0.11111111 0.11111111 0.11111111]
 [0.11111111 0.11111111 0.11111111]
 [0.11111111 0.11111111 0.11111111]]

Now when we have both an image and a kernel defined in our code, let’s go straight to coding a convolution

Convolution implementation

We are going to implement a convolution by traversing the image pixel by pixel.

To compute a new value, for each pixel we extract a slice of the image - of the same size as a kernel

Then we will do a computation - a sum of multiplication of the slice and the kernel.

A result of the computation is a new value of the pixel that will be replaced in an output image

def tinyconv(image: np.ndarray, kernel: np.ndarray) -> np.ndarray:
    image_height, image_width, rgb_channels = image.shape

Defining the types for parameters and return type of function is not obligatory in Python, but I prefer to do so, because it’s easier to catch the bugs

We extract the size of image from shape property

Also, knowing the size of kernel, we can figure out how many pixels in surrounding of a pixel we need to take into account when doing a convolution

kernel_width = int((kernel.shape[0] - 1) / 2)
kernel_height = int((kernel.shape[1] - 1) / 2)

My friend Piotr asked me whether kernel.shape[0] // 2 won’t be enough and it turned out that he’s right, so we can simplify our code

kernel_width = kernel.shape[0] // 2
kernel_height = kernel.shape[1] // 2

Now let’s copy the image, so we don’t modify the original one:

output = np.copy(image)

In a convolution, you can often meet the concept of padding
Padding is when you add new values around your image, so when the kernel moves closer to the edges, it doesn’t cover the area outside the image (it doesn’t try to refer to indices outside the matrix), which likely could blow up your program
But we won’t do padding. Instead, we narrow down the area we cover with a kernel.
The result is nearly identical, except that a single pixel won’t be convoluted - a small price for doing this our own way

To traverse the image, we will use two nested for loops

for vertical_index in range(kernel_height, image_height - kernel_height):
    for horizontal_index in range(kernel_width, image_width - kernel_width):

As you can see, we omit some pixels by not starting at index 0 and not ending at index of last pixel. We can think of it as a padding inside the image itself

Again, this is not a standard way - if you want to do it right, check out some additional resources

Inside the loops, let’s compute the slice of the image corresponding to the pixel we compute value for, so we will be able to multiply the slice by kernel and then sum the elements to get our new output pixel

pixel_surrounding = image[
    vertical_index - kernel_height : vertical_index + kernel_height + 1,
    horizontal_index - kernel_width : horizontal_index + kernel_width + 1]

Let’s compute the output pixel now:

output[vertical_index, horizontal_index] = np.sum(
    pixel_surrounding * kernel
)

Since we use numpy ndarrays instead of plain Python arrays, multiplying two matrices is as simple as pixel_surrounding * kernel. Then we just sum up the elements and - there we have it

After returning the output array, we just need to save the new image to the file. Once again, we can use Pillow:

image = (
    Image.fromarray(tinyconv(image, kernel))
    .convert("RGB")
    .save(output_image_path)
)

It turns out that implementing a convolution is rather straightforward and simple, even though it might sound like something quite mathy. At least it had this conotation to me

Let’s take a quick look at our output image:

Did Kirby lose a color in his life because of our code? It seems like we’re missing something

Do you have an idea what could it possibly be? I’ll leave some empty space for you to not spoil this small riddle

…

….

…

Ready to read the answer?

So, you remember that every pixel consists of three channels

What is missing in our code is handling all three channels in the output:

for channel in range(rgb_channels):
    for vertical_index in range(kernel_height, image_height - kernel_height):
        for horizontal_index in range(kernel_width, image_width - kernel_width):
            pixel_surrounding = image[
                vertical_index - kernel_height : vertical_index + kernel_height + 1,
                horizontal_index
                - kernel_width : horizontal_index
                + kernel_width
                + 1,
                channel,
            ]
            output[vertical_index, horizontal_index, channel] = np.sum(
                pixel_surrounding * kernel
            )

With this little change, Kirby comes back to life and becomes blurry at the same time - just as expected!

And there you have it - our job is done here 🎉

At this point, you can implement more kernels by yourself, like sharping the image or edge detection - most (all?) of them differ only in a kernel size and kernel matrix values

And once again, I’d like to invite you to check out a more polished version of the implementation, so you can compare it with your code or troubleshoot if needed:

https://github.com/jmaczan/tinyconv

In the src/kernel directory you will find more implementations of common kernels. Feel free to fork the code and add your own! Your contributions are more than welcome

If you find my work useful, please consider leaving a GitHub star for visibility 🌟

Time for Piotr’s optimizations!

We could finish at this point, but our code is far from being optimal

Already mentioned Piotr is C++ low-level programmer with a good intuition about code performance. Luckily, after reading our code, he shared some insights, so we can benefit from his remarks. Dzięki!

Kernel shape, width and height

Originally computing the kernel size looked like this:

kernel_width = int((kernel.shape[0] - 1) / 2)
kernel_height = int((kernel.shape[1] - 1) / 2)

But with some minor additional assumptions about kernel shape (all kernels I’ve implemented in tinyconv are squares), we can simplify it to

kernel_height = kernel_width = int(np.square(kernel.shape[0]) / 2)

In order to make it work though, we need to store a kernel as a single vector - without reshaping it into a matrix, as we done before

kernel = np.full(9, 1/9)

You might think that it’s not really a big deal. But now when we implemented this, we can go further and speed up the stuff

Vector multiplication

When we transform a slice of image from matrix to a vector

pixel_surrounding = image[
    vertical_index - kernel_height : vertical_index + kernel_height + 1,
    horizontal_index - kernel_width : horizontal_index + kernel_width + 1, channel].flatten()

Then we can multiply it with our kernel vector

output[vertical_index, horizontal_index] = np.matmul(
    pixel_surrounding, kernel
)

The execution time speedup is significant - the optimized version is almost twice faster as the original one!

A neat comment on HackerNews about CPU optimizations: https://news.ycombinator.com/item?id=33760117

And now, that’s all for today

This is my first post on Substack, so if you enjoyed the journey let’s share it with people you think my enjoy it as well

Many thanks and have a good time hacking!

Jędrzej

Curiosity-driven AI/ML Research Engineering