So you want to code a convolution in Python - a bite-sized image processing with kernel
Implementing 🌸 tinyconv with Kirby for fun and learning
Learning by doing is simply fun, so let’s play with convolutions a bit today
This is what you will build:
Theoretical background
Grant Sanderson from 3Blue1Brown created an excellent video about convolution, so if you want to refresh a theory behind a convolution, this is the go-to resource:
Let’s start!
Images are represented as pixels and each pixel contains an information about its color. The information about the color is being represented by three parts (channels) of the final color - red, green and blue (that’s why it’s called RGB). These colors are represented by three numbers, each of them is within range between 0 and 255
For instance, this color is represented by a number 240 for red channel, 128 for green and 128 for blue
And this one consists of 60 red, 179 green and 113 blue
So first things first, we want to represent an image as a 3 dimensional matrix (a table consisting of rows and columns), where each element (a cell of the table) contains information about all three colors that make up a pixel
We need some image to experiment with, so here’s Kirby - right-click and save this cute pixel art by alessamole12345 and put it next to your Python script, so you can use the image name as a source_image_path
:
For purpose of manipulating images in our code, let’s use NumPy arrays and Pillow:
import numpy as np
from PIL import Image
image = np.array(Image.open(source_image_path).convert("RGB"))
By the way - the full code is available for free on GitHub:
https://github.com/jmaczan/tinyconv
So if something is not clear enough with chunks of code in this post, you can jump directly into the repository and check the complete implementation yourself!
Let’s check what if the image is what we expect it to be:
>>> print(image.ndim, image.shape, image.size)
3 (128, 128, 3) 49152
This looks good - our image size is 128x128 pixels and it consists of 3 color values per pixel
A quick look on a first pixel from our image confirms that -
>>> image[0][0]
array([153, 217, 234], dtype=uint8)
Having a representation of our image in the code, we can move to coding a kernel
A kernel is a 2 dimensional matrix. The values it holds depend on what effect we want to achieve by applying the convolution to our image
Wikipedia has a list of sample kernels, so let’s begin with one of them called Box blur
Box blur kernel consists of a single value repeated across the matrix and all the values have to sum to 1
Interestingly, kernels always have an odd number of elements. I think it’s because with an even number of elements, the kernel wouldn’t have a single central element. Because of that, the computation based on a surrounding of a pixel would not be possible
Implementing this with NumPy is [unironically] a pleasure as always:
kernel = np.full(9, 1 / 9).reshape(3, 3)
What we did here is we initialized a vector of 9 elements, each element of value 1/9. Then, we reshaped the vector to become 2d matrix, consisting of 3 vectors with 3 elements each
Let’s just see if nothing is messed up here:
>>> print(kernel)
[[0.11111111 0.11111111 0.11111111]
[0.11111111 0.11111111 0.11111111]
[0.11111111 0.11111111 0.11111111]]
Now when we have both an image and a kernel defined in our code, let’s go straight to coding a convolution
Convolution implementation
We are going to implement a convolution by traversing the image pixel by pixel.
To compute a new value, for each pixel we extract a slice of the image - of the same size as a kernel
Then we will do a computation - a sum of multiplication of the slice and the kernel.
A result of the computation is a new value of the pixel that will be replaced in an output image
def tinyconv(image: np.ndarray, kernel: np.ndarray) -> np.ndarray:
image_height, image_width, rgb_channels = image.shape
Defining the types for parameters and return type of function is not obligatory in Python, but I prefer to do so, because it’s easier to catch the bugs
We extract the size of image from shape
property
Also, knowing the size of kernel, we can figure out how many pixels in surrounding of a pixel we need to take into account when doing a convolution
kernel_width = int((kernel.shape[0] - 1) / 2)
kernel_height = int((kernel.shape[1] - 1) / 2)
My friend Piotr asked me whether kernel.shape[0] // 2
won’t be enough and it turned out that he’s right, so we can simplify our code
kernel_width = kernel.shape[0] // 2
kernel_height = kernel.shape[1] // 2
Now let’s copy the image, so we don’t modify the original one:
output = np.copy(image)
In a convolution, you can often meet the concept of padding
Padding is when you add new values around your image, so when the kernel moves closer to the edges, it doesn’t cover the area outside the image (it doesn’t try to refer to indices outside the matrix), which likely could blow up your program
But we won’t do padding. Instead, we narrow down the area we cover with a kernel.
The result is nearly identical, except that a single pixel won’t be convoluted - a small price for doing this our own way
To traverse the image, we will use two nested for
loops
for vertical_index in range(kernel_height, image_height - kernel_height):
for horizontal_index in range(kernel_width, image_width - kernel_width):
As you can see, we omit some pixels by not starting at index 0 and not ending at index of last pixel. We can think of it as a padding inside the image itself
Again, this is not a standard way - if you want to do it right, check out some additional resources
Inside the loops, let’s compute the slice of the image corresponding to the pixel we compute value for, so we will be able to multiply the slice by kernel and then sum the elements to get our new output pixel
pixel_surrounding = image[
vertical_index - kernel_height : vertical_index + kernel_height + 1,
horizontal_index - kernel_width : horizontal_index + kernel_width + 1]
Let’s compute the output pixel now:
output[vertical_index, horizontal_index] = np.sum(
pixel_surrounding * kernel
)
Since we use numpy ndarrays instead of plain Python arrays, multiplying two matrices is as simple as pixel_surrounding * kernel
. Then we just sum up the elements and - there we have it
After returning the output
array, we just need to save the new image to the file. Once again, we can use Pillow:
image = (
Image.fromarray(tinyconv(image, kernel))
.convert("RGB")
.save(output_image_path)
)
It turns out that implementing a convolution is rather straightforward and simple, even though it might sound like something quite mathy. At least it had this conotation to me
Let’s take a quick look at our output image:
Did Kirby lose a color in his life because of our code? It seems like we’re missing something
Do you have an idea what could it possibly be? I’ll leave some empty space for you to not spoil this small riddle
…
….
…
Ready to read the answer?
So, you remember that every pixel consists of three channels
What is missing in our code is handling all three channels in the output:
for channel in range(rgb_channels):
for vertical_index in range(kernel_height, image_height - kernel_height):
for horizontal_index in range(kernel_width, image_width - kernel_width):
pixel_surrounding = image[
vertical_index - kernel_height : vertical_index + kernel_height + 1,
horizontal_index
- kernel_width : horizontal_index
+ kernel_width
+ 1,
channel,
]
output[vertical_index, horizontal_index, channel] = np.sum(
pixel_surrounding * kernel
)
With this little change, Kirby comes back to life and becomes blurry at the same time - just as expected!
And there you have it - our job is done here 🎉
At this point, you can implement more kernels by yourself, like sharping the image or edge detection - most (all?) of them differ only in a kernel size and kernel matrix values
And once again, I’d like to invite you to check out a more polished version of the implementation, so you can compare it with your code or troubleshoot if needed:
https://github.com/jmaczan/tinyconv
In the src/kernel
directory you will find more implementations of common kernels. Feel free to fork the code and add your own! Your contributions are more than welcome
If you find my work useful, please consider leaving a GitHub star for visibility 🌟
Time for Piotr’s optimizations!
We could finish at this point, but our code is far from being optimal
Already mentioned Piotr is C++ low-level programmer with a good intuition about code performance. Luckily, after reading our code, he shared some insights, so we can benefit from his remarks. Dzięki!
Kernel shape, width and height
Originally computing the kernel size looked like this:
kernel_width = int((kernel.shape[0] - 1) / 2)
kernel_height = int((kernel.shape[1] - 1) / 2)
But with some minor additional assumptions about kernel shape (all kernels I’ve implemented in tinyconv are squares), we can simplify it to
kernel_height = kernel_width = int(np.square(kernel.shape[0]) / 2)
In order to make it work though, we need to store a kernel as a single vector - without reshaping it into a matrix, as we done before
kernel = np.full(9, 1/9)
You might think that it’s not really a big deal. But now when we implemented this, we can go further and speed up the stuff
Vector multiplication
When we transform a slice of image from matrix to a vector
pixel_surrounding = image[
vertical_index - kernel_height : vertical_index + kernel_height + 1,
horizontal_index - kernel_width : horizontal_index + kernel_width + 1, channel].flatten()
Then we can multiply it with our kernel vector
output[vertical_index, horizontal_index] = np.matmul(
pixel_surrounding, kernel
)
The execution time speedup is significant - the optimized version is almost twice faster as the original one!
A neat comment on HackerNews about CPU optimizations: https://news.ycombinator.com/item?id=33760117
And now, that’s all for today
This is my first post on Substack, so if you enjoyed the journey let’s share it with people you think my enjoy it as well
Many thanks and have a good time hacking!
Jędrzej