ML endeavors begin

As someone who has been a software developer and a builder for quite some time now, it's been humbling trying to get back to research and studies. Lately, I've been very curious about how the AI tools I use every day in life really work behind the scenes. I've learnt all about it back in my days at college. I pursued computer science in uni and I took up at least a couple of subjects related to machine learning and artificial intelligence. Safe to say, most of it has left my brain the moment exams were over and my learning skill has been a little rusty, so I've been trying to stay consistent with learning at least something each day. I'm going to try to learn machine learning and break down the fundamentals as much as I can to help myself understand things better, and document it too! Going forward, I expect daily study times, slow understanding of what's really going on behind these complex machines and fun problems to tackle and find solutions to. This is going to be the perspective of a builder, a dev (and a designer at heart) trying to learn and research machine learning.

Learning/Material

I've consulted several threads online and AI - one of the best resources to get started with machine learning seems to be the neural networks playlist of 3blue1brown. And... that's exactly what I did. I watched a couple of videos to kick-start this whole thing. And I kinda understood things? The video made it very clear and illustrated exactly what needed to be understood. But I really wanted to delve deeper, so I want to base my learnings off of the book by Michael Nielsen called Neural Networks and Deep Learning. It's available online, it's just a website, and more importantly, it's free to read! I would highly encourage anyone with a strong enough background in mathematics to go check it out - it's wonderful and elegant.

So I started with the Hello World! of the machine learning world: The MNIST handwritten digit classification example. Classic. To say it's the hello world is an understatement, because you're not simply printing things on a screen anymore. You're going to be creating and training a neural network to identify and classify handwritten digits on a 28x28 pixel grid image. Sounds simple enough, right? That's what I thought. And boy was I wrong. There is just a lot of math behind how all of this works, and I'm going to try my best to break it down as much as I can / I understand.

Psst: Turns out, before I can actually get my hands dirty with coding/building the network, it's crucial to understand the math behind it. Okay, maybe not so much if you just want to create, but from a research/optimization standpoint, it's very important to know how the network works so it's later then possible to improve your system (through a process called finetuning) but more on actual coding later.

Neural Networks

What's fascinating about machine learning is how starkingly similar the terminologies are to human biology. Of course, it's way different in its behaviour, but a neural network is simply a network of artificial neurons or perceptrons. These are the building blocks of any machine learning algorithm. And they have a simple use case: take in inputs and spit out something as output. It's essentially a computer program, defined to compute an output, given a set of inputs. Straightforward. We call this output of a perceptron, as an activation. This activation is influenced by parameters known as weights and biases.

These perceptrons are arranged in "layers" within a neural network. Each input to a neuron comes to it from the previous layer - which is essentially the activation of neurons in the previous layer. But each input is also "weighted" in the sense that these weights determine "how much the activation of this previous neuron is important for activating the next neuron". And each neuron also has an inherent bias to decide a threshold before it becomes active. Putting all this together, we call this the "weighted sum". Think of a neuron $X$ that has 3 inputs to it, $x1, x2, x3$ but each of these inputs has a corresponding weight attached to it, say, $w1, w2, w3$. The bias, now, is a value that helps the neuron nudge towards or away from activation, it dictates just generally how likely that this neuron should fire an output. We can call that $b$.

But a neural network is made up of several of these neurons. We can take all of these weights, inputs, and biases and form them into matrices called $w$, $x$, and $b$. Putting all this together, we get an equation something like this:

$$wx + b > 0$$

This is the condition necessary for the neuron to activate. Think of this example: say our inputs were -2, 1, and 0. Weights are 1, 2, and 3. And the bias of the neuron is 2. We would do something like: $-2*1 + 1*2 + 0*3 + 2$ and this results in 2 which is greater than 0. So the neuron activates! But we want to be dealing with fractional values as well, mostly to deal with probabilities. We don't want it to be binary. So, we insert a new method/function called the sigmoid function (or as later I would come to know that modern-day neural nets use a special function called ReLU). Now this sigmoid function is a simple function that says:

$$\sigma(Z) = \frac{1}{1+e^{-Z}}$$

What this means is, it doesn't matter how huge or how negative the output is, we can squish all of that into a number within 0 to 1. You can see how if the number tends towards $\infty$ then the sigmoid function is almost… 1. And if the number tends towards $-\infty$ then the sigmoid function results in 0. This is just a beautiful way to ensure that the activation of a neuron always stays between 0 and 1! Well, that covers the basics of how the activation of a neuron is determined. Simply put, it's this:

$$y = \sigma(wx + b) = \frac{1}{1+e^{-wx-b}}$$

where w -> denotes the weight matrix, x denotes the input matrix and b -> the bias matrix of the network. That's it, a simple, elegant way to represent the activation/output of a neural network.