5 files changed, 368 insertions, 0 deletions
diff --git a/content/theses/research/_index.md b/content/theses/research/_index.md
new file mode 100644
index 0000000..7e23064
--- /dev/null
+++ b/content/theses/research/_index.md
@@ -0,0 +1,5 @@
+---
+title: research
+weight: 10
+bookCollapseSection: true
+---
diff --git a/content/theses/research/arc-agi/_index.md b/content/theses/research/arc-agi/_index.md
new file mode 100644
index 0000000..89585e6
--- /dev/null
+++ b/content/theses/research/arc-agi/_index.md
@@ -0,0 +1,5 @@
+---
+title: arc-agi
+weight: 20
+bookCollapseSection: true
+---
diff --git a/content/theses/research/arc-agi/approach.md b/content/theses/research/arc-agi/approach.md
new file mode 100644
index 0000000..3745b68
--- /dev/null
+++ b/content/theses/research/arc-agi/approach.md
@@ -0,0 +1,43 @@
+---
+title: approach
+weight: 20
+---
+My idea is to use my [`alectors`](https://docs.apotheke.earth/alectors) library to parse the JSON as tokens, as standard LLM approaches do, but because it is a pure RL library, it is a *markov decision process* not a *markov chain*.
+
+## observation space
+The state space is obviously the JSON object, with a prompt like
+```python
+prompt = f"""
+You have to find the pattern. You are given some clues.
+Given {obj['train'][0]['input']} you get {obj['train'][0]['output']}.
+Given {obj['train'][1]['input']} you get {obj['train'][1]['output']}.
+Now based on {obj['test'][0]['input']}, what is the output?
+"""
+```
+(this is obviously a simplified prompt for berevity.)
+{{% hint warning %}}
+In theory, since we know that the max number of grid squares is `30x30`, could we not make a wrapper and use a CNN, or even flatten the input and outputs, and use simple traditional RL methods?
+{{% /hint %}}
+
+## action space
+The maximum number of grid squares is 30, and each square can have an integer from 0 to 9, inclusive. This means that in total there are `30x30x10=9000` possible actions, which is a huge action space[^1]. We can just use it as is, and hope for the best (which will massively increase the memory needs of the network), or we can come up with a clever workaround.
+
+Since we are not zero-shotting, and we will have multiple steps before a final output, we can simply add more steps rather than add more possible actions. A basic approach is to split it into two distinct steps, 'pick grid square' and 'pick integer for chosen grid square'. Then, the agent would choose an action from each of the possible squares in a flat array, and then pick a color. In order to force the agent to pick a color we could have two distinct ways; the first one would be to repeat the colors across the entirety of the action space, or we could heavily negatively reward the agent for picking an action outside of the integer value during 'integer picking for grid square'.
+
+The above seems like a bad solution; the action space would still be too big (~900 at worst) and the fact that we can have at most 10 integers per grid square means that during integer picking, the agent might have trouble exploring the action space effectively. Also, due to the curse of dimentionality, one should aim to lower the action space as much as possible.
+
+It would then make sense for a three-ministep step. We would have a 30d action space, with three disctict steps. One would be to pick the row, then the column, and finally the integer for the row/column.
+```python
+for ministep in ['row', 'column', 'color']:
+  prompt_with_ministep = prompt+ministep
+  action = agent.choose_action(prompt_with_ministep)
+  # ...
+```
+(again, this is an oversimplification)
+
+
+
+
+
+
+[^1]: smaller than the vocab size that LLMs use(~3e5), but it is too big to cheaply do anything.
diff --git a/content/theses/research/arc-agi/introduction.md b/content/theses/research/arc-agi/introduction.md
new file mode 100644
index 0000000..ea7e06e
--- /dev/null
+++ b/content/theses/research/arc-agi/introduction.md
@@ -0,0 +1,78 @@
+---
+title: introduction
+weight: 10
+---
+
+[ARC-AGI-2](https://arcprize.org) is a competition regarding pattern recognition for machine learning algorithms.
+
+The base idea is to see if current state-of-the-art models are capable of understanding and implementing advanced patters. This benchmark-competition is unique in that it is easy for humans to score extremely high, while it is very difficult for current models to perform at any acceptable rate.
+
+## data
+The dataset comprises of squares divided into sub-squares, with a visual queue in order to determine the colors, or the new position, or any different style of visual reasoning.
+
+>![base arc-agi example](/images/arcagi-base.png)  
+>The example shown on the arc-agi website.  
+>The colors correspond to the number of holes.
+
+This information is served as JSON, so the agents need to be able to parse JSON to 'visualise' the patterns, either by tokenizing and breaking down the data, or by parsing the JSON using a hard-coded method.
+
+A typical JSON data object looks like
+```json
+{
+  "train": [
+    {
+      "input": [
+        [7, 9],
+        [4, 3]
+      ],
+      "output": [
+        [7, 9, 7, 9, 7, 9],
+        [4, 3, 4, 3, 4, 3],
+        [9, 7, 9, 7, 9, 7],
+        [3, 4, 3, 4, 3, 4],
+        [7, 9, 7, 9, 7, 9],
+        [4, 3, 4, 3, 4, 3]
+      ]
+    },
+    {
+      "input": [
+        [8, 6],
+        [6, 4]
+      ],
+      "output": [
+        [8, 6, 8, 6, 8, 6],
+        [6, 4, 6, 4, 6, 4],
+        [6, 8, 6, 8, 6, 8],
+        [4, 6, 4, 6, 4, 6],
+        [8, 6, 8, 6, 8, 6],
+        [6, 4, 6, 4, 6, 4]
+      ]
+    }
+  ],
+  "test": [
+    {
+      "input": [
+        [3, 2],
+        [7, 8]
+      ],
+      "output": [
+        [3, 2, 3, 2, 3, 2],
+        [7, 8, 7, 8, 7, 8],
+        [2, 3, 2, 3, 2, 3],
+        [8, 7, 8, 7, 8, 7],
+        [3, 2, 3, 2, 3, 2],
+        [7, 8, 7, 8, 7, 8]
+      ]
+    }
+  ]
+}
+```
+on this example, the pattern is to generate alternating pairs of the input
+
+## current approaches
+Most approaches rely on LLMs; the best scores are achived by the state-of-the-art reasoning models such as [OpenAI's o3](https://openai.com/index/introducing-o3-and-o4-mini/), with a ***3%*** performance rating.
+These try to zero-shot the problem, i.e. they see the problem, might reason (depending on the model) and then suggest a solution immediately for the entire puzzle.
+
+It is my belief that this approach is flawed. LLMs are markov chains, and are frozen at inference. Hence they are not able to adjust their trajectories in the embedded vector space, nor is there any actual decision making.
+
+Even "reasoning" models, which have had their trajectories finetuned via rl, that are able to  self-prompt to cover more of the embedded space as they autoregress, are unable to reason.
diff --git a/content/theses/research/monitoring.md b/content/theses/research/monitoring.md
new file mode 100644
index 0000000..9d7e2c9
--- /dev/null
+++ b/content/theses/research/monitoring.md
@@ -0,0 +1,237 @@
+---
+title: monitoring
+weight: 10
+---
+
+The easiest way to set up logging for ai experiments is to use `mlflow`, which is a ready made `python` package.
+
+## installation
+To get started we can add `mlflow` to our project, using reasonable package managers like `poetry` or `uv`
+
+```sh
+$ poetry add mlflow
+```
+
+and then inside our environment we can run
+```sh
+$ mlflow server --host 127.0.0.1
+```
+
+This sets up a web server on `localhost:5000`, which is only accessible via the computer (for local monitoring).
+If you want to make this accessible to other computers (say locally via LAN, or via the internet) use `--host 0.0.0.0`. Just make sure that [you open the proper port in the firewall (by default port 5000)](/self-sufficiency/networking)
+
+{{% hint info %}}
+for example, to serve publicly on port 8889, we run
+```sh
+$ mlflow server --host 127.0.0.1 --port 8889
+```
+{{% /hint %}}
+
+## docker-compose
+In order to use docker and easily handle/manage updates we can create a `docker-compose.yaml`
+```yaml
+services:
+  mlflow:
+  image: ghcr.io/mlflow/mlflow
+  container_name: mlflow
+  ports:
+    - '5000:5000'
+  environment:
+    MLFLOW_TRACKING_URI: http://0.0.0.0:5000
+  volumes:
+    - ./mlflow:/mlflow/mlruns
+  restart: always
+  command: ["mlflow", "server", "--host", "0.0.0.0", "--port", "5000"]
+```
+This pulls the latest image of `mlflow` from github and sets it to always run so we can access the service from anywhere on port `5000`.
+{{% hint info %}}
+if we want to serve it on port 8889, we need to set `ports: '8889:5000'`
+{{% /hint %}}
+
+## demo
+
+To get a ready-made demo, we will do a basic MNIST setup
+
+```python
+import mlflow
+
+import torch as T
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+
+from torch.utils.data import DataLoader
+
+from torchvision import datasets
+from torchvision.transforms import ToTensor
+
+train_data = datasets.MNIST(
+    root='data',
+    train=True,
+    transform=ToTensor(),
+    download=True
+    )
+
+test_data = datasets.MNIST(
+    root='data',
+    train=False,
+    transform=ToTensor(),
+    download=True
+    )
+
+loaders = {
+	'train': DataLoader(
+		train_data,
+		batch_size=params['batch_size'],
+		shuffle=True,
+		num_workers=1
+	),
+	
+	'test': DataLoader(
+		test_data,
+		batch_size=params['batch_size'],
+		shuffle=True,
+		num_workers=1
+	)
+}
+```
+and set up an `ImageClassifier`
+```python
+class ImageClassifier(nn.Module):
+
+    def __init__(self):
+        super(ImageClassifier, self).__init__()
+
+        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
+        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
+        self.conv2_drop = nn.Dropout2d()
+        self.fc1 = nn.Linear(320, 50)
+        self.fc2 = nn.Linear(50, 10)
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = F.max_pool2d(x, 2)
+        x = F.relu(x)
+        x = self.conv2(x)
+        x = self.conv2_drop(x)
+        x = F.max_pool2d(x, 2)
+        x = F.relu(x)
+        x = x.view(-1, 320)
+        x = self.fc1(x)
+        x = F.relu(x)
+        x = F.dropout(x, training=self.training)
+        x = self.fc2(x)
+        return F.softmax(x)
+
+model = ImageClassifier().to(device)
+optimizer = optim.Adam(model.parameters(), lr=params['learning_rate'])
+loss_func = nn.CrossEntropyLoss()
+```
+
+### train/test functions
+Using the [official documentation](https://mlflow.org/docs/latest/tracking/), we can build a tracking experiment
+
+We will need two functions, `train` and `test`:
+```python
+def train(epoch):
+    """
+    Train the model on a single pass of the dataloader, and send the metrics to mlflow
+    """
+    model.train()
+    for batch_idx, (data, target) in enumerate(loaders['train']):
+
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+
+        loss = loss_func(output, target)
+        loss.backward()
+        accuracy = batch_idx/len(loaders['train'].dataset)
+
+        optimizer.step()
+
+        if batch_idx % 20 == 0:
+            print(
+              f"Train Epoch: {epoch}, [{
+                batch_idx*len(data)
+              }/{
+                len(loaders['train'].dataset)
+              } ({
+                100*batch_idx/len(loaders['train'].dataset):.0f
+              }%)]), Loss: {loss}"
+            )
+
+            loss, current = loss.item(), batch_idx
+            step = batch_idx // 20 * (epoch + 1)
+            mlflow.log_metric("loss", f"{loss:2f}", step=step)
+            mlflow.log_metric("accuracy", f"{accuracy:2f}", step=step)
+
+def test():
+    """
+    Evaluate the model, and log results with mlflow
+    """
+    model.eval()
+
+    loss = 0
+    correct = 0
+
+    with T.no_grad():
+        for data, target in loaders['test']:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            loss += loss_func(output, target).item()
+            pred = output.argmax(dim=1,keepdim=True)
+            correct += pred.eq(target.view_as(pred)).sum().item()
+
+    loss /=len(loaders['test'].dataset)
+    accuracy = correct/len(loaders['test'].dataset)
+
+    print(
+      f"\nTest set: Average Loss: {loss:.4f}, Accuracy: {correct}/{
+        len(loaders['test'].dataset)
+      } ({
+        100*correct/len(loaders['test'].dataset):.0f
+      })\n"
+    )
+
+    mlflow.log_metric("eval_loss", f"{loss:2f}", step=epoch)
+    mlflow.log_metric("eval_accuracy", f"{accuracy:2f}", step=epoch)
+```
+
+### parameter logging
+
+In order to log the hyperparameters so we can reference them during finetuning, we first need to inform the script where our `mlflow` instance is at, and to do this we set
+```python
+mlflow.set_tracking_uri(uri="http://localhost:5000")
+
+mlflow.set_experiment("MNIST mlflow demo")
+```
+{{% hint info %}}
+`set_tracking_uri` points to the `url` we run `mlflow` at. This means that is we run it on `127.0.0.1`, we use `localhost` or `127.0.0.1`. If we set it up as `0.0.0.0`, and the experiment is run outside of the mlflow server (ie another computer), we use the IP that points to that computer; either the LAN IP provided by the router (if we are using a LAN), or the public IP of the server.
+
+`set_experiment` is the name of the experiment inside the mlflow instance, and is used for experiment grouping and comparisons.
+{{% /hint %}}
+
+Now we can define the hyperparameters and log them
+
+```python                
+params = {
+	"batch_size": batch,
+	"learning_rate": lr,
+	"num_epochs": epochs
+}
+mlflow.log_params(params)
+```
+
+### the loop
+We are now ready to let the experiment run.
+
+The main training loop needs to run inside the `mlflow` [***context***](https://realpython.com/python-with-statement/)
+
+```python
+with mlflow.start_run():
+	for epoch in range(params['num_epochs']):
+		train(epoch)
+		test()
+```
+and wait.