3 files changed, 126 insertions, 0 deletions
diff --git a/content/theses/research/arc-agi/_index.md b/content/theses/research/arc-agi/_index.md
new file mode 100644
index 0000000..89585e6
--- /dev/null
+++ b/content/theses/research/arc-agi/_index.md
@@ -0,0 +1,5 @@
+---
+title: arc-agi
+weight: 20
+bookCollapseSection: true
+---
diff --git a/content/theses/research/arc-agi/approach.md b/content/theses/research/arc-agi/approach.md
new file mode 100644
index 0000000..3745b68
--- /dev/null
+++ b/content/theses/research/arc-agi/approach.md
@@ -0,0 +1,43 @@
+---
+title: approach
+weight: 20
+---
+My idea is to use my [`alectors`](https://docs.apotheke.earth/alectors) library to parse the JSON as tokens, as standard LLM approaches do, but because it is a pure RL library, it is a *markov decision process* not a *markov chain*.
+
+## observation space
+The state space is obviously the JSON object, with a prompt like
+```python
+prompt = f"""
+You have to find the pattern. You are given some clues.
+Given {obj['train'][0]['input']} you get {obj['train'][0]['output']}.
+Given {obj['train'][1]['input']} you get {obj['train'][1]['output']}.
+Now based on {obj['test'][0]['input']}, what is the output?
+"""
+```
+(this is obviously a simplified prompt for berevity.)
+{{% hint warning %}}
+In theory, since we know that the max number of grid squares is `30x30`, could we not make a wrapper and use a CNN, or even flatten the input and outputs, and use simple traditional RL methods?
+{{% /hint %}}
+
+## action space
+The maximum number of grid squares is 30, and each square can have an integer from 0 to 9, inclusive. This means that in total there are `30x30x10=9000` possible actions, which is a huge action space[^1]. We can just use it as is, and hope for the best (which will massively increase the memory needs of the network), or we can come up with a clever workaround.
+
+Since we are not zero-shotting, and we will have multiple steps before a final output, we can simply add more steps rather than add more possible actions. A basic approach is to split it into two distinct steps, 'pick grid square' and 'pick integer for chosen grid square'. Then, the agent would choose an action from each of the possible squares in a flat array, and then pick a color. In order to force the agent to pick a color we could have two distinct ways; the first one would be to repeat the colors across the entirety of the action space, or we could heavily negatively reward the agent for picking an action outside of the integer value during 'integer picking for grid square'.
+
+The above seems like a bad solution; the action space would still be too big (~900 at worst) and the fact that we can have at most 10 integers per grid square means that during integer picking, the agent might have trouble exploring the action space effectively. Also, due to the curse of dimentionality, one should aim to lower the action space as much as possible.
+
+It would then make sense for a three-ministep step. We would have a 30d action space, with three disctict steps. One would be to pick the row, then the column, and finally the integer for the row/column.
+```python
+for ministep in ['row', 'column', 'color']:
+  prompt_with_ministep = prompt+ministep
+  action = agent.choose_action(prompt_with_ministep)
+  # ...
+```
+(again, this is an oversimplification)
+
+
+
+
+
+
+[^1]: smaller than the vocab size that LLMs use(~3e5), but it is too big to cheaply do anything.
diff --git a/content/theses/research/arc-agi/introduction.md b/content/theses/research/arc-agi/introduction.md
new file mode 100644
index 0000000..ea7e06e
--- /dev/null
+++ b/content/theses/research/arc-agi/introduction.md
@@ -0,0 +1,78 @@
+---
+title: introduction
+weight: 10
+---
+
+[ARC-AGI-2](https://arcprize.org) is a competition regarding pattern recognition for machine learning algorithms.
+
+The base idea is to see if current state-of-the-art models are capable of understanding and implementing advanced patters. This benchmark-competition is unique in that it is easy for humans to score extremely high, while it is very difficult for current models to perform at any acceptable rate.
+
+## data
+The dataset comprises of squares divided into sub-squares, with a visual queue in order to determine the colors, or the new position, or any different style of visual reasoning.
+
+>![base arc-agi example](/images/arcagi-base.png)  
+>The example shown on the arc-agi website.  
+>The colors correspond to the number of holes.
+
+This information is served as JSON, so the agents need to be able to parse JSON to 'visualise' the patterns, either by tokenizing and breaking down the data, or by parsing the JSON using a hard-coded method.
+
+A typical JSON data object looks like
+```json
+{
+  "train": [
+    {
+      "input": [
+        [7, 9],
+        [4, 3]
+      ],
+      "output": [
+        [7, 9, 7, 9, 7, 9],
+        [4, 3, 4, 3, 4, 3],
+        [9, 7, 9, 7, 9, 7],
+        [3, 4, 3, 4, 3, 4],
+        [7, 9, 7, 9, 7, 9],
+        [4, 3, 4, 3, 4, 3]
+      ]
+    },
+    {
+      "input": [
+        [8, 6],
+        [6, 4]
+      ],
+      "output": [
+        [8, 6, 8, 6, 8, 6],
+        [6, 4, 6, 4, 6, 4],
+        [6, 8, 6, 8, 6, 8],
+        [4, 6, 4, 6, 4, 6],
+        [8, 6, 8, 6, 8, 6],
+        [6, 4, 6, 4, 6, 4]
+      ]
+    }
+  ],
+  "test": [
+    {
+      "input": [
+        [3, 2],
+        [7, 8]
+      ],
+      "output": [
+        [3, 2, 3, 2, 3, 2],
+        [7, 8, 7, 8, 7, 8],
+        [2, 3, 2, 3, 2, 3],
+        [8, 7, 8, 7, 8, 7],
+        [3, 2, 3, 2, 3, 2],
+        [7, 8, 7, 8, 7, 8]
+      ]
+    }
+  ]
+}
+```
+on this example, the pattern is to generate alternating pairs of the input
+
+## current approaches
+Most approaches rely on LLMs; the best scores are achived by the state-of-the-art reasoning models such as [OpenAI's o3](https://openai.com/index/introducing-o3-and-o4-mini/), with a ***3%*** performance rating.
+These try to zero-shot the problem, i.e. they see the problem, might reason (depending on the model) and then suggest a solution immediately for the entire puzzle.
+
+It is my belief that this approach is flawed. LLMs are markov chains, and are frozen at inference. Hence they are not able to adjust their trajectories in the embedded vector space, nor is there any actual decision making.
+
+Even "reasoning" models, which have had their trajectories finetuned via rl, that are able to  self-prompt to cover more of the embedded space as they autoregress, are unable to reason.