---
license: mit
tags:
- threshold-logic
- neuromorphic
- computer-architecture
- turing-complete
- loihi
- truenorth
- akida
---

# 8bit-threshold-computer

A Turing-complete CPU implemented entirely as threshold logic gates. Every gate, from Boolean primitives to arithmetic to control flow, is a single threshold neuron of the form:

```
output = 1 if (Σ wᵢ·xᵢ + b) ≥ 0 else 0
```

**Every weight in the file is in {-1, 0, 1}.** Biases are integers. Activations are the Heaviside step. Nothing else. The library was originally built with positional weights up to ±2³¹ for wide single-layer comparators; those have all been replaced with bit-cascaded multi-layer equivalents that use only ternary weights and small integer biases. Threshold-gate evaluation reduces to a popcount minus a popcount plus a bias, which is exactly what neuromorphic chips and FPGAs natively support.

The repository ships eighteen prebuilt configurations spanning three data-path widths (8, 16, 32 bits) and six memory sizes (0 B to 64 KB). The canonical file at the repo root is the largest of these: a 32-bit data path with a 64 KB address space and ~8.47 M parameters.

```
neural_computer.safetensors        32-bit data, 64 KB memory, ~8.47M params (canonical)
variants/neural_computer{8,16,32}.safetensors                    full memory (64 KB)
variants/neural_computer{8,16,32}_reduced.safetensors            4 KB memory
variants/neural_computer{8,16,32}_small.safetensors              1 KB memory
variants/neural_computer{8,16,32}_scratchpad.safetensors         256 B memory
variants/neural_computer{8,16,32}_registers.safetensors          16 B memory
variants/neural_alu{8,16,32}.safetensors                         pure ALU, no memory
```

---

## Quick start

```python
import torch
from safetensors.torch import load_file

tensors = load_file("neural_computer.safetensors")

def heaviside(x):
    return (x >= 0).float()

# AND gate: fires when both inputs are 1
w = tensors['boolean.and.weight']  # [2]
b = tensors['boolean.and.bias']    # [1]
for a, c in [(0, 0), (0, 1), (1, 0), (1, 1)]:
    out = heaviside((torch.tensor([a, c], dtype=torch.float32) * w).sum() + b)
    print(f"AND({a}, {c}) = {int(out.item())}")
```

Run the full circuit verification suite against any variant:

```bash
python eval_all.py variants/                              # all 18 in one pass
python eval_all.py neural_computer.safetensors            # the canonical file
python eval_all.py --cpu-program variants/                # also run an assembled
                                                          # program through the
                                                          # threshold-gated CPU
```

`eval_all.py` reads each variant's manifest, runs a gate-level fitness suite (5,900–7,800 tests per variant covering Boolean, arithmetic, ALU, control, modular, error-detection, threshold, and IEEE 754 float circuits), and optionally executes a small assembled program through a manifest-sized threshold CPU plus a chained 16- or 32-bit ALU sequence on wider variants.

For an interactive walkthrough that exercises Boolean gates, the 8-bit ALU, mod-5 divisibility, and a CPU loop end-to-end:

```bash
python play.py                                            # 1 KB demo, runs in seconds
python play.py --model neural_computer.safetensors        # 64 KB, slower
```

For end-to-end CPU validation (Fibonacci, sum 1..N, bubble sort, self-modifying JMP, all eight conditional jumps, CALL stack semantics, MUL cross-checked against repeated ADD):

```bash
python test_cpu.py                                        # default: 1 KB, ~2 s
python test_cpu.py --model neural_computer.safetensors    # 64 KB canonical, ~100 s
python test_cpu.py --only fib,sum_n                       # subset of suite
```

Each program is assembled by a small Python assembler (`cpu_programs.py`) and run through the threshold-gated CPU; the driver verifies expected memory contents at HALT.

---

## Execution model

A self-contained machine. State goes in, state comes out:

- **Pure tensor computation**: state in, state out
- **Frozen circuits**: integer weights, Heaviside activation
- **ACT execution**: internal loop until `HALT`
- **No external orchestration**: one forward pass equals one complete program execution

```
            ┌─────────────────────────────┐
            │      Initial State          │
            │  [PC|Regs|Flags|Memory...]  │
            └─────────────┬───────────────┘
                          ▼
            ┌─────────────────────────────┐
            │   Threshold Circuit Layer   │
            │  ┌───────────────────────┐  │
            │  │   Fetch: PC → Instr   │  │
            │  ├───────────────────────┤  │
            │  │   Decode: Opcode/Ops  │  │
            │  ├───────────────────────┤  │
            │  │   Execute: ALU/Mem    │  │
            │  ├───────────────────────┤  │
            │  │   Writeback: Results  │  │
            │  ├───────────────────────┤  │
            │  │   PC Update           │  │
            │  └───────────┬───────────┘  │
            │              │              │
            │         ┌────▼────┐         │
            │         │ HALTED? │         │
            │         └────┬────┘         │
            │       no ────┴──── yes      │
            │        │           │        │
            │        ▼           ▼        │
            │     [loop]      [exit]      │
            └─────────────┬───────────────┘
                          ▼
            ┌─────────────────────────────┐
            │       Final State           │
            └─────────────────────────────┘
```

### Instruction set

| Opcode | Mnemonic | Operation |
|--------|----------|-----------|
| 0x0 | ADD | R[d] = R[a] + R[b] |
| 0x1 | SUB | R[d] = R[a] - R[b] |
| 0x2 | AND | R[d] = R[a] & R[b] |
| 0x3 | OR | R[d] = R[a] \| R[b] |
| 0x4 | XOR | R[d] = R[a] ^ R[b] |
| 0x5 | SHL | R[d] = R[a] << 1 |
| 0x6 | SHR | R[d] = R[a] >> 1 |
| 0x7 | MUL | R[d] = R[a] * R[b] |
| 0x8 | DIV | R[d] = R[a] / R[b] |
| 0x9 | CMP | flags = R[a] - R[b] |
| 0xA | LOAD | R[d] = M[addr] |
| 0xB | STORE | M[addr] = R[s] |
| 0xC | JMP | PC = addr |
| 0xD | Jcc | PC = addr if cond (imm8[2:0]: 0=Z, 1=NZ, 2=C, 3=NC, 4=N, 5=P, 6=V, 7=NV) |
| 0xE | CALL | push PC; PC = addr |
| 0xF | HALT | stop execution |

### State tensor layout

The **state tensor** uses MSB-first bit ordering: index 0 of each multi-bit field is the most-significant bit. So `R0[0]` is bit 7 of the architectural register, `R0[7]` is bit 0.

```
[ PC[N] | IR[16] | R0[8] R1[8] R2[8] R3[8] | FLAGS[4] | SP[N] | CTRL[4] | MEM[2^N][8] ]
```

`N` is the address width (configurable, 0–16). Flags are ordered `Z, N, C, V`. Control bits are ordered `HALT, MEM_WE, MEM_RE, RESERVED`.

#### Bit ordering, one rule per scope

The state tensor's MSB-first convention does **not** propagate to subcircuit ports. Each subcircuit names its operand bits in its own scope:

| Scope | Convention | Example |
|---|---|---|
| State tensor | MSB-first (index 0 = MSB) | `R0[0]` is bit 7 of register R0 |
| Subcircuit external ports (`$a[i]`, `$b[i]`) | LSB-indexed (index 0 = LSB) | `$a[0]` is bit 0 of operand `a` |
| Ripple-carry full adders (`fa0..fa7`) | LSB-first (fa0 = LSB) | `fa0` consumes `$a[0]` and `$b[0]` |
| Instruction word | MSB-first (bit 15 = opcode high) | bit 15 is `opcode[3]` |

Worked example for `arithmetic.ripplecarry8bit`:

- Inputs: `$a[0]..$a[7]` and `$b[0]..$b[7]` where `$a[0]` is the LSB of `a`. To add `a = 0x05 = 0b00000101` and `b = 0x03`, drive `a[0]=1, a[1]=0, a[2]=1` (rest 0) and `b[0]=1, b[1]=1` (rest 0).
- Outputs: `fa0.ha2.sum.layer2`..`fa7.ha2.sum.layer2` are sum bits 0..7 (LSB to MSB), and `fa7.carry_or` is the final carry-out. The 8-bit result is `{fa7..fa0}` reading high-to-low.

This is also how `safetensors2verilog`'s threshold-logic frontend exposes the ports of any extracted subcircuit. See the project's testbench at `tests/threshold_alu/run.py` for a worked end-to-end example, or use `python -m safetensors2verilog ... --inspect` to print the port contract for any extracted circuit.

### Instruction encoding (16-bit, MSB-first)

```
15..12  11..10  9..8  7..0
opcode  rd      rs    imm8
```

Interpretation:
- **R-type**: `rd = rd op rs` (imm8 ignored)
- **I-type**: `rd = op rd, imm8` (rs ignored)
- **Address-extended**: `LOAD`, `STORE`, `JMP`, `Jcc`, `CALL` consume the next word as a 16-bit address (big-endian); `imm8` is reserved and the PC skips 4 bytes when the jump is not taken.

### Circuit categories

| Category | Circuits | Examples |
|----------|----------|----------|
| Boolean | 9 | AND, OR, NOT, NAND, NOR, XOR, XNOR, IMPLIES, BIIMPLIES |
| Arithmetic | 18+ | half/full adder, ripple-carry (8/16/32-bit), comparators (8/16/32-bit), 3-operand adder, A+B×C and (A+B)×C expressions |
| ALU | 8/16/32-bit | shifts, multiply, divide, INC/DEC, NEG, ROL/ROR, bitwise |
| Combinational | 10+ | MUX (2:1, 4:1, 8:1), DEMUX, 3-to-8 decoder, 8-to-3 encoder, barrel shifter, priority encoder |
| Control flow | 16 | JMP, conditional jumps (JZ/JNZ/JC/JNC/JN/JP/JV/JNV), CALL, RET, PUSH, POP |
| Memory | 3 | N-bit address decoder, read mux, write cells (packed) |
| Modular | 11 | divisibility by 2–12 (multi-layer for non-powers-of-2) |
| Threshold | 13 | k-of-n gates, majority, minority, exactly-k |
| Pattern | 10 | popcount, leading/trailing ones, symmetry |
| Error detection | 11 | parity (XOR tree), checksum, CRC, Hamming |
| Float (IEEE 754) | half + single | pack/unpack, classify, normalize, ADD, MUL, DIV, EQ/LT/LE/GT/GE |

### Tensor naming

```
{category}.{circuit}[.{layer}][.{component}].{weight|bias}

Examples:
  boolean.and.weight
  boolean.xor.layer1.neuron1.weight
  arithmetic.ripplecarry8bit.fa7.ha2.sum.layer1.or.weight
  modular.mod5.eq.k15.bit3.match.weight
  error_detection.paritychecker8bit.stage2.xor1.layer1.nand.bias
```

Memory circuits are stored as packed tensors so the safetensors header stays manageable (`memory.addr_decode.weight`, `memory.read.and.weight`, `memory.write.and_old.weight`, etc.).

---

## Bit widths and memory profiles

The build tool emits one of 51 functionally distinct configurations: three data-path widths × seventeen address widths (0–16, where 0 means no memory).

**Bit widths** (`--bits`):

| Width | Range | Use case |
|-------|-------|----------|
| 8 | 0–255 | full CPU, legacy compatibility |
| 16 | 0–65,535 | extended arithmetic |
| 32 | 0–4,294,967,295 | practical arithmetic ranges |

**Memory profiles** (`-m`):

| Profile | Size | Addr bits | Filename suffix |
|---------|------|-----------|-----------------|
| `none` | 0 B | 0 | (uses `alu` instead of `computer`) |
| `registers` | 16 B | 4 | `_registers` |
| `scratchpad` | 256 B | 8 | `_scratchpad` |
| `small` | 1 KB | 10 | `_small` |
| `reduced` | 4 KB | 12 | `_reduced` |
| `full` | 64 KB | 16 | (none) |

Auto-generated filename: `neural_{alu|computer}{BITS}[_{MEMORY}].safetensors`. Custom address widths via `-a N` produce `_addrN`.

```bash
python build.py --bits 32 --apply all              # neural_computer32.safetensors
python build.py --bits 8 -m none --apply all       # neural_alu8.safetensors
python build.py --bits 16 -m small --apply all     # neural_computer16_small.safetensors
python build.py --bits 32 -a 6 --apply all         # neural_computer32_addr6.safetensors
```

To regenerate every named variant in one pass:

```bash
python build_all.py
```

This populates `variants/` with all 18 builds, quantizes each one to the smallest signed integer dtype that exactly represents its weights (~4× reduction in tensor data, with file size dominated by the safetensors header on the smaller profiles), and runs `eval.py` on each as a sanity check.

The quantizer is also available standalone:

```bash
python quantize.py path/to/file.safetensors           # in-place
python quantize.py variants/                          # whole directory
python quantize.py model.safetensors -o quantized.safetensors
python quantize.py file.safetensors --ternary         # push toward {-1, 0, 1} weights
python quantize.py file.safetensors --ternary --strict  # error if any weight is non-ternary
```

Every weight and bias tensor in the canonical model fits in `int8`. The eval pipeline promotes weights to `float32` on load, so integer storage is exact and transparent.

**Ternary mode.** With `--ternary`, the quantizer also rewrites single-input `weight=±2` identity buffers (SHL/SHR/ROL/ROR bit gates, stack data buffers, RET address buffers, flag buffers) as `weight=±1` with bias adjusted to preserve the heaviside output for binary inputs (`H(2x - 1) ≡ H(x - 1)` etc.). The canonical model has zero non-ternary weights as built; the comparators, modular detectors, and division stages that previously required positional weights up to ±2³¹ have all been bit-cascaded into multi-layer ternary equivalents in `build.py`. The metadata field `weight_quantization` records `ternary` (clean) or `ternary_partial` (some violations remain).

---

## Verification

| Category | Coverage | Notes |
|----------|----------|-------|
| Boolean gates | exhaustive | all 2^n input combinations |
| Arithmetic (8-bit) | strategic sampling | edge values + diagonal pairs; ~50 cases per circuit |
| Arithmetic (16/32-bit) | strategic sampling | extreme values + targeted bit patterns |
| ALU primitives (8/16/32-bit) | strategic sampling | edge inputs per operation |
| Control flow | exhaustive | all 2^3 input combinations per Jcc |
| Threshold k-of-n | exhaustive | all 256 8-bit popcounts |
| Modular (all moduli, 8-bit input) | exhaustive | every value in [0, 255] |
| Parity | exhaustive | every value in [0, 255] |
| Pattern recognition | exhaustive | every value in [0, 255] |
| Combinational logic | exhaustive | full input space per gate |
| CPU integration | program-level | seven assembled programs (Fibonacci, sum, sort, self-modifying JMP, all eight Jcc, CALL stack push, MUL vs repeated ADD) plus a divisor-by-repeated-subtraction cross-checked against the DIV opcode and a bitwise pipeline (AND/OR/XOR/SHL/SHR) |

The 8-bit arithmetic and ALU tests use strategic sampling rather than the full 65,536-case sweep because exhaustive coverage at 8-bit is feasible but not necessary given that the circuits are constructed gate-by-gate. The 16-bit and 32-bit arithmetic tests sample edge cases only; full exhaustive coverage at those widths is infeasible without specialized hardware.

`eval_all.py` runs the unified suite. Exit code is the number of failing variants (0 means all pass). `test_cpu.py` runs the CPU program suite against a chosen variant.

---

## Threshold logic

A threshold gate computes a Boolean function by taking a weighted sum of binary inputs and comparing the result to a threshold; the output is 1 when the sum meets or exceeds the threshold and 0 otherwise. Equivalently, it is a neuron with Heaviside step activation, integer weights, and an integer bias.

Threshold gates are strictly more powerful than standard Boolean gates. A single threshold gate can compute any linearly separable Boolean function, which includes AND, OR, NAND, NOR, IMPLIES, and many others that require multiple levels of conventional gates. Functions that are not linearly separable (XOR, parity, mod-k for k not a power of two) require multiple layers.

Example gates:

```
AND: w=[1, 1], b=-2
  H(0+0-2) = 0     H(1+1-2) = 1

OR:  w=[1, 1], b=-1
  H(0+0-1) = 0     H(1+0-1) = 1

XOR: two layers (not linearly separable)
  layer 1: OR + NAND
  layer 2: AND of the two
```

A full adder is two half-adders plus a carry OR, around four threshold layers. An 8-bit ripple-carry adder is eight chained full adders, around 32 layers.

### History

Warren McCulloch and Walter Pitts introduced the threshold neuron in 1943, proving that networks of such neurons can compute any Boolean function. Their work preceded both the perceptron and modern neural networks and established the theoretical foundation for neural computation.

The 1960s saw substantial work on threshold logic synthesis. Saburo Muroga, Robert McNaughton, and Michael Dertouzos developed algebraic methods for determining whether a Boolean function can be implemented as a single threshold gate, and if so, how to compute the appropriate weights. The focus was on individual gates rather than complete systems.

Frank Rosenblatt's Mark I Perceptron (1957–1960) implemented threshold neurons in hardware using potentiometers for weights, but it was a pattern classifier that learned its weights through training; the final configurations were not published. Bernard Widrow's ADALINE and MADALINE (1960–1963) similarly used adaptive threshold elements with weights learned via the LMS algorithm.

Hava Siegelmann and Eduardo Sontag proved in the 1990s that recurrent neural networks are Turing-complete. The construction relied on continuous sigmoid activations with infinite precision, not the discrete step function used in threshold logic. Other theoretical work on neural Turing machines and differentiable computers followed similar patterns: universality with continuous activations chosen to support gradient-based training.

### Neuromorphic hardware

Modern neuromorphic processors implement large arrays of configurable threshold-like neurons in silicon:

- **Intel Loihi** (2017): 128 neuromorphic cores with programmable synaptic weights, spike-based communication, and on-chip learning. Supports integer weights and configurable neuron dynamics.
- **IBM TrueNorth** (2014): one million neurons and 256 million synapses across a 4096-core array. Each neurosynaptic core implements 256 neurons with configurable weights and thresholds.
- **BrainChip Akida** (2021): edge-oriented event-based processing with integer weights.
- **SpiNNaker** (University of Manchester): ARM cores simulating spiking networks at scale.

Published work on these platforms has focused on neural network inference, sensory processing, and pattern recognition. A 2024 paper demonstrated basic logic gates, adders, and decoders on SpiNNaker and Dynap-SE1 and described that work as "a first step toward the construction of a spiking computer"; that implementation lacked instruction fetch, a program counter, memory, and control logic.

The weights in this repository implement a complete CPU: registers, ALU with 16 operations, status flags, conditional branching, subroutine calls, stack operations, and memory access. Every component is a threshold neuron with integer weights.

---

## Hardware compatibility

All weights are in {-1, 0, 1}, all activations are Heaviside step, and every gate is a single weighted sum followed by a sign test. This eliminates multipliers entirely: each gate evaluation is a popcount of `+1`-weighted inputs minus a popcount of `-1`-weighted inputs plus an integer bias. The circuits are intended to deploy directly on:

- **FPGA**: every gate maps to a small LUT cluster (or a popcount tree of LUT4/LUT6 + carry chain). Ternary weight storage compresses to 2 bits per weight; routing collapses to bit selection.
- **Intel Loihi**: integer weights and Heaviside threshold neurons are the native primitive. Ternary fits well within Loihi's 8-bit weight range.
- **IBM TrueNorth**: configurable threshold per neurosynaptic core; ternary weights and small biases are within the supported range.
- **BrainChip Akida**: edge-oriented integer-weight inference; ternary weights fit cleanly.

---

## LLM integration

Threshold circuits can be embedded into transformer MLP layers to give a language model exact arithmetic. Standard LLMs fail at arithmetic because they interpolate over the training distribution rather than compute, so a 360M-parameter model trained on web text has seen `127 + 128 = 255` few times if at all and guesses based on pattern matching.

The integration freezes the circuits and trains only the interface layers that:

1. Extract operands from token embeddings.
2. Route computation through the appropriate circuit.
3. Inject the result back into the residual stream.

The model learns *call dispatch*; the arithmetic is already solved.

### Architecture

```
x ──┬── MLP path ─────────────────┬── + ── output
    │                             │
    └── BitExtractor ── Circuit ──┴── BitInjector
                          │
                       Router (learned weighting)
```

Augmented MLP forward pass:

```python
def forward(x):  # x: [batch, seq, d_model=960]
    mlp_out = self.down_proj(silu(self.gate_proj(x)) * self.up_proj(x))

    a_bits, b_bits = self.bit_extractor(x)              # [batch, seq, 8] each
    result_bits, carry = self.circuits.add_8bit(a_bits, b_bits)
    flags = self.compute_flags(result_bits, carry)
    circuit_delta = self.bit_injector(result_bits, flags)

    route_weights = self.router(x)                       # [batch, seq, 2] softmax
    return mlp_out + route_weights[..., 1:2] * circuit_delta
```

### Target model

The reference integration uses HuggingFace's [SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct). See [`llm_integration/SMOLLM2_ARCHITECTURE.md`](llm_integration/smollm2/SMOLLM2_ARCHITECTURE.md) for the full technical analysis.

| Property | Value |
|----------|-------|
| Parameters | 361.82 M |
| Hidden dimension | 960 (matches the extractor input) |
| Layers | 32 transformer blocks |
| Attention | 15 query heads, 5 KV heads (GQA) |
| MLP | SwiGLU (960 → 2560 → 960) |
| Position encoding | RoPE (theta = 100k, max 8192) |

Digits tokenize individually (`"47 + 86"` → `['4', '7', ' +', ' ', '8', '6']`, with digit token IDs `32 + digit_value`), which makes position-based operand extraction practical.

### Gradient flow

Heaviside has zero gradient almost everywhere. The implementation uses a straight-through estimator:

```python
class HeavisideSTE(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        return (x >= 0).float()

    @staticmethod
    def backward(ctx, grad_output):
        return grad_output
```

At inference, Heaviside is the true step function; if the extractor identifies operands correctly, the circuit produces the correct result by construction.

### Baseline

SmolLM2-360M-Instruct on randomized 8-bit arithmetic (2,000 cases, operands uniform on [0, 255], generous answer extraction):

| Operation | Accuracy |
|-----------|----------|
| Addition | 35.92% |
| Subtraction | 17.72% |
| Multiplication | 1.25% |
| Greater than | 14.37% |
| Less than | 4.31% |
| Equality | 0.28% |
| **Overall** | **11.90%** (238/2000) |

Multiplication accuracy at 1.25% is essentially random over the output space. Comparison operations often echo the expression rather than evaluate it. Even addition fails roughly two-thirds of the time on full 8-bit operands. Performance degrades further as operand magnitude increases: edge cases like `127 + 128` are almost never correct.

The frozen threshold circuits reach 100% on the same task when given correctly formatted bit inputs (10,000 random cases, every operation). The integration challenge is therefore the extractor, not the arithmetic.

### Trainable parameters (SmolLM2, hidden_dim = 960)

| Component | Parameters | Description |
|-----------|------------|-------------|
| AttentionPooling | ~3.7 M | 4-head attention over the sequence |
| MultiHeadBitExtractor (× 2) | ~245 K each | 8 per-bit MLPs for A and B |
| OpRouter | ~246 K | 960 → 256 → 6 MLP |
| **Extractor total** | **~4.4 M** | full extraction module |

Alternative architectures: `PositionExtractor` (~1.5 M, position-specific, no attention), `DigitExtractor` (~1.2 M, predicts digits 0–9 instead of bits), `HybridExtractor` (digit lookup with MLP fallback for word numerals). With `--unfreeze_layers 4` an additional ~39.3 M trainable parameters open up in the top four transformer layers.

### Training

```bash
python train.py --mode router --epochs 100                          # sanity check
python train.py --mode llm --epochs 100 --batch_size 256            # frozen LLM
python train.py --mode llm --unfreeze_layers 4 --batch_size 4096    # fine-tune top layers
```

Loss components: BCE on output bits, BCE on extracted A and B bits (2× weight), and CE on operation classification. Curriculum runs 0–9 → 0–99 → 0–255. Optimizer is AdamW, lr 3e-4, gradient clipping 1.0.

---

## Repository layout

```
neural_computer.safetensors         canonical model (32-bit, 64 KB, ~8.47M params)
variants/                           18 prebuilt configurations
build.py                            generator (one safetensors per invocation)
build_all.py                        builds, quantizes, and verifies every named profile
quantize.py                         casts each tensor to its minimum signed integer dtype
eval.py                             gate-level fitness suite + reference CPU runtime
eval_all.py                         variant-agnostic gate-level harness
cpu_programs.py                     assembler + program suite for CPU-level validation
test_cpu.py                         runs the program suite against a chosen variant
play.py                             interactive demo
prune_weights.py                    GPU-batched weight reduction with conflict resolution
llm_integration/                    SmolLM2 extractor + circuit wrapper + training code
  ├── circuits.py                   FrozenThresholdCircuits (loads safetensors, exposes
  │                                 add_8bit / sub_8bit / mul_8bit / compare_*)
  ├── model.py                      Extractor variants, ArithmeticModel
  ├── train.py                      router / interface / llm training modes
  ├── fitness.py                    randomized fitness function
  ├── baseline.py                   vanilla SmolLM2 baseline measurement
  ├── trained/                      checkpointed extractor weights
  └── smollm2/
      ├── SMOLLM2_ARCHITECTURE.md   architecture analysis
      ├── analyze_smollm2.py        analysis script
      └── smollm2_analysis.json     analysis output
```

---

## Citation

```bibtex
@misc{8bit-threshold-computer,
  title={8bit-threshold-computer: A Turing-Complete Threshold Logic CPU},
  author={Norton, Charles},
  year={2026},
  howpublished={Hugging Face},
  url={https://huggingface.co/phanerozoic/8bit-threshold-computer}
}
```

---

## License

MIT

---

## References

1. McCulloch & Pitts (1943). *A Logical Calculus of Ideas Immanent in Nervous Activity.*
2. Muroga (1971). *Threshold Logic and Its Applications.*
3. Siegelmann & Sontag (1995). *On the Computational Power of Neural Nets.*
4. Bengio et al. (2013). *Estimating or Propagating Gradients Through Stochastic Neurons.*
5. Ma et al. (2024). *The Era of 1-bit LLMs* (BitNet b1.58).
6. HuggingFace (2024). *SmolLM2: Small Language Models* — [model card](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct).
7. Vaswani et al. (2017). *Attention Is All You Need.*
8. Su et al. (2021). *RoFormer: Enhanced Transformer with Rotary Position Embedding.*