I Taught an AI Agent to Speak OSPF: It's Now My Router's Neighbour

Wont You Be My Neighbour?

## The Network as a Conversation, Not a Configuration

For decades, we’ve treated networks as things to be **configured**. We push commands, pull outputs, parse CLI text, and hope our automation scripts survive the next OS upgrade.

**What if we’ve been thinking about this wrong?**

What if networks aren’t meant to be configured—they’re meant to be **conversed with**?

Think about OSPF for a moment. It’s not a configuration language. It’s a **conversation protocol**. Routers don’t configure each other—they talk to each other. They exchange beliefs about topology. They debate link costs. They converge on a shared truth about the network graph. When a link fails, they don’t wait to be polled—they announce it, and every peer updates their worldview in milliseconds.

**Routing protocols are conversations.** Distributed systems exchanging information, building consensus, and making decisions together.

So we asked: **What if an AI agent could join that conversation?**

—

## What if your AI agent didn’t just talk *to* your routers—what if it *was* a router?

For decades, network automation has followed the same pattern: build tools that sit *outside* the network, speaking to routers through intermediary protocols. SSH and screen scraping. NETCONF and YANG models. RESTCONF APIs. gRPC and gNMI. Even the cutting-edge Model Context Protocol (MCP) that everyone’s excited about.

Every single one treats the router as a black box with an API.

**We took a different approach: We built an AI agent that doesn’t talk *to* routers. It talks *with* them. As a peer.**

The agent runs RFC 2328 OSPF natively. It forms FULL neighbor adjacencies with production routers. It exchanges Link State Advertisements (LSAs), maintains a complete Link State Database (LSDB), runs Dijkstra’s shortest path first (SPF) algorithm, and participates as a first-class member of the OSPF control plane.

It’s not observing the network. **It IS the network.**

This isn’t automation that **controls** the network from above. This is intelligence that **participates** in the network as a peer. The agent doesn’t issue commands—it **listens**. It doesn’t scrape outputs—it **receives updates**. It doesn’t poll for state—it **maintains synchronized state**.

**The network isn’t configured anymore. It’s listened to.**

### Routers Already Know How to Talk

Close your eyes and imagine what’s happening in your network right now:

“`

Router A: “Hello, I’m 10.10.10.10, and I can reach 10.255.255.10”

Router B: “Hello, I’m 10.20.20.20, and I heard you. I can reach 10.30.30.30”

Router A: “Thanks! Now I know I can reach 10.30.30.30 through you”

Router B: “And I can reach 10.255.255.10 through you!”

“`

Every 10 seconds. Every interface. Every router.

**This is OSPF.** Not a configuration language—a **conversation protocol.**

Now imagine a link fails:

“`

Router A: “URGENT: I lost my link to 10.255.255.10!” (floods LSA)

Router B: “I heard you! Recalculating my routes…” (SPF calculation)

Router C: “I also heard! Updating my forwarding table…” (FIB update)

Router A: “Thanks, we’re all synchronized now” (convergence)

“`

Milliseconds. No polling. No central controller. **Just peers talking, listening, and adapting together.**

This is how networks have always worked. Distributed systems exchanging information, building consensus, making decisions collaboratively. **Networks are conversations.**

**So why do we automate them with commands?**

—

## The Paradigm Shift: From Control to Participation

Let me show you what traditional network automation looks like:

“`

┌─────────────┐

│ AI Agent │

│ “Show me │

│ the route” │

└──────┬──────┘

│

│ SSH/NETCONF/RESTCONF/gNMI

│ “Run show ip route”

│ “Parse CLI output”

│ “Hope the format doesn’t change”

▼

┌─────────────┐

│ Router │

│ (Black Box)│

└─────────────┘

“`

Now look at what we built:

“`

┌─────────────┐ ┌─────────────┐

│ AI Agent │◄────OSPF────►│ Router │

│ Router ID │ Hello (10s)│ Router ID │

│10.255.255.10│ LSA Flood │ 10.10.10.10 │

│ │ SPF Sync │ │

│ FULL/- ✓ │ │ FULL/- ✓ │

└─────────────┘ └─────────────┘

│ │

└────────────────────────────┘

Same LSDB

Same Topology

Same Protocol Language

“`

The agent doesn’t issue commands. **It receives LSAs.** It doesn’t scrape outputs. **It runs SPF calculations.** It doesn’t query for state. **It maintains synchronized state.**

This isn’t automation 2.0. This is **network participation 1.0.**

—

## Control-Plane Literacy: Speaking the Network’s Native Language

Traditional network automation requires “translation layers” because we’ve never given AI agents **control-plane literacy**—the ability to speak routing protocols natively.

Think about what happens when you automate via CLI:

“`

Human → Python script → SSH → CLI parser → “show ip ospf neighbor” → Text output → Regex → Hope

“`

Every layer is a translation. Every translation loses information. Every parse is fragile.

**Now watch what happens when the agent speaks OSPF:**

“`

Router → OSPF LSA → Agent (native protocol understanding) → Complete topology graph → Insights

“`

No translation. No parsing. No information loss. **Just conversation.**

This is the difference between asking “show me your OSPF database” and **being part of the OSPF database**. Between polling “what changed?” and **being notified when things change**. Between commanding routers and **collaborating with them**.

### Why This Changes Everything

When AI agents gain control-plane literacy, they don’t just get better data—they get **contextual understanding** that was previously impossible:

**1. Instant Topology Awareness**

Traditional: Poll every router, correlate outputs, infer topology

Protocol-Native: Receive LSAs, build topology graph automatically

**2. Real-Time Change Detection**

Traditional: Poll periodically, detect changes retroactively

Protocol-Native: Receive updates in milliseconds, understand impact immediately

**3. Root Cause Analysis**

Traditional: “OSPF neighbor down” (symptom)

Protocol-Native: “Lost bidirectional Hello communication with 10.10.10.50, affecting routes to 172.20.0.0/20, alternate path available via 10.10.10.45 (+5ms latency)” (complete context)

**4. Predictive Intelligence**

Traditional: React to failures after they propagate

Protocol-Native: See LSA sequence number gaps, detect flapping, predict convergence delays

The agent isn’t just reading state—it’s **experiencing the network** the same way routers do.

—

## The Value: Six Concrete Examples

Let’s move from philosophy to practicality. What does control-plane literacy actually buy you?

### Example 1: “Why Is Traffic Taking That Path?”

**Traditional Approach:**

“`

Engineer: “Why isn’t traffic using the direct link?”

Tool: SSH to 12 routers → Parse “show ip route” → Infer path → 15 minutes

“`

**Protocol-Native Agent:**

“`

Engineer: “Why isn’t traffic using the direct link to 10.5.3.0/24?”

Agent: “Looking at my LSDB… I see two paths:

– Direct via 10.10.10.50: metric 20

– Alternate via 10.10.10.45: metric 15

SPF selected the alternate path due to lower metric.

The direct link’s metric was set to 20 by LSA 0x80000042

from router 10.10.10.50 at 14:32:15 UTC.

Would you like to see the SPF tree or the LSA that changed the metric?”

Time: 100 milliseconds

“`

The agent doesn’t need to SSH anywhere. **It already has the complete topology in memory because it received every LSA as it was flooded.**

### Example 2: Planned Maintenance Without Fear

**Traditional Approach:**

“`

Engineer: “I need to take down Router A for maintenance. What breaks?”

Tool: Build topology from CLI scrapes → Run simulation → Hope it’s accurate

Risk: High (topology might be stale or incomplete)

“`

**Protocol-Native Agent:**

“`

Engineer: “What happens if I remove Router 10.10.10.50 from the topology?”

Agent: “Simulating removal of 10.10.10.50 from my LSDB…

Affected routes: 47 prefixes

– 12 lose all paths (single point of failure!)

– 35 have alternate paths with these changes:

* 10.5.3.0/24: +10ms via 10.10.10.45

* 192.168.100.0/24: +5ms via 10.10.10.22

Critical services impacted:

– Database cluster at 10.5.3.50 (no alternate path)

– Monitoring server at 10.5.3.100 (no alternate path)

Expected convergence time: 3.2 seconds

Recommendation: Add backup link to 10.10.10.45 before maintenance.”

Time: 200 milliseconds

Risk: Eliminated (simulation uses live LSDB)

“`

The agent can run **what-if scenarios** against the actual topology because it maintains a complete, synchronized graph.

### Example 3: Detecting Policy Drift

**Traditional Approach:**

“`

Intended metric for link X: 10

Actual metric: 100 (someone changed it manually 3 months ago)

Detection: Never (unless you audit every router periodically)

“`

**Protocol-Native Agent:**

“`

Agent monitors LSAs: “Router 10.10.10.50 just flooded LSA 0x80000098

Link to 10.10.10.22: metric 100

Expected metric (per policy): 10

Deviation detected!

Alert: Policy drift on link 10.10.10.50→10.10.10.22

Detected: Real-time (within 100ms of change)

Last compliant LSA: 0x80000097 at 2026-01-10 09:15:42″

“`

The agent sees **every topology change in real-time** and can validate against policy continuously.

### Example 4: Intelligent Traffic Steering

**Traditional Approach:**

“`

Engineer: “Steer traffic away from congested link”

Tool: SSH → Configure metric → Hope convergence works → Check 30 seconds later

“`

**Protocol-Native Agent:**

“`

Agent detects high utilization on link X

Agent generates temporary Router LSA with adjusted metric

Agent floods LSA to neighbors

Agent observes SPF recalculation across all peers

Agent validates traffic shifted to alternate path

Agent monitors impact (latency, packet loss)

Agent can automatically revert if problems detected

Time to detection → action → validation: <5 seconds

“`

The agent can **participate in traffic engineering** because it’s a peer in the control plane, not an external observer.

### Example 5: Multi-Agent Intelligence

**Traditional Approach:**

“`

Agent 1: Monitors via SNMP (polls every 60s)

Agent 2: Monitors via Syslog (reactive)

Agent 3: Monitors via NetFlow (sampled)

Correlation: Manual, delayed, incomplete

“`

**Protocol-Native Multi-Agent:**

“`

Agent A (OSPF peer in DC1): “I see Router X advertising new LSA with link down”

Agent B (OSPF peer in DC2): “Confirmed, I received the same LSA flood”

Agent C (BGP peer): “Seeing BGP route withdrawal from same router”

Agent D (ISIS peer in transport network): “ISIS adjacency with that router intact”

Correlation happens automatically because all agents speak native protocols.

Root cause identified in <1 second:

“OSPF-specific failure, not router failure. Likely interface or area config issue.”

“`

Multiple agents participating in different protocol domains can **correlate events across control planes** with perfect timing and complete context.

### Example 6: Learning Without Burdening

**Traditional Approach:**

“`

Training ML model on network topology:

– SSH to routers every 5 minutes

– Parse outputs (CPU load + time)

– Miss fast-changing events

– Models trained on stale data

“`

**Protocol-Native Approach:**

“`

Agent receives every LSA update as it happens

Agent maintains complete history of topology changes

Agent has exact timing of every event

Agent never polls, never loads router CPUs

Agent can feed ML models with:

– Sub-second granularity topology changes

– Complete graph structure at every moment

– Zero operational impact on production

Result: Better models, zero production impact, real-time learning

“`

The agent **learns continuously** without adding any load to production infrastructure because it’s a peer receiving broadcasts, not a client making requests.

—

## The Technical Reality: Full OSPF Implementation in Python

### What We Built

Our agent, lovingly named “Won’t You Be My Neighbor” (after Mr. Rogers and OSPF neighbor relationships), is a complete OSPF implementation written in Python using Scapy for packet manipulation and NetworkX for graph algorithms.

**Core Features:**

– **RFC 2328 compliant state machine**: Transitions through Down → Init → 2-Way → ExStart → Exchange → Loading → FULL

– **Master/Slave negotiation**: Numerical Router ID comparison for Database Description exchange

– **LSA flooding and acknowledgment**: Proper reliable flooding with retransmission timers

– **Link State Database**: Full LSDB with LSA aging, sequence numbers, and MaxAge handling

– **SPF calculation**: Dijkstra’s algorithm building a complete topology graph

– **Route injection**: Advertises its own /32 loopback as a stub network to prevent becoming a transit path

### The Architecture

“`

┌──────────────────────────────────────────────────────────────┐

│ Docker Container │

│ ┌────────────────────────────────────────────────────────┐ │

│ │ OSPF Agent (Python) │ │

│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │

│ │ │ Hello │ │ DBD │ │ LSA │ │ │

│ │ │ Handler │ │ Manager │ │ Flooding │ │ │

│ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │

│ │ │ │ │ │ │

│ │ └─────────────┴──────────────┘ │ │

│ │ │ │ │

│ │ ┌─────────────▼──────────────┐ │ │

│ │ │ State Machine │ │ │

│ │ │ (Neighbor FSM) │ │ │

│ │ └─────────────┬──────────────┘ │ │

│ │ │ │ │

│ │ ┌─────────────▼──────────────┐ │ │

│ │ │ Link State Database │ │ │

│ │ │ (Synchronized with peers) │ │ │

│ │ └─────────────┬──────────────┘ │ │

│ │ │ │ │

│ │ ┌─────────────▼──────────────┐ │ │

│ │ │ SPF Calculator │ │ │

│ │ │ (NetworkX Dijkstra) │ │ │

│ │ └────────────────────────────┘ │ │

│ └────────────────────────────────────────────────────────┘ │

│ │ │

│ Raw OSPF Packets │

│ (Protocol 89) │

│ │ │

│ ┌────────────────────────▼───────────────────────────────┐ │

│ │ eth0: 172.20.0.2/20 │ │

│ └─────────────────────────────────────────────────────────┘ │

└────────────────────────────┬─────────────────────────────────┘

│

Layer 2 Bridge (ospf-net)

│

┌────────────────────────────▼─────────────────────────────────┐

│ FRRouting Container │

│ ┌─────────────────────────────────────────────────────────┐ │

│ │ eth0: 172.20.2.10/20 │ │

│ └─────────────────────────────────────────────────────────┘ │

│ │ │

│ ospfd (FRR 8.4) │

│ Router ID: 10.10.10.10 │

│ │

│ OSPF Neighbor: 10.255.255.10 – State: FULL/- │

│ Routes Learned: 10.255.255.10/32 via 172.20.0.2 [110/11] │

└───────────────────────────────────────────────────────────────┘

“`

### The Setup

**Network Topology:**

“`

Docker Network: ospf-net (172.20.0.0/20)

├── FRR Router: 172.20.2.10 (Router ID: 10.10.10.10)

└── Python Agent: 172.20.0.2 (Router ID: 10.255.255.10)

OSPF Configuration:

– Area: 0.0.0.0 (Backbone)

– Network Type: Point-to-Point

– Hello Interval: 10 seconds

– Dead Interval: 40 seconds

– Interface MTU: 1500 bytes

“`

**Adjacency Formation:**

“`

14:47:40.123 | Agent sends Hello (neighbors: [])

14:47:40.125 | FRR receives Hello → State: Init

14:47:40.130 | FRR sends Hello (neighbors: [10.255.255.10])

14:47:40.132 | Agent receives Hello → State: Init → 2-Way (bidirectional!)

14:47:40.135 | Agent decides: form adjacency (p2p network)

14:47:40.136 | Agent → State: ExStart

14:47:40.140 | Agent sends DBD (I|M|MS, seq=0x696a4b86)

14:47:40.145 | FRR responds DBD (M|MS, seq=0x696a4b86) → Master/Slave negotiated

14:47:40.150 | Both → State: Exchange

14:47:40.155 | Exchange LSA headers via DBD packets

14:47:40.200 | Both → State: Loading (Agent needs FRR’s Router LSA)

14:47:40.205 | Agent sends LS Request for FRR’s LSA

14:47:40.210 | FRR sends LS Update with full Router LSA

14:47:40.215 | Agent acknowledges with LS Ack

14:47:40.220 | Agent → State: FULL ✓

14:47:40.225 | Agent floods its own Router LSA

14:47:40.230 | FRR acknowledges

14:47:40.235 | FRR → State: FULL ✓

“`

### What the Agent Knows

From its LSDB, the agent now has **complete topology awareness**:

“`python

# Router LSA from 10.10.10.10 (FRR)

LSA(type=Router, id=10.10.10.10, adv=10.10.10.10, seq=0x80000006)

Links:

– P2P to 10.255.255.10 via 172.20.2.10 (metric 10)

– Stub 172.20.0.0/20 (metric 10)

# Router LSA from 10.255.255.10 (Agent itself)

LSA(type=Router, id=10.255.255.10, adv=10.255.255.10, seq=0x80000002)

Links:

– P2P to 10.10.10.10 via 172.20.0.2 (metric 10)

– Stub 10.255.255.10/32 (metric 1)

“`

**SPF Calculation Result:**

“`

Routing Table for 10.255.255.10

===============================================================

Destination Cost Next Hop Path

—————————————————————

10.10.10.10 10 10.10.10.10 [direct]

172.20.0.0/20 20 10.10.10.10 [via 10.10.10.10]

===============================================================

“`

**What FRR Learns:**

“`

frr# show ip route

O>* 10.255.255.10/32 [110/11] via 172.20.0.2, eth0, weight 1

“`

—

## The Debugging Journey: When Checksums Attack

Building this wasn’t straightforward. The most insidious bug? **The Fletcher-16 checksum.**

OSPF uses Fletcher checksums (RFC 2328 Appendix B) for LSA integrity. The algorithm uses ISO 8473 Annex C to calculate two bytes (X and Y) such that when included in the packet and recalculated, the result is zero.

Here’s what we kept seeing in FRR’s logs:

“`

Link State Update: LSA checksum error 4bc5/4bc5, ID=10.255.255.10

“`

Both checksums matched (4bc5/4bc5), yet FRR rejected it! This meant the checksum was **internally consistent but didn’t validate correctly**.

The bug was subtle. Our formula was:

“`python

x = ((l – p) * c0 – c1) % 255 # Wrong!

“`

The correct formula needs a `-1`:

“`python

x = ((l – p – 1) * c0 – c1) % 255 # Fixed!

“`

That single `-1` is the difference between an LSA that validates and one that doesn’t. Once fixed, FRR immediately accepted our LSAs, installed them in its database, and computed routes.

**The lesson:** When implementing protocols from RFCs, every byte matters. Every formula matters. The devil is in the implementation details.

—

## Why This Matters: The Philosophical Shift

### 1. **Protocols Are Universal, APIs Are Vendor-Specific**

OSPF is OSPF. Whether you’re running Cisco IOS-XR, Juniper Junos, Nokia SR-OS, Arista EOS, or FRRouting—if it’s RFC 2328 compliant, our agent can peer with it. No vendor-specific API clients. No parsing different CLI outputs. No YANG model variations.

**One agent. Any OSPF speaker.** Because protocols are the **universal language** of networks.

Think about human language: If you speak English, you can have a conversation with anyone else who speaks English, regardless of their nationality, culture, or background. Routing protocols work the same way. OSPF is the shared language that enables routers from different vendors to exchange information and build consensus.

**APIs are dialects. Protocols are languages.**

### 2. **Real-Time vs. Polling: Conversation vs. Interrogation**

Traditional automation polls for state:

“`

while True:

ssh router “show ip ospf neighbor”

parse output

sleep 30 seconds

repeat

“`

This isn’t conversation—it’s **interrogation**. “Tell me your state. Now tell me again. And again.”

Our agent participates in **continuous conversation**:

“`

Router A: “I’ve lost connectivity to link 10.5.3.0/24” (LSA flood)

Agent: “I received your update and updated my topology” (LSA Ack)

Router B: “I also received A’s update, recalculating SPF…”

Agent: “My SPF calculation matches: alternate path via Router C”

“`

When a link goes down, the agent knows **instantly** because the conversation is always active. No polling delay. No “check again in 30 seconds.” The network **tells** the agent what changed.

**Polling is asking. Protocol participation is listening.**

### 3. **Bidirectional Intelligence: From Observer to Participant**

Traditional automation **observes** networks:

“`

Agent → “What’s your state?” → Router

Agent ← “Here’s my state” ← Router

(Agent makes decision externally)

“`

Our agent **participates** in networks:

“`

Agent ↔ “Exchange LSAs” ↔ Router

Agent ↔ “Build shared topology” ↔ Router

Agent ↔ “Converge on consensus” ↔ Router

(Agent and router make decisions together)

“`

The agent can:

– **Learn routes** from neighbors (passive intelligence)

– **Inject routes** into the network (active influence)

– **Steer traffic** by adjusting metrics (collaborative optimization)

– **Participate in fast convergence** during failures (distributed resilience)

It’s not commanding the network—it’s **co-creating** the network’s understanding of itself alongside its peers.

**The agent isn’t outside looking in. It’s inside, participating.**

### 4. **No Credentials, No Access Control: Trust Through Protocol**

Traditional automation requires:

– Usernames and passwords

– SSH keys

– API tokens

– Role-based access control

– Audit logging for every command

– Attack surface: Credential theft, privilege escalation, API exploitation

Our agent? **It just needs to be on the same Layer 2 network.** No credentials. No privileged access. It participates in OSPF just like any other router. The protocol itself provides authentication (with optional MD5/SHA if needed).

For read-only network intelligence, this eliminates entire attack surfaces. The agent doesn’t “log in” to anything—it simply **joins the conversation** that’s already happening.

**Trust isn’t granted through credentials. Trust is established through protocol.**

### 5. **From Configuration to Conversation**

This is the fundamental paradigm shift:

**Configuration mindset:**

– Routers are passive devices

– Automation pushes commands

– State is pulled via queries

– Changes are imposed externally

– Routers don’t “know” about each other

**Conversation mindset:**

– Routers are active participants

– Automation exchanges information

– State is shared continuously

– Changes emerge from consensus

– Routers collaborate to build shared truth

OSPF already treats networks as conversations—routers exchanging beliefs, debating metrics, and converging on a shared understanding of topology. **We just let the AI agent join the conversation.**

When you stop treating the network as something to configure and start treating it as something to converse with, everything changes:

– You stop imposing state and start **observing how state emerges**

– You stop debugging by interrogation and start **debugging by listening**

– You stop controlling the network and start **collaborating with it**

**The network has always been a conversation. We just gave AI a seat at the table.**

—

## Implications for Network Operations

### SOC/NOC Use Cases

**Topology Monitoring:**

“`python

# Real-time topology change detection

def on_lsa_update(lsa):

if lsa.type == ROUTER_LSA:

old_links = topology.get_links(lsa.advertising_router)

new_links = parse_lsa_links(lsa)

if links_changed(old_links, new_links):

alert(f”Topology change: {lsa.advertising_router}”)

alert(f” Removed: {old_links – new_links}”)

alert(f” Added: {new_links – old_links}”)

# Instant SPF recalculation

spf.calculate()

notify_impact_analysis()

“`

**Intelligent Alerting:**

“`python

# Instead of “OSPF neighbor down”

# Provide: “Router A lost connectivity to Router B,

# affecting paths to subnets X, Y, Z”

def analyze_failure(neighbor_id):

affected_routes = spf.get_routes_via(neighbor_id)

alternate_paths = spf.find_alternate_paths(affected_routes)

return {

‘failed_neighbor’: neighbor_id,

‘affected_routes’: affected_routes,

‘alternate_paths’: alternate_paths,

‘expected_convergence’: calculate_convergence_time()

}

“`

**Conversational Troubleshooting:**

“`

Engineer: “Why isn’t traffic taking the direct path to 10.5.3.0/24?”

Agent: “Let me check my LSDB…

I see the direct path via 10.10.10.50 has metric 20,

but there’s an alternate path via 10.10.10.45 with metric 15.

SPF chose the lower-metric path.

The direct link was set to metric 20 by LSA from router 10.10.10.50

at 14:32:15 UTC (sequence 0x80000042).

Would you like to see the full SPF tree?”

“`

### AIOps Integration

Imagine feeding OSPF topology data directly into AI models:

“`python

# Real-time topology as graph embeddings

graph = agent.lsdb.to_networkx()

embeddings = graph_neural_network.encode(graph)

# Predict failures before they happen

prediction = model.predict_failure(embeddings)

if prediction.confidence > 0.85:

alert(f”Predicted link failure: {prediction.link}”)

alert(f”Impact: {prediction.affected_flows}”)

alert(f”Suggested mitigation: {prediction.mitigation}”)

“`

The agent doesn’t need to SSH anywhere or parse anything. **It already has the complete topology in memory.**

—

## Beyond OSPF: The Future Vision

This approach isn’t limited to OSPF. The same principle applies to any routing protocol:

### BGP: The Ultimate Application

Imagine an AI agent as a BGP peer:

“`

Agent as BGP Route Reflector Client:

– Receives full Internet routing table (900k+ routes)

– Maintains RIB and FIB in memory

– Can answer “what AS path to 8.8.8.0/24?” instantly

– Detects BGP hijacks in real-time (unexpected AS path changes)

– Participates in traffic engineering via community manipulation

“`

**The killer app:** An AI that understands global Internet routing, can detect anomalies, and participates in policy enforcement—all by being a native BGP speaker.

### ISIS: Multi-Level Topology

“`

Agent in ISIS Network:

– Participates in Level-1 and Level-2 flooding

– Understands area boundaries

– Can reason about optimal inter-area paths

– Detects suboptimal area designs

“`

### EVPN: Overlay Intelligence

“`

Agent as EVPN Peer:

– Maintains MAC/IP route table

– Understands VXLAN tunnel endpoints

– Can trace end-to-end overlay paths

– Detects MAC mobility storms

– Participates in anycast gateway scenarios

“`

### Segment Routing: Path Engineering

“`

Agent with SR-MPLS:

– Understands SID allocations

– Can calculate explicit paths with segment lists

– Participates in traffic steering

– Validates TE policies in real-time

“`

—

## Distributed Intelligence: The Network Thinks Together

Here’s where it gets really interesting: **Routers are already doing distributed intelligence.**

Think about what happens when a link fails:

1. Router A detects failure locally

2. Router A floods LSA to all neighbors

3. Each neighbor recalculates SPF independently

4. All routers converge on the **same topology view**

5. Traffic reroutes without central coordination

**This is distributed consensus without a central authority.** No controller. No orchestrator. Just peers exchanging information and independently arriving at the same conclusion.

Now imagine AI agents participating in this process:

“`

┌─────────────┐

│ Router A │

│ (Hardware) │

└──────┬──────┘

│

┌──────────────┼──────────────┐

│ │ │

OSPF LSA OSPF LSA OSPF LSA

│ │ │

▼ ▼ ▼

┌──────────┐ ┌──────────┐ ┌──────────┐

│ Router B │ │ AI Agent │ │ Router C │

│(Hardware)│ │ (Python) │ │(Hardware)│

└──────────┘ └──────────┘ └──────────┘

│ │ │

└──────────────┴──────────────┘

All have same LSDB

All run same SPF algorithm

All reach same conclusion

“`

**The AI agent isn’t centralized intelligence—it’s distributed intelligence that happens to be implemented in Python instead of hardware.**

### What This Enables

**1. Heterogeneous Intelligence**

Traditional networks: All nodes are routers (similar capabilities)

Protocol-native networks: Mix routers (fast forwarding) + AI agents (deep analysis)

The routers do what they do best: fast packet forwarding

The agents do what they do best: pattern recognition, prediction, optimization

Both participate in the same control plane.

**2. Specialized Agents**

Because agents speak native protocols, you can deploy **specialized AI peers**:

“`

Agent A: Anomaly detection specialist

– Monitors LSA update patterns

– Detects unusual flapping behavior

– Identifies potential hardware failures before they cascade

Agent B: Traffic engineering specialist

– Analyzes flow data + topology

– Calculates optimal metric adjustments

– Participates in proactive load balancing

Agent C: Security specialist

– Monitors for unauthorized routers

– Detects topology poisoning attempts

– Validates LSA authenticity patterns

Agent D: Capacity planning specialist

– Logs historical topology changes

– Predicts growth patterns

– Recommends infrastructure additions

“`

All agents participate in the **same OSPF domain**, receiving the **same LSAs**, maintaining the **same topology view**—but each applies different AI models to the data.

**3. Emergent Behavior**

When multiple intelligent agents participate in the same protocol:

“`

Router X: “Link down to Y” (LSA flood)

Agent A: “Detecting pattern: X-Y link flaps every 2 hours” (anomaly)

Agent B: “Analyzing: Temperature correlation with flap timing” (diagnosis)

Agent C: “Recommending: Check X’s interface for thermal issues” (action)

Router X: “Maintenance window scheduled” (human notified)

Agent D: “Adjusting metrics preemptively to shift traffic” (mitigation)

Network: “Converges to new stable state without X-Y link” (resilience)

“`

**No central orchestrator.** Just distributed intelligence emerging from protocol participation.

### The Philosophy: Networks as Societies

If networks are conversations, then networks with AI agents are **societies**—collections of diverse participants (routers and agents) exchanging information, building consensus, and making collective decisions.

In a society:

– Some members provide infrastructure (routers)

– Some members provide intelligence (agents)

– All members communicate in a shared language (protocols)

– Decisions emerge from consensus, not central authority

– The whole is greater than the sum of its parts

**This is the future: Not networks with AI controllers. Networks with AI citizens.**

—

## The Bigger Picture: AI as Infrastructure

This isn’t just about network automation. It’s about a fundamental shift in how we think about AI and infrastructure.

**Traditional Model:**

“`

AI → API → Infrastructure

↑

Abstraction Layer

(Loses information)

“`

**New Model:**

“`

AI = Infrastructure

(No abstraction, full information)

“`

When AI speaks the native protocol language:

– **No information loss** through abstraction layers

– **Real-time intelligence** through protocol messages

– **Bidirectional influence** as a peer participant

– **Universal compatibility** through RFC standards

This is “vibe coding” meets network engineering. The agent learned OSPF by understanding RFC 2328, not by memorizing Cisco IOS commands. It’s **protocol-native AI.**

—

## Why Protocol Participation > APIs

Let’s compare approaches:

| Aspect | Traditional APIs | Protocol Participation |

|——–|——————|————————|

| **Access Method** | SSH/NETCONF/REST | Native protocol (OSPF/BGP/ISIS) |

| **State Sync** | Polling (seconds/minutes) | Event-driven (milliseconds) |

| **Information** | Filtered through CLI/API | Raw protocol data |

| **Vendor Support** | Varies by platform | RFC-compliant = universal |

| **Credentials** | Required | None (protocol auth only) |

| **Bidirectional** | Commands only | Full peer participation |

| **Real-time** | No | Yes |

| **Topology Awareness** | Inferred from outputs | Native LSDB/RIB |

**The paradigm:**

– APIs let you **control** the network

– Protocol participation lets you **be** the network

—

## Getting Started: The Code

The full implementation is available at [GitHub link]. Key components:

**Core Files:**

– `ospf/packets.py` – Scapy packet definitions, Fletcher checksum

– `ospf/neighbor.py` – Neighbor state machine (Down→Init→2Way→ExStart→Exchange→Loading→Full)

– `ospf/hello.py` – Hello protocol handler

– `ospf/adjacency.py` – Database Description exchange

– `ospf/flooding.py` – LSA flooding and acknowledgment

– `ospf/lsdb.py` – Link State Database

– `ospf/spf.py` – SPF calculation (NetworkX)

– `wontyoubemyneighbor.py` – Main agent orchestration

**Dependencies:**

“`python

scapy>=2.5.0 # Packet manipulation

networkx>=3.0 # Graph algorithms for SPF

“`

**Running the Agent:**

“`bash

# Build container

docker build -t ospf-agent .

# Run with FRR peer

docker run –rm -it –privileged –network ospf-net \

-v $(pwd):/app ospf-agent:latest \

python3 wontyoubemyneighbor.py \

–router-id 10.255.255.10 \

–area 0.0.0.0 \

–interface eth0 \

–source-ip 172.20.0.2 \

–unicast-peer 172.20.2.10 \

–network-type point-to-point

“`

**Verification:**

“`bash

# On FRR router

show ip ospf neighbor

# Should show: 10.255.255.10 State: Full/-

show ip ospf database router 10.255.255.10

# Should show: Router LSA with 2 links

show ip route

# Should show: O>* 10.255.255.10/32 via 172.20.0.2

“`

—

## Technical Deep Dive: Key Challenges Solved

### 1. Container Networking for Raw Protocols

OSPF uses IP protocol 89, not TCP/UDP. Getting raw socket access from containers required:

– `–privileged` mode for CAP_NET_RAW

– Custom packet socket handling to strip IP headers

– Point-to-point network type to avoid multicast complexity

– Manual interface MTU configuration

### 2. Master/Slave Negotiation

Router IDs must be compared **numerically**, not lexicographically:

“`python

# Wrong: “10.255.255.10” > “10.10.10.10” = False (string comparison)

# Right: Convert to 32-bit integers

import struct, socket

our_id_int = struct.unpack(“!I”, socket.inet_aton(“10.255.255.10”))[0]

neighbor_id_int = struct.unpack(“!I”, socket.inet_aton(“10.10.10.10”))[0]

we_are_master = (our_id_int > neighbor_id_int) # True!

“`

### 3. LSA Checksum Validation

The Fletcher-16 checksum must satisfy:

“`

When you recalculate over the entire LSA (including checksum field),

the result should be C0=0, C1=0 (mod 255)

“`

Critical formula:

“`python

x = ((L – P – 1) * c0 – c1) % 255 # The -1 is essential!

“`

Where:

– L = length of data from offset

– P = position of checksum from offset

– c0, c1 = Fletcher sums

### 4. LSA Parsing with Scapy

Scapy’s RouterLSA parser had bugs parsing multiple links. We implemented a manual parser:

“`python

def parse_router_lsa_body(body_bytes):

offset = 0

flags_byte = body_bytes[0]

num_links = struct.unpack(“!H”, body_bytes[2:4])[0]

offset = 4

links = []

for i in range(num_links):

link_id = socket.inet_ntoa(body_bytes[offset:offset+4])

link_data = socket.inet_ntoa(body_bytes[offset+4:offset+8])

link_type = body_bytes[offset+8]

metric = struct.unpack(“!H”, body_bytes[offset+10:offset+12])[0]

links.append(RouterLink(

link_id=link_id,

link_data=link_data,

link_type=link_type,

metric=metric

))

offset += 12

return RouterLSA(links=links)

“`

### 5. Preventing Transit Traffic

The agent advertises its /32 as a **stub network**, not a transit network. This prevents it from being used to forward traffic between other routers:

“`python

links = [

# P2P link to neighbor (allows adjacency)

{

‘link_id’: neighbor_id, # Neighbor’s Router ID

‘link_data’: our_interface_ip,

‘link_type’: LINK_TYPE_PTP, # Point-to-point

‘metric’: 10

# Stub link for our /32 (no transit)

{

‘link_id’: our_router_id,

‘link_data’: ‘255.255.255.255’, # /32 mask

‘link_type’: LINK_TYPE_STUB, # Stub = not transit

‘metric’: 1

}

]

“`

This is how the agent learns topology without becoming part of the forwarding path.

—

## Lessons Learned

### 1. **RFCs Are Specifications, Not Suggestions**

Every detail in RFC 2328 matters. From the Fletcher checksum formula to the exact sequence of state transitions, shortcuts break interoperability.

### 2. **Protocols Are More Universal Than APIs**

Any RFC-compliant OSPF speaker can peer with our agent. Cisco, Juniper, Nokia, FRR—it doesn’t matter. Protocols are the ultimate abstraction layer.

### 3. **Real-Time Protocol Participation > Polling**

LSA updates arrive in milliseconds. Convergence happens in seconds. Polling-based automation will always be minutes behind.

### 4. **Container Networking Enables Protocol Innovation**

Docker networks with direct Layer 2/3 access let us experiment with protocol-native agents without physical infrastructure.

### 5. **Intelligence Belongs in the Control Plane**

Observability tools sit above the network. Our agent sits **in** the network. The difference is profound.

—

## What’s Next?

This OSPF agent is just the beginning. The roadmap:

**Phase 2: BGP Agent**

– Establish BGP peering with route reflectors

– Maintain full RIB in memory

– Detect BGP hijacks via AS path anomalies

– Participate in traffic engineering

**Phase 3: Multi-Protocol Intelligence**

– Single agent speaking OSPF, BGP, and ISIS

– Cross-protocol correlation (IGP topology + BGP paths)

– Detect inconsistencies between protocols

– Unified network graph

**Phase 4: Autonomous Operations**

– Self-healing networks via route injection

– Predictive failure mitigation

– Intent-based traffic steering

– Zero-touch troubleshooting

**Phase 5: LLM Integration**

– Natural language queries against live LSDB

– Conversational network exploration

– Automated root cause analysis

– AI-generated configuration recommendations

—

## Join the Revolution

We’re building a community around protocol-native AI agents. If you believe that intelligence should **participate** in infrastructure, not just **observe** it, join us:

– **GitHub:** [Repository link]

– **Discord:** [Community link]

– **Blog:** [Technical deep-dives]

– **RFC Study Group:** Learning protocols for AI integration

**Contributing:**

– Implement BGP support

– Add ISIS/EVPN protocols

– Build observability dashboards

– Create AI/ML models for topology analysis

– Write protocol parsers for other routing protocols

—

## Conclusion: The Network That Thinks—Together

We started with a simple question: What if an AI agent could speak OSPF?

The answer revealed something bigger: **What if networks aren’t meant to be configured—they’re meant to be conversed with?**

For decades, we’ve built automation that treats routers as black boxes with APIs. We push commands, pull outputs, and hope our parsers survive the next firmware update. We’ve created elaborate abstraction layers, each one losing a little more context, a little more timing, a little more truth.

**We were speaking to routers in the wrong language.**

Routers don’t want to be configured—they want to **converse**. They already do it with each other, every day, through routing protocols. They exchange beliefs about topology. They debate the best paths. They converge on shared truth. They form **distributed consensus** without any central authority.

**We just taught AI to join that conversation.**

### What Changes

When AI agents gain control-plane literacy and participate as peers:

**Configuration becomes conversation:**

– Instead of: “Tell me your OSPF neighbors” (command)

– We have: “Here’s an LSA update about my connectivity” (information exchange)

**Polling becomes participation:**

– Instead of: Check every 30 seconds for changes (latency)

– We have: Receive updates in milliseconds as they happen (real-time)

**Control becomes collaboration:**

– Instead of: External system pushes changes to routers (top-down)

– We have: Peers exchange information and converge on consensus (distributed)

**Observation becomes presence:**

– Instead of: Agent queries routers about their state (external)

– We have: Agent experiences network events as they happen (internal)

### What We Built

Traditional automation treats infrastructure as something to control from the outside. We’re building something different: **intelligence that participates from the inside**, speaking the same language, seeing the same topology, and operating as an equal peer.

This is the future of network operations:

– No more abstraction layers losing information

– No more polling delays missing fast events

– No more vendor-specific APIs fracturing ecosystems

– No more SSH parsing breaking on updates

– No more credential management attack surfaces

Just **pure protocol intelligence**, participating in the control plane, with complete topology awareness and real-time state synchronization.

### What This Means

The router isn’t a black box anymore. **It’s a neighbor.**

The network isn’t a thing to configure anymore. **It’s a conversation to join.**

The AI isn’t controlling the network anymore. **It’s participating in the network.**

And when AI participates as a peer—listening, learning, and thoughtfully responding—it gains something that external automation can never have:

**The network’s perspective.**

Not filtered through CLIs. Not delayed by polling. Not abstracted through APIs. Just the raw, real-time conversation that routers have been having all along.

**We gave AI a seat at the table.**

And now, the network thinks together—routers and agents, hardware and software, protocol peers collaborating to build a shared understanding of the world.

This isn’t automation 2.0.

**This is distributed intelligence 1.0.**

—

## The Call to Action

Networks are already conversations. Routing protocols are already distributed intelligence. The infrastructure is already collaborative.

**We just haven’t been listening.**

What if we stopped trying to control networks from the outside and started participating in them from the inside?

What if our AI agents could:

– Speak BGP and understand global Internet routing?

– Participate in EVPN and trace overlay paths?

– Run ISIS and detect suboptimal area designs?

– Speak PCEP and calculate optimal TE paths?

What if instead of building another abstraction layer, we taught AI to speak the protocols that routers already use to talk to each other?

**The conversation is already happening. It’s time for AI to join.**

—

## Acknowledgments

Built with:

– Python 3.11

– Scapy (packet manipulation)

– NetworkX (graph algorithms)

– FRRouting (interoperability testing)

– Docker (container networking)

– RFC 2328 (OSPF specification)

– Countless hours debugging Fletcher checksums

Special thanks to Mr. Rogers for the inspiration: “Won’t you be my neighbor?” 🏡

—

**Date:** January 16, 2026

—

*”It’s a beautiful day in the neighborhood, a beautiful day for a neighbor. Would you be mine? Could you be mine? Won’t you be my neighbor?”*

— Fred Rogers (and now, OSPF routers everywhere)

0 Shares

I Taught an AI Agent to Speak OSPF: It’s Now My Router’s Neighbour

2 Replies to “I Taught an AI Agent to Speak OSPF: It’s Now My Router’s Neighbour”

Leave a Reply Cancel reply