logo

NJP

AI Agent Masterclass - Session 4

New article articles in ServiceNow Community · Jan 17, 2026 · article

AI Agent Masterclass • Session 4

Behind the AI Agent

Prompts, Data & Control

← Back to Masterclass Overview

Date

15 January 2026

Duration

60 minutes

Speakers

Timo Weber, Thomas Geering, Javier Lombana Dominguez

⏳ Recording Coming Soon

🎯 Key Takeaways

  • Context Engineering: Providing the right information at the right time – too much context = noise, too little = hallucinations
  • Memory Types: Short-term (conversation), Long-term (preferences), and Episodic (specific events) memory
  • Golden Datasets: Systematic evaluation with representative test cases and expected outcomes
  • Metrics First: Define KPIs BEFORE building, not after launch – both business and quality metrics

From "Prompt Guessing" to "Agent Engineering"

Many teams treat AI agent development like guesswork – trying random prompts, hoping for good results, with no systematic way to measure or improve. This session changes that.

❌ The Problem

  • Trial-and-error prompting
  • No defined metrics
  • Can't prove value
  • Don't know when to stop

✅ The Solution

  • Systematic engineering
  • Defined metrics upfront
  • Structured testing
  • Continuous optimization

Key Insight: "We need to move away from guessing at prompts to systematic engineering of AI agents. Without metrics, you can't prove value or know when to stop."

The AI Agent Factory Framework

Think of agent development as a factory with distinct stations. Each station has specific inputs, processes, and outputs:

🎯

Design

Define purpose & scope

🔧

Build

Implement & configure

🧪

Test

Golden datasets

Optimize

Improve prompts

📊

Monitor

Continuous observation

🚀

Scale

Roll out & expand

Context Engineering

Context Engineering is the art of providing the AI agent with exactly the right information at the right time to make informed decisions.

The Challenge: LLMs have limited context windows. Too much context creates noise and slower responses. Too little context leads to poor decisions and hallucinations. The solution? Dynamically assemble only relevant context.

Context Engineering Best Practices

  • Filter aggressively: Only include information relevant to the current task
  • Prioritize by relevance: Most important context first, within token limits
  • Use structured formats: JSON, tables, or clear sections help the LLM parse
  • Test context combinations: Different contexts produce different results

Memory Types in ServiceNow

ServiceNow provides three types of memory for AI Agents, each serving a different purpose:

💭 Short-Term Memory

What: Current conversation context

Duration: Single session

Example: Chat history within current interaction

🧠 Long-Term Memory

What: Persistent user information

Duration: Across sessions

Example: User preferences, historical patterns

📅 Episodic Memory

What: Specific past events

Duration: Event-based

Example: "Last week user X had issue Y..."

Metrics: Business vs. Quality

Successful AI agents require both business and quality metrics – and they must be defined BEFORE you start building:

💰 Business Metrics

  • ROI / Cost Savings
  • Time Saved
  • User Satisfaction (CSAT)
  • Deflection Rate
  • Tickets Resolved

📊 Quality Metrics

  • Accuracy / Precision
  • Response Time
  • Error Rate
  • Completion Rate
  • Hallucination Rate

Critical Rule: "KPIs must be defined BEFORE building, not after launch. If you don't know what success looks like, you can't measure it."

Evaluation with Golden Datasets

Golden Datasets are curated collections of test cases with known expected outcomes. They're essential for systematic agent evaluation:

The Evaluation Process

Step Action
1 Create Golden Dataset – Representative test cases with expected outcomes
2 Select Agent – Choose which agent configuration to evaluate
3 Define Metrics – Accuracy, Relevance, Helpfulness, Safety
4 Run Evaluation – Execute all test cases systematically
5 Analyze Results – Identify patterns, failures, opportunities
6 Iterate – Improve agent, re-test, compare results

Pro Tip: Your Golden Dataset should include edge cases, not just happy paths. Include examples where you expect the agent to fail gracefully, ask clarifying questions, or escalate to humans.

Knowledge Graphs: Connected Context

Knowledge Graphs represent connected knowledge as nodes and edges, showing relationships between entities. This enables context-aware queries that go beyond simple keyword matching:

[User: Max Müller] --works_in--> [Department: IT]

--has_role--> [Role: Developer]

--uses--> [System: ServiceNow]

--reported--> [Incident: INC0012345]

Benefits for AI Agents:

  • Better understanding of relationships between data
  • More precise, contextually grounded answers
  • Fewer hallucinations due to explicit relationship constraints

🚀 Your Next Steps

  • Define Your Metrics: Before building any agent, define what success looks like
  • Create a Golden Dataset: Start with 20-30 representative test cases for your use case
  • Explore Agentic Evaluations: Check out the resources below to set up systematic testing
  • Complete the Journey: Join us for Session 5 on Data, Scale & Governance

📚 Resources

← Previous: Session 3 Overview Next: Session 5 - Data & Governance →

Last updated: January 2026

View original source

https://www.servicenow.com/community/now-assist-articles/ai-agent-masterclass-session-4/ta-p/3469309