Beyond the Prompt: Training Agentic AI

Jul 31

At Stokes AI, we live and breathe data. We spend our days sourcing, annotating, and refining the information that powers the artificial intelligence revolution. From Reinforcement Learning from Human Feedback (RLHF) to Supervised Fine-Tuning (SFT), our goal has always been to provide AI developers with the diverse, high-quality training data they need. But a recent project has pushed the boundaries of our work and given us a front-row seat to the next evolution of AI: the rise of autonomous, agentic systems.

We recently had the incredible opportunity to help train a sophisticated agentic AI designed to manage a smart factory. This wasn't about teaching an AI to answer questions or generate reports. This was about teaching it to act. It’s an experience that has fundamentally reshaped my perspective on the future of AI and the critical importance of the human element in its development.

An AI That Sees and Acts

The setup was incredible. We were working within a hyper-realistic digital simulation of a smart factory—a simulated ecosystem of machines, sensors, robotic arms, and logistics systems, all generating data in real-time. Our partner's agentic AI had administrative access to this environment. Its "tools" were not just for observation; they were commands. With the right tool call, the AI could power down a machine, summon a maintenance crew, reroute a conveyor belt, or re-prioritize the entire production schedule.

My role in this was to act as the human overseer, guiding the AI’s learning process. What’s crucial to understand is that I wasn’t spoon-feeding it information. I would provide simple, high-level prompts—the kind a real factory manager would give. It was the AI's job to do the rest. It had to proactively pull data from sensor logs, machine statuses, and error reports to understand the context, reason about the best course of action, and then execute its decisions.

The agent's job was to be the ultimate factory supervisor, ensuring stability, productivity, and safety. Our job was to be its teacher, its mentor, and its toughest critic.

Here’s a look at what that training actually looked like.

Scenario 1: The Vague Priority Order

My morning might begin with a simple, goal-oriented prompt for the AI:

My Prompt: "A priority shipment of Product Y is due in 4 hours."

I provide no other details. I don't tell it which machines are free or if supplies are low. The AI has to figure that out from the details of the simulated environment. A high-quality response, which we train it to achieve, involves a sophisticated chain of reasoning:

Information Gathering: The first thing the agent does is read the environment. It pulls the live production schedule, checks the operational status of all machines capable of making Product Y, and queries the inventory system for the necessary raw materials.
Analysis: It sees that the primary assembly line for Product Y is already at capacity running a different job. However, it identifies two smaller, secondary machines that are currently idle. It also reviews their maintenance logs and sees one has a pending, non-critical alert for lubrication.
Action & Tool Calls: Based on its analysis, the AI makes a series of autonomous decisions:

Tool Call: schedule_maintenance(machine='B-12', type='lubrication', priority='low') - It addresses the maintenance issue on the idle machine before it becomes a problem.
Tool Call: add_to_queue(machine='B-11', product='Product Y', quantity=1500) - It immediately allocates the priority job to the other free machine.
Tool Call: generate_report(content="Priority order for Product Y is underway. Allocated to Machine B-11. Preemptively scheduled maintenance for Machine B-12 to prepare it for overflow capacity. No delays anticipated for the 4-hour deadline.")

Scenario 2: The Subtle Decline

Sometimes, I'd give the AI a seemingly routine task to test its investigative abilities.

My Prompt: "Run a standard diagnostic on the Delta production line."

The agent's task is to define what "standard diagnostic" means. It doesn't just check for red lights and error codes.

Information Gathering: The AI begins a comprehensive data pull: energy consumption, vibration sensor data, machine temperatures, and, most importantly, the output logs from the end-of-line computer vision quality control scanner.
Analysis: Everything looks normal at first glance. No alarms are triggered. But a truly intelligent agent digs deeper. It analyzes the QC data for the products created by the Delta production line over the last 24 hours and detects a minuscule, 0.7% increase in microscopic surface scratches—not enough to fail a product, but a statistically significant upward trend.
Correlation: The AI then cross-references the timeline of this quality degradation with other factory events. It finds the trend began just after a new batch of raw polymer was introduced to the Delta line.
Action & Tool Calls: Its response is one of proactive prevention.

Tool Call: flag_for_review(item='Raw Polymer Batch #9C', reason='Correlated with minor increase in surface abrasion on Delta line.')
Tool Call: increase_qc_sensitivity(scanner='QC-Delta-01', parameter='abrasion', value='10%') - It makes the scanner temporarily more sensitive to gather more data on the issue.
Tool Call: generate_report(content="Delta line diagnostic complete. All systems nominal. However, I have detected a subtle upward trend in surface abrasions correlated with Raw Polymer Batch #9C. The batch has been flagged for human review. No immediate production impact.")

While many of our training exercises involve giving the AI a direct prompt, the ultimate goal is for the agent to manage the factory proactively, without needing a human to tell it where to look. By constantly monitoring the endless flow of data from sensors, logs, and schedules, the AI can act on opportunities and threats a human might miss.

Here are some examples of the agent operating completely on its own, based purely on what it observes in its simulated world.

Scenario 3: The Opportunistic Optimizer

Trigger: No human prompt. The AI constantly monitors both internal factory data and external data feeds, including the real-time electricity pricing from the local utility grid.
Information Gathering: At 2:00 PM, the AI ingests an updated electricity price forecast. It notes that due to high grid demand, the price per kilowatt-hour will be 35% higher than average between 3:00 PM and 6:00 PM. It will then drop to 50% below average from 10:00 PM to 4:00 AM. Simultaneously, it scans its own production queue.
Analysis: The AI identifies three large-batch, energy-intensive jobs currently scheduled for the afternoon: a series of long-run heat treatments for steel components. These jobs are marked as "standard priority" and are not due for completion for another 72 hours. The agent calculates that running these jobs during the peak price window will cost an estimated $12,400 in energy, whereas running them overnight would cost only $5,700.
Autonomous Action & Tool Calls: The AI requires no human approval for this optimization.

Tool Call: pause_queue_item(item_id='Job-HT-771')
Tool Call: pause_queue_item(item_id='Job-HT-772')
Tool Call: reschedule_queue_item(item_id='Job-HT-771', start_time='22:00')
Tool Call: reschedule_queue_item(item_id='Job-HT-772', start_time='22:30')
Tool Call: generate_log_entry(level='info', content='Rescheduled non-urgent heat treatment jobs to off-peak energy window for estimated cost savings of $6,700.')

Scenario 4: The Prioritization Dilemma (Requires Human Decision)

Trigger: No human prompt. The AI’s internal scheduler ingests two new, separate orders that have been entered into the system by the sales department, both flagged with the highest "URGENT" priority.
Information Gathering: The first order is for 1,000 units of "Product A" for a long-standing, major client. The second is for 300 units of "Component B," a critical part needed by another factory within the same parent company to prevent a line-down situation there. The AI automatically scans machine availability, material inventory, and projected run times.
Analysis: The AI determines it's impossible to fulfill both URGENT orders simultaneously without one impacting the other. Both orders require time on the same high-precision CNC machine. It models two outcomes:

Option Alpha: Prioritize Product A. It will be completed on time. However, this will delay the shipment of Component B by an estimated 3 hours, causing a potential cascade of delays at the sister factory.
Option Beta: Prioritize Component B. It will ship on time, preventing the line-down situation. However, this will delay the high-priority shipment to the external client by 2.5 hours, potentially violating the service-level agreement (SLA).

The AI recognizes this isn't a purely logical or efficiency-based decision; it's a strategic business decision. It doesn't know the financial penalty of violating the SLA versus the cost of the sister factory's downtime.
Autonomous Action & Escalation:

`Tool Call: alert_human_supervisor(recipient='operations_manager', subject='URGENT: Resource Conflict - Action Required', body='I have a scheduling conflict between two URGENT priority orders. Please advise on the course of action.

Option Alpha: Complete Product A order on time. [Consequence: Component B will be delayed by 3 hours.]
Option Beta: Complete Component B order on time. [Consequence: Product A will be delayed by 2.5 hours and may violate our client SLA.]
Which option should I execute?')`

Scenario 5: The Ghost in the Machine (Requires Human Decision)

Trigger: No human prompt. The AI is performing its continuous, routine monitoring of all factory equipment.
Information Gathering: The agent detects highly anomalous readings from a single vibration sensor on a critical gearbox (Gearbox-C4). The readings are not just high; they are erratically spiking and dropping in a pattern that does not conform to any known failure signature in its database. The AI immediately checks all adjacent sensors—temperature, pressure, and rotational speed for Gearbox-C4—and they all report perfectly normal conditions.
Analysis: The AI is faced with a contradiction. The sensor's readings are nonsensical, suggesting a sensor malfunction. However, the AI runs a remote diagnostic self-test on the sensor itself, and the sensor reports back STATUS: HEALTHY. It now has two conflicting pieces of high-confidence data: a healthy sensor is providing impossible data.

Hypothesis 1: The sensor has a novel, undetectable fault. Its data is junk and should be ignored. The gearbox is fine.
Hypothesis 2: The sensor is working perfectly and is detecting a genuinely new and unknown type of physical phenomenon or impending failure mode that the AI has never been trained on.

Shutting down the line based on a potentially faulty sensor would cost thousands in lost production. Ignoring a sensor that might be giving a valid, critical warning could lead to catastrophic equipment failure. The AI has no precedent to help it choose.
Autonomous Action & Escalation:

Tool Call: flag_for_review(item='Sensor_Vibe_GBC4', reason='Data anomaly does not match known failure patterns and conflicts with self-test results.')
`Tool Call: alert_human_supervisor(recipient='maintenance_lead', subject='Anomaly Detected on Gearbox-C4 - Guidance Needed', body='I have detected irreconcilable data from the vibration sensor on Gearbox-C4. The data is erratic, but the sensor passes its own diagnostic tests. I am unsure whether to trust the sensor and initiate an emergency shutdown, or to distrust the sensor and allow operations to continue. Please advise on the best course of action:

Action 1: Initiate immediate, precautionary shutdown for physical inspection. [Risk: High probability of unnecessary downtime.]
Action 2: Ignore the sensor data and schedule a routine inspection for the next maintenance window. [Risk: Potential for catastrophic equipment failure if the sensor is correct.]

My job is first to create the simulated environments (with all the logs and information coming from sensors and machines) for the agentic AI, sometimes with human prompts. I then get the agentic AI to give a response (that could include tool calls) and then I annotate this entire sequence. Did the agentic AI make the logical choice? Did the agentic AI use the right tools and make the right commands? Did it act proactively on the maintenance log? Was its report clear and concise? This is how the AI learns judgment.

If the agentic AI makes suboptimal decisions, I provide the correct decisions for the agentic AI. This is what the agentic AI’s training data consists of. And this is what trains agentic AI to reason with logic using all contextual data for decision-making.

The tasks and tools that the agentic AI is trained to do is the kind of insight that can save a company millions in recalls down the road. The kind of insight that can drastically boost a smart factory’s efficiency and effectiveness. It's a level of analysis that would be incredibly difficult for a human supervisor to perform in real-time across an entire factory.

The Immense Responsibility of Building a Digital Mind

This project was a powerful reminder that with agentic AI, the stakes are astronomically higher. When an AI moves from generating text to controlling physical machinery, the margin for error evaporates. An LLM hallucination in a chatbot is an inconvenience; a hallucinated tool call in a smart factory could be a catastrophe.

This is why the quality of training is not just a feature; it's the bedrock of safety and trust. We are not merely teaching an AI to pass a test; we are instilling in it a form of judgment. The future will not be about just having the most powerful models, but about having the best-trained models.

This experience, a deep dive into the practical application of AI in a complex, real-world scenario, brings to mind my early fascination with optimization algorithms back in high school. That project, which used evolutionary algorithms for task allocation in IIoT, was a theoretical exploration of the very problem I now help solve in a tangible way. It's a journey from theory to a reality where our work at Stokes AI directly contributes to building more reliable, efficient, and safer automated systems.

The future of agentic AI is one of human-machine collaboration at a scale we’re only just beginning to comprehend. It’s a future where AI agents handle the complex, data-intensive optimization of our physical world, freeing up human ingenuity for grander challenges. And we are proud to be on the front lines, helping to build that future responsibly.

Anderson Lu

Beyond the Prompt: Training Agentic AI

STOKES AI

Contact

Beyond the Prompt: Training Agentic AI

Why I Started Stokes AI: A Personal Story

STOKES AI

Contact