A/B Testing

AgentCart lets you run A/B tests on product descriptions to find out which version performs better with AI agents. Different agents see different description variants in real UCP and MCP responses — this isn’t a simulation.

How it works

When a test is running, AgentCart splits incoming agent traffic between your description variants. The split is deterministic: the same agent platform from the same IP on the same day always sees the same variant. This ensures consistent behavior within a session without requiring cookies or agent-side state.

Each time an agent fetches your catalog (via search or lookup), AgentCart records which variant was served. If that product later appears in a checkout session, the exposure is marked as a conversion.

Creating a test

Go to your store’s A/B Tests tab
Click New Test
Select a product — search by name from your catalog
Name your test — e.g. “Holiday description test” or “Technical vs. conversational”
Write your variants:
- Variant A (Control) — pre-filled with the current description (approved optimization or Shopify original)
- Variant B — pre-filled with the suggested description if one exists, or blank
Choose a traffic split:

Split	When to use
50/50	Default. Equal exposure, fastest to reach significance.
70/30	When you want most traffic on the control while testing a new variant.
80/20	Conservative. Minimal risk to current performance.

Click Save as Draft to review later, or Start Test to begin immediately

Test lifecycle

State	What it means
Draft	Test is saved but not running. No traffic is split. You can edit variants and settings.
Running	Agents receive different variants based on the traffic split. Exposures are recorded.
Paused	Traffic splitting stops. All agents see the pre-test description. Data is preserved — you can resume.
Completed	Test is finished. Data is frozen. You can review results and pick a winner.

Actions:

Start — begins the test, snapshots the current description for safe revert
Pause — temporarily stops the test without losing data
Resume — restarts a paused test
Stop — ends the test permanently, moves to Completed

Only one test can be active (draft or running) per product at a time.

Reading results

The test detail view shows:

Metric	What it means
Impressions	Number of times each variant was served to an agent
Conversions	Number of times a product with that variant appeared in a checkout
Conversion rate	Conversions / Impressions for each variant
Lift	How much better (or worse) the variant performs compared to the control, as a percentage
Statistical confidence	How likely it is that the difference between variants is real and not due to chance

A daily chart shows impressions per variant over time so you can spot trends.

What “statistically significant” means

AgentCart uses a standard two-proportion z-test. When confidence reaches 95% or higher, the result is marked as significant — meaning there’s less than a 5% chance the observed difference is random. Below that threshold, the difference might just be noise.

Applying a winner

After stopping a test:

Review the results in the test detail view
Click Apply Winner on the variant you want to keep
That variant’s description becomes the approved description for the product — served to all agents going forward

If you stop a test without picking a winner, the product reverts to its pre-test description.

Tips for running good tests

Run for at least 7 days — shorter tests may not capture enough agent traffic patterns
Wait for significance — don’t pick a winner based on a small number of impressions
Test one thing at a time — if you change both the tone and the content, you won’t know which made the difference
Make meaningful differences — small wording tweaks are unlikely to produce measurable results. Test genuinely different approaches (e.g. feature-focused vs. benefit-focused, technical vs. conversational)
Check traffic first — products that rarely appear in agent searches won’t generate enough data. Use the Agent Activity tab to identify high-traffic products