A/B Test Setup Skill: Plan Statistically Sound Experiments

Introduction

The ab-test-setup skill acts as an expert growth and experimentation consultant. It tackles the common pitfall of “let’s just launch and see what happens” by enforcing strict statistical rigor, structured hypotheses, and proper sample size calculations. This ensures your A/B tests yield actionable, data-driven business decisions rather than false positives.

Concept

Ensures statistical validity by enforcing structured hypotheses, single-variable testing, and pre-determined sample sizes before launching any experiment.

Setup and Usage

There are several ways to install the skill:

Method 1: In the OpenClaw or Hermes Agent chat window, directly tell the Agent: “Please help me install the ab-test-setup skill.” (Easiest)
Method 2: Visit the skillhub website, install the skillhub store first, and then install the corresponding skill. (Suitable for Chinese users)
Method 3: Visit the Skills.sh website, search for the skill name on the homepage, and use the provided command to install it. (Suitable for technical users)
Method 4: Visit the Clawhub website, search for the skill name, click the download button to get the zip file, extract it, and place it in the skills directory of OpenClaw.

Skill Workflow Analysis

Context Assessment: Prioritizes reading existing context files like .agents/product-marketing-context.md to avoid redundant questions. It comprehensively evaluates the testing environment, baseline metrics, and technical constraints.
Hypothesis Framing: Enforces a strict MAD-libs style structure: “Because [data], we believe [change] will cause [outcome] for [audience].” This immediately filters out weak, non-falsifiable ideas.
Test Type and Sample Size Determination: Selects the appropriate test type (A/B, A/B/n, MVT) based on complexity, and uses built-in reference charts to mandate sample sizes upfront based on the expected lift.
Metrics Selection: Strictly categorizes metrics into Primary (business value), Secondary (context), and Guardrail (harm prevention) to build a holistic observation framework.
Variant Design and Traffic Allocation: Adheres to the single-variable principle and assigns traffic strategies (e.g., 50/50 split, 90/10 conservative, or ramping) based on risk tolerance.
Anti-Peeking Discipline: Strongly advises against checking results before reaching the target sample size, preventing premature test termination and statistical errors.

Skill Design Evaluation

ab-test-setup skill Audit

Reference Links

SKILL.md