Skip to content
🤖 AI-optimized docs: llms-full.txt

Scripts in Skills

Scripts handle work that has clear right-and-wrong answers (validation, transformation, extraction, counting) so the LLM can focus on judgment, synthesis, and creative reasoning.

Without scripts, every operation in a skill runs through the LLM. That means:

  • Non-deterministic results. Ask an LLM to count tokens in a file three times and you may get three different numbers. Ask a script and you get the same answer every time.
  • Wasted tokens and time. Parsing a JSON file, checking if a directory exists, or comparing two strings are mechanical operations. Running them through the LLM burns context window and adds latency for no gain.
  • Harder to test. You can write unit tests for a script. You cannot write unit tests for an LLM prompt.

The pattern shows up everywhere: skills that try to LLM their way through structural validation are slower, less reliable, and more expensive than skills that offload those checks to scripts.

The design principle is intelligence placement: put each operation where it belongs.

Scripts HandleLLM Handles
Validate structure, format, schemaInterpret meaning, evaluate quality
Count, parse, extract, transformClassify ambiguous input, make judgment calls
Compare, diff, check consistencySynthesize insights, generate creative output
Pre-process data into compact formAnalyze pre-processed data with domain reasoning

The test: Given identical input, will this operation always produce identical output? If yes, it belongs in a script. Could you write a unit test with expected output? Definitely a script. Requires interpreting meaning, tone, or context? Keep it as an LLM prompt.

Skills must work across macOS, Linux, and Windows. Bash is not portable.

FactorBashPython
macOS / LinuxWorksWorks
Windows (native)Fails or behaves inconsistentlyWorks identically
Windows (WSL)Works, but can conflict with Git Bash on PATHWorks identically
Error handlingLimited, fragileRich exception handling
TestingDifficultStandard unittest/pytest
Complex logicQuickly becomes unreadableClean, maintainable

Even basic commands like sed -i behave differently on macOS vs Linux. Piping, jq, grep, awk. All of these have cross-platform pitfalls that Python’s standard library avoids entirely.

Safe bash commands that work everywhere and remain fine to use directly:

CommandPurpose
git, ghVersion control and GitHub CLI
uv runPython script execution
npm, npx, pnpmNode.js ecosystem
mkdir -pDirectory creation

Everything beyond that list should be a Python script.

Python’s standard library covers most script needs without any external dependencies. Stdlib-only scripts run with plain python3, need no special tooling, and have zero supply-chain risk.

NeedStandard Library
JSON parsingjson
Path handlingpathlib
Pattern matchingre
CLI interfaceargparse
Text comparisondifflib
Counting, groupingcollections
Source analysisast
Data formatscsv, xml.etree

Only reach for external dependencies when the stdlib genuinely cannot do the job: tiktoken for accurate token counting, pyyaml for YAML parsing, jsonschema for schema validation. Each external dependency adds install-time cost, requires uv to be available, and expands the supply-chain surface. The BMad builders require explicit user approval for any external dependency during the build process.

Python scripts in skills use PEP 723 inline metadata to declare their dependencies directly in the file. Combined with uv run, this gives you npx-like behavior: dependencies are silently cached in an isolated environment, no global installs, no user prompts.

#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.10"
# dependencies = ["pyyaml>=6.0"]
# ///
import yaml
# script logic here

When a skill invokes this script with uv run scripts/analyze.py, the dependency (pyyaml in this example) is automatically resolved. The user never sees an install prompt, never needs to manage a virtual environment, and never pollutes their global Python installation.

Without PEP 723, skills that need libraries like pyyaml or tiktoken would force users to run pip install, a jarring experience that makes people hesitate to adopt the skill.

Skills run in multiple environments: CLI terminals, desktop apps, IDE extensions, and web interfaces like claude.ai. Not all environments can execute Python scripts.

The principle: scripts are the fast, reliable path, but the skill must still deliver its outcome when execution is unavailable.

When a script cannot run, the LLM performs the equivalent work directly. This is slower and less deterministic, but the user still gets a result. The script’s --help output documents what it checks, making the fallback natural. The LLM reads the help to understand the script’s purpose and replicates the logic.

Frame script steps as outcomes in the SKILL.md, not just commands:

ApproachExample
Good”Validate path conventions (run scripts/scan-paths.py --help for details)“
Fragile”Execute python3 scripts/scan-paths.py” with no context

The good version tells the LLM both what to accomplish and where to find the details, enabling graceful degradation without additional instructions.

Look for these signal verbs in a skill’s requirements; they indicate script opportunities:

SignalScript Type
”validate”, “check”, “verify”Validation
”count”, “tally”, “aggregate”Metrics
”extract”, “parse”, “pull from”Data extraction
”convert”, “transform”, “format”Transformation
”compare”, “diff”, “match against”Comparison
”scan for”, “find all”, “list all”Pattern scanning

The builders guide you through script opportunity discovery during the build process. If you find yourself writing detailed validation logic in a prompt, it almost certainly belongs in a script instead.