ScreenKiteScreenKite|Guide
    • Installing ScreenKite
    • System Requirements
    • Setting Up Permissions
    • New Recording
    • Recording Full Display
    • Recording a Window
    • Recording an Area
    • Webcam & Microphone
    • System Audio
    • Recording iOS Devices
    • Keyboard Shortcuts
    • Auto Zoom
    • Configuring Zoom Settings
    • Project Editor Overview
    • Timeline & Tracks
    • Trimming & Splitting
    • Appearance Customization
    • Device Frames
    • Agentic Video Editing
    • Export Settings
    • Common Issues
    • Permissions & Access
    ← ScreenKite homepage
    Guide/Editing

    Agentic Video Editing

    Open your .skbundle recording in ScreenKite, then prompt your AI agent (Claude Code, Codex, Gemini CLI, or any agent with ScreenKite's MCP tools). The agent handles two things: cutting the transcript and generating B-roll with scene layouts. You review and approve; it executes.

    For community workflows, prompts, and skill packs: github.com/ScreenKite/awesome-ai-video-editing


    Prompting Your Agent

    You don't write code. You write a sentence. The agent calls ScreenKite's CLI and MCP tools on your behalf.

    Claude Code

    # Start an interactive session in your project folder
    claude
    
    # Then type:
    Open ~/Desktop/Recording.skbundle and do a transcript cut. Plan the cuts first.
    
    # Or one-shot from the terminal
    claude "Open ~/Desktop/Recording.skbundle, transcribe the mic with ElevenLabs, plan all cuts before executing"
    

    Codex CLI

    codex "Open ~/Desktop/Recording.skbundle and do a transcript cut — plan first, then wait for my approval"
    
    # B-roll in one go
    codex "Open ~/Desktop/Recording.skbundle, transcribe and cut, then add medium-density B-roll with a centered layout"
    

    Gemini CLI

    gemini "Open ~/Desktop/Recording.skbundle. Transcribe the mic, plan the cuts, and show me the list before touching the timeline."
    

    What the agent actually calls

    Under the hood, every session starts with:

    # Open the project
    '/Applications/ScreenKite.app/Contents/MacOS/ScreenKite' agent project open \
      --path ~/Desktop/Recording.skbundle --json
    
    # Read project state
    '/Applications/ScreenKite.app/Contents/MacOS/ScreenKite' agent tool call \
      --name getProjectState --input-json '{"scope":"summary"}' --json
    

    You can run these yourself to inspect state at any point. --json on every call makes output machine-readable.


    Skills

    Skills are pre-built prompt bundles that teach the agent the full workflow so you don't have to describe it from scratch. Install them once; reference them by name in any session.

    Install

    npx skills add ScreenKite/awesome-ai-video-editing
    

    Available skills

    use-screenkite-advanced-b-roll — Full pipeline: transcribe with ElevenLabs, pack to phrase view, proofread proper nouns, propose visual menu with density bundles, generate Hyperframes compositions in parallel, render to MP4, apply setSceneLayout DSL with magicMove transitions.

    claude "use the use-screenkite-advanced-b-roll skill on ~/Desktop/Recording.skbundle. Cute visuals, centered layout, medium density."
    

    video-use — Transcript-focused editing: transcribe, pack, plan cuts, confirm, execute. Also handles color grade, subtitles, and animation overlays via FFmpeg when working outside ScreenKite.

    claude "use the video-use skill. Transcribe ~/Desktop/Recording.skbundle and plan a cut."
    

    Invoking a skill in Claude Code

    If you have Claude Code open interactively, type the skill name as a slash command:

    /use-screenkite-advanced-b-roll
    

    The skill loads its instructions and prompts you for the recording path.


    Part 1 — Transcript-Driven Cuts

    What the agent does

    1. Transcribes your microphone track with ElevenLabs Scribe — word-level timestamps, cached so it never re-uploads the same file
    2. Packs the raw JSON into a readable phrase view (phrases break on silences ≥ 0.5s)
    3. Proofreads every product name or proper noun via web search — ASR regularly mishears names (e.g. "ScreenKite" transcribed as "Screencast"); wrong names spread into every downstream caption and visual
    4. Proposes a cut list with exact time ranges and a plain-English reason for each cut
    5. Waits for your approval before touching the timeline
    ⚠️

    Timeline cuts cannot be undone via the CLI. The agent always shows the full cut list and waits for your confirmation. Review it — especially the first and last boundaries.

    Sample prompt

    Open ~/Desktop/Recording.skbundle and transcribe + cut.
    Show me the cut plan first, don't touch the timeline yet.
    ElevenLabs key is in ~/.config/env/elevenlabs.env
    

    The agent returns something like:

    [000.06–000.66]  "HelloPro."          → false start
    [002.14–002.56]  "Hello."             → second false start
    --- CUT [0 → 2.98s] ---
    [011.84–012.18]  "Uh,"                → filler      CUT [8.72–10.00]
    [043.60–045.16]  "And let's see."     → transition  CUT [40.48–42.60]
    
    Result: 58.8s → 51.2s
    Apply these 3 cuts?
    

    Reply yes and all cuts apply in one call via editTimeline(action: "cut", {ranges: [...]}).

    What gets cut

    • False starts — anything before the real first sentence (mic checks, repeated greetings)
    • Filler words — isolated "Uh," "Um," "Like" with sufficient silence on both sides
    • Transition phrases — "And let's see," "OK so," "Anyway" that pad between beats

    The agent never cuts mid-word, pads every cut edge 100–150ms from word boundaries, and prefers silences ≥ 400ms as cut targets.


    Part 2 — Automatic B-Roll Generation

    After cuts, the agent maps the transcript to beats and generates an animated visual for each using Hyperframes (HTML + GSAP → MP4). Each visual is placed as a scene layout in ScreenKite with a magicMove transition.

    Layout styles

    Corner PiP — screen recording fills the canvas, B-roll appears as a corner accent (40–42% width). Best for tutorials where the screen content is the main story.

    Centered B-roll — screen recording minimizes to top-left (~38%), B-roll plays centered (~56% width). Best for product intros where the visual should be prominent.

    # Corner PiP (default)
    claude "add B-roll with corner layout"
    
    # Centered
    claude "add B-roll — minimize the screen to top left, B-roll centered, medium density, cute visuals"
    

    What the agent does

    1. Beat mapping — maps cut transcript phrases to beats: product name, key feature, workflow, CTA
    2. Density choice — proposes Sparse (4), Medium (7), or Dense (10); shows a slot menu; waits for your pick
    3. Parallel generation — dispatches one sub-agent per slot simultaneously; each writes a full 1920×1080 Hyperframes composition
    4. Serial renders — renders each slot to MP4 in sequence (parallel Chrome spawns corrupt frames)
    5. DSL application — calls setSceneLayout for each time window with your chosen layout

    The visual contract

    Every generated visual follows these rules:

    • Full-frame content — the 1920×1080 MP4 is the PiP frame; content fills it edge-to-edge (placing a small card inside a mostly empty frame buries it in a corner-of-a-corner)
    • Entry → hold → no internal exit — visuals animate in (0–1.5s), settle into a readable hold, and stop. magicMove handles the exit. Internal fade-outs produce a broken double-exit.
    • Large typography — display text 160–220px, body 48–72px; at 40–56% width this stays legible on screen

    Density bundles

    BundleSlotsSpacingFeel
    Sparse4~13s apartClean, documentary
    Medium7~7s apartBalanced (default)
    Dense10~5s apartExplainer energy

    Sample prompt

    Recording is cut. Add B-roll:
    - Centered layout (screen top-left, B-roll center)
    - Medium density
    - Cute, warm visuals
    - All text in English
    

    Iterating on one slot

    Slot 3 should show a Swift logo instead of the Apple emoji.
    Re-render slot 3 and re-apply.
    

    The agent regenerates only that slot and re-applies its DSL window. Everything else stays.

    ⚠️

    When re-applying a layout window over a previously longer one, leftover "tail" segments can keep playing. The agent clears these automatically. If you apply setSceneLayout manually and see B-roll running too long, call setSceneLayout with mode: "pictureInPicture" over the tail range to clear it.


    Putting It Together

    # 1. Start Claude Code in your project folder
    claude
    
    # 2. Transcription cut
    "Open ~/Desktop/Recording.skbundle. Transcribe and plan cuts. ElevenLabs key at ~/.config/env/elevenlabs.env"
    # → review cut list → "yes"
    
    # 3. B-roll
    "Add B-roll — centered layout, medium density, cute English visuals"
    # → review 7-slot beat menu → "Medium, looks good"
    # → agent generates in parallel, renders serially, applies DSL (~3 min)
    
    # 4. Spot-check
    "Show me slot 4 at 18s"
    # → scrub in ScreenKite
    
    # 5. Tweak if needed
    "Slot 4 — change the node diagram to use mint green for all nodes"
    

    Total hands-on time: under 5 minutes. Render time: ~2–3 minutes for 7 slots.

    For more workflows, sample prompts, and community skills: github.com/ScreenKite/awesome-ai-video-editing

    Previous

    ← Device Frames

    Next

    Export Settings→