Text-based editing lets you cut, rearrange, and tighten your recording by working with the transcript instead of the timeline. Select a sentence and delete it — the corresponding video is removed automatically. Rearrange paragraphs and the timeline follows. It is the fastest way to turn a rough take into a polished edit.
How It Works
- Open a project that has a transcription (see Word-Level Generated Captions for setup).
- The transcript editor panel shows the full spoken text alongside the timeline.
- Select any text — a word, a sentence, or an entire paragraph.
- Delete the selection. ScreenKite removes the matching video and audio from the timeline and closes the gap automatically when ripple editing is on.
- To rearrange, cut a paragraph and paste it at a new position — the timeline clips move to match.
Setting Up Transcription
The first time you use text-based editing or any AI feature, ScreenKite shows a guided setup window that walks you through configuring a transcription provider. You can choose:
- Automatic — uses ElevenLabs when an API key is configured, then falls back to a local WhisperKit model
- ElevenLabs — hosted word-level transcription
- Local (WhisperKit) — on-device transcription, no data leaves your Mac
See Word-Level Generated Captions for detailed provider configuration.
Common Workflows
Removing filler words
Select filler words like "um," "uh," "like," and "you know" in the transcript and delete them. The video tightens up without manual timeline scrubbing.
Shortening a section
Select a paragraph that runs long and delete the middle sentences. The remaining text and video reconnect seamlessly.
Reordering sections
Cut a paragraph from one position and paste it at another. The timeline clips move with the text, so the new order plays back correctly.
Related Guides
- Trimming & Splitting — traditional timeline editing
- Word-Level Generated Captions — transcription setup
- AI Chat Assistant — ask the AI to make transcript-based edits for you