Voice-to-résumé: a 5-minute walkthrough
By the SpeakResume team5 min readUpdated
A voice-built résumé is assembled by talking through five sections of a career, one at a time. The audio is transcribed in real time, extracted into structured fields, and rendered onto a live paper preview before the user moves to the next section.
What is a voice résumé?
A voice résumé is a résumé built by speaking instead of typing. The candidate describes each section of their career out loud, the audio is transcribed in real time, and a structured draft assembles on the page as each section is confirmed. The output is the same PDF a traditional résumé builder produces; only the input method changes.
SpeakResume is a voice-first résumé builder. There is no upload step, no LinkedIn import, and no manual typing for the body of the résumé. The product was built for people who would rather talk through five years of work than type it — which turns out to be most people once they try it. Recordings are processed in real time and discarded once a section is confirmed.
How does the voice-to-résumé conversion actually work?
The voice-to-résumé conversion works in two stages. The audio is transcribed by OpenAI Whisper, and the transcript is passed to Anthropic Claude, which extracts the fields specific to that section — titles, dates, bullets, skills — as validated JSON. The candidate reviews the structured draft, edits anything off, and confirms the section. A live preview updates the moment a section is confirmed.
Both stages happen in real time. There is no batch queue and no overnight processing. Each section is its own independent recording, and each recording is capped at three minutes — long enough to walk through one role thoroughly, short enough that nothing rambles into the next section. If a clarifying detail is missing, Claude returns at most two short follow-up questions, and the candidate can skip either of them.
What does a SpeakResume build walk through?
A SpeakResume build walks through five sections in a fixed order: an introduction that captures the candidate’s name and target role, work experience, education, skills, and an "other" section for anything that does not fit the first four — certifications, languages, awards, volunteering. The professional summary at the top of the résumé is generated automatically after the build is complete.
The sections are independent. Each one opens with a short recruiter-style prompt so the candidate is never staring at a blank field, and each one ends with a confirmable draft. The candidate can re-record any section up to three times if the first take did not land, and can edit the structured fields directly without re-recording at all. The order is fixed because résumés are read top-down, and the order reflects how a recruiter scans them.
What if you misspeak or want to redo a section?
A misspoken section can be re-recorded up to three times per section, or edited directly without re-recording. After the initial voice extraction, every free-text field is editable inline, and a small AI Writer popover accepts a short directive — "shorter," "more concrete numbers," "less corporate" — to rewrite the field without losing what the candidate originally said. Nothing is auto-applied; every change requires an explicit confirm.
The re-record limit exists to keep the build moving, not to enforce a hard cap on attempts. In practice, most candidates do not use more than one re-record per section, because the edit-in-place flow is faster than re-speaking a whole role. A bad transcript line can be rewritten in two seconds; a missing role can be appended by recording just that role into the existing experience list.
How is the result different from a typed résumé?
A voice-built résumé reads differently from a typed one because the source material is different. Spoken career stories include numbers, team sizes, and outcomes that candidates routinely omit when typing — because typing rewards brevity and speaking rewards detail. The structured extraction keeps the substance and removes the conversational fillers, so the final document is denser than what most candidates would have typed unprompted.
The other practical difference is time. A full first draft on SpeakResume takes about ten minutes, compared to the hour or two most candidates spend on a from-scratch typed résumé. Building and previewing the résumé are always free; the user pays only when ready to download the final PDF. Three plans unlock the download — a one-time weekly pass, a monthly subscription, and a one-time three-month pass.
Key takeaways
- A SpeakResume build walks five sections: introduction, experience, education, skills, and other.
- Each section is capped at three minutes of recording and at most two clarifying follow-up questions.
- The live preview updates on every confirmed section so the document assembles as the user speaks.