Skip to main content

Transcribe

The /transcribe endpoint allows you to convert medical audio recordings into structured clinical notes. It uses advanced speech recognition technology to transcribe the audio and then formats the transcription into a structured clinical note based on the specified template.

Endpoint

POST /transcribe

Authentication

This endpoint requires authentication using your API key. Include your API key in the x-api-key header with all requests. See Authentication for more details.

Request

The request must be sent as multipart/form-data with the following parameters:

ParameterTypeRequiredDescription
fileFileYesThe audio file to transcribe. Must be one of the supported formats and under 100MB.
note_typeStringNoThe clinical note type that determines the appropriate template and sections. See Note Types for available options. Cannot be used with custom_template.
templateStringNoThe specific template to use for the medical note. If provided, this overrides the default template for the selected note_type. See Note Templates for available options. Defaults to SOAP if not provided. Cannot be used with custom_template.
custom_templateStringNoA custom template in raw text or markdown format. Use this to define your own sections and dynamic fields. See Custom Templates for formatting guidelines. Cannot be used with note_type or template.
languageStringNoThe language of the audio. See Language Support for available options. Defaults to en (English).

Response

The response is a JSON object with the following structure:

{
"note": {
"title": "Medical Consultation Note",
"sections": [
{
"key": "SUBJECTIVE",
"title": "Subjective",
"text": "Patient reports chest pain for the past 3 days..."
},
{
"key": "OBJECTIVE",
"title": "Objective",
"text": "BP 140/90, HR 88, RR 18, Temp 98.6F..."
},
{
"key": "ASSESSMENT",
"title": "Assessment",
"text": "1. Acute coronary syndrome - likely..."
},
{
"key": "PLAN",
"title": "Plan",
"text": "1. Admit to cardiology service..."
}
]
},
"dynamic_fields": {
"visual_acuity_right": "20/20",
"visual_acuity_left": "20/25"
}
}
FieldTypeDescription
noteObjectThe structured medical note generated from the transcription.
note.titleStringThe title of the medical note.
note.sectionsArrayAn array of sections in the note. The specific sections depend on the template used.
note.sections[].keyStringThe identifier for the section (e.g., "SUBJECTIVE", "OBJECTIVE").
note.sections[].titleStringThe display title for the section.
note.sections[].textStringThe content of the section.
dynamic_fieldsObject(Optional) Dynamic key-value pairs extracted from the transcript based on the custom template's dynamic fields specification. Only present when using custom templates with dynamic fields.

Key Concepts

Note Types

The note_type parameter allows you to specify the type of clinical note you want to generate. Each note type is associated with a default template that structures the output in a way that's appropriate for that type of clinical documentation.

Available note types:

Note TypeDescriptionDefault Template
PROGRESS_NOTEDocumentation of a patient's progress during treatmentSOAP
ADMISSION_NOTEDocumentation of a patient's initial hospital admissionAPSO
ED_NOTEDocumentation of an emergency department visitCHEDDAR
DISCHARGE_SUMMARYSummary of a patient's hospital stay upon dischargeMULTIPLE_SECTIONS
CONSULT_NOTEDocumentation of a specialist consultationAPSO
NURSING_NOTEDocumentation by nursing staffPIE
BEHAVIORAL_HEALTH_NOTEDocumentation of behavioral health servicesBIRP
DIETITIAN_NOTEDocumentation by a dietitianADIME

Note Templates

The template parameter allows you to specify the structure of the generated clinical note. If provided, it overrides the default template associated with the selected note_type.

Available templates:

TemplateDescriptionSections
SOAPSubjective, Objective, Assessment, PlanSUBJECTIVE, OBJECTIVE, ASSESSMENT, PLAN
APSOAssessment, Plan, Subjective, Objective (prioritizes assessment and plan)ASSESSMENT, PLAN, SUBJECTIVE, OBJECTIVE
MULTIPLE_SECTIONSComprehensive note with multiple detailed sectionsCHIEF_COMPLAINT, HISTORY_OF_PRESENT_ILLNESS, PAST_MEDICAL_HISTORY, PHYSICAL_EXAM, ASSESSMENT, PLAN
CHEDDARChief complaint, History, Examination, Differential, Decision, Action, ReviewCHIEF_COMPLAINT, HISTORY, EXAMINATION, DIFFERENTIAL, DECISION, ACTION, REVIEW
DAPData, Assessment, PlanDATA, ASSESSMENT, PLAN
PIEProblem, Intervention, EvaluationPROBLEM, INTERVENTION, EVALUATION
SBARSituation, Background, Assessment, RecommendationSITUATION, BACKGROUND, ASSESSMENT, RECOMMENDATION
BIRPBehavior, Intervention, Response, PlanBEHAVIOR, INTERVENTION, RESPONSE, PLAN
DARData, Action, ResponseDATA, ACTION, RESPONSE
ADIMEAssessment, Diagnosis, Intervention, Monitoring, EvaluationASSESSMENT, DIAGNOSIS, INTERVENTION, MONITORING, EVALUATION

Custom Templates

The custom_template parameter allows you to define your own template structure using a simple markdown-like format. This is ideal for specialty-specific documentation (e.g., ophthalmology, cardiology) or when you need sections that aren't covered by the predefined templates.

Template Format

Custom templates use a simple text format with markdown-style headings:

# Section Name 1
Description of what should be included in this section

# Section Name 2
Description of what should be included in this section

## Dynamic Fields
- field_name_1: Description of the field
- field_name_2: Description of the field

Rules:

  • Use # (level-1 heading) to define sections
  • Each section should have a description that tells Knidian what content to extract
  • Use ## Dynamic Fields to define structured data fields that should be extracted separately
  • Dynamic fields use the format - field_name: description
  • Section names are automatically converted to keys (e.g., "Chief Complaint" becomes "CHIEF_COMPLAINT")

Dynamic Fields

Dynamic fields allow you to extract specific structured data from the transcript as key-value pairs. This is particularly useful for specialty-specific measurements or values that you want to process separately from the main note text.

Common dynamic field names by specialty:

Ophthalmology:

  • visual_acuity_right, visual_acuity_left - Visual acuity measurements
  • iop_right, iop_left - Intraocular pressure measurements
  • sphere_right, sphere_left - Spherical prescription values
  • cylinder_right, cylinder_left - Cylindrical prescription values
  • axis_right, axis_left - Axis values for astigmatism correction

Cardiology:

  • ejection_fraction - Left ventricular ejection fraction
  • troponin - Troponin levels
  • bnp - B-type natriuretic peptide

General Medicine:

  • blood_pressure_systolic, blood_pressure_diastolic - Blood pressure readings
  • heart_rate - Heart rate in beats per minute
  • temperature - Body temperature
  • respiratory_rate - Breaths per minute
  • oxygen_saturation - SpO2 percentage

Note: These are recommended naming conventions. You can use any field names that make sense for your use case. Use snake_case for consistency.

Example: Ophthalmology Template

# Chief Complaint
Patient's main concern regarding their vision

# History of Present Illness
Detailed history of the current eye problem

# Visual Acuity Assessment
Results of visual acuity testing for both eyes

# Examination Findings
Findings from the eye examination including anterior and posterior segment

# Assessment
Clinical assessment and diagnosis

# Plan
Treatment plan including medications, procedures, or follow-up

## Dynamic Fields
- visual_acuity_right: Right eye visual acuity measurement
- visual_acuity_left: Left eye visual acuity measurement
- iop_right: Right eye intraocular pressure in mmHg
- iop_left: Left eye intraocular pressure in mmHg
- sphere_right: Right eye spherical prescription
- sphere_left: Left eye spherical prescription

Example: Cardiology Template

# Chief Complaint
Patient's cardiac-related complaint

# History of Present Illness
Detailed cardiac history

# Cardiovascular Examination
Findings from cardiovascular exam including heart sounds, murmurs, etc.

# Diagnostic Results
Results from ECG, echocardiogram, stress test, or cardiac biomarkers

# Assessment
Cardiac diagnosis or impression

# Plan
Treatment plan including medications, procedures, or interventions

## Dynamic Fields
- ejection_fraction: Left ventricular ejection fraction percentage
- troponin: Troponin level
- bnp: BNP or NT-proBNP level
- heart_rate: Resting heart rate in bpm
- blood_pressure_systolic: Systolic blood pressure
- blood_pressure_diastolic: Diastolic blood pressure

Integration Example with Custom Template

cURL:

curl -X POST https://api.knidian.ai/transcribe \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@ophthalmology_visit.mp3" \
-F 'custom_template=# Chief Complaint
Patient'\''s main vision concern

# Visual Acuity
Visual acuity test results

# Examination
Eye examination findings

# Assessment
Diagnosis

# Plan
Treatment plan

## Dynamic Fields
- visual_acuity_right: Right eye vision
- visual_acuity_left: Left eye vision
- iop_right: Right eye pressure
- iop_left: Left eye pressure' \
-F "language=en"

Python:

import requests

custom_template = """# Chief Complaint
Patient's main vision concern

# Visual Acuity
Visual acuity test results

# Examination
Eye examination findings

# Assessment
Diagnosis

# Plan
Treatment plan

## Dynamic Fields
- visual_acuity_right: Right eye vision
- visual_acuity_left: Left eye vision
- iop_right: Right eye pressure in mmHg
- iop_left: Left eye pressure in mmHg
"""

url = "https://api.knidian.ai/transcribe"
headers = {"x-api-key": "YOUR_API_KEY"}

files = {
"file": ("ophthalmology_visit.mp3", open("ophthalmology_visit.mp3", "rb"), "audio/mpeg")
}

data = {
"custom_template": custom_template,
"language": "en"
}

response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()

# Access sections
for section in result["note"]["sections"]:
print(f"{section['title']}:")
print(section['text'])
print()

# Access dynamic fields
if "dynamic_fields" in result:
print("Dynamic Fields:")
for key, value in result["dynamic_fields"].items():
print(f" {key}: {value}")

Response Example:

{
"note": {
"title": "Ophthalmology Consultation",
"sections": [
{
"key": "CHIEF_COMPLAINT",
"title": "Chief Complaint",
"text": "Patient reports blurry vision in the right eye for the past two weeks."
},
{
"key": "VISUAL_ACUITY",
"title": "Visual Acuity",
"text": "Right eye: 20/40, Left eye: 20/20 with correction"
},
{
"key": "EXAMINATION",
"title": "Examination",
"text": "Anterior segment: Clear cornea bilaterally. Normal anterior chamber depth. No iris abnormalities. Posterior segment: Right eye shows mild posterior subcapsular cataract. Left eye normal."
},
{
"key": "ASSESSMENT",
"title": "Assessment",
"text": "Early posterior subcapsular cataract, right eye"
},
{
"key": "PLAN",
"title": "Plan",
"text": "Discussed cataract surgery options. Patient prefers to monitor for now. Follow-up in 6 months or sooner if vision worsens."
}
]
},
"dynamic_fields": {
"visual_acuity_right": "20/40",
"visual_acuity_left": "20/20",
"iop_right": "15 mmHg",
"iop_left": "14 mmHg"
}
}

Audio Processing

The /transcribe endpoint supports the following audio formats:

  • audio/mpeg (MP3)
  • audio/mp4 (M4A)
  • audio/wav (WAV)
  • audio/x-wav (WAV)
  • audio/vnd.wave (WAV)
  • audio/wave (WAV)
  • audio/x-pn-wav (WAV)
  • audio/webm (WEBM)
  • audio/ogg (OGG)
  • audio/flac (FLAC)
  • audio/x-flac (FLAC)
  • audio/aac (AAC)
  • audio/x-aac (AAC)

The maximum file size is 100MB. Larger files will be rejected with a 413 error.

The service performs diarization, which means it identifies different speakers in the audio and labels them in the transcript (e.g., "Speaker 1", "Speaker 2"). This helps maintain the conversational context in the generated note.

Language Support

The /transcribe endpoint supports the following languages:

Language CodeLanguage Name
enEnglish (default)
esSpanish
ptPortuguese

The language parameter affects both the speech recognition model used for transcription and the language of the generated clinical note. For optimal results, ensure the specified language matches the language spoken in the audio.

Medical Content Validation

The API checks if the transcribed content contains relevant medical discussion. If the content is not medical in nature (e.g., a casual conversation unrelated to healthcare), the API will return a 400 error with a message indicating that the audio does not contain relevant medical content.

This validation helps ensure that the service is used for its intended purpose and prevents processing of non-medical audio.

Integration Examples

cURL

curl -X POST https://api.knidian.ai/transcribe \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@patient_consultation.mp3" \
-F "note_type=PROGRESS_NOTE" \
-F "language=en"

Python

import requests

url = "https://api.knidian.ai/transcribe"
headers = {
"x-api-key": "YOUR_API_KEY"
}

files = {
"file": ("patient_consultation.mp3", open("patient_consultation.mp3", "rb"), "audio/mpeg")
}

data = {
"note_type": "PROGRESS_NOTE",
"language": "en"
}

response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())

Node.js

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
const path = require('path');

async function transcribeAudio() {
const form = new FormData();
form.append('file', fs.createReadStream(path.resolve('./patient_consultation.mp3')));
form.append('note_type', 'PROGRESS_NOTE');
form.append('language', 'en');

try {
const response = await axios.post('https://api.knidian.ai/transcribe', form, {
headers: {
...form.getHeaders(),
'x-api-key': 'YOUR_API_KEY'
}
});
console.log(response.data);
} catch (error) {
console.error('Error:', error.response ? error.response.data : error.message);
}
}

transcribeAudio();

Error Handling

Status CodeDescription
400Bad Request - Invalid parameters or the audio does not contain relevant medical content
401Unauthorized - Invalid or missing API key
402Payment Required - Subscription limit reached or payment required
413Payload Too Large - Audio file exceeds the 100MB limit
415Unsupported Media Type - Unsupported audio format
500Internal Server Error - An unexpected error occurred on the server
503Service Unavailable - The service is temporarily unavailable

Conversation Transcription

The /transcribe/conversation endpoint allows you to convert medical audio recordings into a structured conversation format with speaker identification. This is ideal for creating chat-like interfaces displaying doctor-patient conversations.

Endpoint

POST /transcribe/conversation

Request

The request must be sent as multipart/form-data with the following parameters:

ParameterTypeRequiredDescription
fileFileYesThe audio file to transcribe. Must be one of the supported formats and under 100MB.
languageStringNoThe language of the audio. See Language Support for available options. Defaults to en (English).
include_timestampsBooleanNoInclude timestamps for each message. Defaults to true.

Response

The response is a JSON object with the following structure:

{
"conversation": {
"messages": [
{
"id": "msg-1",
"speaker": 0,
"speaker_label": "Speaker 0",
"text": "Good morning! What brings you in today?",
"start_time": 0.5,
"end_time": 2.3,
"confidence": 0.95
},
{
"id": "msg-2",
"speaker": 1,
"speaker_label": "Speaker 1",
"text": "I've been having chest pain for the past three days.",
"start_time": 2.8,
"end_time": 5.1,
"confidence": 0.92
}
],
"metadata": {
"total_duration": 120.5,
"total_messages": 15,
"speaker_count": 2,
"language": "en"
}
}
}
FieldTypeDescription
conversationObjectThe structured conversation with messages and metadata.
conversation.messagesArrayArray of conversation messages in chronological order.
conversation.messages[].idStringUnique identifier for the message.
conversation.messages[].speakerNumberSpeaker number assigned by the transcription service (0, 1, 2, etc.).
conversation.messages[].speaker_labelStringHuman-readable speaker label (e.g., "Speaker 0", "Speaker 1").
conversation.messages[].textStringThe text content of the message.
conversation.messages[].start_timeNumberStart time of the message in seconds (if include_timestamps is true).
conversation.messages[].end_timeNumberEnd time of the message in seconds (if include_timestamps is true).
conversation.messages[].confidenceNumberAverage confidence score for the message (if include_timestamps is true).
conversation.metadataObjectMetadata about the conversation.
conversation.metadata.total_durationNumberTotal duration of the audio in seconds.
conversation.metadata.total_messagesNumberTotal number of messages in the conversation.
conversation.metadata.speaker_countNumberNumber of unique speakers detected.
conversation.metadata.languageStringLanguage of the conversation.

Speaker Identification

The service uses automatic speaker diarization to identify different speakers in the conversation. Speakers are assigned numeric IDs (0, 1, 2, etc.) based on when they first speak in the audio. The service cannot determine which speaker is the doctor and which is the patient - this should be handled by your frontend application or through user input.

Integration Examples

cURL

curl -X POST https://api.knidian.ai/transcribe/conversation \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@doctor_patient_consultation.mp3" \
-F "language=en" \
-F "include_timestamps=true"

Python

import requests

url = "https://api.knidian.ai/transcribe/conversation"
headers = {
"x-api-key": "YOUR_API_KEY"
}

files = {
"file": ("consultation.mp3", open("consultation.mp3", "rb"), "audio/mpeg")
}

data = {
"language": "en",
"include_timestamps": "true"
}

response = requests.post(url, headers=headers, files=files, data=data)
conversation = response.json()

# Display the conversation
for message in conversation["conversation"]["messages"]:
print(f"{message['speaker_label']}: {message['text']}")
if "start_time" in message:
print(f" Time: {message['start_time']:.1f}s - {message['end_time']:.1f}s")

Node.js

const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');

async function transcribeConversation() {
const form = new FormData();
form.append('file', fs.createReadStream('./consultation.mp3'));
form.append('language', 'en');
form.append('include_timestamps', 'true');

try {
const response = await axios.post(
'https://api.knidian.ai/transcribe/conversation',
form,
{
headers: {
...form.getHeaders(),
'x-api-key': 'YOUR_API_KEY'
}
}
);

// Display the conversation
response.data.conversation.messages.forEach(message => {
console.log(`${message.speaker_label}: ${message.text}`);
if (message.start_time !== undefined) {
console.log(` Time: ${message.start_time}s - ${message.end_time}s`);
}
});
} catch (error) {
console.error('Error:', error.response?.data || error.message);
}
}

transcribeConversation();

Usage Tracking

The API tracks the following metrics for each transcription request:

  • Request duration
  • Audio duration (in seconds)
  • Knidian's AI Engine processing time (for /transcribe endpoint)
  • Success/failure status

These metrics are used for billing purposes and to improve the service. They are associated with your API key and organization.