Transcribe
The /transcribe endpoint allows you to convert medical audio recordings into structured clinical notes. It uses advanced speech recognition technology to transcribe the audio and then formats the transcription into a structured clinical note based on the specified template.
Endpoint
POST /transcribe
Authentication
This endpoint requires authentication using your API key. Include your API key in the x-api-key header with all requests. See Authentication for more details.
Request
The request must be sent as multipart/form-data with the following parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
file | File | Yes | The audio file to transcribe. Must be one of the supported formats and under 100MB. |
note_type | String | No | The clinical note type that determines the appropriate template and sections. See Note Types for available options. Cannot be used with custom_template. |
template | String | No | The specific template to use for the medical note. If provided, this overrides the default template for the selected note_type. See Note Templates for available options. Defaults to SOAP if not provided. Cannot be used with custom_template. |
custom_template | String | No | A custom template in raw text or markdown format. Use this to define your own sections and dynamic fields. See Custom Templates for formatting guidelines. Cannot be used with note_type or template. |
language | String | No | The language of the audio. See Language Support for available options. Defaults to en (English). |
Response
The response is a JSON object with the following structure:
{
"note": {
"title": "Medical Consultation Note",
"sections": [
{
"key": "SUBJECTIVE",
"title": "Subjective",
"text": "Patient reports chest pain for the past 3 days..."
},
{
"key": "OBJECTIVE",
"title": "Objective",
"text": "BP 140/90, HR 88, RR 18, Temp 98.6F..."
},
{
"key": "ASSESSMENT",
"title": "Assessment",
"text": "1. Acute coronary syndrome - likely..."
},
{
"key": "PLAN",
"title": "Plan",
"text": "1. Admit to cardiology service..."
}
]
},
"dynamic_fields": {
"visual_acuity_right": "20/20",
"visual_acuity_left": "20/25"
}
}
| Field | Type | Description |
|---|---|---|
note | Object | The structured medical note generated from the transcription. |
note.title | String | The title of the medical note. |
note.sections | Array | An array of sections in the note. The specific sections depend on the template used. |
note.sections[].key | String | The identifier for the section (e.g., "SUBJECTIVE", "OBJECTIVE"). |
note.sections[].title | String | The display title for the section. |
note.sections[].text | String | The content of the section. |
dynamic_fields | Object | (Optional) Dynamic key-value pairs extracted from the transcript based on the custom template's dynamic fields specification. Only present when using custom templates with dynamic fields. |
Key Concepts
Note Types
The note_type parameter allows you to specify the type of clinical note you want to generate. Each note type is associated with a default template that structures the output in a way that's appropriate for that type of clinical documentation.
Available note types:
| Note Type | Description | Default Template |
|---|---|---|
PROGRESS_NOTE | Documentation of a patient's progress during treatment | SOAP |
ADMISSION_NOTE | Documentation of a patient's initial hospital admission | APSO |
ED_NOTE | Documentation of an emergency department visit | CHEDDAR |
DISCHARGE_SUMMARY | Summary of a patient's hospital stay upon discharge | MULTIPLE_SECTIONS |
CONSULT_NOTE | Documentation of a specialist consultation | APSO |
NURSING_NOTE | Documentation by nursing staff | PIE |
BEHAVIORAL_HEALTH_NOTE | Documentation of behavioral health services | BIRP |
DIETITIAN_NOTE | Documentation by a dietitian | ADIME |
Note Templates
The template parameter allows you to specify the structure of the generated clinical note. If provided, it overrides the default template associated with the selected note_type.
Available templates:
| Template | Description | Sections |
|---|---|---|
SOAP | Subjective, Objective, Assessment, Plan | SUBJECTIVE, OBJECTIVE, ASSESSMENT, PLAN |
APSO | Assessment, Plan, Subjective, Objective (prioritizes assessment and plan) | ASSESSMENT, PLAN, SUBJECTIVE, OBJECTIVE |
MULTIPLE_SECTIONS | Comprehensive note with multiple detailed sections | CHIEF_COMPLAINT, HISTORY_OF_PRESENT_ILLNESS, PAST_MEDICAL_HISTORY, PHYSICAL_EXAM, ASSESSMENT, PLAN |
CHEDDAR | Chief complaint, History, Examination, Differential, Decision, Action, Review | CHIEF_COMPLAINT, HISTORY, EXAMINATION, DIFFERENTIAL, DECISION, ACTION, REVIEW |
DAP | Data, Assessment, Plan | DATA, ASSESSMENT, PLAN |
PIE | Problem, Intervention, Evaluation | PROBLEM, INTERVENTION, EVALUATION |
SBAR | Situation, Background, Assessment, Recommendation | SITUATION, BACKGROUND, ASSESSMENT, RECOMMENDATION |
BIRP | Behavior, Intervention, Response, Plan | BEHAVIOR, INTERVENTION, RESPONSE, PLAN |
DAR | Data, Action, Response | DATA, ACTION, RESPONSE |
ADIME | Assessment, Diagnosis, Intervention, Monitoring, Evaluation | ASSESSMENT, DIAGNOSIS, INTERVENTION, MONITORING, EVALUATION |
Custom Templates
The custom_template parameter allows you to define your own template structure using a simple markdown-like format. This is ideal for specialty-specific documentation (e.g., ophthalmology, cardiology) or when you need sections that aren't covered by the predefined templates.
Template Format
Custom templates use a simple text format with markdown-style headings:
# Section Name 1
Description of what should be included in this section
# Section Name 2
Description of what should be included in this section
## Dynamic Fields
- field_name_1: Description of the field
- field_name_2: Description of the field
Rules:
- Use
#(level-1 heading) to define sections - Each section should have a description that tells Knidian what content to extract
- Use
## Dynamic Fieldsto define structured data fields that should be extracted separately - Dynamic fields use the format
- field_name: description - Section names are automatically converted to keys (e.g., "Chief Complaint" becomes "CHIEF_COMPLAINT")
Dynamic Fields
Dynamic fields allow you to extract specific structured data from the transcript as key-value pairs. This is particularly useful for specialty-specific measurements or values that you want to process separately from the main note text.
Common dynamic field names by specialty:
Ophthalmology:
visual_acuity_right,visual_acuity_left- Visual acuity measurementsiop_right,iop_left- Intraocular pressure measurementssphere_right,sphere_left- Spherical prescription valuescylinder_right,cylinder_left- Cylindrical prescription valuesaxis_right,axis_left- Axis values for astigmatism correction
Cardiology:
ejection_fraction- Left ventricular ejection fractiontroponin- Troponin levelsbnp- B-type natriuretic peptide
General Medicine:
blood_pressure_systolic,blood_pressure_diastolic- Blood pressure readingsheart_rate- Heart rate in beats per minutetemperature- Body temperaturerespiratory_rate- Breaths per minuteoxygen_saturation- SpO2 percentage
Note: These are recommended naming conventions. You can use any field names that make sense for your use case. Use snake_case for consistency.
Example: Ophthalmology Template
# Chief Complaint
Patient's main concern regarding their vision
# History of Present Illness
Detailed history of the current eye problem
# Visual Acuity Assessment
Results of visual acuity testing for both eyes
# Examination Findings
Findings from the eye examination including anterior and posterior segment
# Assessment
Clinical assessment and diagnosis
# Plan
Treatment plan including medications, procedures, or follow-up
## Dynamic Fields
- visual_acuity_right: Right eye visual acuity measurement
- visual_acuity_left: Left eye visual acuity measurement
- iop_right: Right eye intraocular pressure in mmHg
- iop_left: Left eye intraocular pressure in mmHg
- sphere_right: Right eye spherical prescription
- sphere_left: Left eye spherical prescription
Example: Cardiology Template
# Chief Complaint
Patient's cardiac-related complaint
# History of Present Illness
Detailed cardiac history
# Cardiovascular Examination
Findings from cardiovascular exam including heart sounds, murmurs, etc.
# Diagnostic Results
Results from ECG, echocardiogram, stress test, or cardiac biomarkers
# Assessment
Cardiac diagnosis or impression
# Plan
Treatment plan including medications, procedures, or interventions
## Dynamic Fields
- ejection_fraction: Left ventricular ejection fraction percentage
- troponin: Troponin level
- bnp: BNP or NT-proBNP level
- heart_rate: Resting heart rate in bpm
- blood_pressure_systolic: Systolic blood pressure
- blood_pressure_diastolic: Diastolic blood pressure
Integration Example with Custom Template
cURL:
curl -X POST https://api.knidian.ai/transcribe \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@ophthalmology_visit.mp3" \
-F 'custom_template=# Chief Complaint
Patient'\''s main vision concern
# Visual Acuity
Visual acuity test results
# Examination
Eye examination findings
# Assessment
Diagnosis
# Plan
Treatment plan
## Dynamic Fields
- visual_acuity_right: Right eye vision
- visual_acuity_left: Left eye vision
- iop_right: Right eye pressure
- iop_left: Left eye pressure' \
-F "language=en"
Python:
import requests
custom_template = """# Chief Complaint
Patient's main vision concern
# Visual Acuity
Visual acuity test results
# Examination
Eye examination findings
# Assessment
Diagnosis
# Plan
Treatment plan
## Dynamic Fields
- visual_acuity_right: Right eye vision
- visual_acuity_left: Left eye vision
- iop_right: Right eye pressure in mmHg
- iop_left: Left eye pressure in mmHg
"""
url = "https://api.knidian.ai/transcribe"
headers = {"x-api-key": "YOUR_API_KEY"}
files = {
"file": ("ophthalmology_visit.mp3", open("ophthalmology_visit.mp3", "rb"), "audio/mpeg")
}
data = {
"custom_template": custom_template,
"language": "en"
}
response = requests.post(url, headers=headers, files=files, data=data)
result = response.json()
# Access sections
for section in result["note"]["sections"]:
print(f"{section['title']}:")
print(section['text'])
print()
# Access dynamic fields
if "dynamic_fields" in result:
print("Dynamic Fields:")
for key, value in result["dynamic_fields"].items():
print(f" {key}: {value}")
Response Example:
{
"note": {
"title": "Ophthalmology Consultation",
"sections": [
{
"key": "CHIEF_COMPLAINT",
"title": "Chief Complaint",
"text": "Patient reports blurry vision in the right eye for the past two weeks."
},
{
"key": "VISUAL_ACUITY",
"title": "Visual Acuity",
"text": "Right eye: 20/40, Left eye: 20/20 with correction"
},
{
"key": "EXAMINATION",
"title": "Examination",
"text": "Anterior segment: Clear cornea bilaterally. Normal anterior chamber depth. No iris abnormalities. Posterior segment: Right eye shows mild posterior subcapsular cataract. Left eye normal."
},
{
"key": "ASSESSMENT",
"title": "Assessment",
"text": "Early posterior subcapsular cataract, right eye"
},
{
"key": "PLAN",
"title": "Plan",
"text": "Discussed cataract surgery options. Patient prefers to monitor for now. Follow-up in 6 months or sooner if vision worsens."
}
]
},
"dynamic_fields": {
"visual_acuity_right": "20/40",
"visual_acuity_left": "20/20",
"iop_right": "15 mmHg",
"iop_left": "14 mmHg"
}
}
Audio Processing
The /transcribe endpoint supports the following audio formats:
audio/mpeg(MP3)audio/mp4(M4A)audio/wav(WAV)audio/x-wav(WAV)audio/vnd.wave(WAV)audio/wave(WAV)audio/x-pn-wav(WAV)audio/webm(WEBM)audio/ogg(OGG)audio/flac(FLAC)audio/x-flac(FLAC)audio/aac(AAC)audio/x-aac(AAC)
The maximum file size is 100MB. Larger files will be rejected with a 413 error.
The service performs diarization, which means it identifies different speakers in the audio and labels them in the transcript (e.g., "Speaker 1", "Speaker 2"). This helps maintain the conversational context in the generated note.
Language Support
The /transcribe endpoint supports the following languages:
| Language Code | Language Name |
|---|---|
en | English (default) |
es | Spanish |
pt | Portuguese |
The language parameter affects both the speech recognition model used for transcription and the language of the generated clinical note. For optimal results, ensure the specified language matches the language spoken in the audio.
Medical Content Validation
The API checks if the transcribed content contains relevant medical discussion. If the content is not medical in nature (e.g., a casual conversation unrelated to healthcare), the API will return a 400 error with a message indicating that the audio does not contain relevant medical content.
This validation helps ensure that the service is used for its intended purpose and prevents processing of non-medical audio.
Integration Examples
cURL
curl -X POST https://api.knidian.ai/transcribe \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@patient_consultation.mp3" \
-F "note_type=PROGRESS_NOTE" \
-F "language=en"
Python
import requests
url = "https://api.knidian.ai/transcribe"
headers = {
"x-api-key": "YOUR_API_KEY"
}
files = {
"file": ("patient_consultation.mp3", open("patient_consultation.mp3", "rb"), "audio/mpeg")
}
data = {
"note_type": "PROGRESS_NOTE",
"language": "en"
}
response = requests.post(url, headers=headers, files=files, data=data)
print(response.json())
Node.js
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
const path = require('path');
async function transcribeAudio() {
const form = new FormData();
form.append('file', fs.createReadStream(path.resolve('./patient_consultation.mp3')));
form.append('note_type', 'PROGRESS_NOTE');
form.append('language', 'en');
try {
const response = await axios.post('https://api.knidian.ai/transcribe', form, {
headers: {
...form.getHeaders(),
'x-api-key': 'YOUR_API_KEY'
}
});
console.log(response.data);
} catch (error) {
console.error('Error:', error.response ? error.response.data : error.message);
}
}
transcribeAudio();
Error Handling
| Status Code | Description |
|---|---|
| 400 | Bad Request - Invalid parameters or the audio does not contain relevant medical content |
| 401 | Unauthorized - Invalid or missing API key |
| 402 | Payment Required - Subscription limit reached or payment required |
| 413 | Payload Too Large - Audio file exceeds the 100MB limit |
| 415 | Unsupported Media Type - Unsupported audio format |
| 500 | Internal Server Error - An unexpected error occurred on the server |
| 503 | Service Unavailable - The service is temporarily unavailable |
Conversation Transcription
The /transcribe/conversation endpoint allows you to convert medical audio recordings into a structured conversation format with speaker identification. This is ideal for creating chat-like interfaces displaying doctor-patient conversations.
Endpoint
POST /transcribe/conversation
Request
The request must be sent as multipart/form-data with the following parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
file | File | Yes | The audio file to transcribe. Must be one of the supported formats and under 100MB. |
language | String | No | The language of the audio. See Language Support for available options. Defaults to en (English). |
include_timestamps | Boolean | No | Include timestamps for each message. Defaults to true. |
Response
The response is a JSON object with the following structure:
{
"conversation": {
"messages": [
{
"id": "msg-1",
"speaker": 0,
"speaker_label": "Speaker 0",
"text": "Good morning! What brings you in today?",
"start_time": 0.5,
"end_time": 2.3,
"confidence": 0.95
},
{
"id": "msg-2",
"speaker": 1,
"speaker_label": "Speaker 1",
"text": "I've been having chest pain for the past three days.",
"start_time": 2.8,
"end_time": 5.1,
"confidence": 0.92
}
],
"metadata": {
"total_duration": 120.5,
"total_messages": 15,
"speaker_count": 2,
"language": "en"
}
}
}
| Field | Type | Description |
|---|---|---|
conversation | Object | The structured conversation with messages and metadata. |
conversation.messages | Array | Array of conversation messages in chronological order. |
conversation.messages[].id | String | Unique identifier for the message. |
conversation.messages[].speaker | Number | Speaker number assigned by the transcription service (0, 1, 2, etc.). |
conversation.messages[].speaker_label | String | Human-readable speaker label (e.g., "Speaker 0", "Speaker 1"). |
conversation.messages[].text | String | The text content of the message. |
conversation.messages[].start_time | Number | Start time of the message in seconds (if include_timestamps is true). |
conversation.messages[].end_time | Number | End time of the message in seconds (if include_timestamps is true). |
conversation.messages[].confidence | Number | Average confidence score for the message (if include_timestamps is true). |
conversation.metadata | Object | Metadata about the conversation. |
conversation.metadata.total_duration | Number | Total duration of the audio in seconds. |
conversation.metadata.total_messages | Number | Total number of messages in the conversation. |
conversation.metadata.speaker_count | Number | Number of unique speakers detected. |
conversation.metadata.language | String | Language of the conversation. |
Speaker Identification
The service uses automatic speaker diarization to identify different speakers in the conversation. Speakers are assigned numeric IDs (0, 1, 2, etc.) based on when they first speak in the audio. The service cannot determine which speaker is the doctor and which is the patient - this should be handled by your frontend application or through user input.
Integration Examples
cURL
curl -X POST https://api.knidian.ai/transcribe/conversation \
-H "x-api-key: YOUR_API_KEY" \
-F "file=@doctor_patient_consultation.mp3" \
-F "language=en" \
-F "include_timestamps=true"
Python
import requests
url = "https://api.knidian.ai/transcribe/conversation"
headers = {
"x-api-key": "YOUR_API_KEY"
}
files = {
"file": ("consultation.mp3", open("consultation.mp3", "rb"), "audio/mpeg")
}
data = {
"language": "en",
"include_timestamps": "true"
}
response = requests.post(url, headers=headers, files=files, data=data)
conversation = response.json()
# Display the conversation
for message in conversation["conversation"]["messages"]:
print(f"{message['speaker_label']}: {message['text']}")
if "start_time" in message:
print(f" Time: {message['start_time']:.1f}s - {message['end_time']:.1f}s")
Node.js
const axios = require('axios');
const FormData = require('form-data');
const fs = require('fs');
async function transcribeConversation() {
const form = new FormData();
form.append('file', fs.createReadStream('./consultation.mp3'));
form.append('language', 'en');
form.append('include_timestamps', 'true');
try {
const response = await axios.post(
'https://api.knidian.ai/transcribe/conversation',
form,
{
headers: {
...form.getHeaders(),
'x-api-key': 'YOUR_API_KEY'
}
}
);
// Display the conversation
response.data.conversation.messages.forEach(message => {
console.log(`${message.speaker_label}: ${message.text}`);
if (message.start_time !== undefined) {
console.log(` Time: ${message.start_time}s - ${message.end_time}s`);
}
});
} catch (error) {
console.error('Error:', error.response?.data || error.message);
}
}
transcribeConversation();
Usage Tracking
The API tracks the following metrics for each transcription request:
- Request duration
- Audio duration (in seconds)
- Knidian's AI Engine processing time (for
/transcribeendpoint) - Success/failure status
These metrics are used for billing purposes and to improve the service. They are associated with your API key and organization.