Automate PII Redaction from Audio Files Using Node.js and AssemblyAI
In the age of data privacy, redacting Personally Identifiable Information (PII) from audio and video files is a crucial task for many applications. A recent tutorial by AssemblyAI outlines how to automate this process using Node.js and the AssemblyAI API.
Understanding PII and Its Importance
PII includes any data that can be used to identify an individual, such as names, phone numbers, and email addresses. Handling this information is governed by regulations like HIPAA, GDPR, and CCPA. Redacting PII is essential in various applications, such as recording phone conversations between a doctor and a patient.
Setting Up the Development Environment
To begin, ensure you have Node.js 18 or higher installed. Create a new project folder, navigate to it, and initialize a Node.js project:
mkdir pii-redaction
cd pii-redaction
npm init -y
Modify the package.json file to use ES Module syntax by adding "type": "module"
. Next, install the AssemblyAI JavaScript SDK:
npm install --save assemblyai
You'll need an AssemblyAI API key, which can be obtained from the AssemblyAI dashboard. Set this key as an environment variable on your system:
# Mac/Linux:
export ASSEMBLYAI_API_KEY=<YOUR_KEY>
# Windows:
set ASSEMBLYAI_API_KEY=<YOUR_KEY>
Transcribing Audio with PII Redaction
With the environment set up, you can start transcribing audio files. Create a file named index.js
and add the following code:
import { AssemblyAI } from 'assemblyai';
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
const transcript = await client.transcripts.transcribe({
audio: "https://storage.googleapis.com/aai-web-samples/architecture-call.mp3",
redact_pii: true,
redact_pii_policies: [
"person_name",
"phone_number",
],
redact_pii_sub: "hash",
});
if (transcript.status === "error") {
throw new Error(transcript.error);
}
console.log(transcript.text);
This script transcribes an audio file while redacting specified PII categories like names and phone numbers, replacing them with a hash.
Retrieving the Redacted Audio
To obtain the redacted audio, modify the code to include audio redaction settings:
import { AssemblyAI } from 'assemblyai';
const client = new AssemblyAI({ apiKey: process.env.ASSEMBLYAI_API_KEY });
const transcript = await client.transcripts.transcribe({
audio: "https://storage.googleapis.com/aai-web-samples/architecture-call.mp3",
redact_pii: true,
redact_pii_policies: [
"person_name",
"phone_number",
],
redact_pii_sub: "hash",
redact_pii_audio: true,
redact_pii_audio_quality: "mp3"
});
if (transcript.status === "error") {
throw new Error(transcript.error);
}
console.log(transcript.text);
This configuration ensures that the redacted audio is available in MP3 format. The redacted audio file can be downloaded using the following code:
import { writeFile } from "fs/promises";
const { redacted_audio_url } = await client.transcripts.redactions(transcript.id);
const redactedFileResponse = await fetch(redacted_audio_url);
await writeFile("./redacted-audio.mp3", redactedFileResponse.body);
Executing the Script
Run the script in your shell:
node index.js
If successful, the console will display the redacted transcript, and a redacted audio file will be saved to your disk. The tutorial also provides an example of an unredacted transcript for comparison.
Conclusion
By following this tutorial, developers can efficiently redact PII from audio and video files using AssemblyAI and Node.js. For more details, visit the AssemblyAI blog.
Read More
Nordic Service Partners Leverages Oracle Cloud to Enhance KFC Operations
Jun 13, 2024 3 Min Read
Understanding LSD BNB: Enhancing Liquidity and Flexibility with BNB Staking
Jun 13, 2024 3 Min Read
BitMEX to Implement New Fee Structure for Spot Trading
Jun 13, 2024 3 Min Read
Bitfinex: Aethir (ATH) Seeks to Revolutionize Decentralized Cloud Computing
Jun 13, 2024 3 Min Read
Binance to Adjust Step Size for Spot Trading Pairs on June 19, 2024
Jun 13, 2024 3 Min Read