Orchestrating Multimedia Magic: How I Built Content Generation with Vizra ADK Workflows
Orchestrating Multimedia Magic: How I Built Content Generation with Vizra ADK Workflows
In my recent work, I wasn’t just generating text; I was building immersive multimedia experiences. Whether it was generating daily briefings, educational content, or dynamic updates, relying on a single LLM prompt often fell short.
To create high-quality content that combined researched text, synthesized audio (via ElevenLabs), and generated imagery, I needed orchestration. Enter Vizra ADK Workflows.
Here is a deep dive into how I leveraged Vizra’s workflow patterns to turn a simple topic into a full multimedia package.
The Challenge: Coordination vs. Chaos
Generating multimedia requires distinct steps:
- Research & Writing: Ensuring factual accuracy (RAG) and engaging copy.
- Audio Synthesis: Converting text to speech using specific voice profiles.
- Visuals: Generating thumbnails or accompanying images.
Doing this linearly is slow. Doing it without structure is error-prone. I needed a system that could handle sequential logic for writing and parallel execution for asset generation.
The Solution: The Vizra Workflow
I utilized the Workflow facade provided by Vizra ADK to compose a pipeline that mixes sequential and parallel execution patterns.
1. The Architect: Sequential Planning
Everything starts with a script. I used a Sequential Workflow to ensure I had a solid foundation before generating expensive assets.
use Vizra\VizraADK\Facades\Workflow;
use App\Agents\Content\ResearcherAgent;
use App\Agents\Content\ScriptWriterAgent;
public function generateContent(string $topic)
{
// Step 1: Research and Write
$scriptData = Workflow::sequential()
->then(ResearcherAgent::class) // Uses Meilisearch Vector Store
->then(ScriptWriterAgent::class) // Uses Gemini Pro for reasoning
->run($topic);
// ... pass to next stage
}In this stage, the ResearcherAgent uses the VectorMemoryTool (backed by Meilisearch) to pull relevant context. The ScriptWriterAgent then formats this into a JSON structure containing a title, body, and a prompt for the image generator.
2. The Factory: Parallel Asset Generation
Once I had the script, I didn’t want to wait for the audio to finish before starting the image generation. Vizra’s Parallel Workflow allowed me to spin up multiple agents simultaneously.
I passed the output from the sequential step into a parallel block.
// ... inside generateContent
$assets = Workflow::parallel()
->agents([
'audio' => VoiceOverAgent::class,
'visual' => ThumbnailGeneratorAgent::class,
])
->run($scriptData['final_script']);
return [
'script' => $scriptData,
'audio_path' => $assets['audio'],
'image_url' => $assets['visual'],
];
}3. The Agents & Tools
The magic happens inside the specialized agents. Here is how I configured them using the Vizra ADK structure.
The Voice Over Agent (ElevenLabs)
This agent is responsible for taking text and returning a path to an MP3 file. It utilizes a custom tool I built to interface with the ElevenLabs API.
namespace App\Agents\Content;
use Vizra\VizraADK\Agents\BaseLlmAgent;
use App\Tools\Audio\ElevenLabsTtsTool;
class VoiceOverAgent extends BaseLlmAgent
{
protected string $name = 'voice_over_specialist';
protected string $model = 'gpt-4o-mini'; // Fast, low cost for tool calling
protected string $instructions = <<<'INSTRUCTIONS'
You are an audio engineer.
1. Receive the script text.
2. Select the appropriate voice ID based on the content tone.
3. Use the 'text_to_speech' tool to generate the audio.
4. Return the file path provided by the tool.
INSTRUCTIONS;
protected array $tools = [
ElevenLabsTtsTool::class,
];
}The Tool Implementation
The ElevenLabsTtsTool handles the actual API call, keeping my agent logic clean.
namespace App\Tools\Audio;
use Vizra\VizraADK\Contracts\ToolInterface;
use Illuminate\Support\Facades\Http;
class ElevenLabsTtsTool implements ToolInterface
{
public function definition(): array
{
return [
'name' => 'text_to_speech',
'description' => 'Converts text to audio using ElevenLabs',
'parameters' => [
'type' => 'object',
'properties' => [
'text' => ['type' => 'string'],
'voice_id' => ['type' => 'string'],
],
'required' => ['text'],
],
];
}
public function execute(array $arguments, $context, $memory): string
{
// Implementation calling ElevenLabs API...
// Returns JSON with { "status": "success", "path": "..." }
}
}Why This Approach Wins
- Modularity: If I want to switch from ElevenLabs to OpenAI TTS, I just swap the tool in the
VoiceOverAgent. The workflow remains untouched. - Speed: By parallelizing the asset generation, I cut the total processing time by nearly 50%.
- Observability: Vizra’s built-in tracing allows me to see exactly what the
ResearcherAgentfound in Meilisearch and why theScriptWriterAgentmade specific creative decisions. - Maintainability: Each agent has a single responsibility. The
ResearcherAgentdoesn’t know about audio files, and theVoiceOverAgentdoesn’t care about SEO keywords.
Conclusion
Building complex AI features isn’t just about prompt engineering; it’s about architecture. By treating LLMs as specialized workers within a Vizra Workflow, I’ve turned a complex, multi-modal generation process into a reliable, maintainable pipeline.