924+ Multimodal AI Workflows
Largest set of 924 workflows using multimodal AI. Image, audio, video and document processing with GPT-4 Vision, Gemini, Whisper and ElevenLabs. From lead scoring to Spotify playlist generation.

Features
- Image analysis with GPT-4 Vision and Gemini
- Audio transcription with Whisper
- Speech generation with ElevenLabs
- Image creation with DALL-E and Midjourney
- AI video processing
- Lead scoring with multimodal analysis
- Voice bots and multimodal chatbots
- AI Spotify playlist generation
Full power of multimodal AI
This pack is the largest collection of multimodal workflows. 923 automations combining text, image, audio and video into intelligent solutions.
Image Analysis
GPT-4 Vision and Gemini analyze product photos, documents, charts and screenshots. From data extraction to visual categorization.
Audio Processing
Meeting transcription with Whisper, speech generation with ElevenLabs and voice bots for customer service. Audio as a full communication channel.
Image Generation
Create graphics with DALL-E and Midjourney directly from workflows. Automatic thumbnails, product shots and illustrations.
Video AI
YouTube video analysis, key moment extraction and automatic summaries. Creating shorts and reels with AI.
Creative Applications
From generating Spotify playlists based on text description to AI prompt A/B testing. Custom solutions for creative industries.