Many AI products add extra input modes without redesigning runtime semantics.
The result is predictable:
Teams lose consistency, and output quality drops.
For an AI code generator or AI app builder, multimodal input only works when every input resolves through the same execution model.
What builders actually need
Builders do not need a novelty input layer.
They need faster direction with reliable outcomes.
Useful command patterns include:
The three-layer model we use
We design multimodal workflows in three layers.
Layer 1: Intent capture
Text, speech, direct edits, and structured actions become intent candidates, not immediate mutations.
This reduces destructive changes and allows confidence scoring.
Layer 2: Intent resolution
Intent resolves against active scene context, selected nodes, and capability boundaries.
This is where "make this cleaner" becomes concrete operations such as:
Layer 3: Runtime transform
Resolved intent becomes a typed transform in the runtime pipeline.
The result:
Why this matters for AI code generation
Without structured routing, code generation degrades quickly:
With structured routing, teams can accelerate:
Patterns that consistently work
Use scoped commands
Good:
Bad:
Require confirmation for high-impact transforms
For broad changes, show a concise action summary before apply.
Keep a visible timeline
Teams should always be able to inspect and replay what changed.
Use one command graph for all inputs
Input type should not create a different runtime path.
Accessibility and practical value
Speech support improves accessibility for users who prefer voice control.
But accessibility is not the only reason.
The core product value is performance:
Where this fits in Dreams.fm
In Dreams.fm, multimodal commands are integrated with fmEngine runtime transforms.
That means a request can update:
All within one timeline and one state model.
Keyword strategy note
We describe this system in categories users already search:
fmEngine remains a secondary term while category discovery grows.
Closing
Multimodal workflows only work when every input controls real product state with clear execution semantics.
If you are building an AI code generator, that is the bar.



