A still from a video of Albert Einstein talking generated by ByteDance's OmniHuman-1. (Image Source: ByteDance)TikTok’s parent company – ByteDance recently unveiled a new AI tool that can generate lifelike videos of people talking, playing instruments and more using a single photo. Dubbed OmniHuman-1, ByteDance says the new tool “significantly outperforms existing methods, generating extremely realistic human videos based on weak signal inputs, especially audio.”
In a research paper published on arXiv, the company claimed that the new tool can work with images with any aspect ratio, irrespective of whether they are portraits, half-body or full-body images and deliver “lifelike and high-quality results across various scenarios.”
This is a step up compared to other AI models, many of which can only change facial expressions or make humans speak. On the OmniHuman-1 page hosted on Beehiiv, researchers shared several sample videos showing how the tool performs with examples showing hand and body movements from multiple angles, and animals in motion.
In a black and white video, OmniHuman-1 shows the known scientist Albert Einstein talking in front of a blackboard performing hand gestures and showing facial expressions. ByteDance says it trained the new tool on more than 18,700 hours of human videos, combining it with various types of inputs like text, audio and physical poses.
The researchers also suggest that OmniHuman-1 currently outperforms similar systems across multiple benchmarks. OmniHuman-1 isn’t the first image-to-video generator, but ByteDance’s new tool may have some advantage over its competitor since it is likely trained on videos from TikTok.