A major multimodal language model, GPT-4, is ready for prime time, although, contrary to reports since Friday, it does not support the ability to produce text-to-video.
However, GPT-4 can accept image and text input and produce text output. In a number of domains, including documents with text and photos, charts or screenshots, GPT-4 exhibits the same capabilities as it does for text-only input, OpenAI explains on its website.
That feature, however, is in “research preview” and will not be available to the public.
OpenAI explained that GPT-4, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.
For example, it passed a simulated bar exam with a score in the top 10% of test takers. In contrast, the estimate for GPT-3.5 was about 10%.
Leaps over past models
One early adopter of GPT-4 is Casetext, creator of CoCounsel, an artificial intelligence legal assistant that it says can pass both the multiple-choice and written sections of the Uniform Bar exam.
“GPT-4 surpasses the power of earlier language models,” Casetext co-founder and chief innovation officer Pablo Arredondo said in a statement. “The model’s ability to not only generate text, but to interpret it heralds nothing less than a new era in legal practice.”
“Casetext Advisor is changing the way we practice law by automating critical, time-consuming tasks and freeing our lawyers to focus on the most impactful aspects of the practice,” added Frank Ryan, president of global law firm DLA Piper Americas. in a press release.
OpenAI explained that it spent six months conforming to GPT-4 using lessons from its adversarial testing program as well as ChatGPT, which resulted in its best results ever, though far from perfect, in terms of realism, controllability and out-of-blocks. on opting out.
It added that the GPT-4 training run was exceptionally stable. It was the company’s first large-scale model whose learning outcomes it was able to accurately predict ahead of time.
“As we continue to focus on reliable scaling,” it reads, “we aim to refine our methodology to help us anticipate and prepare for future opportunities earlier, something we see as critical to security.”
Subtle distinctions
OpenAI noted that the difference between GPT-3.5 and GPT-4 can be subtle. The difference emerges when the complexity of the task reaches a sufficient threshold, it explains. GPT-4 is more reliable and creative and can handle more nuanced instructions than GPT-3.5.
GPT-4 is also more customizable than its predecessor. Rather than the classic ChatGPT personality of fixed speech, tone, and style, OpenAI explained, developers—and soon ChatGPT users—can now define their AI’s style and task by describing those directions in a “system” message. System messages allow API users to significantly customize their user experience within boundaries.
API users will have to wait to try the feature at first, however, as their access to GPT-4 will be limited to a waiting list.
OpenAI acknowledged that despite its capabilities, GPT-4 has the same limitations as previous GPT models. Most importantly, it is still not completely reliable. It “hallucinates” the facts and makes mistakes.
Great care should be taken when using language model results, especially in high-stakes contexts, OpenAI warned.
GPT-4 can also be reliably wrong in its predictions without bothering to double-check performance when it is likely to be wrong, he added.
T2V Missing
Anticipation for a new GPT release heightened over the weekend after a Microsoft executive in Germany suggested that text-to-video capability would be part of the final package.
“We will introduce GPT-4 next week, where we have multimodal models that will offer completely different capabilities, such as video,” Andreas Braun, Microsoft’s chief technology officer in Germany, said at a press event on Friday.
Text-to-video will be very disruptive, noted Rob Enderle, president and principal analyst at the Enderle Group, a consulting services firm in Bend, Ore.
“It could dramatically change the way movies and TV shows are made, the way news programs are made, by providing a mechanism for highly specifying user personalization,” he told TechNewsWorld.
Enderle noted that an initial use of the technology could be to create stories from script drafts. “As this technology matures, it will be closer to a finished product.”
Video Distribution
Content created with text-to-video apps is still mainstream, said Greg Sterling, co-founder of Near Media, a news, commentary and analysis website.
“But text-to-video could be disruptive in the sense that we’re going to see a lot more video content being created at very low cost or almost no cost,” he told TechNewsWorld.
“The quality and effectiveness of that video is another matter,” he continued. “But I doubt some of it will be proper.”
He added that explainers and background information are good candidates for text-to-video.
“I could imagine some agencies using it to create videos for small and medium businesses to use on their websites or on YouTube for ranking purposes,” he said.
“It won’t do well, at least initially, with any branded content,” he continued. “Social media content is another use case. You’ll see creators on YouTube using it to build volume to get views and ad revenue.”
Not fooled by Deepfakes
As found with ChatGPT, there are potential risks to the technology, such as text-to-video.
“The most dangerous use cases, like all such tools, are garden-variety scams impersonating people’s relatives or attacks on particularly vulnerable individuals or institutions,” observed Will Duffield, a policy analyst at the Cato Institute, a Washington think tank. .
Duffield, however, discounted the idea of using text-to-video to create effective deepfakes.
“When we’ve seen well-resourced attacks like the Russian Deepfake that Zelensky delivered last year, they fail because there’s enough context and expectation in the world to debunk a fake,” he explained.
“We have very clear ideas about who public figures are, what they are about, what we can expect from them,” he continued. “So when we see their media behaving in a deviant way that doesn’t meet those expectations, we’re likely to be very critical or skeptical of it.”