“It’s something that, you know, we can’t really comment on right now,” OpenAI chief scientist Ilya Sutskver said when I spoke with members of the GPT-4 team via video call an hour after the announcement. “It’s pretty competitive out there.”
GPT-4 is a multimodal broad-language model, meaning it can respond to both text and images. Give it a photo of the contents of your fridge and ask what you can make, and GPT-4 will try to find recipes that use the ingredients pictured. It’s also great at explaining jokes, says Sutskever. “If you show it a meme, it can tell you why it’s funny.”
Access to GPT-4 will be available to waitlisted users and premium paid ChatGPT Plus subscribers with limited, text-only capacity.
“Continued improvements in many dimensions are remarkable,” says Oren Etzioni at the Allen Institute for AI. “GPT-4 is now the standard against which all foundation models will be evaluated.”
“A good multimodal model has been the holy grail of many large tech labs for the past few years,” says Thomas Wolff, co-founder of Hugging Face, an artificial intelligence startup that follows the open-source, broad-language BLOOM model. “But it remained inevitable.”
In theory, combining text and images could allow multimodal models to better understand the world. “It can overcome traditional weaknesses in language models, such as spatial reasoning,” Wolff says.
It is still not clear if this is true for GPT-4. OpenAI’s new model appears to be better at some basic reasoning than ChatGPT, solving simple puzzles like summarizing blocks of text with words starting with the same letter. During my demo on the call, I was shown GPT-4, which summarizes the announcement from OpenAI’s website using words that start with g; Defensive barriers, leadership and achievements were achieved. Huge, groundbreaking and globally gifted.” In another demonstration, GPT-4 accepted a tax document and answered questions about it, giving reasons for their answers.