The era of transformer AI dominance is coming to an end as the search for new architectures begins.
Transformers are the backbone of models like OpenAI’s Sora and text-generating models such as Anthropic’s Claude, Google’s Gemini, and GPT-4o. However, they face challenges related to computation efficiency.
Efforts are being made to address these challenges with the introduction of architectures like Test-Time Training (TTT), which promises better data processing efficiency with less compute power consumption.
The hidden state in transformers
The “hidden state” is a key component of transformers, acting as their memory bank. However, this feature also limits their efficiency, as accessing information stored in the hidden state can be computationally intensive.
Innovations like TTT propose replacing the hidden state with a machine learning model that encodes data into manageable weights, allowing for more efficient data processing without the need for constant growth in computational requirements.
Skepticism around the TTT models
While TTT models show promise in terms of efficiency gains, their scalability and effectiveness compared to existing architectures like transformers are still subject to scrutiny and further research.
Other alternatives to transformers, such as state space models (SSMs), are also being explored by companies like Mistral, AI21 Labs, and Cartesia, hinting at the growing recognition of the need for breakthroughs in AI architecture.