
Introduction In the rapidly evolving field of artificial intelligence, a staggering challenge looms large: the computational cost of processing vast amounts of data with current transformer models, which have been the backbone of breakthroughs like large language models. These architectures, while revolutionary, struggle with efficiency as input sizes grow, often requiring exponentially more resources to handle longer contexts such as










