About 3,200 results
Open links in new tab
  1. 4 EXPERIMENTS provide an investigation of the effectiveness of MAMBA. We first show that MAMBA is a high-performing meta-RL algorithm: compared to baseli es it obtains high returns …

  2. MAMBA: an Effective World Model Approach for Meta …

    MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning Part of International Conference on Representation Learning 2024 (ICLR 2024) Conference

  3. MambaQuant: Quantizing the Mamba Family with Variance …

    Mamba is an efficient sequence model that rivals Transformers and demonstrates significant potential as a foundational architecture for various tasks. Quantization is commonly used in …

  4. Autoregressive Pretraining with Mamba in Vision

    The vision community has started to build with the recently developed state space model, Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual capability …

  5. nspired its adoption in vision applications. Vision Mamba (Vim) (Zhu et al., 2024) utilizes Vim blocks composed of pure Mamba layers: each Vim block leverages both forward and …

  6. LongMamba: Enhancing Mamba's Long-Context Capabilities via …

    To address this significant shortfall and achieve both efficient and accurate long-context understanding, we propose LongMamba, a training-free technique that significantly enhances …

  7. Jamba: Hybrid Transformer-Mamba Language Models

    We study various architectural decisions, such as how to combine Transformer and Mamba layers, and how to mix experts, and show that some of them are crucial in large scale modeling.

  8. At its core, each Mamba block utilizes the selective SSM (S6) layer (Gu & Dao, 2023), which is specifically designed to handle sequential data by preserving structured state dynamics across …

  9. We propose a novel Mamba-based Koopman operator (MamKO) modeling method, which leverages matrices generated from the Mamba structure to model complex nonlinear systems. …

  10. Drama: Mamba-Enabled Model-Based Reinforcement Learning Is …

    Transformers, on the other hand, suffer from the quadratic memory and computational complexity of self-attention mechanisms, scaling as $O (n^2)$, where $n$ is the sequence length.To …