Multiplexed traveling waves for encoding sequence position in State-Space models

Multiplexed traveling waves for encoding sequence position in State-Space models

Motivation

State-space models, such as Mamba(paper, code) have struggled to generalize at long-context lengths, however recent work, Nu-wave Mamba([paper]https://www.world-wide.org/cosyne-25/nu-wave-state-space-models-traveling-3803805f/) has show that interpreting Mamba through the lens of traveling waves can expand the generalization window. In our recent work, we drew connections between Rotary Positional Encodings ([paper] ) and traveling waves; deriving the proper continuous form of the roll operator in Nu-waves. The goal of this project is to further investigate the role of traveling waves in context generalization, namely whether there are any gains to be found in the Transformer architecture or Mamba from proper multiplexed traveling waves.

Project

The idea is to take the H3 block (paper) and make the shift operation continuous with Lie Algebras instead of the Nu-wave setup. In addition/alternatively, we would like to experiment with the generalization properties of multiplexed traveling waves as positional encodings for transformers using similar operations.

This topic requires comfort with learning math (particularly PDEs), but not deep background knowledge. Comfort with complex numbers and their exponentials is beneficial.

Thesis

The main starting points of exploration is:

  • Can meaning be drawn from state-space models and traveling waves? Do they benefit from damping?
  • Do traveling waves benefit context generalization in transformers?
  • Can other PDEs be used to encode position?

Contact

To apply please email Chase van de Geijn stating your interest in this project and detailing your relevant skills. Please note that this project is better suited for student, who already have some minimal coding background.

Neural Data Science Group
Institute of Computer Science
University of Goettingen