128KB SPI-RAM with caching support

2019/07/05

If you don’t need the additional GPIO’s on Arduino Mega, you can instead use Teensy adapter for more memory and faster speeds.

This blog talks about adding 128 KBytes to Arduino Mega w/ SPI and some of learnings thru the process.

SPI-RAM (23LC1024, 128KBytes, SPI)

I chose Arduino Mega because of its 5V GPIO’s but it has only 8KB RAM. My goal was to keep the design simple (no level shifters) so I decided to solve the memory problem later on. That time has come now; I want to run OS’s and want to design the RetroShield 68008.

Last option was simplest and cheapest:

k1802

Commands required to read/write from memory.

k1802

Wtih sequential-mode (enabled by default), after you send the initial address, you can continue to clock data bytes back to back and 23LC1024 will auto-increment the address pointer. This is very handy for page reads/writes during cache operations.

k1802

You can also switch to SQI (4-bit) mode. In this mode, we send 8-bits of data in two clock cycles. Note that dummy byte read after address. You don’t have to do this for SQI write transactions.

k1802

Implementing a Cache

As you notice, even though we use 4-bit mode, read/writing a byte from SPI-RAM is expensive because we have to send 4 bytes (cmd + 24bit address) for 1 byte. Doing this for every CPU read/write will be slow. We can implement a simple cache to speeds things up.

A cache is basically a small fast memory area where we fetch data from slower memory and hope the CPU will be accessing this fast memory frequently instead of the slow memory. If things go well, average-wise memory looks fast to CPU.

Since the size of this fast memory is much smaller than the slow memory, we bring data in “blocks” or “pages” and keep track of which page is in cache or not.

If CPU is trying to access a memory location, we check if the page containing that address is in cache. If the page is in the cache, we will access the page in cache, (cache-hit). If the page is not in the cache, then we need to bring the page from SPI-RAM to the cache page and then complete the access, (cache-miss). Expectation is the ratio of cache-hits will be higher than cache-misses.

There are couple of optimization parameters for an effective cache:

Tuning of these parameters will depend on the program you are runing and its memory access pattern.

This is a good write-up that explains cache concepts.

On Arduino, I initially chose Direct-Mapped Cache because it is easy and a good starting point. Direct-Mapped Cache uses part of the address bits to map to a fixed cache location and uses rest of the address as tag.

Let’s go over the code together:

////////////////////////////////////////////////////////////////////
// Cache for SPI-RAM
////////////////////////////////////////////////////////////////////

byte cacheRAM[16][256];
byte cachePage[16];

So,

Let’s see how we read using cache:

inline __attribute__((always_inline))
byte cache_read_byte(word addr)           // 0x1234
{
  byte a = (addr & 0xFF00) >> 8;            // a = 0x12
  byte p = a >> 4;                          // p = 0x01
  byte n = a & 0x0F;                        // n = 0x02
  byte r = (addr & 0x00FF);                 // r = 0x34
  
  if (cachePage[n] == p)
  {
    // Cache Hit !!!
    return cacheRAM[n][r];
  }
  else
  {
    // Need to fill cache from SPI-RAM
    digitalWrite2f(LED2, HIGH);
    spi_read_byte_array_quad(0, addr & 0xFF00, 256, cacheRAM[n]);
    cachePage[n] = p;
    digitalWrite2f(LED2, LOW);
    return cacheRAM[n][r];
  }
}

Looking at code above, we split the 16-bit address into three pieces: p, n, r. For example, address of 0x1234 becomes p=0x1, n=0x2, r=0x34.

We use n to find the corresponding cacheRAM page, cacheRAM[n][...]. r points to byte in that page, cacheRAM[n][r]. Last, we use p as tag, which shows what address is saved in this cache area, and saved under cachePage[n] = p.

If cachePage[n] == p, then we have the page in the cache, hence a cache-hit. Otherwise, cache-miss which results in copying data from SPI-RAM.

cache_write_byte is the same concept except we write the data to SPI-RAM immediately. (I will experiment with dirty caches later on).

inline __attribute__((always_inline))
void cache_write_byte(word addr, byte din)   // 0x1234
{
  byte a = (addr & 0xFF00) >> 8;            // a = 0x12
  byte p = a >> 4;                          // p = 0x01
  byte n = a & 0x0F;                        // n = 0x02
  byte r = (addr & 0x00FF);                 // r = 0x34

  if (cachePage[n] == p)
  {
    // Cache Hit !!!
    cacheRAM[n][r] = din;
    spi_write_byte_quad(0, addr, din);        // Write-thru cache :)
    return;
  }
  else
  {
    // Need to fill cache from SPI-RAM
    digitalWrite2f(LED1, HIGH);
    spi_write_byte_quad(0, addr, din);
    spi_read_byte_array_quad(0, addr & 0xFF00, 256, cacheRAM[n]);
    cachePage[n] = p;
    digitalWrite2f(LED1, LOW);
    return;
  }
}

void cache_init()
{
  // Initialize cache from spi-ram
  for(int p=0; p<16; p++)
  {
    cachePage[p] = 0;
  }
  Serial.println("RAM Cache - Initialized.");
}

The code is checked in Gitlab repository.

That’s all folks.