Domipheus Labs

Stuff that interests Colin ‘Domipheus’ Riley

Content follows this message
If you have enjoyed my articles, please consider these charities for donation:
  • Young Lives vs Cancer - Donate.
  • Blood Cancer UK - Donate.
  • Children's Cancer and Leukaemia Group - Donate.

Designing a CPU in VHDL, Part 13: Memory system and BIOS beginnings

Posted May 17, 2016, Reading time: 8 minutes.

This is part of a series of posts detailing the steps and learning undertaken to design and implement a CPU in VHDL. Previous parts are available here, and I’d recommend they are read before continuing.

Now we have text-mode HDMI/DVI-D output, it’s about time we started writing more code for TPU. However, we’ve not delved into too much detail yet about the memory subsystem – the part of the puzzle which reinterprets the various busses from the TPU module in VHDL and manages how data flows between different memories and/or mapped ‘registers’.

TPU memory interface

TPU has an address bus output, a data input bus and a data output bus. Generally CPUs have a single data bus and it’s bidirectional, but I opted for this current setup early on and have stuck with it.

For the most part, memory on the TPU ‘System on Chip’ is made up of Xilinx Block Rams. These are 2KB in size, and dual-ported, allowing them to be used as VRAM and TRAM (for an explanation of TRAM see the previous part in this series of posts). The rest of the memory subsystem is addressing logic for memory mapping the UART, switches, LEDs, and other peripheral I/O.

Due to most memory blocks being 2KB, memory is divided up into 2KB blocks/banks. The address bus selects a bank, which has it’s own chip select line.

-- Embedded ram
	MEM_CS_ERAM_1 <= '1' when (MEM_BANK_ID = X"0"&'0') else '0'; -- 0x00 bank
	MEM_CS_ERAM_2 <= '1' when (MEM_BANK_ID = X"0"&'1') else '0'; -- 0x08 bank
	MEM_CS_ERAM_3 <= '1' when (MEM_BANK_ID = X"1"&'0') else '0'; -- 0x10 bank
	MEM_CS_ERAM_4 <= '1' when (MEM_BANK_ID = X"1"&'1') else '0'; -- 0x18 bank
	
	-- system i/o maps
	MEM_CS_SYSTEM <= '1' when (MEM_BANK_ID = X"9"&'0') else '0'; -- 0x90 bank
	
	-- 4KB of font bitmap ram
	MEM_CS_FRAM_1 <= '1' when (MEM_BANK_ID = X"A"&'0') else '0'; -- 0xA0 bank
	MEM_CS_FRAM_2 <= '1' when (MEM_BANK_ID = X"A"&'1') else '0'; -- 0xA8 bank
	
	-- 4KB of text character ram
	MEM_CS_TRAM_1 <= '1' when (MEM_BANK_ID = X"B"&'0') else '0'; -- 0xB0 bank
	MEM_CS_TRAM_2 <= '1' when (MEM_BANK_ID = X"B"&'1') else '0'; -- 0xB8 bank

We have signals for the masked off address within any given bank, and the bank ID.

-- mem brams banks are 2KB. The following addresses within a BRAM
	MEM_2KB_ADDR <= MEM_O_addr and X"07FF";
	MEM_BANK_ID <= MEM_O_addr(15 downto 11);

Then the block rams “ebram” entities are connected as so:

ebram_1: ebram Port map ( 
    I_clk => cEng_clk_core,
    I_cs => MEM_CS_ERAM_1,
    I_we => MEM_WE,
    I_addr => MEM_2KB_ADDR,
    I_data => MEM_O_data,
    I_size => MEM_REQ_SIZE,
    O_data => MEM_DATA_OUT_ERAM_1
	);

You’ll notice the data output from the ebram is to it’s own signal, MEM_DATA_OUT_ERAM_1. The actual signal that gets selected for input into the TPU core is chosen via a big asynchronous conditional:

-- select the correct data to send to tpu
	MEM_I_data <= INT_DATA when O_int_ack = '1' 	
	              else  MEM_DATA_OUT_ERAM_1 when MEM_CS_ERAM_1 = '1' 
	              else  MEM_DATA_OUT_ERAM_2 when MEM_CS_ERAM_2 = '1' 
	              else  MEM_DATA_OUT_ERAM_3 when MEM_CS_ERAM_3 = '1' 
	              else  MEM_DATA_OUT_ERAM_4 when MEM_CS_ERAM_4 = '1' 
	              else  MEM_DATA_OUT_ERAM_5 when MEM_CS_ERAM_5 = '1' 
	              else  MEM_DATA_OUT_ERAM_6 when MEM_CS_ERAM_6 = '1' 
	              else  MEM_DATA_OUT_ERAM_7 when MEM_CS_ERAM_7 = '1' 
	              else  MEM_DATA_OUT_ERAM_8 when MEM_CS_ERAM_8 = '1'
					  
	              else  MEM_DATA_OUT_FRAM_1 when MEM_CS_FRAM_1 = '1' 
	              else  MEM_DATA_OUT_FRAM_2 when MEM_CS_FRAM_2 = '1'
					  
	              else  MEM_DATA_OUT_TRAM_1 when MEM_CS_TRAM_1 = '1' 
	              else  MEM_DATA_OUT_TRAM_2 when MEM_CS_TRAM_2 = '1'
					  
	              else  MEM_DATA_OUT_VRAM_1 when MEM_CS_VRAM_1 = '1' 
				        else IO_DATA ;

We could implement all of this with bidirectional/tristate signals, but maybe that’s a discussion for another post. I’ve intentionally kept bidirectional communication to the minimum, as it can easily cause confusing situations.

So you can see it’s fairly easy to move things around and see how to attach block rams into the TPU ‘memory map’. But we also have I/O!

Memory mapped I/O

Part 9 showed how memory mapped I/O was handled, and that does not change at all. There is a process in the top-level module which monitors for memory requests at certain addresses, and manipulates the IO_DATA signal in the case of any memory reads. You can see above that the data into TPU selects IO_DATA when no memory selects or interrupts are active. If we add another peripheral, we simply edit the process hanling this part of memory to update the relevant signals for any given address.

if MEM_O_addr = X"9000" and MEM_O_we = '1' then
    -- onboard leds
    IO_LEDS <= MEM_O_data(7 downto 0);
  end if;
  
  if MEM_O_addr = X"9001" and MEM_O_we = '0' then
    -- onboard switches
    IO_DATA <= X"000" & IO_SWITCH;
  end if;

Access Violations

You’ll notice that the chip selects above don’t map all banks to block rams just now. It would be useful to know about memory locations that are not currently mapped, and that should be a simple case of making sure a chip select line is active on a memory command. If a chip select is not active, we can assume the address is unmapped, and request an interrupt to the TPU core.

First, we need to OR together all those chip select lines into one allseeing MEM_ANY_CS signal. Then, we check that signal in the I/O handler process – and if it is ever inactive during a memory operation, we know that we’re accessing unmapped memory.

MEM_proc: process(cEng_clk_core)
begin
  if rising_edge(cEng_clk_core) then
  
    if MEM_readyState = 0 then
      if MEM_O_cmd = '1' then
        
        if MEM_ANY_CS = '0' then
          -- a memory command with unmapped memory
          -- throw interrupt
          MEM_Access_error <= '1';
          MEM_Access_error_bank <= MEM_O_addr(15 downto 8);
        end if;
      
        ...snip... 

In the code above, whats missing is that at the end of the memory command, we de-assert the MEM_Access_error signal. This means that if another process sees this MEM_Access_error signal as active, we can use that to request an interrupt.

Memory Interrupt Process

This is what the memory interrupt process checks for, and acts upon.

exception_notifier: process (cEng_clk_core, MEM_Access_error)
begin
  if rising_edge(cEng_clk_core) and MEM_access_int_state = 0 then
    if MEM_Access_error = '1' then
      I_int <= '1';
      MEM_access_int_state <= 1;
      INT_DATA <= X"80"  & MEM_Access_error_bank;
    elsif MEM_access_int_state = 1 and I_int = '1' and O_int_ack = '1' then
      I_int <= '0';
      MEM_access_int_state <= 2; 
    elsif MEM_access_int_state = 2 then
      MEM_access_int_state <= 3; 
    elsif MEM_access_int_state = 3 then
      MEM_access_int_state <= 0;
    end if;
  end if;
  
end process;

This process checks each clock cycle for a memory access error, and if it notices one, it requests an interrupt, and saves the current memory bank into the lower half of the INT_DATA signal. This signal is what becomes the Interrupt Event Field, which is accessible in user code. We set he high byte of this to 0x80 – to identify the interrupt type to the interrupt handler. The rest of the code in this process is simply following the exception workflow – it waits for the interrupt acknowledge, and then waits for cycles of latency before completing.

With these changes, it now means if an un-mapped memory access occurs, our interrupt vector code is called, and when we issue the gief instruction to obtain the Event Field into a register, we’ll be able to know it’s a memory violation – and what bank the attempted access came from.

The beginnings of a BIOS

So, now that we have our text output, our UART, and a decent memory system, it’s time to start implementing a BIOS which we can leverage when building real TPU programs. So far, my bootloader contains several functions:

At the moment, when printing the bios header (using a custom glyph set for a TPU logo), the size of this code amounts to around 1.1KB – which is pretty massive when you think about it.

You can see in that above image a memory figure – and this is checked at runtime. It iteratively searches through memory, reading every byte location, until an unmapped memory violation occurs. The memory test assumes that the first contiguous block of ram is the usable memory. With 8 2KB block rams connected to those first bank addresses, we have 16KB to play with.

This memory test really brought back old memories of how long you sometimes had to wait for the memory test to complete. The uitoa function relies heavily on divide/mod operations, and with software-only divide, things are slow. It’s a few seconds to work through that 16KB window. But, I quite like the fact it is slow enough that you can see the searches happen in real time.

With that, I was tempted to integrate a startup beep like old times. And, well, I’m going to do what @mmalex tells me to do in this instance!

The audio is a simple square wave through the headphone jack which I’d forgotten existed on the miniSpartan6+ board. I now have a memory-mapped register which controls it’s operation, allowing you to activate the left or right output channels. I’ll add the ability to control the tone later – for now, it’s just a cool bios beep!

Wrap Up

That pretty much explains the current memory subsystem in a bit more detail, and hopefully shows that TPU is now starting to really behave like an old vintage computer. I aim to develop the BIOS more, and have another post up my sleeve talking about a new instruction, and further BIOS progress.

Thanks for reading, as always let me know your thoughts via twitter @domipheus.