ECEA 5362 - Softcore Processor Development Flow

  1. Soft processor designs allows for more flexiblity, could change either code or HDL.
  2. Could choose to migrate into future FPGA devices, while keeping the same C Code, potentitally on a lower power/lower cost device or onto an ASIC.
  3. Soft processor's have JTAG code trace (multi/trace32/xmtc) to follow instruction/data traces
  4. Can add operating systems/RTOS's (eCos/uClinux/FreeRTOS)
  5. Software can use Simulink/Matlab on top of the softcore processor.
  6. NIOS II gives an Econopy/Fast type processor. See table at end of this section for comparison.
  7. 3rd party cores are available. 8051 (Cast), 68000 (Digital Core Design), 80816 (iWave)
  8. PicoBlaze from Xilinx is a small and efficient
  9. Dedicated Bus Structure: Intel-Altera NIOS - Avalon, Xilinx MicroBlaze - AXI, Microchip-Microsemi ARM Cortex-M1: AHB
  10. Bus structure allows for custom/dedicated peripherals to be attached to the softcore

Nios II Comparison Table

Feature NIOS II (E) NIOS II (F)
Pipeline 1 Stage 6 Stages
Branch Prediction None Dynamic
HW Multiplication Software 1 Cycle HW
Inst/Data Cache None 0.5 - 64 Kbyte
MMU None Available
Area 540 LEs 1600 LEs
Freq 195 MHz 140 MHz
Perf 18 MIPS 145 MIPS

Soft Processor Flows

  1. What computation requirements does the design have? How much bandwidh is needed? DDR3, DDR4?
  2. Do you need a faster or smaller processor/interfaces or hardware acceleration or off-loading? DMA = Direct Memory Access
  3. NIOS II Platform Designer Flow formerly called Qsys can be used to make these design choices. It can create processors, peripherals, and memories. Once compiled the logic cells and flip-flops inside the FPGA are loaded with the processor configuration. The design is volatile and is reprogrammed at power ON.
  4. Eclipse uses information from the .sopcinfo file for configuring processor. system.h defines symbols in the hardware Executable and Linking Format File .elf is the result of compiling a C/C++. Hexadecimel (intel-format) file (.hex) intialization information for on-chip memories. Flash memory programming data, boot code.
  5. List of available soft cores on the market (2020)
  6. Feature Altera Nios II ARM Cortex-M1 Xilinx Microblaze Xilinx Picoblaze Microsemi Core ABC Microsemi Core8051 Lattice Mico32 Lattice Mico8
    Datapath 32 bits 32 bits 32 bits 8 bits 8/16/32 8 bits 32 bits 8 bits
    Pipeline Stages 1-6 3 3-5 1 1 1 6 1
    Frequency (MHz) 340 200 332 240 92 52 115 52.2
    Gate Count 26k – 72k 2.6k-4k 30k – 60k 500 – 2k 420-4k 3k-5k 20k-25k 2k-3k
    Register File 32 gen. + 32 special 16 32 gen. + 32 special 1 1 32 32 16 or 32
    Instruction Word 32 bits 16/32 32 bits 18 bits 8/16 8/16 32 8
    Instruction Cache Optional No Optional No No No Option No
    Hardware Multiply Optional Yes Optional No No No Yes No
    Hardware Floating Point Optional No Optional No No No Optional No
    OS Support eCOS, uC/OSII, Linux, uClinux RTX Linux -- -- -- -- --
  7. A FPGA based processor system is equivalent to a microcontroller that includes a processor and a combination of peripherals and memory on a single chip. Includes processor core, on chip peripheral, on chip memory, interfaces to off chip memory. All processor systems use a consistent instruction set and programming model.
  8. NIOS II Core is a 32 bit RISC, full 32 bit instruction set. Optional memory management unit to support operating systems that require a MMU. Also an optional memory protection unit. MPU/MMU are exclusive.
  9. Has integration with FPGA's signal tap embedded logic analyzer enabling real time analysis of instructions and data in the FPGA.
  10. NIOS II ALU can be configured to include DSP funcs or FIR filters. Adding these hardware func's in, increases on chip resource usage.
  11. NIOS II Core has thirty-two 32 bit GP integer register + control registers each. Also may have shadow registers that are transparent to application code and are used to speed up code.
  12. reset - global hardware reset signal that roces the processor core to reset immediately
  13. cpu_resetrequest - optional, local reset signal that causes the processor to reset without affecting other components in the signal
  14. debug_reset_request - allos the jtag debugger to reset the processor
  15. debugreq - optional signal that temporarily suspends the processor for debugging purposes
  16. reset_req - optional signla that prevents memory corruption by performing memory handshake before reset.
  17. EIC - external interrupt controller - interface enables a speed up of interrupt handling in a complex system
  18. EC a non vectored exception controller handles all exception types and each exception causes the processor to transfer execution to an exception address
  19. IC: 32 level sensitive interrupt request inputs, irq0-irq31 providing a unique input for each interrupt source.
  20. Seperate instruction + data busses = harvard architecture. Instruction + data busses are both implemented as master ports.
  21. Data master port connects components: memory, peripheral components. Instruction master port connects only memory. They have a combined address map and instructinos and data are in the same address space.
  22. Instruction port fetches instruction to be executed by the processor. It is pipelined and does not perform any writes. Cache + TCM (tightly coupled memory) are supported.
  23. Data port - reads data from memory or a peripheral when the processor executes a load instruction. Writes data to memory or a peripheral when the processor executes a store instruction. Cache + TCM are supported.
  24. Cache is optional for both, perpertually enabled at runtime. There are methods to bypass it in software. Cache managment and and cache coherencey are handled by software.
  25. TCM/Tightly coupled memory places the closest memory to the CPU for fast access. Still has normal address space.
  26. MMU 4 GB of virtual to phyiscal memoery 4KB page and frame size control. TLB's (translation lookaside buffers) accelerating address translation.
  27. MPU's set read and write access permissions for data regions. Also execute access for certain regions.
  28. NIOS II Core has a JTAG Debug Module. Can be used for PC to Core Control.
  29. JTAG Instructions can work on either the data or instruction bus.
  30. BSP for NIOS II is compiled seperately from the application. When you start a new project, BSP is created for you. It can control stack size, stdin/out sources, system timers, and code size.
  31. .sof files go directly to CRAM while the device is on. .pof goes to the CFM and is used when the device is powered on.
  32. All major vendors have IP Cores, Intel Altera, Xilinx, Microchip Microsemi, and Lattice. Can instantiate IP cores in your HDL using top level port definiotons. Can be added graphically using block diagram element, and can use a system design tool like Qsys.
  33. Microsemi provides Direct Core (developed by Microsemi) and CompanionCore (3rd Party). Majority are available for free! Companion Core costs money usually. Libero can be used for Microsemi cores.
  34. Intel Altera has cores that are sometimes freely available, sometimes only for evaluation mode until paid. Intel deliver IP via the IP catalog in Quartus. There are also 3rd party IP cores available! Can also use it via source if source is included!
  35. Xilinx has a lot of IP in Vivade (its FPGA design tool). Majority of FPGA cores are free, there are also 3rd party cores.
  36. Lattice uses the Diamond Design tool via IPexpress. It has modules + IP. Modules are basic configurable blocks that come with I/O express. IP are more complex blocks, but may not come with the tool. Can instantiate like any other models via Verilog/VHDL. Not all modules are compatible with a particular device. 3rd party IP cores may cost money.
  37. DO files are scripts that allow issuing many commands at once. They're just text files, simlar to TCL scripts (can also issue tcl commands.)
  38. Simulatoin Cycle (stimulus/response), Simulation timing, sensitivity lists, signal drivers, resolution functions. VHDL code consists of numerous concurrent statmements or processes. Stimulus changes at a specific sim time, processes evaluate resulting logic changes, signal values change after specific time has elapsesd.
  39. At init, '0' for bits, False for bools, U for std_logic/std_ulogic, and smallest negative number (-2^31 - 1) at t=0. All signals are then updated in the update phase. If there is a transition scheduled they will change, time does not change. 3/then each process is executed if it is sensitive to a signal in the current sim cycle. Sim time in the next cycle is then determined. This new time is either the earliest time that any signal has an event scheduled, the earliest time at which any process is scheduled t oresume, or a delta delay (smallest/infitesimal change) with the same sim time.
  40. When a process is being executed, statements are sequential. Triggered by the sensitivity list of an event (or a resumption after wait statement)
  41. Changes are via process (a, b, s) which will execute on anything in a,b,s. Process with no sensitivity will trigger at t=0.
  42. Making sure sensitivity lists are correct is important to get both synthesis + sim working.
  43. Delta Delay is the default assignment propagation delay in the case that no delay is explicitly defined. It's used to make changes in an assignment without changing sim time, so "concurrency" can be faked out. Values you see in the windows are "real" values like 1ns, you do not see delta delays.
  44. # block delays will not synthesize. Need to be careful when mixing blocking and non block assignments in an always block in verilog. It can be confusing what gets assigned where.
  45. Can use SignalTap II internal logic analyzer (ILA) to examine behavior of internal signals. It's even hard to actually use probes because of the density of some of these devices. It stores captured signal data in device memory until it's ready to be ready/analyzed. Analyzer is basically an intermediary step from logic to device memory it synthesized in the design. Add it through quartus, pick the signals you want, recompile, load via the signal analyzer and run the analysis via the signal tap tool.