



# First High-Energy RISC-V Instrumentation SOC (HERIS) for CEPC Readout ASICs

Reporter: Cui Yuxin

**Team:** Yan Qi, Chen Jiaolong, Luo Shoudong, Wang Hanwen, Zhang Yihan

On behalf of the CEPC Silicon Tracker & the HERIS Group

> Introduction

> R&D Process

> Future Plan

# Reconfigurable Intelligent SoC (System On Chip)

#### **Challenges**

- High cost and complexity of advanced process nodes
  - Expensive tape-out iterations for advanced nodes
  - Requires first-pass success, and the verification is also highly challenging.
- Limitations of traditional ASICs
  - Fixed functionality, weak configurability
  - Difficult to integrate on-chip algorithms

#### **New Design Paradigm**

- ➤ Introducing RISC-V based SoC architecture for hardware-software co-design
  - Programmability: Adapts to multiple applications via software configuration
  - Modularity & IP Reuse: Enhances design collaboration and efficiency
  - Can be integrated into existing ASIC chips

# RISC-V in High Energy Physics

To address future HEP experiment needs, CERN's RISC-V application focuses on two core directions:

- > Building highly reliable monitoring and control systems
- > Developing intelligent on-chip data processing











# RISC-V Applied to LATRIC

At the initial research stage, we selected the beginner-friendly Tiny-RiscV soft core, implementing a dynamically

configurable DAC module and other algorithms on OTK.



Schematic FPMROC



Schematic Tiny-RiscV



Schematic LATRIC

- Supports RV32IM instruction set
- Employs a three-stage pipeline (Fetch, Decode, Execute).
- Capable of running C programs
- Supports JTAG for online program updates via OpenOCD.
- Supports interrupts、FreeRTOS
- Easily portable to any FPGA platform.

—, Background Introduction

**R&D Process** 

三、Future Plan

# Tiny-RiscV Architecture Diagram

#### **HERIS**

- Core (Tiny RiscV)
  - RV32IM Core
  - 3 stage pipeline
  - CoreMark/MHz = 2.4
- JTAG Interface
  - OpenOCD support
  - GDB debugging support
- Peripherals
  - 4 KB ROM, 32 KB RAM
  - Supports multiple serial communication protocols, such as I2C, UART, and SPI
  - Integrated sensor data processing and UDP forwarding



- 55 nm process technology
- Frequency 50 MHz
- Size 1020 x 1196 um
- Design verification completed and submitted for tape-out in October
- Testing is scheduled to begin after the chips return, expected in January 2026

# Adding I2C Master Module

- Selected a mature I2C host from Opencores (https://opencores.org).
- Wrapped the host with an interface layer and mounted it on the bus.



#### Simulation result



#### 12C Module FPGA Verification

Used the onboard AT24C64 EEPROM memory chip for functional validation.

#### **C** Program



#### **ILA(Integrated Logic Analyzer) results**



#### **UART** output results



# Optimizing & Testing SPI Master Module

- Replaced the original SPI module with a more mature SPI Master, and added an interface wrapper
- Successfully verified module functionality on FPGA using the W25Q64 SPI Flash memory chip

W25Q64



**UART** output results



#### **ILA(Integrated Logic Analyzer) results**



# Adding ADC-Ethernet Module

Receives serial input data, processes it, and transmits it to the host PC via UDP protocol



#### On-board Validation



#### **Data reading**

Signal@CH1: 519



# **Optimizing Boot Method**

Tiny-RiscV (original using Block RAM as 32KB ROM)

- Boot only via JTAG, Lacks self-start capability, poor reliability and convenience.
- Large ROM area overhead



New Solution: Use SPI-based SD Card as external storage

- > ROM contains bootloader (reduced from 32KB to 4KB)
- On power-up, the bootloader loads the program from the SD card

#### Output from SD card test



# **Boot-Loader Design**

- Current RAM size: 32KB (30KB available for programs, top 2KB reserved for bootloader variables).
- Added UART output for debugging.

#### Linker Script

```
* bootloader.lds (Corrected Final Version)
 * Defines separate RAM regions to guarantee isolation. This version uses
 * direct calculation within the MEMORY block for maximum compatibility.
OUTPUT ARCH( "riscv" )
ENTRY(_start_bootloader)
/* Define constants for easy modification */
RAM TOTAL LENGTH = 16K:
BOOTLOADER RAM SIZE = 2K; /* Reserve 2KB at the top of RAM for the bootloader */
MEMORY
                    : ORIGIN = 0x000000000, LENGTH = 4K
  /* RAM available for the application. Its length is calculated directly. */
  app_ram (wxa!ri) : ORIGIN = 0x10000000, LENGTH = RAM_TOTAL_LENGTH - BOOTLOADER_RAM_SIZE
  /* RAM reserved for bootloader's private use. Its origin is calculated directly. */
  boot ram (wxa!ri) : ORIGIN = 0x10000000 + (RAM TOTAL LENGTH - BOOTLOADER RAM SIZE), LENGTH = BOOTLOADER RAM SIZE
SECTIONS
  /* Place code and read-only data in ROM */
   KEEP (*(SORT_NONE(.init)))
    *(.text .text.*)
    *(.rodata .rodata.*)
  /* Place the bootloader's .bss section into its reserved RAM region */
    . = ALIGN(4);
    bss start = .;
    *(.bss .bss.*)
    *(COMMON)
    . = ALIGN(4);
    __bss_end = .;
  } >boot_ram
  /* The stack pointer is initialized to the top of the bootloader's RAM region */
  _stack_start = ORIGIN(boot_ram) + LENGTH(boot_ram);
```

#### Startup Assembly File

```
1 // bootloader/bootloader_start.S
      .globl _start_bootloader
     start bootloader:
        // 1. Initialize the stack pointer to the top of the main RAM.
         // The linker script provides the ' stack start' symbol.
         la sp, _stack_start
        // Ontional: Clear the .hss section for the hootloader.
         // For a minimal bootloader, this step can often be skipped if you know
         // your variables don't rely on being zero-initialized. But it's good practice.
         la t0, __bss_start
        la t1, _bss_end
     .L clear bss boot:
         beq t0, t1, .L_clear_bss_boot_done
         sw zero. (t0)
         addi t0, t0, 4
         i .L clear bss boot
     .L_clear_bss_boot_done:
        // 2. Jump to the C entry point of the bootloader
22
23
         // If boot_main_c returns, hang here.
         j .L_hang
```

#### **Boot program**

```
10 // ... (配置宏定义保持不变) ...
11 #define APP START BLOCK ON SD CARD 0 // 使用方案A, 从块0读取
12 #define APP_RUN_ADDR_ON_CHIP_RAM 0x10000000
    #define ON CHIP RAM SIZE BYTES (14 * 1024)
    #define MY_SD_CARD_SPI_CHANNEL
    typedef void (*app_entry_t)(void);
     const app_entry_t app_entry = (app_entry_t)APP_RUN_ADDR_ON_CHIP_RAM;
     Explain code | Complete comments | Locate code bugs | Generate unit tests | Code Review | Close
     void boot_main_c() {
        // STEP 1: 初始化UART,这是我们唯一的调试窗口
        // 注意: 这里的uart init()必须能够独立工作,不依赖于主程序中的任何东西。
        // 发送一个启动信号,如果能在串口看到这个,说明Bootloader至少启动了
        DEBUG PRINTF("\n\n--- RISC-V Bootloader Started ---\n");
        DEBUG PRINTF("Initializing SD card on SPI channel %d...\n", MY_SD_CARD_SPI_CHANNEL);
        if (sd card init(MY SD CARD SPI CHANNEL) != 0) {
            DEBUG PRINTF("!!! FATAL: SD Card initialization failed. Halting. !!!\n");
            while(1); // 如果失败, 打印信息并死循环
        DEBUG PRINTF("SD Card initialized successfully.\n"):
        uint32_t num_blocks_to_read = (ON_CHIP_RAM_SIZE_BYTES + SD_BLOCK_SIZE - 1) / SD_BLOCK_SIZE;
        DEBUG_PRINTF("Target RAM size: %d bytes. Need to read %d blocks from SD card.\n", ON_CHIP_RAM_SIZE_BYTES, num_blocks_to_read);
        DEBUG PRINTF("Starting to copy data from SD:@0x%X to RAM:@0x%X...\n", APP START BLOCK ON SD CARD, APP RUN ADDR ON CHIP RAM);
        for (uint32_t i = 0; i < num_blocks_to_read; i++) {
            uint32_t source_block = APP_START_BLOCK_ON_SD_CARD + i;
            uint8_t* destination_address = (uint8_t*)APP_RUN_ADDR_ON_CHIP_RAM + (i * SD_BLOCK_SIZE);
            // 可以在这里加一个简单的进度指示
            if ((i % 4) == 0) { // 每4个块打印一个点
                DEBUG PRINTF(".");
            if (sd card read block(source block, destination address) != 0) {
                DEBUG PRINTF("\n!!! FATAL: Failed to read block %d. Halting. !!!\n", source block);
                while(1); // 读取失败, 打印信息并死循环
        DEBUG_PRINTF("\nData copy complete.\n");
        // STEP 5: 准备跳转
        DEBUG_PRINTF("Bootloader finished. Jumping to application at 0x%X...\n", app_entry);
        // 在跳转前可以加一个极短的延时,确保串口信息都发出去了
        for(volatile int i=0; i<10000; ++i);
        // 跳转
        app_entry();
        // 如果程序能执行到这里,说明跳转失败了
        DEBUG_PRINTF("!!! FATAL: Application returned to bootloader. Halting. !!!\n");
```

#### Hardware Code Modifications for Bootloader

- 1、Fetch Module: Controlled by internal signals rst\_n, flush\_i, stall. No direct inputs from external debug
- 2、JTAG Module Input: In jtag\_dm (Debug Module), based on RISC-V Debug Spec. Modified behavior when writing to dpc (Debug Program Counter) CSR.

```
modulas
 always @ (posedge clk or negedge rst_n) begin
     // 复位
     if (!rst n) begin
        pc <= `CPU RESET ADDR;
        pc prev <= 32'h0;
     // 冲刷
     end else if (flush i) begin
        pc <= flush addr i;
     // 暂停, 取上一条指令
     end else if (stall) begin
        pc <= pc prev;
     // 取下一条指令
     end else begin
        pc <= pc + 32'h4;
        pc prev <= pc;
     end
 end
```

```
end else begin

// when write dpc, we reset cpu here
if (data[15:0] == DPC) begin

//dm_reset_req <= 1'b1;

jtag_pc_we <= 1'b1;

jtag_pc_wdata <= data0;

dm_halt_req <= 1'b0;

dmstatus <= {dmstatus[31:12], 4'hc, dmstatus[7:0]};

end else if (data[15:0] < 16'h1020) begin

dm_reg_we <= 1'b1;

dm_reg_wdata <= data0;
end</pre>
```

When JTAG attempts to write to DPC, the current hardware's response is not to update the CPU's PC, but rather to assert a reset request signal

3. Reused existing flush mechanism (which flushes pipeline and restarts fetch from new address). JTAG module "borrows" this mechanism to update PC.

```
// When JTAG writes to PC, use the new address provided by JTAG; otherwise, use the normal jump address.
assign flush_addr_o = jtag_pc_we_i ? jtag_pc_wdata_i : jump_addr_i;

// Flush the pipeline on a normal jump, a CLINT stall, or a JTAG PC write.
assign flush_o = jump_assert_i | stall_from_clint_i | jtag_pc_we_i;
```

#### Post-modification: Normal startup via GDB is successful



# C Program Modifications & FPGA Validation

C programs compiled using Ids linker script and start\_s script, configured for the bootloader.

Method 1: ROM + SD Card



Debug output messages from successful ROM + SD Card boot shown

Method 2: GDB Debug Startup

```
For help, type "help".
# file: debug.gdb
                                       Type "apropos word" to search for commands related to "word"...
target remote localhost:3333
                                       Reading symbols from .\gpio...
                                       0x000000000 in ?? ()
monitor reset halt
                                       JTAG tap: riscv.cpu tap/device found: 0x1e200a6f (mfg: 0x537 (Wuhan Xun Zhan Electronic Technology), part: 0xe200, ver:
load
                                       0x1)
                                        Loading section .init, size 0x6c lma 0x10000000
set pc = 0x10000000
                                        Loading section .text, size 0x260 lma 0x1000006c
                                       Start address 0x10000000, load size 716
b main
                                       Transfer rate: 99 KB/sec, 358 bytes/write.
                                       Breakpoint 1 at 0x100001a8: file main.c, line 9.
```

#### **RAM Modification**

Modified RAM architecture to implement byte-select functionality

- Changed 32-bit RAM to 4 combined 8-bit RAMs.
- > Added a Verilog model for SD card simulation for pre- and post-layout verification.













**Pre-Simulation** 

**FPGA Validation** 

# **Digital IC Design Process**



# **Peripheral Post-Simulation**

• SPI Post-simulation is normal (The MISO signal shows 'X' (unknown state) because the SD program only uses a portion of the 32KB space; the remaining unused sections exhibit 'X')



Added the AT24C32 simulation model. Post-simulation output is normal



Timer & ADC-Eithernet: the simulation results are normal

```
=== Starting Timer Module Test Suite =====
                                                                                                                                            Verdi* : Begin traversing the scope (tinyriscv_soc_tb_sd), layer (0).
 unning Test: Basic Interrupt...
                                                                                                                                           *Verdi* : fsdbDumpon - All FSDB files at 0 ns.
                                                                                                                                                              200] Test starting...
                                                                                                                                                              200] --- Starting Split RAM Fast Load ---
Running Test: Register Read/Write...
                                                                                                                                                              200] Using base file path: source/testbench/my_adc/my_adc
 -> PASS
                                                                                                                                                              200] --- Split RAM Fast Load Complete --
Running Test: Counter Increment...
                                                                                                                                                              200] Reset released. CPU and automated ADC test start.
                                                                                                                                                             200] UART Monitor: Started. Watching tx_pin.
200] UART Monitor: CLK_FREQUENCY= 50000000, BAUD_RATE= 115200
 -> PASS
 unning Test: Polling Mode...
 -> PASS
tunning Test: Stop and Reset...
                                                                                                                                             stination IP set to: 192.168.185.240 (0xC0A8B9F0)
                                                                                                                                             P Config Reg (0x10) written with: 0x00030400
                                                                                                                                            DP Config Reg (0x10) read back: 0x00030400
Running Test: Byte Enables...
                                                                                                                                            DP Config Reg (0x10) written with: 0x00030400
                                                                                                                                           UDP Config Reg (0x10) written with: 0x00030400
UDP Config Reg (0x10) written with: 0x00030400
UDP Config Reg (0x10) written with: 0x00030400
UDP Config Reg (0x10) written with: 0[ 35000200] Simula
Sfinish called from file "source/testbench/Auto_test.v", line 207.
 -> WARNING: Byte-write to timer registers is not supported by hardware. Test skipped
  -> PASS
                                                                                                                                                                                                  350002001 Simulation finished after sending some packets.
Running Test: Boundary Value (VALUE=0)...
                                                                                                                                             inish at simulation time
                                                                                                                                                                                    35000200
 -> PASS
                                                                                                                                                    VCS Simulation Report
                                                                                                                                             me: 35000200 ns
 === Test Suite Finished =====
                                                                                                                                            PU Time: 1314.340 seconds;
                                                                                                                                                                                 Data structure size: 25.5Mb
Result: 7 / 7 tests passed.
                                                                                                                                             U time: 4.425 seconds to compile + .464 seconds to elab + .408 seconds to link + 1314.384 seconds in simulation
  L TESTS PASSED!
                                                                                                                                              di KDB elaboration done and the database successfully generated: 0 {\sf error}({\sf s}), 0 {\sf warning}({\sf s})
                                             △ ▼ -23,513,940 x lns Q Q № By: F ▼  Goto: G1 ▼
```

### **Core Post-Simulation**

➤ Used official RISC-V test suite,All instructions passed the tests, with some results shown below.

```
200] Using base file path: source/testbench/core_test/I-DELAY_SLOTS-01.elf
200] --- Split RAM Fast Load Complete ---
200] Reset released. CPU starts execution from ROM.
200] UART Monitor: Started. Watching tx_pin.
200] UART Monitor: CLK_FREQUENCY= 500000000, BAUD_RATE= 115200
300] CPU halted signal detected. Starting verification.
3310] CPU halted signal detected!
3510] Signature range found in RAM: Start Addr=0x10002000, End Addr=0x10002020
```

```
200] Using base file path: source/testbench/core_test/I-ENDIANESS-01.elf
200] --- Split RAM Fast Load Complete ---
200] Reset released. CPU starts execution from ROM.
200] UART Monitor: Started. Watching tx_pin.
200] UART Monitor: CLK_FREQUENCY= 50000000, BAUD_RATE= 115200
300] CPU halted signal detected. Starting verification.
2310] CPU halted signal detected!
2510] Signature range found in RAM: Start Addr=0x10002010, End Addr=0x10002030
```

#### **Batch Testing Process:**

- 1、Modified compile/link files to generate .bin files compatible with current SoC.
- 2. Split .bin files for different instructions into 4 sub-files for the RAMs.
- 3. Batch loaded and ran programs.
- 4. Verified data by checking specific memory addresses defined in source code.

Our SOC works as expected

—, Background Introduction

**R&D Process** 

**=**. Future Plan

# **Development Plan of HERIS-V2**

Migrate to PULP platform's Pulpissimo SoC (Already running on FPGA, passed some program tests)



- Radiation Hardening Design
  - ECC(Error Correction Code)
  - LockStep + ECC
  - Full TMR (Triple Modular Redundancy)

- Freely switchable cores: RI5CY and IBEX
- Autonomous I/O Subsystem (uDMA)
- New Memory Subsystem
- Support for Hardware Processing Engines
- New simple interrupt controller
- Rich peripherals
- Multi-level internal/external bus configuration



Cite by CERN AMAC ASIC

#### **Futrure Plan**

➤ Near-term Goal: Design test board, await chip return. Implement monitoring functions (e.g., temperature, voltage) similar to AMAC/TETRISC. Integrate with LATRIC for dynamic configuration (e.g., DAC)







81 signal, 35 other

AMAC by ATLAS

https://doi.org/10.1088/1748-0221/20/08/P08003

TETRISC SOC by IHP

Pads

https://doi.org/10.1016/j.microrel.2023.115173

- ➤ Long-term Goal: Leverage RISC-V's openness and customizability to build an intelligent edge data processing platform specifically for High-Energy Physics (HEP) experiments
- > Vision: Equip every CEPC ASIC with an "Intelligent Brain"