

## 从设计到流片: 国内高能物理首款RISC-V芯片工程实现

报告人: 崔宇鑫

合作导师: 严琪 研究员

研发成员: 崔宇鑫、陈娇龙、骆首栋

测试团队: 王翰文、张奕晗

时间: 2025-10-24

# 一、背景简介

二、研发历程

三、后续规划

## 面向高能物理的可重构智能SoC (System on Chip)

#### 一、背景与挑战(Why)

- ▶ 先进工艺成本高、设计复杂
  - 先进节点流片迭代价格昂贵
  - 要求首次设计成功,设计验证难度大
- ➤ 传统ASIC局限性明显
  - 功能固定、可配置性弱
  - 难以集成片上算法,更新与调试成本高
  - 无法满足高能物理实验的多样化与快速迭代需求

#### 二、新的设计范式(How)

- ➤ 引入RISC-V的SoC架构,实现软硬件协同
  - 可编程性:通过软件配置实现多应用适配
  - 模块化与IP复用:提升设计协同与效率
  - 标准化互联:确保不同模块兼容性,构建开放生态
  - 智能芯片SoC是ASIC发展的未来趋势





## Risc-V在高能物理的应用(CERN DRD 7.2 相关Risc-V芯片)

为了应对未来高能物理实验需求, CERN在Risc-V的应用主要聚焦于两个核心方向:

- 构建高可靠性的监控与控制系统。
- ➤ 发展智能化的片上数据处理。将Risc-V作为主控核心,结合AI硬件加速模块协同工作,可以在探测器前端 直接对海量数据进行智能分析和事件筛选,提升发现新物理现象的效率。









TriglaV



## RISC-V应用于LATRIC

研究的起步阶段,我们选用入门友好的Tiny-Riscv软核,在OTK上实现一个动态设置的的DAC模块和其他算法



FPMROC示意图



Tiny-RiscV架构图



LATRIC示意图

- 支持RV32IM指令集;
- 采用三级流水线,即取指,译码,执行;
- 可以运行C语言程序;
- 支持JTAG,可以通过openocd在线更新程序;
- 支持中断:
- 支持总线;
- 支持FreeRTOS;
- 支持通过串口更新程序:
- · 容易移植到任何FPGA平台;

# 一、背景简介

二、研发历程

三、后续规划

## 最终SOC架构

- ➤ 核心部分 (Tiny RiscV)
  - RV32IM 指令集
  - 三级流水线
  - CoreMark/MHz = 2.4
- ➤ JTAG 接口
  - 支持OpenOCD
  - 支持GDB 调试启动
- ▶ 外设
  - 4 KB ROM, 32 KB RAM
  - 支持I2C、UART、SPI
  - 集成ADC数据收集模块并通 过UDP协议传输到上位机





- 55 nm 工艺制程
- 工作频率 50 MHz
- 面积 1020 x 1196 um
- 工作电压 1.2 V
- 10月份第一版设计验证完成并已经提交流片
- 正式芯片测试预计于2026年1月回片后开展

## Risc-V程序运行示例

#### C程序源码

int a = 0; // 结果 int b = 10; // 操作数1 int c = 20; // 操作数2

void main() { a = b + c; // 核心操作 }



| 汇编指令                      | 机器码        | 说明                           |
|---------------------------|------------|------------------------------|
|                           |            |                              |
| la x5, a (伪指令)            | (展开为两条指令)  | la: 加载a的地址到x5寄存器 (作为数据段基址)   |
| lw x6, 4(x5) (Load Word)  | 0x0042A303 | 从x5+4地址 (b的位置) 加载值(10)到x6寄存器 |
| lw x7, 8(x5) (Load Word)  | 0x0082A383 | 从x5+8地址 (c的位置) 加载值(20)到x7寄存器 |
| add x6, x6, x7 (Add)      | 0x00730333 | 将x6和x7的值相加,结果(30)存回x6        |
| sw x6, 0(x5) (Store Word) | 0x0062A023 | 将x6寄存器的值(30)存到x5+0地址 (a的位置)  |





0x10008008:0x00730333

0x1000800C:0x0062A023

译码



执行

| 字段 | funct7       | rs2          | rs1          | funct3       | rd          | opcode     |
|----|--------------|--------------|--------------|--------------|-------------|------------|
| 位域 | [31:25] (7位) | [24:20] (5位) | [19:15] (5位) | [14:12] (3位) | [11:7] (5位) | [6:0] (7位) |
| 用途 | 功能码7 (区分指令)  | 源寄存器2        | 源寄存器1        | 功能码3 (区分指令)  | 目的寄存器       | 操作码 (指令类型) |
|    | 0000000      | 00111        | 00110        | 000          | 00110       | 0110011    |

## 添加I2C主机模块

I2C主机选择开源网站Opencores (https://opencores.org/)提供的较为成熟版本,并对主机做一层封装挂载于总线

## 主机状态转移图



#### 仿真程序图



## I2C模块FPGA验证

#### I2C使用开发板上自带的AT24C64 EEPROM 存储芯片验证模块功能

#### C程序



#### ILA调试结果



#### UART输出结果



## 优化并测试SPI主机模块

更换原有SPI模块为更成熟的SPI主机模块 (https://github.com/nandland/spi-master), 并添加一层封装,使用基于SPI协议的W25Q64 存储芯片,在FPGA上成功验证模块功能

#### W25Q64



#### UART输出结果



#### ILA结果



## 添加ADC-Eithernet 模块

#### 接受串行输入数据并进行处理后通过UDP协议传输至上位机



#### 上板验证

#### 初始化配置



## 优化启动方式

Tiny-RiscV(Bram)设计于FPGA中使用,Rom模块实际为Block-RAM,大小为32 KB。只能通过JTAG启动:

- ▶ 缺少自启动的方式,可靠性、便利性不足;
- ➤ ROM面积开销过大;
- 优势在于无需外部存储器;

考虑到器件的通用性和便携性,我们考虑基于SPI协议的SD卡作为外部存储器,ROM内为boot引导程序,上电后从SD卡中读取应用程序,大小由32 KB 改为 4 KB (非调试版本2 KB)



#### SD卡测试输出结果



## BOOT引导程序设计

## Boot引导程序: 当前RAM大小为16k,程序可用14k,最高位2k用于boot引导程序存放变量,加入UART输出信息用于调试

#### lds链接文件

```
* bootloader.lds (Corrected Final Version)
 * Defines separate RAM regions to guarantee isolation. This version uses
 * direct calculation within the MEMORY block for maximum compatibility.
OUTPUT ARCH( "riscv" )
ENTRY(_start_bootloader)
/* Define constants for easy modification */
RAM TOTAL LENGTH = 16K:
BOOTLOADER RAM SIZE = 2K; /* Reserve 2KB at the top of RAM for the bootloader */
MEMORY
                    : ORIGIN = 0x000000000, LENGTH = 4K
  /* RAM available for the application. Its length is calculated directly. */
  app_ram (wxa!ri) : ORIGIN = 0x10000000, LENGTH = RAM_TOTAL_LENGTH - BOOTLOADER_RAM_SIZE
  /* RAM reserved for bootloader's private use. Its origin is calculated directly. */
  boot ram (wxa!ri) : ORIGIN = 0x10000000 + (RAM TOTAL LENGTH - BOOTLOADER RAM SIZE), LENGTH = BOOTLOADER RAM SIZE
SECTIONS
  /* Place code and read-only data in ROM */
  .text :
   KEEP (*(SORT_NONE(.init)))
    *(.text .text.*)
    *(.rodata .rodata.*)
  /* Place the bootloader's .bss section into its reserved RAM region */
    . = ALIGN(4);
    bss start = .;
    *(.bss .bss.*)
    *(COMMON)
    . = ALIGN(4);
    __bss_end = .;
  } >boot ram
  /* The stack pointer is initialized to the top of the bootloader's RAM region */
  _stack_start = ORIGIN(boot_ram) + LENGTH(boot_ram);
```

#### start\_s汇编

```
1 // bootloader/bootloader start.S
      .globl _start_bootloader
     start bootloader:
        // 1. Initialize the stack pointer to the top of the main RAM.
         // The linker script provides the ' stack start' symbol.
         la sp, _stack_start
         // Ontional: Clear the .hss section for the hootloader.
         // For a minimal bootloader, this step can often be skipped if you know
         // your variables don't rely on being zero-initialized. But it's good practice.
         la t0, _bss_start
         la t1, __bss_end
      .L clear bss boot:
         beq t0, t1, .L_clear_bss_boot_done
         sw zero. (t0)
         addi t0, t0, 4
         i .L clear bss boot
      .L_clear_bss_boot_done:
22
         // 2. Jump to the C entry point of the bootloader
23
         jal ra, boot_main_c
25
         // If boot_main_c returns, hang here.
         j .L_hang
```

#### boot主程序

```
10 // ... (配置宏定义保持不变) ...
11 #define APP START BLOCK ON SD CARD 0 // 使用方案A, 从块0读取
    #define APP_RUN_ADDR_ON_CHIP_RAM 0x10000000
    #define ON CHIP RAM SIZE BYTES (14 * 1024)
    #define MY_SD_CARD_SPI_CHANNEL
     typedef void (*app_entry_t)(void);
     const app_entry_t app_entry = (app_entry_t)APP_RUN_ADDR_ON_CHIP_RAM;
     Explain code | Complete comments | Locate code bugs | Generate unit tests | Code Review | Close
     void boot_main_c() {
        // STEP 1: 初始化UART,这是我们唯一的调试窗口
        // 注意: 这里的uart init()必须能够独立工作,不依赖于主程序中的任何东西。
        // 发送一个启动信号,如果能在串口看到这个,说明Bootloader至少启动了
        DEBUG PRINTF("\n\n--- RISC-V Bootloader Started ---\n");
        DEBUG_PRINTF("Initializing SD card on SPI channel %d...\n", MY_SD_CARD_SPI_CHANNEL);
        if (sd card init(MY SD CARD SPI CHANNEL) != 0) {
            DEBUG PRINTF("!!! FATAL: SD Card initialization failed. Halting. !!!\n");
            while(1); // 如果失败, 打印信息并死循环
        DEBUG PRINTF("SD Card initialized successfully.\n"):
        // STEP 3: 计算需要读取的块数
        uint32_t num_blocks_to_read = (ON_CHIP_RAM_SIZE_BYTES + SD_BLOCK_SIZE - 1) / SD_BLOCK_SIZE;
        DEBUG_PRINTF("Target RAM size: %d bytes. Need to read %d blocks from SD card.\n", ON_CHIP_RAM_SIZE_BYTES, num_blocks_to_read);
        DEBUG PRINTF("Starting to copy data from SD:@0x%X to RAM:@0x%X...\n", APP START BLOCK ON SD CARD, APP RUN ADDR ON CHIP RAM);
        for (uint32_t i = 0; i < num_blocks_to_read; i++) {
           uint32_t source_block = APP_START_BLOCK_ON SD CARD + i;
            uint8_t* destination_address = (uint8_t*)APP_RUN_ADDR_ON_CHIP_RAM + (i * SD BLOCK SIZE);
            // 可以在这里加一个简单的进度指示
            if ((i % 4) == 0) { // 每4个块打印一个点
                DEBUG PRINTF(".");
            if (sd card read block(source block, destination address) != 0) {
                DEBUG PRINTF("\n!!! FATAL: Failed to read block %d. Halting. !!!\n", source block);
                while(1); // 读取失败, 打印信息并死循环
        DEBUG PRINTF("\nData copy complete.\n");
        // STEP 5: 准备跳转
        DEBUG_PRINTF("Bootloader finished. Jumping to application at 0x%X...\n", app_entry);
        // 在跳转前可以加一个极短的延时,确保串口信息都发出去了
        for(volatile int i=0; i<10000; ++i);
        // 跳转
        app_entry();
        // 如果程序能执行到这里,说明跳转失败了
        DEBUG PRINTF("!!! FATAL: Application returned to bootloader. Halting. !!!\n");
```

## Boot引导程序相关硬件代码修改

stall 这几个内部信号决定。没有任 何来自外部调试模块的直接输入。

```
always @ (posedge clk or negedge rst_n) begin
   // 复位
   if (!rst n) begin
       pc <= `CPU RESET ADDR;
       pc prev <= 32'h0;
   // 冲刷
   end else if (flush i) begin
       pc <= flush addr i;
   // 暂停, 取上一条指令
   end else if (stall) begin
       pc <= pc prev;
   // 取下一条指令
   end else begin
       pc <= pc + 32'h4;
       pc prev <= pc;
```

end

end

1、取指模块,由 rst n, flush i, 2、JTAG模块输入:在jtag dm(debug module),根据RISC-V调试规范, 这通常是通过写一个特殊的CSR寄存器,叫做 dpc (Debug Program Counter)。新增发出一个"PC写请求"信号 和新的PC值

```
end else begin
   // when write dpc, we reset cpu here
   if (data[15:0] == DPC) begin
       //dm reset req <= 1'b1; <
                               // <<-- 新增: 发起PC写请求
       itag pc we <= 1'b1;
       jtag pc wdata <= data0; // <<-- 新增: 要写入的PC值来自DATA0寄存器
       dm halt req <= 1'b0;
       dmstatus <= {dmstatus[31:12], 4'hc, dmstatus[7:0]};</pre>
   end else if (data[15:0] < 16'h1020) begin
       dm_reg_we <= 1'b1;
       dm reg wdata <= data0;
```

当JTAG尝试写DPC时, 当前硬件的反应不是去 更新CPU的PC,而是拉 高了一个复位请求信号

#### 修改后正常通过GDB启动

3、复用一条现有的、功能相似的通 路, (flush 的本意就是"冲刷流水线, 并从一个新的地址重新开始取指), 让JTAG模块"借用" flush 机制

```
// 当JTAG写入PC时,使用JTAG提供的新地址;否则,使用正常的跳转地址。
assign flush addr_o = jtag_pc_we_i ? jtag_pc_wdata_i : jump_addr_i;
// 当发生正常跳转、CLINT暂停或JTAG写入PC时,都需要冲刷流水线。
assign flush o = jump assert i | stall from clint i | jtag pc we i;
```



## C程序配套文件修改及FPGA上板验证

C程序根据引导程序的配置,使用配套的lds链接脚本和start\_s脚本编译,并通过SD卡和GDB两种启动方式测试

方式一: ROM+SD卡



ROM + SD卡正常启动输出的调试信息

方式二: GDB调试启动

```
For help, type "help".
# file: debug.gdb
                                       Type "apropos word" to search for commands related to "word"...
                                       Reading symbols from .\gpio...
target remote localhost:3333
                                       0x00000000 in ?? ()
monitor reset halt
                                       JTAG tap: riscv.cpu tap/device found: 0x1e200a6f (mfg: 0x537 (Wuhan Xun Zhan Electronic Technology), part: 0xe200, ver:
                                       0x1)
load
                                       Loading section .init, size 0x6c lma 0x10000000
set pc = 0x10000000
                                       Loading section .text, size 0x260 lma 0x1000006c
b main
                                       Start address 0x10000000, load size 716
                                       Transfer rate: 99 KB/sec, 358 bytes/write.
                                       Breakpoint 1 at 0x100001a8: file main.c, line 9.
```

## RAM修改

当前的架构需要实现字节选功能,即能只改变32位中的8位,而厂商提供的SRAM可能不支持该功能,会导致部分指令多一个周期,因此将32位RAM改为4个8位RAM组合,并新增SD仿真的Verilog模型用于前后仿验证











RISC-V Bootloader Started --tializing SD card on SPI channel 1.. 6683650000] SD\_MODEL: R1 response sent, returning to IDLE 6734110000] SD\_MODEL: R1 response sent, returning to IDLE. 6754010000] SD\_MODEL: R1 response sent, returning to IDLE. 6775270000] SD\_MODEL: R1 response sent, returning to IDLE. 6795170000] SD MODEL: R1 response sent, returning to IDLE. 6816430000] SD\_MODEL: R1 response sent, returning to IDLE 6836330000] SD\_MODEL: R1 response sent, returning to IDLE 6857590000] SD\_MODEL: R1 response sent, returning 6877490000] SD\_MODEL: R1 response sent, returning to IDLE. 6898750000] SD\_MODEL: R1 response sent, returning to IDLE. 6918650000] SD\_MODEL: R1 response sent, returning to IDLE. 6939910000] SD\_MODEL: R1 response sent, returning to IDLE 6959810000] SD\_MODEL: R1 response sent, returning to IDLE. Card on SPI CH-1 initialized successfully Card initialized successfully. arget RAM size: 30720 bytes. Need to read 60 blocks from SD card. tarting to copy data from SD:@0x0 to RAM:@0x10000000... Data copy complete. Bootloader finished. Jumping to application at 0x10000000. rinish called from file "rtl/testbench/tb\_sd.v", line 71. finish at simulation time 1500002000000° VCS Simulation Report me: 150000200000 ps 68.590 seconds; Data structure size: 0.2Mb time: .828 seconds to compile + .199 seconds to elab + .185 seconds to link + 68.632 seconds in simulation i KDB elaboration done and the database successfully generated: 0 error(s), 0 warning(s)

Vivado仿真 FPGA验证 VCS仿真

## 数字IC设计过程



### 外设后仿

· SPI后仿正常,MISO出现x态因为SD程序只使用了32KB中的一部分,其余未使用部分会出现x



• I2C: 添加AT24C32仿真模型,后仿输出正常



• Timer: 计时器后仿结果正常

```
==== Starting Timer Module Test Suite =====
Running Test: Basic Interrupt...
Running Test: Register Read/Write...
 -> PASS
Running Test: Counter Increment...
Running Test: Polling Mode...
 -> PASS
Running Test: Stop and Reset...
 -> PASS
Running Test: Byte Enables...
 -> WARNING: Byte-write to timer registers is not supported by hardware. Test skipped.
Running Test: Boundary Value (VALUE=0)...
  -> PASS
==== Test Suite Finished =====
Result: 7 / 7 tests passed.
ALL TESTS PASSED!
```

• ADC传输模块仿真结果正常

```
Verdi* : Begin traversing the scope (tinyriscv_soc_tb_sd), layer (0).
Verdi* : End of traversing.
 Verdi* : fsdbDumpon - All FSDB files at 0 ns.
                   200] Test starting..
                   200] --- Starting Split RAM Fast Load ---
200] Using base file path: source/testbench/my_adc/my_adc
                   200] --- Split RAM Fast Load Complete --
                   200] Reset released. CPU and automated ADC test start.
                   200] UART Monitor: Started. Watching tx pin.
                   200] UART Monitor: CLK FREQUENCY= 50000000, BAUD RATE=
   System Initialization --
 pard IP set to: 192.168.185.110 (0xC0A8B96E)
Destination IP set to: 192.168.185.240 (0xC0A8B9F0)
DP Config Reg (0x10) written with: 0x00030400
JDP Config Reg (0x10) read back: 0x00030400
JDP Config Reg (0x10) written with: 0x00030400
JDP Config Reg (0x10) written with: 0x00030400
JDP Config Reg (0x10) written with: 0x00030400
   Config Reg (0x10) written with: 0[
                                                         35000200] Simulation finished after sending some packets.
  inish called from file "source/testbench/Auto test.v", line 207.
  inish at simulation time
          VCS Simulation Report
ime: 35000200 ns
CPU Time: 1314.340 seconds;
                                       Data structure size: 25.5Mb
PU time: 4.425 seconds to compile + .464 seconds to elab + .408 seconds to link + 1314.384 seconds in simulation
Verdi KDB elaboration done and the database successfully generated: 0 error(s), 0 warning(s)
```



## 核心后仿批量测试

#### 批量测试流程:

- 1、修改编译链接文件,根据当前SOC设计重新将.S源码编译为.bin文件
- 2、将不同指令的.bin文件分割为4个RAM子文件
- 3、批量加载并运行程序
- 4、根据源码特定的地址索引寻找验证数据的起始和终止地址逐一验证结果
- ➤ Core后仿: 使用Riscv官方测试集

初步测试大多数指令都通过了测试, 部分失败,

部分失败源于程序结束后,RAM在输入存在x时会清空所有内容, 选择会适运行时间。结果正常加下

选择合适运行时间,结果正常如下

```
200] Using base file path: source/testbench/core_test/I-DELAY_SLOTS-01.elf
200] --- Split RAM Fast Load Complete ---
200] Reset released. CPU starts execution from ROM.
200] UART Monitor: Started. Watching tx_pin.
200] UART Monitor: CLK FREQUENCY= 50000000, BAUD RATE=
300] CPU halted signal detected. Starting verification.
3310] CPU halted signal detected!
3510] Signature range found in RAM: Start Addr=0x10002000, End Addr=0x10002020
200] Using base file path: source/testbench/core_test/I-ENDIANESS-01.elf
200] --- Split RAM Fast Load Complete ---
 200] Reset released. CPU starts execution from ROM.
 200] UART Monitor: Started. Watching tx_pin.
 200] UART Monitor: CLK_FREQUENCY= 50000000, BAUD_RATE=
300] CPU halted signal detected. Starting verification.
2310] CPU halted signal detected!
2510] Signature range found in RAM: Start Addr=0x10002010, End Addr=0x10002030
```

```
1 I-ADD-01: PASSED
 2 I-ADDI-01: PASSED
 3 I-AND-01: PASSED
 4 I-ANDI-01: PASSED
 5 I-AUIPC-01: PASSED
 6 I-BEO-01: PASSED
 7 I-BGF-01: PASSED
 8 I-BGEU-01: PASSED
 9 I-BLT-01: PASSED
10 I-BLTU-01: PASSED
11 I-BNE-01: PASSED
12 I-DELAY_SLOTS-01: FAILED (Signature mismatch)
13 I-EBREAK-01: PASSED
14 I-ECALL-01: PASSED
15 I-ENDIANESS-01: FAILED (Signature mismatch)
16 I-IO-01: PASSED
17 I-JAL-01: PASSED
18 I-JALR-01: PASSED
19 I-LB-01: PASSED
20 I-LBU-01: PASSED
21 I-LH-01: PASSED
22 I-LHU-01: PASSED
23 I-LUI-01: PASSED
24 I-LW-01: PASSED
25 I-MISALIGN_JMP-01: FAILED (Signature mismatch)
26 I-MISALIGN_LDST-01: FAILED (Signature mismatch)
27 I-NOP-01: PASSED
28 I-OR-01: PASSED
29 I-ORI-01: PASSED
30 I-RF size-01: PASSED
31 I-RF width-01: PASSED
32 I-RF x0-01: PASSED
33 I-SB-01: PASSED
34 I-SH-01: PASSED
35 I-SLL-01: PASSED
36 I-SLLI-01: PASSED
37 I-SLT-01: PASSED
38 I-SLTI-01: PASSED
39 I-SLTIU-01: PASSED
40 I-SLTU-01: PASSED
41 I-SRA-01: PASSED
42 I-SRAI-01: PASSED
43 I-SRL-01: PASSED
44 I-SRLI-01: PASSED
45 I-SUB-01: PASSED
46 I-SW-01: PASSED
47 I-XOR-01: PASSED
48 I-XORI-01: PASSED
```

# 一、背景简介

二、研发历程

三、后续规划

## 后续规划

➤ 更换SOC架构,使用PULP平台的pulpissimo (已在FPGA上正常运行并通过部分程序测试)



- ▶ 抗辐照设计
  - 时间冗余 + 重要模块三模冗余 + ECC纠错
  - LockStep + ECC
  - 完整三模冗余

- 可任意切换RI5CY和IBEX
- · 自主输入/输出子系统(uDMA)
- 新型内存子系统
- 支持硬件处理引擎(HWPEs)
- 新型简单中断控制器
- 丰富的外设和配套的SDK
- 内外部多级总线配置



## 后续规划

➤ 近期目标:设计测试版,等待回片,实现类似AMAC、TETRISC的温度、电压等监控,同时和LATRIC 整合实现动态配置例如DAC







AMAC by ATLAS

https://doi.org/10.1088/1748-0221/20/08/P08003

TETRISC SOC by IHP

https://doi.org/10.1016/j.microrel.2023.115173

▶ 远期目标:利用RISC-V的开放与可定制性,打造专为高能物理(HEP)实验设计的端侧智能数据处理平台。
未来帮助高能所设计的每块ASIC集成一颗"智能大脑"