

# The development of the CEPC common BEE

Jun Hu
On behalf of the CEPC Elec study team



## Content

- CEPC common BEE structure
- R&D efforts and results
- Summary

## Global framework of the CEPC Elec system

**Figure 11.1** 



#### Common BackEnd Electronics

- Configurable for individual subsystems
- Serving as a bridge between FEE and TDAQ
- Possible based on xTCA architecutre
- All connections utilize optical fibers.



**Common Platform** 

## **Common BEE functionality**

- Receive raw data streams from FEE
- Control interface with FEE
- Data processing & Hit info generation
- Final trigger decision processing
- Data caching & Packaging
- Data transmission to DAQ
- DCS communication
- Clock synchronization & distribution



## Interface with FEE

- Adapt to the specific FEE data protocol
  - Compatible with GBT-FPGA protocol, simplifying the development
  - Provides 16 channels to FEE



64 bits Downlink frame data format

256 bits Uplink frame data format

|         |               |               |                     |             | 1 10 4    | . – |
|---------|---------------|---------------|---------------------|-------------|-----------|-----|
| {H(3),E | BDC(1),H(2),E | BDC(0),H(1),l | Data[31:0]          | FEC[23:0]   |           |     |
| 2 bits  | 2 bits        | 2 bits        | 10 bits             | 192 bits    | 48 bits   |     |
| H[1:0]  | BDC[1:0]      | UDC[1:0]      | {8'b0,DownBDC[1:0]} | Data[191:0] | FEC[47:0] | 5   |

### **BEE boards number**

**Table 11.13** 

| Detector    | Max Data Rate/<br>Fiber (Gbps) | Fibers/<br>Module | Fibers      | BEEs     | Crates |
|-------------|--------------------------------|-------------------|-------------|----------|--------|
| VTX         | 8                              | 1–2               | 96          | 6        | 1      |
| TPC         | 0.1                            | 1                 | 496         | 32       | 4      |
| ITK-Barrel  | 2.88                           | 2–3               | 376         | 24       | 2      |
| ITK-EndCap  | 4.4                            | 2                 | 148         | 6        | 1      |
| OTK-Barrel  | 1.4                            | 1                 | 880         | 55       | 4      |
| OTK-EndCap  | 1.4                            | 1–2               | 544         | 34       | 4      |
| ECAL-Barrel | 4.8                            | 2 (4)             | 960 (1,920) | 60 (120) | 6 (12) |
| ECAL-EndCap | 4.8                            | 2 (4)             | 448 (896)   | 28 (56)  | 4(8)   |
| HCAL-Barrel | 0.14                           | 1                 | 5,568       | 348      | 36     |
| HCAL-EndCap | 1.75                           | 1                 | 3,072       | 192      | 20     |
| Muon-Barrel | 0.01                           | 1                 | 24          | 2        | 1      |
| Muon-EndCap | 0.01                           | 1                 | 16          | 1        | -      |
| Total       | -                              | -                 | 12,628      | 788      | 83     |
| (Upgraded)  |                                |                   | (14,036)    | (876)    | (93)   |

The numbers in brackets correspond to the second operational phase, while the regular numbers refer to the baseline case.

Given the low bandwidth requirements of HCAL-Barrel, and Muon subdetector system, a dedicated low-cost BEE solution can be developed to significantly reduce the hardware costs and power consumption.

## Interface with Trigger System



#### Data caching & packaging

- Large DDR memories buffer are deployed to cache over 1 ms of raw data.
- Upon a trigger decision, BEE identifies and extracts the relevant data from the buffer for packaging and readout.

#### Data processing (On-demand)

 Advanced algorithms for noise filtering, signal alignment, linearization, data compression, etc.

#### Hit information generation

- Feature extraction: Time, Charge,
   Position, Counter, etc.
- Provides a 40Gbps data transmission link to Trigger System

#### Trigger Decision processing

L1A(FT), BC0, and orbit signal

## **Interface with Timing System**

**Figure 11.17** 



#### TTC Distributer hardware

- Distribute Timing, Trigger, Control signals to sub-detector BEE.
- Receive feedback status (e.g., FULL, error) from BEE.
- Two operational modes available:
  - Master: Receive from backplane, distribute via fiber.
  - Slave: Receive via fiber, distribute to backplane.

## Clock synchronization module

- System clock is recovered from serial link via BEE's CDR
- New hardware employs Write Rabbit(WR) principle
  - Achieves 3 ps TIE jitter resolution clock
  - Maintains 4 ps clock synchronization stability under temperature variations
  - ~30ps RMS (70ps peak to peak) accuracy across system power cycles









High-precision clock synchronization hardware node



Firmware design

## Interface with DAQ/DCS System

**Table 12.12** 

|          |                               | =                                     |                     |                                                    |
|----------|-------------------------------|---------------------------------------|---------------------|----------------------------------------------------|
| Detector | Readout Data Rate<br>at Higgs | after L1-Trigger (Gbps) at Low Lumi Z | BEE<br>Board Number | Data rate<br>per BEE board<br>(Gbps) at Low Lumi Z |
| VTX      | 1.94                          | 6.14                                  | 6                   | 1.02                                               |
| TPC      | 26.4                          | 57.1                                  | 32                  | 1.78                                               |
| ITK      | 0.317                         | 0.756                                 | 30                  | 0.0252                                             |
| OTK_B    | 0.690                         | 1.62                                  | 55                  | 0.0295                                             |
| OTK_E    | 0.544                         | 1.15                                  | 34                  | 0.0338                                             |
| ECAL_B   | 4.13                          | 8.68                                  | 60                  | 0.145                                              |
| ECAL_E   | 7.10                          | 15.4                                  | 28                  | 0.55                                               |
| HCAL_B   | 0.0448                        | 0.204                                 | 348                 | < 0.01                                             |
| HCAL_E   | 1.54                          | 3.88                                  | 192                 | 0.0202                                             |
| Muon     | < 0.1                         | < 0.1                                 | 3                   | < 0.01                                             |
| Trigger  | -                             | -                                     | 103                 | -                                                  |
| Sum      | 42.7 (5.34 GB/s)              | 95.1 (11.9 GB/s)                      | 891                 |                                                    |



Average readout data rate to DAQ at 120 kHz L1 trigger rate for low Z mode

- The BEE and trigger system are designed with an xTCA architecutre, so data readout to DAQ will be implemented using a network-based approach.
- To achieve high performance, a hardware-based TCP Offload Engine (TOE) will be integrated into the FPGA firmware.

## **BEE R&D hardware/firmware**

**Figure 11.24** 



**Figure 11.23** 



Ziyue Yan. Study on Key Technologies of Readout Electronics for CEPC Vertex Detector Pre-Research. 2024. University of Chinese Academy of Sciences, Doctoral thesis.

- A cost-driven device selection: FPGA XC7VX690T
- Interface: SFP+ X12 + QSFP X3
- Implement real time FPGA based data processing for clustering, hit point searching, and tracking algorithm.

## Fiber loopback test



Due to a layout issue in channel 0, 10 Gbps cannot meet the requirement

12 channels loopback test BER at 10 Gbps



The eye diagram of channel 1 at 10 Gbps rate

## Real-Time data processing algorithm

#### Algorithm for Vertex prototype

- Timestamp Matching: Identifies the set of points with identical timestamps (from the same event)
- Coordinate Sorting: Sorts the point set within each event based on spatial coordinates
- Cluster Edge Detection: Detects clusters within the sorted data and identifies their boundary points
- K-Means Clustering: Computes centroid coordinates for each cluster using machine learning optimization.
- **Position Matching:** Outputs matching results ("1" or "0") for centroid pairs across upper and lower layer clusters.



| Resources | Usage  | Available | Utilization % |
|-----------|--------|-----------|---------------|
| LUT       | 40942  | 433200    | 9.45          |
| LUTRAM    | 1018   | 174200    | 0.58          |
| FF        | 42466  | 866400    | 4.90          |
| BRAM      | 860.50 | 1470      | 58.54         |
| DSP       | 26     | 3600      | 0.72          |
| 10        | 11     | 350       | 3.14          |
| GT        | 7      | 48        | 14.58         |
| BUFG      | 24     | 32        | 75.00         |
| MMCM      | 5      | 20        | 25.00         |



Test setup in BSRF (Beijing Synchrotron Radiation Facility)

## **Algorithm results**



Focus on the Data Compression Ratio & Track Recognition Accuracy



|       |          | Offline  | Online          | Track       | Compressed | Original    | Data        |
|-------|----------|----------|-----------------|-------------|------------|-------------|-------------|
| NO.   | Time/min | Analyzed | <b>Filtered</b> | Recognition | Data       | Data volume | Compression |
|       |          | Tracks   | Tracks          | Accuracy    | Volume/MB  | /MB         | Ratio       |
| 1     | 125      | 21206    | 20403           | 96.21 %     | 15.9       | 5171        | 325.22      |
| 2     | 152      | 22503    | 21749           | 96.65 %     | 26.6       | 8050        | 302.63      |
| 3     | 125      | 27524    | 26449           | 96.09 %     | 26.6       | 6176        | 232.18      |
| 4     | 124      | 25987    | 25279           | 97.28 %     | 28         | 6108        | 218.14      |
| 5     | 262      | 26110    | 25372           | 97.17 %     | 31.8       | 6305        | 198.27      |
| 6     | 156      | 23538    | 22808           | 96.90 %     | 53.5       | 7010        | 131.03      |
| 7     | 211      | 34919    | 33949           | 97.22 %     | 37.3       | 8590        | 230.29      |
| 8     | 146      | 25738    | 24835           | 96.49 %     | 39         | 6780        | 173.85      |
| 9     | 243      | 18412    | 17951           | 97.50 %     | 30.5       | 6226        | 204.13      |
| 10    | 272      | 29010    | 28265           | 97.43 %     | 52.2       | 7265        | 139.18      |
| 11    | 270      | 24737    | 24070           | 97.30 %     | 54.2       | 7590        | 140.04      |
| Total | 2086     | 279684   | 271130          | 96.94 %     | 395.6      | 75271       | 190.27      |

- 11 tests were conducted, lasting 2086 minutes (34.8 hours) in total.
- The overall data compression ratio of 190.3:1.
- The overall average online track recognition accuracy reached 96.9%.

Event selection and noise filtering effectively suppress background.

## Summary

- The baseline schemes for Higgs and LowZ modes have been established.
- Some functionalities have been implemented through hardware and firmware development.
  - Preliminary research on signal processing algorithms of FPGA has been conducted for specific sub-detectors.
  - Initial integration and testing with DAQ system have been conducted.

Development of the advanced HighZ mode continues, requiring further architectural refinement and testing.



# Thank you for your attention!



## **Backup scheme of the framework**

- The proposed framework was based on the estimated background rate of all sub-det.
- Background rate indicated the data link capability can manage the Phase I operation of Higgs & Low LumiZ in the first ten years
  - Shielding optimization ongoing to suppress the background
  - High LumiZ situation is still not fully understood, but in the 2<sup>nd</sup> ten years
    - Replaceable detectors e.g. VTX, ITK... can be upgraded with new chips with intel-compression and advanced trigger in case
    - Unreplaceable detectors e.g. ECAL, HCAL can be upgraded with more fiber channels



The conventional trigger scheme can serve as a backup plan, with sufficient on-detector data buffering and reasonable trigger latency, the overall data transmission rate can be controlled.

Related to the IDRC recommendation:

R.2&3 (Trigger)

## Detailed design on common BEE

**Figure 11.58** DDR Power DAQ/DCS SO-DIMM management **FPGA** Memory DCS registers controller QSFP TCP/IP+UDP Data Decode Algorithm TOE + Assembly **GTH Transceiver** TTC protocal **ATCA** SFP+ Reference clk backplane **Jitter** 

**Figure 11.59** 



Data aggregation and processing board Prototype for Vertex detector

The back-end Card structure

Trigger & Clock

- Routing data between the optical link of front-end and the highspeed network of DAQ system.
- Connect to TTC and obtain synchronized clock, global control, and fanout high performance clock for front-end.

Cleaner

- Real-time data processing, such as trigger algorithm and data assembly.
- On-board large data storage for buffering.
- Preference for Xilinx Kintex UltraScale series due to its costeffectiveness and availability.

|                   |                                      |                                  |                                            | <u> 19016 11.18</u>                        |                  |  |
|-------------------|--------------------------------------|----------------------------------|--------------------------------------------|--------------------------------------------|------------------|--|
|                   | KC705<br>(XC7K325<br>T-<br>2FFG900C) | (XCKU040<br>-<br>2FFVA115<br>6E) | VC709<br>(XC7VX69<br>0T-<br>2FFG1761<br>C) | VCU108<br>(XCVU095<br>-<br>2FFVA210<br>4E) | XCKU115          |  |
| Logic<br>Cells(k) | 326                                  | 530                              | 693                                        | 1,176                                      | 1451             |  |
| DSP<br>Slices     | 840                                  | 1920                             | 3,600                                      | 768                                        | 5520             |  |
| Memory<br>(Kbits) | 16,020                               | 21,100                           | 52,920                                     | 60,800                                     | 75,900           |  |
| Transcei<br>vers  | 16(12.5Gb<br>/s)                     | 20(16.3G<br>b/s)                 | 80(13.1Gb<br>/s)                           | 32(16.3Gb<br>/s) and<br>32(30.5Gb<br>/s)   | 64(16.3Gb<br>/s) |  |
| I/O Pins          | 500                                  | 520                              | 1,000                                      | 832                                        | 832              |  |
| Cost              | 2748 (650)                           | 3882(150<br>0)                   | 8094                                       | 7770                                       |                  |  |

Table 11 10

- A cost-driven device selection: FPGA XC7VX690T
- Interface: SFP+ 10Gbps X12 + QSFP 40Gbps X3
- Implement real time FPGA based machine learning for clustering, hit point searching, and tracking algorithms