Welcome to the SMAesH Challenge

The SMAesH challenge is a side-channel analysis contest on SMAesH, a masked FPGA implementation of the AES. Using the public profiling dataset and the open-source hardware design, the goal is to mount a key-recovery attack using as few traces as possible.

The SMAesH challenge was the CHES2023 challenge.

Get started now!

The winners were announced at the CHES2023 rump session slides, but the challenge continues: see the leaderboard and new submission instructions!

N.B.: We maintain a list of all attacks, including those that are not proper submissions (e.g., do not follow the correct format or do not respect the rules): please send them to us.

Key features

Open-source target: open-source AES implementation running on widespread SCA board: CW305 and Sakura-G with reproducible acquisition setup.

Public datasets: 16 million traces with random key and plaintext, 16 million traces with fixed key and random plaintext, covering 2 AES rounds (~4,500 samples per trace).

Simple example attack (that works) as a starting point: you can easily start by improving it.

Profiling challenge: the profiling dataset for the Artix-7 target (CW305) contains the values of all the shares in the executions, while the one for the Spartan-6 (Sakura-G) contains only the unmasked values.

Attack success criterion: rank of the key below \(1 \text{BTC-H}\cdot\mathrm{s}\), defined as the number of blocks hashed by the Bitcoin mining network in 1 second (fixed to \(2^{68}\) for the duration of the challenge).

Efficient implementation good latency vs area trade-off with 32-bit wide masked datapath (4 Sboxes in parallel).

Arbitrary-order masking: if you completely break the first-order design, there is more to come!

Attack ideas

The demo submission implements a textbook attack against the AES S-box output that should be easy to improve. We next share a few ideas of alternative strategies that could be used for this purpose:

  • Try to re-use the demo submission with fewer traces! This is a quick and efficient way to gain points for early candidates.
  • Exploit more leakage points: the demo targets the shares of the S-box output which lies in the combinatorial logic, but the masked states in the bitslice S-box or the output of MixColumns leak more.
  • Profile larger target intermediate values: for example, the masked states in the bitslice S-box are larger than 8-bit despite they only depend on 8 key bits, and the output of MixColumns naturally depends on 32 key bits.
  • Perform multi-target and multivariate attacks: there are multiple leaking operations in the implementations, which can be exploited with advanced statistical attacks (e.g., analytical strategies or machine learning).
  • Try different profiling strategies: for low number of shares, directly profiling with a machine learning model without taking advantage of the shares' knowledge could be possible.
  • Perform cross-dataset transfer learning: we provide more profiling power for the Artix-7 than for the Spartan-6.
  • Exploit the leakage of the key scheduling algorithm.

You can also have a look at the existing open-source attacks!

Timeline

  • 2023-05-08 Challenge launch with Artix-7 target, submission server opens.
  • 2023-07-03 Launch of the Spartan-6 target.
  • 2023-09-1 Submission server closes.
  • 2023-09-10 (at CHES) Award ceremony.
  • 2023-10-19! Full dataset public release. The challenge continues with self-evaluation! See the leaderboard.

Contact information

Organizers

This challenge is organized by the SIMPLE-Crypto Association, a non-profit organization created in order to develop open source cryptographic implementations and to maintain them over time, currently with a strong focus on embedded implementation with strong physical security guarantees.

The contributors involved in this project are Gaëtan Cassiers, Charles Momin and François-Xavier Standaert.

Getting started

A dedicated evaluation framework has been developped for the challenge and is available on Github. This section explains the different steps that should be performed in order to run the demo attack provided. More details about the evaluation framework can be found in the dedicated Framework section.

Installing dependencies

The framework runs with python >= 3.8 and requires the following python tools:

  • venv, part of python's standard library, but not included by in some python installations (e.g., on ubuntu, you might have to apt install python3-venv to get it).
  • pip, also, part of most python installations (but on ubuntu, apt install python3-pip is needed).

Additionally, the demonstration attack depends on

  • Yosys (version 0.25 tested, below 0.10 will likely not work)
  • Verilator (use version 5.006, many other version are known not to work)
  • Make

CAUTION: we highly recommand to install Verilator from the git and to run Verilator in-place (as recommended by the official documentation).

We highly recommend to use the challenge in a unix environment (on windows, use WSL).

Cloning repo

First, clone the challenge framework repository:

git clone https://github.com/simple-crypto/SMAesH-challenge.git

Downloading the datasets

See this page for downloading the datasets.

In the following, we use the variable SMAESH_DATASET as the path to the directory where the downloaded dataset is stored (i.e., the path to the directory smaesh-dataset, which is the directory that contains the directories A7_d2 and S6_d2).

As a final step, format the dataset. This operation must be done a single time on each fixed key dataset (it may last a few seconds). This will generate a new manifest per dataset (manifest_split.json) that will be used by the framework's scripts to evaluate the attack.

# Create the venv for running framework's scripts
cd SMAesH-challenge
python3 -m venv venv-scripts
source venv-scripts/bin/activate # Activate it (adapt if not using bash shell)
pip install pip --upgrade 
pip install -r scripts/requirements.txt
# Run the split_dataset command (here for the Artix-7 only)
python3 scripts/split_dataset.py --dataset $SMAESH_DATASET/A7_d2/fk0/manifest.json 
# Leave de venv-scripts virtual environment
deactivate

Running our example attack: profiling, attack, evaluation

The following steps allow to run the demo attack and to evaluate it.

  1. First, we move to the cloned framework directory
    cd SMAesH-challenge
    
  2. Then, setup a python virtual environement used for the evaluation, and activate it
    python3 -m venv venv-demo-eval
    source venv-demo-eval/bin/activate
    pip install pip --upgrade 
    
  3. Install the verime dependency (tool for simulating intermediate values in the masked circuit)
    pip install verime
    
  4. In the demo submission directory, build the simulation library with Verime
    cd demo_submission
    make -C values-simulations 
    
  5. Install the python package required to run the attack
    (cd setup && pip install -r requirements.txt)
    
  6. Run the evaluation in itself
    # Profiling step:
    # - uses the dataset vk0
    # - saves the templates in the current directory
    python3 quick_eval.py profile --profile-dataset $SMAESH_DATASET/A7_d2/vk0/manifest.json --attack-case A7_d2 --save-profile .
    
    # Performs the attack using 16777215 traces 
    # - uses the dataset fk0
    # - loads the profile located into the current directory
    # - performs the attack using 524288 traces
    # - saves the keyguess resulting in the file './keyguess-file'
    python3 quick_eval.py attack --attack-dataset $SMAESH_DATASET/A7_d2/fk0/manifest_split.json --attack-case A7_d2 --load-profile . --save-guess ./keyguess-file --n-attack-traces 16777216
    
    # Evaluates the attack based on the result file produced
    # - loads the keyguess file generated with the attack
    # - use the key value of the dataset fk0 as a reference.
    python3 quick_eval.py eval --load-guess ./keyguess-file --attack-case A7_d2 --attack-dataset $SMAESH_DATASET/A7_d2/fk0/manifest_split.json
    

For the demo attack, the evaluation phase is expected to produce the following result on the standard output when the default configuration are used

...
log2 ranks [59.79907288]
number of successes 1

which means that the attacks reduces the rank of the correct key to \( 2 ^ {59.79} \).

By default, the profiling phase implemented uses \( 2^{24}\) traces to build the models, which may result in a significant processing time (it takes about 45 minutes on the reference machine). The attack also runs in abouthe same time on that machine. Reducing the number of traces for both steps will reduce their execution time (at the expense of a worse key rank, of course).

Note: you can run multiple steps at once, as in python3 quick_eval.py profile attack eval ....

Next steps

It's your turn!

Both phases (profiling and attack) are implemented in the profile() and attack() functions in attack.py: tweak these functions to implement your revolutionary attack.

If you get the demo submission to run with fewer traces, you can also try to directly submit it!

The other pages of this website provide more detailed information on how to develop a submission. In particular:

  • Framework details how to use the framework of the challenge to develop, evaluate, package and send a new submission.
  • Rules: see how to get points, and what are the constraints on submitted attacks.
  • Target: acquisition setup used for the different targets.
  • Datasets: content of the datasets.

Have a look at our suggestions and at the SMAesH documentation to get ideas for improved attacks.

Targets

The targets for this challenge are all instantiations of SMAesH on FPGAs. The latter is a 32-bits masked implementation of the AES-128 encryption algorithm. For the challenge, we first instantiate it at the first order security (\(d=2\)).

We have two FPGA targets: the Chipwhisperer CW305 with an Artix-7 Xilinx FPGA and the Sakura-G with a Spartan-6 FPGA.

The challenge for the Artix-7 target is in a fully white-box profiled setting: the full implementation is open-source, including the bitstream. The profiling datasets for this target include the full seed randomness, therefore the complete state of the FPGA is known for the profiling traces. Refer to Artix-7 for detailled explanations.

For the Spartan-6, the profiling setting is more constrained: only the SMAesH core is open-source, and the remaining part of the design is kept secret. The profiling datasets for this target only contain the value of the key and the plaintext. Of course, challenge participants can perform measurement on their own instance of SMAesH on a Sakura-G, or use the Artix-7 dataset as a starting point to build an attack. Refer to Spartan-6 for detailled explanations.

Artix-7

The Artix-7 measurement setup is based on the Chipwhisperer CW305 board from NewAE.

The FPGA bitstream used to perform the acquisitions has been generated using the Xilinx Vivado Toolset (v2022.1 64-bit) and the following modifications have been applied compared to the default toolflow parameters:

  • HDL annotation:
    • attribute DONT_TOUCH set for every module.
    • attribute KEEP_HIERARCHY set for every module.
  • Synthesis parameters:
    • flatten_hierarchy set to none
    • gated_clock_conversion set to off
    • bufg set to 12
    • directive set to Default
    • no_retiming checked
    • fsm_extraction set to auto
    • keep_equivalent_registers checked
    • ressource_sharing set to off
    • no_lc checked
    • no_srlextract checked
  • Implementation parameters:
    • opt_design related:
      • is_enabled unchecked
    • phys_opt_design related:
      • is_enabled unchecked

The provided datasets contain power traces that have been acquired by measuring the voltage drop accross the \( 100 m \Omega \) shunt resistor R27. The low noise amplified signal point X4 is measured by a digital oscilloscope through a SMA connector. An external low noise power supply Keysight E36102B is used in order to avoid the noise generated by the onboard (switching) power supply. In particular, a continuous DC voltage of 1V is provided through the dedicated banana jacks on the board (and the switch SW1 is configured accordingly).

The digital oscilloscope used is a PicoScope 6242E. The phase of the clocks used by the target FPGA and the oscilloscope are matched in order to reduce the level of noise induced by clock jitter. In particular, the onboard CDCE906 PLL module is configured to generate two clocks signals based on the 12MHz onboard crystal. The first is the FPGA clock, running at 1.5625MHz that is generated by the PLL1 and fed to the port N13 on the FPGA. The second is a 10MHz clock signal, generated by the PLL0 and fed routed to the X6 SMA connector. The later is then forwarded to the PicoScope 10MHz clock reference input port. A single measurement channel (channel A) is used to perfom the measurement and the trigger signal is fed from the onboard test point TP1 to the oscilloscope AUX trigger port.

The power traces are sampled at 5GHz (resulting in 3200 samples per target clock cycle) using a vertical resolution of 10 bits. Two steps of pre-processing are applied before storing the measurements: first, a re-alignement algorithm based on a maximum correlation is used in order to improve the SNR. In particular, the shift maximising the correlation of each trace with a reference trace is computed and applied for each collected trace. Second, sequential time samples are aggregated (i.e., summed) in order to reduce the dataset storing size. The practical reduction ratio equals 16, resulting in a practical sampling frequency of 312.5MHz with a vertical resolution of 14 bits.

The Vivado project used to generate the bitstream of the target FPGA is available on github. The acquisition setup relies on a tweaked version of the chipwhisperer CW305 firmware. The latter is also available on github. In addition to adding genericity to the acquisition configuration, it has also been modified to be able to acquire multiple datasets while limiting the biases that can occur during long measurement campaigns.

Spartan-6

The Spartan-6 measurement setup is based on the Sakura-G board from the Satoh Lab.

The provided datasets contain power traces that have been acquired by measuring the voltage drop accross a \( 2 \Omega \) shunt resistor placed at the JP2 connector. The amplified signal point J3 is measured by a digital oscilloscope through a SMA connector. An external low noise power supply voltage Keysight E36102B is used in order to provide a continuous DC voltage of 5V at the dedicated connector CN1/EXT5V (and the power switch is configured accordingly).

The digital oscilloscope used is a PicoScope 6242E. The phase of the clocks used by the target FPGA and the oscilloscope are matched in order to reduce the level of noise induced by clock jitter. In particular, the waveform generation feature enabled by the oscilloscope is used to generate a clock signal of 1.5625MHz. The latter is forwarded to the board target FPGA through a SMA connector. A single measurement channel (channel A) is used to perform the measurement and the trigger signal is fed from the GPIO connected to the target FPGA.

The power traces are sampled at 1.25GHz (resulting in 800 samples per target clock cycle) using a vertical resolution of 12 bits. As a pre-processing, sequential time samples are aggregated (i.e., summed) in order to reduce the dataset storing size. The practical reduction ratio equals 4, resulting in a practical sampling frequency of 312.5MHz with a vertical resolution of 14 bits. It results that the Spartan-6 traces have a temporal configuration similar to the one collected for the Artix-7.

Dataset

For each target, we have acquired 3 datasets:

  • training dataset,
  • validation dataset,
  • test dataset.

The training and validation datasets are public, while the test dataset is used to evaluate the submissions and kept private by the organizers.

All datasets correspond to a correct usage of the SMAesH core: for each trace, the sharing of the key and of the plaintext is fresh. Moreover, we reseed the core before each trace with a fresh seed (the reseeding is not included in the trace). In the training dataset we use a fresh random key and a fresh random plaintext for each trace. The validation and test datasets are sampled identically and use a single fixed key (sampled at random) for the whole dataset, and a fresh random plaintext for each trace.

Dataset versions

  • v1 is the dataset used for the CHES 2023 SMAesH challenge. The fk1 part of that dataset is kept private.
  • v2 contains the same data as the v1 dataset, under a file structure more suitable for archival. All the parts of this dataset are public, including fk1.

Dataset parameters

The dataset contains the power measurements collected with the evalutation setups described in Targets together with input data used at each execution of the core execution. In particular, the following fields can be found:

LabelTypeLengthDescription
tracesint16nspower traces measurement of ns time samples.
umsk_plaintextuint816unshared plaintext.
umsk_keyuint816unshared key.
msk_plaintextuint816 \( \cdot \) dplaintext sharing with d shares.
msk_keyuint816 \( \cdot \) dkey sharing with d shares.
seeduint810PRNG seed of 80 bits.

The datasets have different levels of granularity in terms of data they contain. The following table summarizes which fields are provided with each datasets for all targets:

Training / Validation (A7_d2)Training / Validation (S6_d2)Test
traces
umsk_plaintext
umsk_key
msk_plaintext
msk_key
seed

Finally, the next table summarizes the global size of each dataset (in term of amount of traces).

Dataset nameTargetRoleNumber of tracesLength of traces
SMAesH-A7_d2-vk0Artix-7 (\( d = 2 \))profiling\( 2 ^ {24} \)\( 4250 \)
SMAesH-A7_d2-fk0Artix-7 (\( d = 2 \))validation\( 2 ^ {24} \)\( 4250 \)
SMAesH-A7_d2-fk1Artix-7 (\( d = 2 \))test\( 2 ^ {24} \)\( 4250 \)
SMAesH-S6_d2-vk0Spartan-6 (\( d = 2 \))profiling\( 2 ^ {24} \)\( 4400 \)
SMAesH-S6_d2-fk0Spartan-6 (\( d = 2 \))validation\( 2 ^ {24} \)\( 4400 \)
SMAesH-S6_d2-fk1Spartan-6 (\( d = 2 \))test\( 2 ^ {24} \)\( 4400 \)

Files organization and dataset reading

The dataset for the SMAesH challenge is composed of several datasets, which are grouped by target and by security order (denoted as a target isntance). For each target instance, we provide the training and the validation dataset (respectively vk0 and fk0), and keep private the test dataset (fk1). Each dataset is described by a manifest file (denoted manifest.json) that describes the content of the dataset (including a file list and a way to check integrity) and is composed of several sub-directories (one per field stored in the dataset which is containing the fields data). The data files use the NPY format.

The dataset are expected to be read with the tool provided in dataset.py specifically implemented for this purpose. It provides top level functions that allows to load the data contained in a dataset per blocks of arbitrary size (see the definition of iter_ntraces in dataset.py and its usage in demo_submission/attack.py for more details).

The architecture convention described above will be followed when the SMAesH dataset will be extended with new target cases. The dataset organisation for the different targets are depicted by the following trees

smaesh-dataset/
+-- smaesh-dataset-A7_d2-vk0/
| + manifest.json
| +-- traces/
| +-- umsk_plaintext/
| +-- umsk_key/
| +-- msk_plaintext/
| +-- msk_key/
| +-- seed/
+-- smaesh-dataset-A7_d2-fk0/
| [...]
+-- smaesh-dataset-A7_d2-fk1/
| [...]
+-- smaesh-dataset-S6_d2-vk0/
| [...]
+-- smaesh-dataset-S6_d2-fk0/
| [...]
+-- smaesh-dataset-S6_d2-fk1/
| [...]

Hashes of datasets

SHA256 hashes of manifest.json files (dataset v1):

e067944fa0c630301c29f09cb53747bafd148af29032c87ff3a20a07e6401bc6 A7_d2-vk0/manifest.json
91db2ed958c7c00e07eaec07cec8818e30c0dfd79cfcb8bac856db41f5b485b9 A7_d2-fk0/manifest.json
08690d4152c2c6b979bd20cad489b5c99dafac7ad970fb60bcf91d67ea44be12 A7_d2-fk1/manifest.json
6af82b2c13eec7de974f3ec25756c470910c4aeca612988bad7d5bcb39a74f7a S6_d2-vk0/manifest.json
fd0469d839336f0f7fe644c97949c1dfee9eb145011213b3ef29b4e334c5753b S6_d2-fk0/manifest.json
90f2b82fc3ec788523e90ef9682864dd3682179d7b5f19f8439a583cc87eb5fe S6_d2-fk1/manifest.json

SHA256 hashes of manifest.json files (dataset v2):

6045582ea4de5545682579d08acc57b5c0f1ea4e73e898f5ca0128af643305a1  smaesh-dataset-A7_d2-fk0/manifest.json
52823b9d7ee325a7e1f257c3b23b3f9fb9a911f517c1169b3118ee81f5740855  smaesh-dataset-A7_d2-fk1/manifest.json
f7aef1456ce193ed2823dc0ba7c5dbe6b0c84cf6868ac8bdffcb60cea0e519cf  smaesh-dataset-A7_d2-vk0/manifest.json
c0d6ead05f9d5cad80bde5360b2c89d7686164afd0c775adac887095c080b307  smaesh-dataset-S6_d2-fk0/manifest.json
b90de1d3c9e040303ffaf44f44c713947da2fdeae53aca768f3397c5ef295990  smaesh-dataset-S6_d2-fk1/manifest.json
36ad6916dd5b4bd6c09c1152123d26c196721c82c68135dba5979b30384f8199  smaesh-dataset-S6_d2-vk0/manifest.json

Dataset download

The v2 dataset is available here. The v1 dataset is not publicly available anymore.

SHA256 hashes of the compressed files:

ea1d2f58939708c617f02040350c5d125ad78808c49e8dbb7f0790cd2a3d1c77  smaesh-dataset-A7_d2-fk0.tar.zstd
7ed17c5e08fb76d59e304ee1320f2acc10783da27b9d4ed8b3295f22944055a4  smaesh-dataset-A7_d2-fk1.tar.zstd
f8edcc26fbed4c6f96ccb7fbd34b7f12a2b8b4abb515c1eb5c949c88526ab9b5  smaesh-dataset-A7_d2-vk0.tar.zstd
1ad75f3b2f0a037711ea49ef0ad61a6d20d04e0c6b9f0a1a4697288420e314cf  smaesh-dataset-S6_d2-fk0.tar.zstd
a2f22abc9beffbae87e2970eccd8f345cf755c1eb2a8af9d37120f85d58d8a3d  smaesh-dataset-S6_d2-fk1.tar.zstd
deae6c33f6d5af04f91043eafbe21617eec895121f99c480e471ce3363c9afb6  smaesh-dataset-S6_d2-vk0.tar.zstd

The files are compressed with the zstd tool (typically available on linux distributions as the zstd package), and archived in the tar format.

Example decompression command:

zstdcat -T0 smaesh-dataset-A7_d2-vk0.tar.zstd | tar xv

Framework

The Getting Started page explains in a minimal way what are the commands to use in order to run the demonstration attack provided. By following these, you can ensure that all the dependencies are properly installed before starting implementing your own.

In this section, we give more details on the challenge's python-based framework to develop and evaluate attacks, available on github. The following sections try to guide a candidate through the implementation of a new submission. However, not all the details are covered , and we invite the reader to refer to the documentation found directly in the code for further explanation.

Next, we assume that the attack is written in python, but the framework may also work with other languages (see Beyond Python).

Contents

The framework is split in two parts:

  • The code running the attack, given as a demonstration attack (demo_submission directory). This includes:
    • Loading the datasets.
    • Simulating internal states of the circuit.
    • Running a optional profiling step.
    • Executing the attack in itself.
    • Running a simplified evaluation for testing/developing purpose.
  • The scripts (scripts directory) for
    • building a submission package,
    • checking that is it well-structured,
    • evaluating it.

Dependencies

The main dependencies for the framework are given in Getting Started.

Additionally, the fully reproducible submission evaluation depends on Apptainer (optional, see Submission).

Usage

Running attacks

There are multiple ways to an attack in the framework, that vary in their ease of use, performance overheads and portability.

Quick eval Using the quick_eval.py python script in the submission is the easiest. It also has minimal overhead, and it is therefore ideal for developping an attack. See Getting Started for usage instructions.

Test submission Since quick_eval.py is tightly integrated to a submission, it is easy to use and modify, but this tight integration is not always wanted: when evaluating a submission that is not our own, we would like a more standard interface. This is what the scripts/test_submission.py script provides. It has multiple modes (attack in directory or zip file, native or containerized exeuction). This shoud be mostly useful to validate your submission package. See Submission for usage instructions.

Implementing attacks

The implementation of an attack lies inside two python functions.

  • The profiling function (optional), see Profiling.
  • The attack function, see Attack.

Do you need more flexibility? You can change anything in the demo_submission directory, as long as the command-line interface remains the same. See also Beyond Python.

Submitting attacks

Once your attack works with quick_eval.py, see Submission for a step-by-step list on how to send it to the evaluation server.

Profiling

Within the framework

This (optional) phase allows to create profiles of the leakage that can be used afterwards in the attack phase. This profiling step can be implemented in the profile() method of the Attack class in attack.py.

It is defined as follows

def profile(self, profile_datasets: List[DatasetReader]):

where DatasetReader is defined in dataset.py. The function does not return anything, but must set the value of the instance variable self.profile_model (which can be any pickle-able data).

The computation of the values manipulated by internal wires of the target may be required during the profiling phase. While you can implement your simulation procedure based on the SMAesH core architecture, we provide scripts to build a simulation library with Verime from the verilog code of the target (see Target simulation). On the provided example attack, the profiling phase consists in creating gaussian templates (together with a reduction of dimensionality) for every shares of each bytes after the first SubByte layer. For that, we directly use the SCALib LDAClassifier and rely on the SNR to select the POIs from the traces (see attack.py for more details).

To avoid re-computations, profiles are typically save to files using the instance functions save_profile() and load_profile() (this is managed by quick_eval.py).

When you submit a submission to the evaluation server, this profiling phase will be run. There is a timeout of 4h for this run. If your profiling duration exceeeds that limit, you can embed your profiles in the submission (see below).

Outside the framework

You can also develop your own profiling methdology and save it results to a file that you include in you submission package. E.g., this approach should be used if your profiling is computationally intensive, to the point of exceeding the limits set in the rules.

When such a profile file is embedded into a submission package, the method to follow for regenerating this file must be documented in the submission package (see Submission).

Note that if you submission package exceeds 4 GB, it will not be accepted by the evaluation server. If this limit cannot be adhered to by your attack, we'd still like to be able to accept it in the challenge. Please contact the organizers, we may (at our discretion) arrange a way to bypass the 4 GB limit.

Target simulation

In order to analyse the circuit, it is necessary to know the internal values it handles. To this end, our strategy is to simulate the behaviour of the circuit and to recover the values that interest us. This solution avoids the need to write specific code for each targeted signal (which is time consuming and can lead to errors). The verime tool has been specifically developped for this purpose. In the following sections, we explain how the latter is used for the demo submission provided.

Identification of useful signals

For the demo attack, we consider that an adversary wants to perform a template attack against the SMAesH AES implementation. To this end, he seeks to model the power consumption of the implementation as a function of the share values manipulated after the first SubBytes layer (i.e., the bytes of the state exiting the Sboxes layer of the first round).

As explained in details in the SMAesh documentation, these values are manipulated by the core at the end of a round execution. More particularly, the wires bytes_from_SB coming from the sboxes instances hold the target values when cnt_cycles equals to 7, 8, 9 and 10 (Figure 16 in the core's documentation). The adversary has thus to recover the values passing on these wires at these specific clock cycles in order to be able to build his templates.

Verilog annotation for Verime

The first step to do is to annotate the HDL of the architecture with the verilator_me attribute in order to drive the operations performed by Verime. This annotation is necessary in order to designate the signals from which we wish to obtain the value.

Targeting the SMAesH architecture, this can be achieved by adding the verime attribute on the bytes_from_SB bus in the the source file MSKaes_32bits_core.v (as shown next)

...
(* verime = "B_fromSB" *)
wire [8*d-1:0] bytes_from_SB [3:0];
...

The value of the wire bytes_from_SB will then be accessible through the label B_fromSB. Multiple internal values can be annotated with the verilator_me attribute, but the labels used for each signals have to be different. In addition to wires, ports, registers and/or array of wire and registers can be annotated as well (please refer to the Verime documentation for more details).

Implementation of the C++ simulation wrapper

The next step is to implement the top-level interface of the simulated HW module. The goal of the later is to define how the HW module is used during a single execution. In particular, the user has to implement the function run_simu with the following definition

void run_simu(
        SimModel *sm,
        Prober *p,
        char* data,
        size_t data_size
        )

where the structures SimModel and Prober are specific to Verime (accessible by using the statement #include "verime_lib.h"), data is the input data for a single execution (encoded as an array of bytes) and data_size the amount of bytes provided. As explained in details in the Verime documentation, the Verilated instance of the HW module can be accessed under the variable sm->vtop, which allows the access/set the value of any signal at the top-level. In addition to the features enabled by Verilator, Verime implements the two following additional functions under verime_lib.h

  • sim_clock_cycle(SimModel * sm): simulates a posedge clock cycle.
  • save_state(Prober * p): saves the values of the probed signals (i.e., the one that are annoted with verilator_me).

The file simu_aeshpc_32bit.cpp implements a simple wrapper that stores the values of the probed signals at every clock cycle once an execution started. Next, we detail each part of the file. First, the verime library is included and the value of the generic d that is considered is fetch

#include "verime_lib.h"

#ifndef D
#define D GENERIC_D
#endif
...

It has to be noted that the value of every generic that will be used during the Verime process can be accessed in the C++ wrapper by refering to the macro GENERIC_$(capital_generic_name). Then, we the function run_simu is implemented.
We start the later by applying a reset of the core as follows

...
// Reset the AES core
sm->vtop->rst = 1;
sim_clock_cycle(sm);
sm->vtop->rst = 0;
sim_clock_cycle(sm);
...

These four lines simply sets the core's reset signal during a single clock cycle and then clears it during following clock cycle. Then, the reseed procedure of the core is executed by performing an input transaction at its randomness interface. In practice the following lines are used

...
// Feed the seed
memcpy(&sm->vtop->in_seed,data,SEED_PRNG_BYTES);
sm->vtop->in_seed_valid = 1;
sm->vtop->eval();

while(sm->vtop->in_seed_ready!=1){
    sim_clock_cycle(sm);
}
sim_clock_cycle(sm);
sm->vtop->in_seed_valid = 0;
...

and the later naively implements the transaction. More into the details, the seed is copied from the data buffer to the dedicated randomness bus. Then, the control signal in_seed_valid is asserted and several clock cycles are simulated until the signal in_seed_ready is also asserted. An additional clock cycle is simulated ath the end of the while loop to complete the transaction. Finally, in_seed_valid is deasserted. The call to eval() is used to recompute the internal values resulting from combinatorial logic.

The next step consists in starting the execution using the provided plaintexts and key, which is achieved by the following piece of code

...
// Prepare the run with input data
// Assign the plaintext sharing
memcpy(&sm->vtop->in_shares_plaintext,data+SEED_PRNG_BYTES,16*D); 
// Assign the key sharing 
memcpy(&sm->vtop->in_shares_key,data+SEED_PRNG_BYTES+16*D,16*D);

// Start the run
sm->vtop->in_valid = 1;
sm->vtop->eval();
while(sm->vtop->in_ready!=1){
    sim_clock_cycle(sm);
}
sim_clock_cycle(sm);
sm->vtop->in_valid = 0;
sm->vtop->eval();
...

First, the plaintext and the key sharing are copied from the buffer to the input busses. Then, a transaction on the input interface is implemented to feed the core with fresh inputs. Finally, we wait until the completion of the execution by simulating a clock cycle at each loop iteration until the signal out_valid is asserted. While waiting, the probed signals are saved at every clock cycle by calling save_state(p) as shown here

...
// Run until the end of the computation
while(sm->vtop->out_valid!=1){
    save_state(p);
    sim_clock_cycle(sm);    
}
save_state(p);
...

Building of the python3 simulation package

The simulation package can be built providing an annotated Verilog code and the corresponding simulation wrapper. The building process is done in two simple steps:

  1. Generating the package files using Verime.
  2. Building the python package using the Makefile generated by Verime.

The Makefile combines both steps in the target verime and it suffices to use the later to create the python wheel. Basically, the first step consists in using Verime with the appropriate arguments in order to setup the package. The tool will analyze the hardware architecture, identify the annoted signals and create C++ files in order to probe these signals together with Verilator. Besides, it will generate all the python environment used in the wheel building process. As shown by its helper, Verime accepts the following parameters:

  -h, --help            show this help message and exit
  -y YDIR [YDIR ...], --ydir YDIR [YDIR ...]
                        Directory for the module search. (default: [])
  -g GENERICS [GENERICS ...], --generics GENERICS [GENERICS ...]
                        Verilog generic value, as -g<Id>=<Value>. (default: None)
  -t TOP, --top TOP     Path to the top module file, e.g. /home/user/top.v. (default: None)
  --yosys-exec YOSYS_EXEC
                        Yosys executable. (default: yosys)
  --pack PACK           The Verilator-me package name. (default: None)
  --simu SIMU           Path to the C++ file defining run_simu (default: None)
  --build-dir BUILD_DIR
                        The build directory. (default: .)
  --clock CLOCK         The clock signal to use. (default: clk)

In practice, the Makefile calls Verime with the following arguments under the target verime:

  • --ydir ./aes_enc128_32bits_hpc2 ./aes_enc128_32bits_hpc2/masked_gadgets ./aes_enc128_32bits_hpc2/rnd_gen ./aes_enc128_32bits_hpc2/sbox: used to point to the directories in which the SMAesH source files are located.
  • -g d=2: set the value of the generic d at the top-level of SMAesH
  • --top ./aes_enc128_32bits_hpc2/aes_enc128_32bits_hpc2.v: specifies the top module path.
  • --pack aeshpc_32bit_d2_lib: defines the package name.
  • --build-dir aeshpc_new_32bit_d2_lib: used to indicates the directory used for the building process (in practice, a directory with the package name in the current directory).
  • --simu simu_aeshpc_32bit.cpp: indicates the path to the simu_aeshpc_32bit.cpp file.

After the Verime execution, the directory defined with --build-dir contains an automatically generated Makefile. The latter first uses Verilator in order to build a shared library. The later will then be used as an efficient backend simulator. Finally, the python package is generated and the wheel aeshpc_32bit_d2_lib/aeshpc_32bit_d2_lib-*.whl is created. The following section explain how the provided example integrates the later.

Basic usage of the simulation package.

Once installed, the generated simulation package can be used to easily probe the annotated signal. It is considered next that the wheel generated in the previous step has been installed in the python environment. The following piece of code shows how to use the generated package

import aeshpc_32bit_d2_lib as pred
import numpy as np

### Generate random input data byte.
# Amount of cases to simulate
n_cases = 100
# Amount of input byte for a single case
len_data = 10 + pred.GENERICS['d']*16 + pred.GENERICS['d']*16
# Random input byte
data_bytes = np.random.randint([n_cases, len_data],dtype=np.uint8)

### Simulate the cases
# Amount of probed state to allocate 
# (>= number of calls to save_state() in the C++ wrapper)
am_probed_state = 110
simulations_results = pred.Simul(
        cases,
        am_probed_state
        )

### Recover the data for a specific cycle 
### Note that `bytes_from_SB` being a 2D wires, the index `__i` is added
### to the verime signal name. Please check the value of 
### pred.SIGNALS to get the names of all verime labels.

# Value of the state recover for all simulated cases
sh_byte0_fromSB_clk7 = simulations_results["B_fromSB__0"][:,7,:]
sh_byte1_fromSB_clk8 = simulations_results["B_fromSB__1"][:,8,:]
sh_byte2_fromSB_clk9 = simulations_results["B_fromSB__2"][:,9,:]
sh_byte3_fromSB_clk10 = simulations_results["B_fromSB__3"][:,10,:]

The first lines are generating the numpy 2D-array data_bytes with random bytes. Each row of this array contains the input bytes of a single simulation case. In practice, each of these rows corresponds to an array char * data that will be used by the function run_simu() in the simulation wrapper. In this example, 100 independant random cases are generated, and each row contains the bytes representing the \( 80 \)-bits seed, the \( 128 d \)-bits plaintext and key of a single case. Note that the practical amount of shares \( d \) is fetch from the value that has been passed to Verime during the building process by accessing to the GENERICS metadata of the package.

Next, we use the package to simulate all the input cases. To this end, the package function Simul() takes two input parameters: the cases input data (as a numpy array of bytes with a shape of (n_cases, len_data)) and the amount of probed states to allocate. More into the detail, the backend will allocate memory in order to store a given amount of times each annotated signal per case simulation. Each time the function save_state() is called, the value of the annoted signals are stored to the buffer. In our present example, the saving is done at every clock cycle, and a total of 106 saves is done for a single execution.

The results of the simulation for each cases are stored in the variable simulations_results. In particular, the simulated values for a given signal can be accessed directly using verime label corresponding to the signal. The simulation results are organised as a 3D bytes arrays of dimension (n_cases, am_probed_state, bytes_len_sig), with

  • n_cases: the amount of simulated cases.
  • am_probed_state: used for the memory allocation of the simulation. Correspond to the maximum amount of time save_state() can be called in the simulation wrapper. In particular, using the index i at the second dimension allows to recover the value of the i-th call to save_state() perfomed in the simulation wrapper.
  • bytes_len_sig: the amount of bytes required to encode the simulated signal.

It results that the variables sh_byte0_fromSB_clk7, sh_byte1_fromSB_clk8, sh_byte2_fromSB_clk9 and sh_byte3_fromSB_clk10 hold 4 out of the 16 targeted values (i.e., the values of the wires bytes_from_SB[0],bytes_from_SB[1],bytes_from_SB[2],bytes_from_SB[3] respectively for the clock indexes 7, 8, 9 and 10) when the input vectors stored in data_bytes are used at the input of the core.

Integration in the example submission package

To ease the readibility of the model/attack scripts provided, the file tap_config.py defines the TapSignal class. The latter allows to define a specific signal of interest and provides useful features to deal with the simulated values. In particular, each instance implements the simulation of the configured signal. Besides, when the target signal configured holds a shared data, the user can select to recover a specific sharing or the unmasked value hold by the wire. The following parameters must be provided to each TapSignal instance

Instance parameterTypeDescription
sig_namestrVerime label of the annotated signal
cycleintClock index of interest (considering that the values of the annotated signals are saved at each clock cycle).
share_indexobjShare index to tap. The user has the choice between
  • 'raw': The raw value of the bus.
  • None: The unmasked/recombined value of the bus.
  • \( i \leq d \): the share index \( i \). In that case, only the value of the \( i \)-th share will be recover by the simulation.
tap_bitslist of integers or rangeThe bits indexes of interest. The behaviour depends on the value of share_index
  • 'raw': the indexes represent bits indexes in the raw internal bus.
  • None: indexes represent bits indexes of the unshared value.
  • \( i \): indexes represent bits indexes of the configured share.
am_sharesintAmount of shares used the encode the shared value.

In the demo submission, a TapSignal instance is generated for each shares of each bytes of the state after the first SubByte layer (as done per the function generate_TC_from_SB() in tap_config.py). The tap signal are then used in the profilling phase in order to recover the traces label when building the templates. As a final remark, the TapConfig us just a wrapper designed in order to ease the management of multiple TapSignal instances.

Attack

The attack phase is the only mandatory step that should be implemented in a submission. It takes as input a set of traces and computes a subkey scores. It can also use Profiling data.

To this end, the method attack must be implemented in attack.py:

def attack(self, attack_dataset: DatasetReader) -> KeyGuess

where DatasetReader is defined in dataset.py and KeyGuess is defined in key_guess.py). When a profiling phase is used (as done in the demo submission), the computed/loaded profile is stored in the variable self.profiled_model (the assignement of this variable is handled by quick_eval.py).

The dataset reader points to a fixed key dataset that contains only the power traces and the unmasked plaintext (see Datasets).

The KeyGuess consists in a split of the 128-bit key in subkeys (of at most 16 bits, which can be non-contiguous), and, for each subkey, a list that gives a score for each of its possible values. The score of a key value is the sum of the scores of its subkeys, and it determines the rank. See also the documentation of KeyGuess.

In the demo submission, we use the SASCA implementation from SCALib to recover information about the 16 bytes of the key. While it works (as shown in Getting Started), the attack is not optimised and does not achieve good performance, but it is a starting point for the development of better attacks within our evaluation framework (see attack.py).

Submission

In this section, we explain how to prepare, package and test your submission.

At this point, we assume that your attack works with quick_eval.py (see Framework).

Submission directory

Put your submission in a directory that satisfies the following (the demo_submission directory is a good starting point).

  1. It must contain the filesubmission.toml. See demo_submission/submission.toml for example and instructions.
  2. If your attacks depend on python packages, put your dependencies in setup/requirements.txt (you can generate it with pip freeze).
  3. It must contain the file setup/setup.sh with setup container instructions. If you only depend on python packages, keep the one of demo_submission/, otherwise, add your custom build/install steps here (see Beyond Python for details).
  4. Ensure that the ressources required by your submission are generated (e.g., profile file, etc.). The demo submission is using a library (python wheel) built by verime. It should thus be generated (in demo_submission/setup/ and must be listed in the file demo_submission/setup/requirement.txt for the evaluation to work. To this end, the command
    # Run in venv-demo-eval environment.
    make -C demo_submission/values-simulations 
    
    generates the library wheel, copies it into the directory demo_submission/setup and updates the file demo_submission/setup/requirements.txt accordingly.
  5. If your submission include non-source files (e.g., binary libraries or profiled models), it must contain a succint README explaining how to re-generate those from source. It may also explain how your attack works.

First test (in-place, native)

Test your submission with the test_submission.py script.

# To run in the SMAesH-challenge directory, assuming the submission directory is still demo_submission.
# To run after activating the venv-scripts virtual environment (see "Getting Started").
python3 scripts/test_submission.py --package ./demo_submission --package-inplace --workdir workdir-eval-inplace --dataset-dir $SMAESH_DATASET

If this does not work, it is time to debug your submission. To accelerate the debugging process, see the various command-line options of test_submission.py. In particular, --only allows you to run only some steps (e.g. --only attack).

Building and validating the submission package

The scripts/build_submission.py scripts generates a valid submission .zip file based on the submission directory. You can use the following command to generate the package archive for the demo submission.

# (To run in the venv-scripts environment.)
python3 scripts/build_submission.py --submission-dir ./demo_submission --package-file mysubmission.zip --large-files "setup/*.whl"

If you use "Outside the framework" profiling, you will likely have to add multiple parameters to --large-files, e.g., --large-files "setup/*.whl" "profiled_model.pkl". We try to keep submissions small (as it makes it easier to download them afterwards) by not including non-required large files.

Then, you can validate basic elements of its content with

python3 scripts/validate_submission.py mysubmission.zip

Final tests

Let us now test the content of mysubmission.zip.

python3 scripts/test_submission.py --package mysubmission.zip --workdir workdir-eval-inplace --dataset-dir $SMAESH_DATASET

If this succeeds, we can move to the final test. To ensure a reproducible environment, submissions will be evaluated within a (docker-like) container runtime. The following test ensures that everything is functioning correctly inside the container (and in particular that your submission has no un-listed native dependencies -- the container is (before setup.sh runs) a fresh Ubuntu 23.04). It will also validate resource constraints (you may want to relax timeouts if you use a slower machine than the evaluation server).

  • Install the Apptainer container runtime.
  • Use test_submission.py in --apptainer mode:
python3 scripts/test_submission.py --package mysubmission.zip --workdir workdir-eval-inplace --dataset-dir $SMAESH_DATASET --apptainer

If this works, congrats! Your submission is fully functional and your results will be easily reproduced! It only remains to test it on the test dataset.

If it does not work, for debugging, note that the apptainer mode prints the commands it runs, so you can see what happens.

You may want to:

  • Use the --package-inplace mode of test_submission.py to avoid rebuilding the zip at every attempt.
  • Run only some steps of the submission with the --only option.
  • Clean up buggy state by deleting the workdir.
  • Run commands inside the container using apptainer shell.

Test and submit

Run

python3 scripts/test_submission.py --package mysubmission.zip --workdir workdir-eval-inplace --dataset-dir $SMAESH_DATASET --apptainer --attack-dataset-name fk1

to run the evaluation against the test dataset.

Then, send the evaluation result, along with the submission zip file, to the organizers.

Remark: The resource limit rule is lifted for the post-CHES part of the challenge. However, please let us know if your submission requires significantly more computational resources that this limit.

Beyond Python

The challenge framework has been developed to facilitate the development of python-based submissions. It is however possible to develop submissions using other languages.

We suggest two main solutions for this.

  • Python C extensions. If want to use native code that can interface with C, you can probably turn it into a python module using CPython's C API.
  • Subprocess calls. It might be easier to make your actual implementation as a standalone script or binary that can be called as a subprocess from quick_eval.py.

Otherwise, you can use any other technique that works! What matters is that the final apptainer-based test of Submission succeeds.

Be sure to include all required installation steps in setup/setup.sh.

For native code, you can either:

  • Build it in setup/setup.sh: this is the most portable option, but requires to install the full compilation toolchain in the container.
  • Include the binary in the submission package. This might be easier, but be careful about native dependencies (or instruction sets -- the evaluation server has a AVX2-generation x86 CPU). We use this solution for the simulation library in the demo submission, as installing verilator would make the container setup very annoying.

Challenge rules

The rules of the contest should be interpreted together with all the documentation material of the contest.

Contest Goal

  • The goal of the challenge is to design attacks that break the targets a minimum of traces for the online phase of the attack.
  • "Breaking a target" means extracting a key ranking such that the correct key is within enumeration power, fixed to \(2^{68}\).
  • You can play individually or in teams.
  • Multiple targets will be introduced over time.

Submissions

  1. The participants will submit implementations of their attacks and the attacks will be evaluated by the organizers.
  2. The format and execution interfaces for the submissions is explained in the documentation available on the challenge website.
  3. Attacks can be submitted at any time on the challenge website as a "submission package" file.
  4. A submission can contain attacks for multiple targets. Each attack will be processed independently.
  5. The whole challenge (attacks, evaluations, point and prizes) is run indepndently for each target.
  6. Each attack comes with a claim, which is the number of online traces needed for the attack.
  7. Attacks are made public (attack name, team name, submission date and claim) as soon as they are submitted. Submission packages are also public, after a 10 days embargo.
  8. Sample submissions can be sent to the organizers for testing correct execution of scripts on the evaluation machine.

Attack evaluation

Each submitted attack will be run for the corresponding private test dataset restricted to the number of online traces claimed by the attack, using the challenge evaluation framework. The attack is successful if the upper-bound on the estimated rank is below \(2^{68}\). The state of an attack (successful or not) is made public as soon as the evaluation is done.

Evaluation limits

  • The evaluation will be run on a computer with a Threadripper 3990X processor, a Nvidia A6000 GPU and 128GB of available RAM.
  • The execution time of an attack (excluding profiling) will be limited to 4h.

Grading system

TL;DR: You gain 1 point every hour your attack remains the best one.

Attack acceptance

Attacks will be evaluated at unspecified and irregular intervals (normally multiple times a week), in the order in which they have been submitted.

If a team submits a new attack less than 3 days after submitting its previous attack, and that that previous attack has not been evaluated yet, then it will not be evaluated.

When the time comes to evaluate an attack, if its claim is more than 90% of the best successful attack evaluated so far, then it is not evaluated (i.e., a 10% improvement is required).

Non-generalizable attacks are not accepted and will not be accepted. An attack is generalizable if, in addition to being successful, it has a high chance of being successsful against other test datasets acquired in identical conditions. In particular, an attack that contains hard-coded information on the test dataset key is not generlizable.

Points

Points are countinuously awarded for the best successful attack, at the rate of 1 point per hour.

The dates taken into consideration are the date/time of submission of the attack (not the time of evaluation). The accumulation of points stops when the submission server closes at the end of the challenge.

Prize

For each target:

  • a prize of 1000 € is awarded to the team with the most points,
  • a prize of 500 € is awarded to the team with the best attack at the end of the challenge.

The awarded teams will be asked to send a short description of their attacks. Teams cannot win more than one award.

Final remarks

  • Any time interval of 24 hours is a day.
  • You are allowed to use any means you consider necessary to solve the challenges, except attacking the infrastructure hosting the challenges or extracting unauthorized data from the evaluation environment.
  • The organisers reserve the right to change in any way the contest rules or to reject any submission, including with retroactive effect.
  • Submissions may be anonymous, but only winners accepting de-anonymization will get a prize. For this reason, only submissions with a valid correspondence email address will be eligible for prizes.
  • Submissions containing non-generalizable attacks are not accepted and will not be accepted. An attack is considered "generalizable" if, in addition to being successful, there is a high probability that it will also be successful against other evaluation datasets acquired under similar conditions.

FAQ

How do I choose the claim of my attack?

You can test your attack on the fk0 dataset. If you did not train on it, then the number of traces needed to attack fk0 is a priori not different that the one needed to attack fk1.

That being said, for a given attack algorithm, some keys might be easier to break than some others, and maybe fk1 is easier or harder than fk0 for your attack. Please do not overfit your attack: develop it while evaluating only on fk0, and when it works there, test it on fk1. You may make multiple attempts at fk1 while chaning the number of attack traces, but your final number of traces should work for both fk0 and fk1.

I have another question.

Contact us

Leaderboard

This pages lists the attacks against the SMAesh dataset.

CHES2023 challenge

These attacks were submitted for the CHES2023 challenge and follow the challenge rules.

TargetAuthorsAttackTraces
A7_d2Gaëtan Cassiers, Charles MominDemo16777215
A7_d2Thomas MarquetMorningstar-16500000
A7_d2Thomas MarquetMorningstar-1.35000000
A7_d2Valence CristianiAngelo500000
A7_d2Valence CristianiRaphaellio390000
A7_d2Valence CristianiDonatella290000
S6_d2Valence CristianiLeonardddo10000000
S6_d2Valence CristianiLeonardda5000000
S6_d2Thomas MarquetMorningstar-2.2-try-again2150400
S6_d2Thomas MarquetMorningstar-2.51638400
S6_d2Thomas MarquetMorningstar-2.5.21228800
S6_d2Thomas MarquetMorningstar-xxx901120

These attacks can be downloaded here.

Post-CHES challenge

After the CHES challenge, the test datasets have been released (as well as the profiling randomness for the S6_d2 dataset).

We invite everybody who works with the dataset to report their attacks to the challenge organizers (paper and/or code link). We aim to maintain here a list of all public attacks on the dataset. Ideally, attack code should work within the evaluation framework, in order to ease reproduction.

Following challenge rules

To qualify, an attack should have been trained only on the training and validation datasets, and evaluated on a test dataset (taking the first \(x\) traces of that dataset).

TargetAuthorsAttackTracesUse prof. RNG seed

Other attacks

We list here the attacks that we are aware of, but that do not follow the challenge rules.

  • (None at the moment.)