Welcome to the SMAesH Challenge
The SMAesH challenge is a side-channel analysis contest on SMAesH, a masked FPGA implementation of the AES. Using the public profiling dataset and the open-source hardware design, the goal is to mount a key-recovery attack using as few traces as possible.
The SMAesH challenge was the CHES2023 challenge.
Get started now!
The winners were announced at the CHES2023 rump session slides, but the challenge continues: see the leaderboard and new submission instructions!
N.B.: We maintain a list of all attacks, including those that are not proper submissions (e.g., do not follow the correct format or do not respect the rules): please send them to us.
Key features
Open-source target: open-source AES implementation running on widespread SCA board: CW305 and Sakura-G with reproducible acquisition setup.
Public datasets: 16 million traces with random key and plaintext, 16 million traces with fixed key and random plaintext, covering 2 AES rounds (~4,500 samples per trace).
Simple example attack (that works) as a starting point: you can easily start by improving it.
Profiling challenge: the profiling dataset for the Artix-7 target (CW305) contains the values of all the shares in the executions, while the one for the Spartan-6 (Sakura-G) contains only the unmasked values.
Attack success criterion: rank of the key below \(1 \text{BTC-H}\cdot\mathrm{s}\), defined as the number of blocks hashed by the Bitcoin mining network in 1 second (fixed to \(2^{68}\) for the duration of the challenge).
Efficient implementation good latency vs area trade-off with 32-bit wide masked datapath (4 Sboxes in parallel).
Arbitrary-order masking: if you completely break the first-order design, there is more to come!
Attack ideas
The demo submission implements a textbook attack against the AES S-box output that should be easy to improve. We next share a few ideas of alternative strategies that could be used for this purpose:
- Try to re-use the demo submission with fewer traces! This is a quick and efficient way to gain points for early candidates.
- Exploit more leakage points: the demo targets the shares of the S-box output which lies in the combinatorial logic, but the masked states in the bitslice S-box or the output of MixColumns leak more.
- Profile larger target intermediate values: for example, the masked states in the bitslice S-box are larger than 8-bit despite they only depend on 8 key bits, and the output of MixColumns naturally depends on 32 key bits.
- Perform multi-target and multivariate attacks: there are multiple leaking operations in the implementations, which can be exploited with advanced statistical attacks (e.g., analytical strategies or machine learning).
- Try different profiling strategies: for low number of shares, directly profiling with a machine learning model without taking advantage of the shares' knowledge could be possible.
- Perform cross-dataset transfer learning: we provide more profiling power for the Artix-7 than for the Spartan-6.
- Exploit the leakage of the key scheduling algorithm.
You can also have a look at the existing open-source attacks!
Timeline
- 2023-05-08 Challenge launch with Artix-7 target, submission server opens.
- 2023-07-03 Launch of the Spartan-6 target.
- 2023-09-1 Submission server closes.
- 2023-09-10 (at CHES) Award ceremony.
- 2023-10-19! Full dataset public release. The challenge continues with self-evaluation! See the leaderboard.
Contact information
- Mailing list: announcements and challenge discussions. [send a mail] [subscribe] [list archive]
- Matrix channel for chatting with other participants, build teams, etc.
- Directly contact the organizers for private matters: info@simple-crypto.org.
Organizers
This challenge is organized by the SIMPLE-Crypto Association, a non-profit organization created in order to develop open source cryptographic implementations and to maintain them over time, currently with a strong focus on embedded implementation with strong physical security guarantees.
The contributors involved in this project are Gaëtan Cassiers, Charles Momin and François-Xavier Standaert.
Getting started
A dedicated evaluation framework has been developped for the challenge and is available on Github. This section explains the different steps that should be performed in order to run the demo attack provided. More details about the evaluation framework can be found in the dedicated Framework section.
Installing dependencies
The framework runs with python >= 3.8
and requires the following python tools:
venv
, part of python's standard library, but not included by in some python installations (e.g., on ubuntu, you might have toapt install python3-venv
to get it).pip
, also, part of most python installations (but on ubuntu,apt install python3-pip
is needed).
Additionally, the demonstration attack depends on
- Yosys (version 0.25 tested, below 0.10 will likely not work)
- Verilator (use version 5.006, many other version are known not to work)
- Make
CAUTION: we highly recommand to install Verilator from the git and to run Verilator in-place (as recommended by the official documentation).
We highly recommend to use the challenge in a unix environment (on windows, use WSL).
Cloning repo
First, clone the challenge framework repository:
git clone https://github.com/simple-crypto/SMAesH-challenge.git
Downloading the datasets
See this page for downloading the datasets.
In the following, we use the variable SMAESH_DATASET
as the path to the directory where the
downloaded dataset is stored (i.e., the path to the directory smaesh-dataset
,
which is the directory that contains the directories A7_d2
and S6_d2
).
As a final step, format the dataset.
This operation must be done a single time on each fixed key dataset (it may last a few seconds).
This will generate a new manifest per dataset (manifest_split.json
) that will be used by the framework's scripts to evaluate the attack.
# Create the venv for running framework's scripts
cd SMAesH-challenge
python3 -m venv venv-scripts
source venv-scripts/bin/activate # Activate it (adapt if not using bash shell)
pip install pip --upgrade
pip install -r scripts/requirements.txt
# Run the split_dataset command (here for the Artix-7 only)
python3 scripts/split_dataset.py --dataset $SMAESH_DATASET/A7_d2/fk0/manifest.json
# Leave de venv-scripts virtual environment
deactivate
Running our example attack: profiling, attack, evaluation
The following steps allow to run the demo attack and to evaluate it.
- First, we move to the cloned framework directory
cd SMAesH-challenge
- Then, setup a python virtual environement used for the evaluation, and activate it
python3 -m venv venv-demo-eval source venv-demo-eval/bin/activate pip install pip --upgrade
- Install the verime dependency (tool for simulating intermediate values in the masked circuit)
pip install verime
- In the demo submission directory, build the simulation library with Verime
cd demo_submission make -C values-simulations
- Install the python package required to run the attack
(cd setup && pip install -r requirements.txt)
- Run the evaluation in itself
# Profiling step: # - uses the dataset vk0 # - saves the templates in the current directory python3 quick_eval.py profile --profile-dataset $SMAESH_DATASET/A7_d2/vk0/manifest.json --attack-case A7_d2 --save-profile . # Performs the attack using 16777215 traces # - uses the dataset fk0 # - loads the profile located into the current directory # - performs the attack using 524288 traces # - saves the keyguess resulting in the file './keyguess-file' python3 quick_eval.py attack --attack-dataset $SMAESH_DATASET/A7_d2/fk0/manifest_split.json --attack-case A7_d2 --load-profile . --save-guess ./keyguess-file --n-attack-traces 16777216 # Evaluates the attack based on the result file produced # - loads the keyguess file generated with the attack # - use the key value of the dataset fk0 as a reference. python3 quick_eval.py eval --load-guess ./keyguess-file --attack-case A7_d2 --attack-dataset $SMAESH_DATASET/A7_d2/fk0/manifest_split.json
For the demo attack, the evaluation phase is expected to produce the following result on the standard output when the default configuration are used
...
log2 ranks [59.79907288]
number of successes 1
which means that the attacks reduces the rank of the correct key to \( 2 ^ {59.79} \).
By default, the profiling phase implemented uses \( 2^{24}\) traces to build the models, which may result in a significant processing time (it takes about 45 minutes on the reference machine). The attack also runs in abouthe same time on that machine. Reducing the number of traces for both steps will reduce their execution time (at the expense of a worse key rank, of course).
Note: you can run multiple steps at once, as in python3 quick_eval.py profile attack eval ...
.
Next steps
It's your turn!
Both phases (profiling and attack) are implemented in the profile()
and
attack()
functions in
attack.py:
tweak these functions to implement your revolutionary attack.
If you get the demo submission to run with fewer traces, you can also try to directly submit it!
The other pages of this website provide more detailed information on how to develop a submission. In particular:
- Framework details how to use the framework of the challenge to develop, evaluate, package and send a new submission.
- Rules: see how to get points, and what are the constraints on submitted attacks.
- Target: acquisition setup used for the different targets.
- Datasets: content of the datasets.
Have a look at our suggestions and at the SMAesH documentation to get ideas for improved attacks.
Targets
The targets for this challenge are all instantiations of SMAesH on FPGAs. The latter is a 32-bits masked implementation of the AES-128 encryption algorithm. For the challenge, we first instantiate it at the first order security (\(d=2\)).
We have two FPGA targets: the Chipwhisperer CW305 with an Artix-7 Xilinx FPGA and the Sakura-G with a Spartan-6 FPGA.
The challenge for the Artix-7 target is in a fully white-box profiled setting: the full implementation is open-source, including the bitstream. The profiling datasets for this target include the full seed randomness, therefore the complete state of the FPGA is known for the profiling traces. Refer to Artix-7 for detailled explanations.
For the Spartan-6, the profiling setting is more constrained: only the SMAesH core is open-source, and the remaining part of the design is kept secret. The profiling datasets for this target only contain the value of the key and the plaintext. Of course, challenge participants can perform measurement on their own instance of SMAesH on a Sakura-G, or use the Artix-7 dataset as a starting point to build an attack. Refer to Spartan-6 for detailled explanations.
Artix-7
The Artix-7 measurement setup is based on the Chipwhisperer CW305 board from NewAE.
The FPGA bitstream used to perform the acquisitions has been generated using the Xilinx Vivado Toolset (v2022.1 64-bit) and the following modifications have been applied compared to the default toolflow parameters:
- HDL annotation:
- attribute
DONT_TOUCH
set for every module. - attribute
KEEP_HIERARCHY
set for every module.
- attribute
- Synthesis parameters:
flatten_hierarchy
set tonone
gated_clock_conversion
set tooff
bufg
set to 12directive
set toDefault
no_retiming
checkedfsm_extraction
set toauto
keep_equivalent_registers
checkedressource_sharing
set tooff
no_lc
checkedno_srlextract
checked
- Implementation parameters:
opt_design
related:is_enabled
unchecked
phys_opt_design
related:is_enabled
unchecked
The provided datasets contain power traces that have been acquired by measuring the voltage drop accross the \( 100 m \Omega \) shunt resistor R27. The low noise amplified signal point X4 is measured by a digital oscilloscope through a SMA connector. An external low noise power supply Keysight E36102B is used in order to avoid the noise generated by the onboard (switching) power supply. In particular, a continuous DC voltage of 1V is provided through the dedicated banana jacks on the board (and the switch SW1 is configured accordingly).
The digital oscilloscope used is a PicoScope 6242E. The phase of the clocks used by the target FPGA and the oscilloscope are matched in order to reduce the level of noise induced by clock jitter. In particular, the onboard CDCE906 PLL module is configured to generate two clocks signals based on the 12MHz onboard crystal. The first is the FPGA clock, running at 1.5625MHz that is generated by the PLL1 and fed to the port N13 on the FPGA. The second is a 10MHz clock signal, generated by the PLL0 and fed routed to the X6 SMA connector. The later is then forwarded to the PicoScope 10MHz clock reference input port. A single measurement channel (channel A) is used to perfom the measurement and the trigger signal is fed from the onboard test point TP1 to the oscilloscope AUX trigger port.
The power traces are sampled at 5GHz (resulting in 3200 samples per target clock cycle) using a vertical resolution of 10 bits. Two steps of pre-processing are applied before storing the measurements: first, a re-alignement algorithm based on a maximum correlation is used in order to improve the SNR. In particular, the shift maximising the correlation of each trace with a reference trace is computed and applied for each collected trace. Second, sequential time samples are aggregated (i.e., summed) in order to reduce the dataset storing size. The practical reduction ratio equals 16, resulting in a practical sampling frequency of 312.5MHz with a vertical resolution of 14 bits.
The Vivado project used to generate the bitstream of the target FPGA is available on github. The acquisition setup relies on a tweaked version of the chipwhisperer CW305 firmware. The latter is also available on github. In addition to adding genericity to the acquisition configuration, it has also been modified to be able to acquire multiple datasets while limiting the biases that can occur during long measurement campaigns.
Spartan-6
The Spartan-6 measurement setup is based on the Sakura-G board from the Satoh Lab.
The provided datasets contain power traces that have been acquired by measuring the voltage drop accross a \( 2 \Omega \) shunt resistor placed at the JP2 connector. The amplified signal point J3 is measured by a digital oscilloscope through a SMA connector. An external low noise power supply voltage Keysight E36102B is used in order to provide a continuous DC voltage of 5V at the dedicated connector CN1/EXT5V (and the power switch is configured accordingly).
The digital oscilloscope used is a PicoScope 6242E. The phase of the clocks used by the target FPGA and the oscilloscope are matched in order to reduce the level of noise induced by clock jitter. In particular, the waveform generation feature enabled by the oscilloscope is used to generate a clock signal of 1.5625MHz. The latter is forwarded to the board target FPGA through a SMA connector. A single measurement channel (channel A) is used to perform the measurement and the trigger signal is fed from the GPIO connected to the target FPGA.
The power traces are sampled at 1.25GHz (resulting in 800 samples per target clock cycle) using a vertical resolution of 12 bits. As a pre-processing, sequential time samples are aggregated (i.e., summed) in order to reduce the dataset storing size. The practical reduction ratio equals 4, resulting in a practical sampling frequency of 312.5MHz with a vertical resolution of 14 bits. It results that the Spartan-6 traces have a temporal configuration similar to the one collected for the Artix-7.
Dataset
For each target, we have acquired 3 datasets:
- training dataset,
- validation dataset,
- test dataset.
The training and validation datasets are public, while the test dataset is used to evaluate the submissions and kept private by the organizers.
All datasets correspond to a correct usage of the SMAesH core: for each trace, the sharing of the key and of the plaintext is fresh. Moreover, we reseed the core before each trace with a fresh seed (the reseeding is not included in the trace). In the training dataset we use a fresh random key and a fresh random plaintext for each trace. The validation and test datasets are sampled identically and use a single fixed key (sampled at random) for the whole dataset, and a fresh random plaintext for each trace.
Dataset versions
v1
is the dataset used for the CHES 2023 SMAesH challenge. Thefk1
part of that dataset is kept private.v2
contains the same data as thev1
dataset, under a file structure more suitable for archival. All the parts of this dataset are public, includingfk1
.
Dataset parameters
The dataset contains the power measurements collected with the evalutation setups described in Targets together with input data used at each execution of the core execution. In particular, the following fields can be found:
Label | Type | Length | Description |
---|---|---|---|
traces | int16 | ns | power traces measurement of ns time samples. |
umsk_plaintext | uint8 | 16 | unshared plaintext. |
umsk_key | uint8 | 16 | unshared key. |
msk_plaintext | uint8 | 16 \( \cdot \) d | plaintext sharing with d shares. |
msk_key | uint8 | 16 \( \cdot \) d | key sharing with d shares. |
seed | uint8 | 10 | PRNG seed of 80 bits. |
The datasets have different levels of granularity in terms of data they contain. The following table summarizes which fields are provided with each datasets for all targets:
Training / Validation (A7_d2) | Training / Validation (S6_d2) | Test | |
---|---|---|---|
traces | ☑ | ☑ | ☑ |
umsk_plaintext | ☑ | ☑ | ☑ |
umsk_key | ☑ | ☑ | |
msk_plaintext | ☑ | ||
msk_key | ☑ | ||
seed | ☑ |
Finally, the next table summarizes the global size of each dataset (in term of amount of traces).
Dataset name | Target | Role | Number of traces | Length of traces |
---|---|---|---|---|
SMAesH-A7_d2-vk0 | Artix-7 (\( d = 2 \)) | profiling | \( 2 ^ {24} \) | \( 4250 \) |
SMAesH-A7_d2-fk0 | Artix-7 (\( d = 2 \)) | validation | \( 2 ^ {24} \) | \( 4250 \) |
SMAesH-A7_d2-fk1 | Artix-7 (\( d = 2 \)) | test | \( 2 ^ {24} \) | \( 4250 \) |
SMAesH-S6_d2-vk0 | Spartan-6 (\( d = 2 \)) | profiling | \( 2 ^ {24} \) | \( 4400 \) |
SMAesH-S6_d2-fk0 | Spartan-6 (\( d = 2 \)) | validation | \( 2 ^ {24} \) | \( 4400 \) |
SMAesH-S6_d2-fk1 | Spartan-6 (\( d = 2 \)) | test | \( 2 ^ {24} \) | \( 4400 \) |
Files organization and dataset reading
The dataset for the SMAesH challenge is
composed of several datasets, which are grouped by target and by security order
(denoted as a target isntance). For each target instance, we provide the training
and the validation dataset (respectively vk0
and fk0
), and keep private the test dataset (fk1
).
Each dataset is
described by a manifest file (denoted manifest.json
) that describes the
content of the dataset (including a file list and a way to check integrity) and
is composed of
several sub-directories (one per field stored in the dataset which is
containing the fields data).
The data files use the NPY format.
The dataset are expected to be read with the tool
provided in dataset.py
specifically implemented for this purpose. It
provides top level functions that allows to load the data contained in a
dataset per blocks of arbitrary size (see the definition of iter_ntraces
in
dataset.py
and its usage in demo_submission/attack.py
for more details).
The architecture convention described above will be followed when the SMAesH dataset will be extended with new target cases. The dataset organisation for the different targets are depicted by the following trees
smaesh-dataset/
+-- smaesh-dataset-A7_d2-vk0/
| + manifest.json
| +-- traces/
| +-- umsk_plaintext/
| +-- umsk_key/
| +-- msk_plaintext/
| +-- msk_key/
| +-- seed/
+-- smaesh-dataset-A7_d2-fk0/
| [...]
+-- smaesh-dataset-A7_d2-fk1/
| [...]
+-- smaesh-dataset-S6_d2-vk0/
| [...]
+-- smaesh-dataset-S6_d2-fk0/
| [...]
+-- smaesh-dataset-S6_d2-fk1/
| [...]
Hashes of datasets
SHA256 hashes of manifest.json
files (dataset v1):
e067944fa0c630301c29f09cb53747bafd148af29032c87ff3a20a07e6401bc6 A7_d2-vk0/manifest.json
91db2ed958c7c00e07eaec07cec8818e30c0dfd79cfcb8bac856db41f5b485b9 A7_d2-fk0/manifest.json
08690d4152c2c6b979bd20cad489b5c99dafac7ad970fb60bcf91d67ea44be12 A7_d2-fk1/manifest.json
6af82b2c13eec7de974f3ec25756c470910c4aeca612988bad7d5bcb39a74f7a S6_d2-vk0/manifest.json
fd0469d839336f0f7fe644c97949c1dfee9eb145011213b3ef29b4e334c5753b S6_d2-fk0/manifest.json
90f2b82fc3ec788523e90ef9682864dd3682179d7b5f19f8439a583cc87eb5fe S6_d2-fk1/manifest.json
SHA256 hashes of manifest.json
files (dataset v2):
6045582ea4de5545682579d08acc57b5c0f1ea4e73e898f5ca0128af643305a1 smaesh-dataset-A7_d2-fk0/manifest.json
52823b9d7ee325a7e1f257c3b23b3f9fb9a911f517c1169b3118ee81f5740855 smaesh-dataset-A7_d2-fk1/manifest.json
f7aef1456ce193ed2823dc0ba7c5dbe6b0c84cf6868ac8bdffcb60cea0e519cf smaesh-dataset-A7_d2-vk0/manifest.json
c0d6ead05f9d5cad80bde5360b2c89d7686164afd0c775adac887095c080b307 smaesh-dataset-S6_d2-fk0/manifest.json
b90de1d3c9e040303ffaf44f44c713947da2fdeae53aca768f3397c5ef295990 smaesh-dataset-S6_d2-fk1/manifest.json
36ad6916dd5b4bd6c09c1152123d26c196721c82c68135dba5979b30384f8199 smaesh-dataset-S6_d2-vk0/manifest.json
Dataset download
The v2 dataset is available here. The v1 dataset is not publicly available anymore.
SHA256 hashes of the compressed files:
ea1d2f58939708c617f02040350c5d125ad78808c49e8dbb7f0790cd2a3d1c77 smaesh-dataset-A7_d2-fk0.tar.zstd
7ed17c5e08fb76d59e304ee1320f2acc10783da27b9d4ed8b3295f22944055a4 smaesh-dataset-A7_d2-fk1.tar.zstd
f8edcc26fbed4c6f96ccb7fbd34b7f12a2b8b4abb515c1eb5c949c88526ab9b5 smaesh-dataset-A7_d2-vk0.tar.zstd
1ad75f3b2f0a037711ea49ef0ad61a6d20d04e0c6b9f0a1a4697288420e314cf smaesh-dataset-S6_d2-fk0.tar.zstd
a2f22abc9beffbae87e2970eccd8f345cf755c1eb2a8af9d37120f85d58d8a3d smaesh-dataset-S6_d2-fk1.tar.zstd
deae6c33f6d5af04f91043eafbe21617eec895121f99c480e471ce3363c9afb6 smaesh-dataset-S6_d2-vk0.tar.zstd
The files are compressed with the zstd tool
(typically available on linux distributions as the zstd
package), and
archived in the tar
format.
Example decompression command:
zstdcat -T0 smaesh-dataset-A7_d2-vk0.tar.zstd | tar xv
Framework
The Getting Started page explains in a minimal way what are the commands to use in order to run the demonstration attack provided. By following these, you can ensure that all the dependencies are properly installed before starting implementing your own.
In this section, we give more details on the challenge's python-based framework to develop and evaluate attacks, available on github. The following sections try to guide a candidate through the implementation of a new submission. However, not all the details are covered , and we invite the reader to refer to the documentation found directly in the code for further explanation.
Next, we assume that the attack is written in python, but the framework may also work with other languages (see Beyond Python).
Contents
The framework is split in two parts:
- The code running the attack, given as a demonstration attack (
demo_submission
directory). This includes:- Loading the datasets.
- Simulating internal states of the circuit.
- Running a optional profiling step.
- Executing the attack in itself.
- Running a simplified evaluation for testing/developing purpose.
- The scripts (
scripts
directory) for- building a submission package,
- checking that is it well-structured,
- evaluating it.
Dependencies
The main dependencies for the framework are given in Getting Started.
Additionally, the fully reproducible submission evaluation depends on Apptainer (optional, see Submission).
Usage
Running attacks
There are multiple ways to an attack in the framework, that vary in their ease of use, performance overheads and portability.
Quick eval Using the quick_eval.py
python script in the submission is the
easiest. It also has minimal overhead, and it is therefore ideal for
developping an attack.
See Getting Started for usage instructions.
Test submission Since quick_eval.py
is tightly integrated to a
submission, it is easy to use and modify, but this tight integration is not
always wanted: when evaluating a submission that is not our own, we would like
a more standard interface.
This is what the scripts/test_submission.py
script provides. It has multiple
modes (attack in directory or zip file, native or containerized exeuction).
This shoud be mostly useful to validate your submission package.
See Submission for usage instructions.
Implementing attacks
The implementation of an attack lies inside two python functions.
Do you need more flexibility? You can change anything in the demo_submission
directory, as long as the command-line interface remains the same.
See also Beyond Python.
Submitting attacks
Once your attack works with quick_eval.py
, see Submission for a
step-by-step list on how to send it to the evaluation server.
Profiling
Within the framework
This (optional) phase allows to create profiles of the leakage that can be used
afterwards in the attack phase.
This profiling step can be implemented in the
profile()
method of the Attack
class in
attack.py.
It is defined as follows
def profile(self, profile_datasets: List[DatasetReader]):
where DatasetReader
is defined in
dataset.py.
The function does not return anything, but must set the value of the instance
variable self.profile_model
(which can be any pickle
-able data).
The computation of the values manipulated by internal wires of the target may be required during the profiling phase. While you can implement your simulation procedure based on the SMAesH core architecture, we provide scripts to build a simulation library with Verime from the verilog code of the target (see Target simulation). On the provided example attack, the profiling phase consists in creating gaussian templates (together with a reduction of dimensionality) for every shares of each bytes after the first SubByte layer. For that, we directly use the SCALib LDAClassifier and rely on the SNR to select the POIs from the traces (see attack.py for more details).
To avoid re-computations, profiles are typically save to files using the
instance functions save_profile()
and load_profile()
(this is managed by
quick_eval.py).
When you submit a submission to the evaluation server, this profiling phase will be run. There is a timeout of 4h for this run. If your profiling duration exceeeds that limit, you can embed your profiles in the submission (see below).
Outside the framework
You can also develop your own profiling methdology and save it results to a file that you include in you submission package. E.g., this approach should be used if your profiling is computationally intensive, to the point of exceeding the limits set in the rules.
When such a profile file is embedded into a submission package, the method to follow for regenerating this file must be documented in the submission package (see Submission).
Note that if you submission package exceeds 4 GB, it will not be accepted by the evaluation server. If this limit cannot be adhered to by your attack, we'd still like to be able to accept it in the challenge. Please contact the organizers, we may (at our discretion) arrange a way to bypass the 4 GB limit.
Target simulation
In order to analyse the circuit, it is necessary to know the internal values it handles. To this end, our strategy is to simulate the behaviour of the circuit and to recover the values that interest us. This solution avoids the need to write specific code for each targeted signal (which is time consuming and can lead to errors). The verime tool has been specifically developped for this purpose. In the following sections, we explain how the latter is used for the demo submission provided.
Identification of useful signals
For the demo attack, we consider that an adversary wants to perform a template attack against the SMAesH AES implementation. To this end, he seeks to model the power consumption of the implementation as a function of the share values manipulated after the first SubBytes layer (i.e., the bytes of the state exiting the Sboxes layer of the first round).
As explained in details in the SMAesh
documentation, these values are
manipulated by the core at the end of a round execution. More particularly,
the wires bytes_from_SB
coming from the sboxes instances hold the target values when
cnt_cycles
equals to 7, 8, 9 and 10 (Figure 16 in the core's documentation). The adversary has
thus to recover the values passing on these wires at these specific clock
cycles in order to be able to build his templates.
Verilog annotation for Verime
The first step to do is to annotate the HDL of the architecture with the
verilator_me
attribute in order to drive the operations performed by Verime.
This annotation is necessary in order to designate the signals from which we
wish to obtain the value.
Targeting the SMAesH architecture, this can be achieved by adding the
verime
attribute on the bytes_from_SB
bus in the the
source file
MSKaes_32bits_core.v
(as shown next)
...
(* verime = "B_fromSB" *)
wire [8*d-1:0] bytes_from_SB [3:0];
...
The value of the wire bytes_from_SB
will then be accessible through
the label B_fromSB
. Multiple internal values can be annotated with the
verilator_me
attribute, but the labels used for each signals have to be
different. In addition to wires, ports, registers and/or array of wire and registers
can be annotated as well (please refer to the Verime documentation for more
details).
Implementation of the C++ simulation wrapper
The next step is to implement the top-level interface of the simulated HW
module. The goal of the later is to define how the HW module is used during a single execution.
In particular, the user has to implement the function run_simu
with the following definition
void run_simu(
SimModel *sm,
Prober *p,
char* data,
size_t data_size
)
where the structures SimModel
and Prober
are specific to Verime (accessible
by using the statement #include "verime_lib.h"
), data
is the input data for
a single execution (encoded as an array of bytes) and data_size
the amount of
bytes provided. As explained in details in the Verime documentation, the
Verilated instance of the HW module can be accessed under the variable
sm->vtop
, which allows the access/set the value of any signal at the
top-level. In addition to the features enabled by Verilator, Verime implements
the two following additional functions under verime_lib.h
sim_clock_cycle(SimModel * sm)
: simulates a posedge clock cycle.save_state(Prober * p)
: saves the values of the probed signals (i.e., the one that are annoted withverilator_me
).
The file
simu_aeshpc_32bit.cpp
implements a simple wrapper that stores the values of the probed signals at
every clock cycle once an execution started. Next, we detail each part of the
file. First, the verime library is included and the value of the generic d
that is considered is fetch
#include "verime_lib.h"
#ifndef D
#define D GENERIC_D
#endif
...
It has to be noted that the value of every generic that will be used during the
Verime process can be accessed in the C++ wrapper by refering to the macro
GENERIC_$(capital_generic_name)
. Then, we the function run_simu
is implemented.
We start the later by applying a reset of the core as follows
...
// Reset the AES core
sm->vtop->rst = 1;
sim_clock_cycle(sm);
sm->vtop->rst = 0;
sim_clock_cycle(sm);
...
These four lines simply sets the core's reset signal during a single clock cycle and then clears it during following clock cycle. Then, the reseed procedure of the core is executed by performing an input transaction at its randomness interface. In practice the following lines are used
...
// Feed the seed
memcpy(&sm->vtop->in_seed,data,SEED_PRNG_BYTES);
sm->vtop->in_seed_valid = 1;
sm->vtop->eval();
while(sm->vtop->in_seed_ready!=1){
sim_clock_cycle(sm);
}
sim_clock_cycle(sm);
sm->vtop->in_seed_valid = 0;
...
and the later naively implements the transaction. More into the details, the
seed is copied from the data buffer to the dedicated randomness bus. Then, the
control signal in_seed_valid
is asserted and several clock cycles are
simulated until the signal in_seed_ready
is also asserted.
An additional clock cycle is simulated ath the end of the while loop to complete the transaction.
Finally, in_seed_valid
is deasserted. The call to eval()
is
used to recompute the internal values resulting from combinatorial logic.
The next step consists in starting the execution using the provided plaintexts and key, which is achieved by the following piece of code
...
// Prepare the run with input data
// Assign the plaintext sharing
memcpy(&sm->vtop->in_shares_plaintext,data+SEED_PRNG_BYTES,16*D);
// Assign the key sharing
memcpy(&sm->vtop->in_shares_key,data+SEED_PRNG_BYTES+16*D,16*D);
// Start the run
sm->vtop->in_valid = 1;
sm->vtop->eval();
while(sm->vtop->in_ready!=1){
sim_clock_cycle(sm);
}
sim_clock_cycle(sm);
sm->vtop->in_valid = 0;
sm->vtop->eval();
...
First, the plaintext and the key sharing are copied from the buffer to the
input busses. Then, a transaction on the input interface is implemented to feed
the core with fresh inputs. Finally, we wait until the completion of the
execution by simulating a clock cycle at each loop iteration until the signal
out_valid
is asserted. While waiting, the probed signals are saved at every
clock cycle by calling save_state(p)
as shown here
...
// Run until the end of the computation
while(sm->vtop->out_valid!=1){
save_state(p);
sim_clock_cycle(sm);
}
save_state(p);
...
Building of the python3 simulation package
The simulation package can be built providing an annotated Verilog code and the corresponding simulation wrapper. The building process is done in two simple steps:
- Generating the package files using Verime.
- Building the python package using the Makefile generated by Verime.
The Makefile combines both steps in the target verime
and it suffices
to use the later to create the python wheel. Basically, the first step consists in using
Verime with the appropriate arguments in order to setup the package. The tool
will analyze the hardware architecture, identify the annoted signals and create
C++ files in order to probe these signals together with Verilator. Besides, it will generate all the python
environment used in the wheel building process. As shown by its helper, Verime
accepts the following parameters:
-h, --help show this help message and exit
-y YDIR [YDIR ...], --ydir YDIR [YDIR ...]
Directory for the module search. (default: [])
-g GENERICS [GENERICS ...], --generics GENERICS [GENERICS ...]
Verilog generic value, as -g<Id>=<Value>. (default: None)
-t TOP, --top TOP Path to the top module file, e.g. /home/user/top.v. (default: None)
--yosys-exec YOSYS_EXEC
Yosys executable. (default: yosys)
--pack PACK The Verilator-me package name. (default: None)
--simu SIMU Path to the C++ file defining run_simu (default: None)
--build-dir BUILD_DIR
The build directory. (default: .)
--clock CLOCK The clock signal to use. (default: clk)
In practice, the Makefile calls Verime with the following arguments under the target verime
:
--ydir ./aes_enc128_32bits_hpc2 ./aes_enc128_32bits_hpc2/masked_gadgets ./aes_enc128_32bits_hpc2/rnd_gen ./aes_enc128_32bits_hpc2/sbox
: used to point to the directories in which the SMAesH source files are located.-g d=2
: set the value of the genericd
at the top-level of SMAesH--top ./aes_enc128_32bits_hpc2/aes_enc128_32bits_hpc2.v
: specifies the top module path.--pack aeshpc_32bit_d2_lib
: defines the package name.--build-dir aeshpc_new_32bit_d2_lib
: used to indicates the directory used for the building process (in practice, a directory with the package name in the current directory).--simu simu_aeshpc_32bit.cpp
: indicates the path to the simu_aeshpc_32bit.cpp file.
After the Verime execution, the directory defined with --build-dir
contains an
automatically generated Makefile. The latter first uses Verilator in order to
build a shared library. The later will then be used as an efficient backend
simulator. Finally, the python package is generated and the wheel
aeshpc_32bit_d2_lib/aeshpc_32bit_d2_lib-*.whl
is created.
The following section explain how the provided example integrates the later.
Basic usage of the simulation package.
Once installed, the generated simulation package can be used to easily probe the annotated signal. It is considered next that the wheel generated in the previous step has been installed in the python environment. The following piece of code shows how to use the generated package
import aeshpc_32bit_d2_lib as pred
import numpy as np
### Generate random input data byte.
# Amount of cases to simulate
n_cases = 100
# Amount of input byte for a single case
len_data = 10 + pred.GENERICS['d']*16 + pred.GENERICS['d']*16
# Random input byte
data_bytes = np.random.randint([n_cases, len_data],dtype=np.uint8)
### Simulate the cases
# Amount of probed state to allocate
# (>= number of calls to save_state() in the C++ wrapper)
am_probed_state = 110
simulations_results = pred.Simul(
cases,
am_probed_state
)
### Recover the data for a specific cycle
### Note that `bytes_from_SB` being a 2D wires, the index `__i` is added
### to the verime signal name. Please check the value of
### pred.SIGNALS to get the names of all verime labels.
# Value of the state recover for all simulated cases
sh_byte0_fromSB_clk7 = simulations_results["B_fromSB__0"][:,7,:]
sh_byte1_fromSB_clk8 = simulations_results["B_fromSB__1"][:,8,:]
sh_byte2_fromSB_clk9 = simulations_results["B_fromSB__2"][:,9,:]
sh_byte3_fromSB_clk10 = simulations_results["B_fromSB__3"][:,10,:]
The first lines are generating the numpy 2D-array data_bytes
with random
bytes. Each row of this array contains the input bytes of a single simulation
case. In practice, each of these rows corresponds to an array char * data
that will be used by the function run_simu()
in the simulation wrapper. In
this example, 100 independant random cases are generated, and each row contains
the bytes representing the \( 80 \)-bits seed, the \( 128 d \)-bits
plaintext and key of a single case. Note that the practical amount of shares \( d
\) is fetch from the value that has been passed to Verime during the building
process by accessing to the GENERICS
metadata of the package.
Next, we use the package to simulate all the input cases. To this end, the package function Simul()
takes two input parameters:
the cases input data (as a numpy array of bytes with a shape of (n_cases, len_data)) and the amount of probed states to allocate.
More into the detail, the backend will allocate memory in order to store a given amount of times each annotated signal per case simulation.
Each time the function save_state()
is called, the value of the annoted signals are stored to the buffer. In our present example, the saving
is done at every clock cycle, and a total of 106 saves is done for a single execution.
The results of the simulation for each cases are stored in the variable
simulations_results
. In particular, the simulated values for a given signal
can be accessed directly using verime label corresponding to the signal.
The simulation results are organised as a 3D bytes arrays
of dimension (n_cases
, am_probed_state
, bytes_len_sig
), with
n_cases
: the amount of simulated cases.am_probed_state
: used for the memory allocation of the simulation. Correspond to the maximum amount of timesave_state()
can be called in the simulation wrapper. In particular, using the index i at the second dimension allows to recover the value of the i-th call tosave_state()
perfomed in the simulation wrapper.bytes_len_sig
: the amount of bytes required to encode the simulated signal.
It results that the variables sh_byte0_fromSB_clk7
, sh_byte1_fromSB_clk8
,
sh_byte2_fromSB_clk9
and sh_byte3_fromSB_clk10
hold 4 out of the 16 targeted values
(i.e., the values of the wires bytes_from_SB[0]
,bytes_from_SB[1]
,bytes_from_SB[2]
,bytes_from_SB[3]
respectively for the clock indexes 7, 8, 9 and 10)
when the input vectors stored in data_bytes
are used at the input of the core.
Integration in the example submission package
To ease the readibility of the model/attack scripts provided, the file
tap_config.py
defines the TapSignal
class. The latter allows to define a specific signal of
interest and provides useful features to deal with the simulated values. In
particular, each instance implements the simulation of the configured signal.
Besides, when the target signal configured holds a shared data, the user can
select to recover a specific sharing or the unmasked value hold by the wire.
The following parameters must be provided to each TapSignal instance
Instance parameter | Type | Description |
---|---|---|
sig_name | str | Verime label of the annotated signal |
cycle | int | Clock index of interest (considering that the values of the annotated signals are saved at each clock cycle). |
share_index | obj | Share index to tap. The user has the choice between
|
tap_bits | list of integers or range | The bits indexes of interest. The behaviour depends on the value of share_index
|
am_shares | int | Amount of shares used the encode the shared value. |
In the demo submission, a TapSignal
instance is generated for each shares of each bytes
of the state after the first SubByte layer (as done per the
function generate_TC_from_SB()
in tap_config.py). The tap signal are
then used in the profilling phase in order to recover the traces label when
building the templates. As a final remark, the TapConfig
us just a wrapper
designed in order to ease the management of multiple TapSignal
instances.
Attack
The attack phase is the only mandatory step that should be implemented in a submission. It takes as input a set of traces and computes a subkey scores. It can also use Profiling data.
To this end, the method attack
must be implemented in
attack.py:
def attack(self, attack_dataset: DatasetReader) -> KeyGuess
where DatasetReader
is defined in
dataset.py
and KeyGuess
is defined in
key_guess.py).
When a profiling phase is used (as done in the demo submission),
the computed/loaded profile is stored in the variable self.profiled_model
(the assignement of this variable is handled by quick_eval.py
).
The dataset reader points to a fixed key dataset that contains only the power traces and the unmasked plaintext (see Datasets).
The KeyGuess
consists in a split of the 128-bit key in subkeys (of at most 16
bits, which can be non-contiguous), and, for each subkey, a list that gives a
score for each of its possible values.
The score of a key value is the sum of the scores of its subkeys, and it determines the rank.
See also the documentation of KeyGuess
.
In the demo submission, we use the SASCA implementation from SCALib to recover information about the 16 bytes of the key. While it works (as shown in Getting Started), the attack is not optimised and does not achieve good performance, but it is a starting point for the development of better attacks within our evaluation framework (see attack.py).
Submission
In this section, we explain how to prepare, package and test your submission.
At this point, we assume that your attack works with quick_eval.py
(see Framework).
Submission directory
Put your submission in a directory that satisfies the following (the
demo_submission
directory is a good starting point).
- It must contain the file
submission.toml
. See demo_submission/submission.toml for example and instructions. - If your attacks depend on python packages, put your dependencies in
setup/requirements.txt
(you can generate it withpip freeze
). - It must contain the file
setup/setup.sh
with setup container instructions. If you only depend on python packages, keep the one ofdemo_submission/
, otherwise, add your custom build/install steps here (see Beyond Python for details). - Ensure that the ressources required by your submission are generated (e.g.,
profile file, etc.). The demo submission is using a library (python wheel)
built by verime. It should thus be generated (in
demo_submission/setup/
and must be listed in the filedemo_submission/setup/requirement.txt
for the evaluation to work. To this end, the command
generates the library wheel, copies it into the directory# Run in venv-demo-eval environment. make -C demo_submission/values-simulations
demo_submission/setup
and updates the filedemo_submission/setup/requirements.txt
accordingly. - If your submission include non-source files (e.g., binary libraries or profiled models), it must contain a succint README explaining how to re-generate those from source. It may also explain how your attack works.
First test (in-place, native)
Test your submission with the test_submission.py
script.
# To run in the SMAesH-challenge directory, assuming the submission directory is still demo_submission.
# To run after activating the venv-scripts virtual environment (see "Getting Started").
python3 scripts/test_submission.py --package ./demo_submission --package-inplace --workdir workdir-eval-inplace --dataset-dir $SMAESH_DATASET
If this does not work, it is time to debug your submission. To accelerate the
debugging process, see the various command-line options of
test_submission.py
. In particular, --only
allows you to run only some steps
(e.g. --only attack
).
Building and validating the submission package
The
scripts/build_submission.py
scripts
generates a valid submission .zip
file based on the submission directory. You can use the following command to generate
the package archive for the demo submission.
# (To run in the venv-scripts environment.)
python3 scripts/build_submission.py --submission-dir ./demo_submission --package-file mysubmission.zip --large-files "setup/*.whl"
If you use "Outside the framework" profiling, you will likely
have to add multiple parameters to --large-files
, e.g., --large-files "setup/*.whl" "profiled_model.pkl"
.
We try to keep submissions small (as it makes it easier to download them
afterwards) by not including non-required large files.
Then, you can validate basic elements of its content with
python3 scripts/validate_submission.py mysubmission.zip
Final tests
Let us now test the content of mysubmission.zip
.
python3 scripts/test_submission.py --package mysubmission.zip --workdir workdir-eval-inplace --dataset-dir $SMAESH_DATASET
If this succeeds, we can move to the final test. To ensure a reproducible
environment, submissions will be evaluated within a (docker-like) container
runtime.
The following test ensures that everything is functioning correctly inside the
container (and in particular that your submission has no un-listed native
dependencies -- the container is (before setup.sh
runs) a fresh Ubuntu 23.04).
It will also validate resource constraints (you may want to relax timeouts if you
use a slower machine than the evaluation server).
- Install the Apptainer container runtime.
- Use
test_submission.py
in--apptainer
mode:
python3 scripts/test_submission.py --package mysubmission.zip --workdir workdir-eval-inplace --dataset-dir $SMAESH_DATASET --apptainer
If this works, congrats! Your submission is fully functional and your results will be easily reproduced! It only remains to test it on the test dataset.
If it does not work, for debugging, note that the apptainer
mode prints the
commands it runs, so you can see what happens.
You may want to:
- Use the
--package-inplace
mode oftest_submission.py
to avoid rebuilding the zip at every attempt. - Run only some steps of the submission with the
--only
option. - Clean up buggy state by deleting the
workdir
. - Run commands inside the container using
apptainer shell
.
Test and submit
Run
python3 scripts/test_submission.py --package mysubmission.zip --workdir workdir-eval-inplace --dataset-dir $SMAESH_DATASET --apptainer --attack-dataset-name fk1
to run the evaluation against the test dataset.
Then, send the evaluation result, along with the submission zip file, to the organizers.
Remark: The resource limit rule is lifted for the post-CHES part of the challenge. However, please let us know if your submission requires significantly more computational resources that this limit.
Beyond Python
The challenge framework has been developed to facilitate the development of python-based submissions. It is however possible to develop submissions using other languages.
We suggest two main solutions for this.
- Python C extensions. If want to use native code that can interface with C, you can probably turn it into a python module using CPython's C API.
- Subprocess calls. It might be easier to make your actual implementation as a
standalone script or binary that can be called as a subprocess from
quick_eval.py
.
Otherwise, you can use any other technique that works! What matters is that the final apptainer-based test of Submission succeeds.
Be sure to include all required installation steps in setup/setup.sh
.
For native code, you can either:
- Build it in
setup/setup.sh
: this is the most portable option, but requires to install the full compilation toolchain in the container. - Include the binary in the submission package. This might be easier, but be careful about native dependencies (or instruction sets -- the evaluation server has a AVX2-generation x86 CPU). We use this solution for the simulation library in the demo submission, as installing verilator would make the container setup very annoying.
Challenge rules
The rules of the contest should be interpreted together with all the documentation material of the contest.
Contest Goal
- The goal of the challenge is to design attacks that break the targets a minimum of traces for the online phase of the attack.
- "Breaking a target" means extracting a key ranking such that the correct key is within enumeration power, fixed to \(2^{68}\).
- You can play individually or in teams.
- Multiple targets will be introduced over time.
Submissions
- The participants will submit implementations of their attacks and the attacks will be evaluated by the organizers.
- The format and execution interfaces for the submissions is explained in the documentation available on the challenge website.
- Attacks can be submitted at any time on the challenge website as a "submission package" file.
- A submission can contain attacks for multiple targets. Each attack will be processed independently.
- The whole challenge (attacks, evaluations, point and prizes) is run indepndently for each target.
- Each attack comes with a claim, which is the number of online traces needed for the attack.
- Attacks are made public (attack name, team name, submission date and claim) as soon as they are submitted. Submission packages are also public, after a 10 days embargo.
- Sample submissions can be sent to the organizers for testing correct execution of scripts on the evaluation machine.
Attack evaluation
Each submitted attack will be run for the corresponding private test dataset restricted to the number of online traces claimed by the attack, using the challenge evaluation framework. The attack is successful if the upper-bound on the estimated rank is below \(2^{68}\). The state of an attack (successful or not) is made public as soon as the evaluation is done.
Evaluation limits
- The evaluation will be run on a computer with a Threadripper 3990X processor, a Nvidia A6000 GPU and 128GB of available RAM.
- The execution time of an attack (excluding profiling) will be limited to 4h.
Grading system
TL;DR: You gain 1 point every hour your attack remains the best one.
Attack acceptance
Attacks will be evaluated at unspecified and irregular intervals (normally multiple times a week), in the order in which they have been submitted.
If a team submits a new attack less than 3 days after submitting its previous attack, and that that previous attack has not been evaluated yet, then it will not be evaluated.
When the time comes to evaluate an attack, if its claim is more than 90% of the best successful attack evaluated so far, then it is not evaluated (i.e., a 10% improvement is required).
Non-generalizable attacks are not accepted and will not be accepted. An attack is generalizable if, in addition to being successful, it has a high chance of being successsful against other test datasets acquired in identical conditions. In particular, an attack that contains hard-coded information on the test dataset key is not generlizable.
Points
Points are countinuously awarded for the best successful attack, at the rate of 1 point per hour.
The dates taken into consideration are the date/time of submission of the attack (not the time of evaluation). The accumulation of points stops when the submission server closes at the end of the challenge.
Prize
For each target:
- a prize of 1000 € is awarded to the team with the most points,
- a prize of 500 € is awarded to the team with the best attack at the end of the challenge.
The awarded teams will be asked to send a short description of their attacks. Teams cannot win more than one award.
Final remarks
- Any time interval of 24 hours is a day.
- You are allowed to use any means you consider necessary to solve the challenges, except attacking the infrastructure hosting the challenges or extracting unauthorized data from the evaluation environment.
- The organisers reserve the right to change in any way the contest rules or to reject any submission, including with retroactive effect.
- Submissions may be anonymous, but only winners accepting de-anonymization will get a prize. For this reason, only submissions with a valid correspondence email address will be eligible for prizes.
- Submissions containing non-generalizable attacks are not accepted and will not be accepted. An attack is considered "generalizable" if, in addition to being successful, there is a high probability that it will also be successful against other evaluation datasets acquired under similar conditions.
FAQ
How do I choose the claim of my attack?
You can test your attack on the fk0
dataset. If you did not train on it, then
the number of traces needed to attack fk0
is a priori not different that
the one needed to attack fk1
.
That being said, for a given attack algorithm, some keys might be easier to
break than some others, and maybe fk1
is easier or harder than fk0
for your
attack.
Please do not overfit your attack: develop it while evaluating only on fk0
,
and when it works there, test it on fk1
.
You may make multiple attempts at fk1
while chaning the number of attack
traces, but your final number of traces should work for both fk0
and fk1
.
I have another question.
Leaderboard
This pages lists the attacks against the SMAesh dataset.
CHES2023 challenge
These attacks were submitted for the CHES2023 challenge and follow the challenge rules.
Target | Authors | Attack | Traces |
---|---|---|---|
A7_d2 | Gaëtan Cassiers, Charles Momin | Demo | 16777215 |
A7_d2 | Thomas Marquet | Morningstar-1 | 6500000 |
A7_d2 | Thomas Marquet | Morningstar-1.3 | 5000000 |
A7_d2 | Valence Cristiani | Angelo | 500000 |
A7_d2 | Valence Cristiani | Raphaellio | 390000 |
A7_d2 | Valence Cristiani | Donatella | 290000 |
S6_d2 | Valence Cristiani | Leonardddo | 10000000 |
S6_d2 | Valence Cristiani | Leonardda | 5000000 |
S6_d2 | Thomas Marquet | Morningstar-2.2-try-again | 2150400 |
S6_d2 | Thomas Marquet | Morningstar-2.5 | 1638400 |
S6_d2 | Thomas Marquet | Morningstar-2.5.2 | 1228800 |
S6_d2 | Thomas Marquet | Morningstar-xxx | 901120 |
These attacks can be downloaded here.
Post-CHES challenge
After the CHES challenge, the test datasets have been released (as well as the profiling randomness for the S6_d2
dataset).
We invite everybody who works with the dataset to report their attacks to the challenge organizers (paper and/or code link). We aim to maintain here a list of all public attacks on the dataset. Ideally, attack code should work within the evaluation framework, in order to ease reproduction.
Following challenge rules
To qualify, an attack should have been trained only on the training and validation datasets, and evaluated on a test dataset (taking the first \(x\) traces of that dataset).
Target | Authors | Attack | Traces | Use prof. RNG seed |
---|
Other attacks
We list here the attacks that we are aware of, but that do not follow the challenge rules.
- (None at the moment.)