Target simulation
In order to analyse the circuit, it is necessary to know the internal values it handles. To this end, our strategy is to simulate the behaviour of the circuit and to recover the values that interest us. This solution avoids the need to write specific code for each targeted signal (which is time consuming and can lead to errors). The verime tool has been specifically developped for this purpose. In the following sections, we explain how the latter is used for the demo submission provided.
Identification of useful signals
For the demo attack, we consider that an adversary wants to perform a template attack against the SMAesH AES implementation. To this end, he seeks to model the power consumption of the implementation as a function of the share values manipulated after the first SubBytes layer (i.e., the bytes of the state exiting the Sboxes layer of the first round).
As explained in details in the SMAesh
documentation, these values are
manipulated by the core at the end of a round execution. More particularly,
the wires bytes_from_SB
coming from the sboxes instances hold the target values when
cnt_cycles
equals to 7, 8, 9 and 10 (Figure 16 in the core's documentation). The adversary has
thus to recover the values passing on these wires at these specific clock
cycles in order to be able to build his templates.
Verilog annotation for Verime
The first step to do is to annotate the HDL of the architecture with the
verilator_me
attribute in order to drive the operations performed by Verime.
This annotation is necessary in order to designate the signals from which we
wish to obtain the value.
Targeting the SMAesH architecture, this can be achieved by adding the
verime
attribute on the bytes_from_SB
bus in the the
source file
MSKaes_32bits_core.v
(as shown next)
...
(* verime = "B_fromSB" *)
wire [8*d-1:0] bytes_from_SB [3:0];
...
The value of the wire bytes_from_SB
will then be accessible through
the label B_fromSB
. Multiple internal values can be annotated with the
verilator_me
attribute, but the labels used for each signals have to be
different. In addition to wires, ports, registers and/or array of wire and registers
can be annotated as well (please refer to the Verime documentation for more
details).
Implementation of the C++ simulation wrapper
The next step is to implement the top-level interface of the simulated HW
module. The goal of the later is to define how the HW module is used during a single execution.
In particular, the user has to implement the function run_simu
with the following definition
void run_simu(
SimModel *sm,
Prober *p,
char* data,
size_t data_size
)
where the structures SimModel
and Prober
are specific to Verime (accessible
by using the statement #include "verime_lib.h"
), data
is the input data for
a single execution (encoded as an array of bytes) and data_size
the amount of
bytes provided. As explained in details in the Verime documentation, the
Verilated instance of the HW module can be accessed under the variable
sm->vtop
, which allows the access/set the value of any signal at the
top-level. In addition to the features enabled by Verilator, Verime implements
the two following additional functions under verime_lib.h
sim_clock_cycle(SimModel * sm)
: simulates a posedge clock cycle.save_state(Prober * p)
: saves the values of the probed signals (i.e., the one that are annoted withverilator_me
).
The file
simu_aeshpc_32bit.cpp
implements a simple wrapper that stores the values of the probed signals at
every clock cycle once an execution started. Next, we detail each part of the
file. First, the verime library is included and the value of the generic d
that is considered is fetch
#include "verime_lib.h"
#ifndef D
#define D GENERIC_D
#endif
...
It has to be noted that the value of every generic that will be used during the
Verime process can be accessed in the C++ wrapper by refering to the macro
GENERIC_$(capital_generic_name)
. Then, we the function run_simu
is implemented.
We start the later by applying a reset of the core as follows
...
// Reset the AES core
sm->vtop->rst = 1;
sim_clock_cycle(sm);
sm->vtop->rst = 0;
sim_clock_cycle(sm);
...
These four lines simply sets the core's reset signal during a single clock cycle and then clears it during following clock cycle. Then, the reseed procedure of the core is executed by performing an input transaction at its randomness interface. In practice the following lines are used
...
// Feed the seed
memcpy(&sm->vtop->in_seed,data,SEED_PRNG_BYTES);
sm->vtop->in_seed_valid = 1;
sm->vtop->eval();
while(sm->vtop->in_seed_ready!=1){
sim_clock_cycle(sm);
}
sim_clock_cycle(sm);
sm->vtop->in_seed_valid = 0;
...
and the later naively implements the transaction. More into the details, the
seed is copied from the data buffer to the dedicated randomness bus. Then, the
control signal in_seed_valid
is asserted and several clock cycles are
simulated until the signal in_seed_ready
is also asserted.
An additional clock cycle is simulated ath the end of the while loop to complete the transaction.
Finally, in_seed_valid
is deasserted. The call to eval()
is
used to recompute the internal values resulting from combinatorial logic.
The next step consists in starting the execution using the provided plaintexts and key, which is achieved by the following piece of code
...
// Prepare the run with input data
// Assign the plaintext sharing
memcpy(&sm->vtop->in_shares_plaintext,data+SEED_PRNG_BYTES,16*D);
// Assign the key sharing
memcpy(&sm->vtop->in_shares_key,data+SEED_PRNG_BYTES+16*D,16*D);
// Start the run
sm->vtop->in_valid = 1;
sm->vtop->eval();
while(sm->vtop->in_ready!=1){
sim_clock_cycle(sm);
}
sim_clock_cycle(sm);
sm->vtop->in_valid = 0;
sm->vtop->eval();
...
First, the plaintext and the key sharing are copied from the buffer to the
input busses. Then, a transaction on the input interface is implemented to feed
the core with fresh inputs. Finally, we wait until the completion of the
execution by simulating a clock cycle at each loop iteration until the signal
out_valid
is asserted. While waiting, the probed signals are saved at every
clock cycle by calling save_state(p)
as shown here
...
// Run until the end of the computation
while(sm->vtop->out_valid!=1){
save_state(p);
sim_clock_cycle(sm);
}
save_state(p);
...
Building of the python3 simulation package
The simulation package can be built providing an annotated Verilog code and the corresponding simulation wrapper. The building process is done in two simple steps:
- Generating the package files using Verime.
- Building the python package using the Makefile generated by Verime.
The Makefile combines both steps in the target verime
and it suffices
to use the later to create the python wheel. Basically, the first step consists in using
Verime with the appropriate arguments in order to setup the package. The tool
will analyze the hardware architecture, identify the annoted signals and create
C++ files in order to probe these signals together with Verilator. Besides, it will generate all the python
environment used in the wheel building process. As shown by its helper, Verime
accepts the following parameters:
-h, --help show this help message and exit
-y YDIR [YDIR ...], --ydir YDIR [YDIR ...]
Directory for the module search. (default: [])
-g GENERICS [GENERICS ...], --generics GENERICS [GENERICS ...]
Verilog generic value, as -g<Id>=<Value>. (default: None)
-t TOP, --top TOP Path to the top module file, e.g. /home/user/top.v. (default: None)
--yosys-exec YOSYS_EXEC
Yosys executable. (default: yosys)
--pack PACK The Verilator-me package name. (default: None)
--simu SIMU Path to the C++ file defining run_simu (default: None)
--build-dir BUILD_DIR
The build directory. (default: .)
--clock CLOCK The clock signal to use. (default: clk)
In practice, the Makefile calls Verime with the following arguments under the target verime
:
--ydir ./aes_enc128_32bits_hpc2 ./aes_enc128_32bits_hpc2/masked_gadgets ./aes_enc128_32bits_hpc2/rnd_gen ./aes_enc128_32bits_hpc2/sbox
: used to point to the directories in which the SMAesH source files are located.-g d=2
: set the value of the genericd
at the top-level of SMAesH--top ./aes_enc128_32bits_hpc2/aes_enc128_32bits_hpc2.v
: specifies the top module path.--pack aeshpc_32bit_d2_lib
: defines the package name.--build-dir aeshpc_new_32bit_d2_lib
: used to indicates the directory used for the building process (in practice, a directory with the package name in the current directory).--simu simu_aeshpc_32bit.cpp
: indicates the path to the simu_aeshpc_32bit.cpp file.
After the Verime execution, the directory defined with --build-dir
contains an
automatically generated Makefile. The latter first uses Verilator in order to
build a shared library. The later will then be used as an efficient backend
simulator. Finally, the python package is generated and the wheel
aeshpc_32bit_d2_lib/aeshpc_32bit_d2_lib-*.whl
is created.
The following section explain how the provided example integrates the later.
Basic usage of the simulation package.
Once installed, the generated simulation package can be used to easily probe the annotated signal. It is considered next that the wheel generated in the previous step has been installed in the python environment. The following piece of code shows how to use the generated package
import aeshpc_32bit_d2_lib as pred
import numpy as np
### Generate random input data byte.
# Amount of cases to simulate
n_cases = 100
# Amount of input byte for a single case
len_data = 10 + pred.GENERICS['d']*16 + pred.GENERICS['d']*16
# Random input byte
data_bytes = np.random.randint([n_cases, len_data],dtype=np.uint8)
### Simulate the cases
# Amount of probed state to allocate
# (>= number of calls to save_state() in the C++ wrapper)
am_probed_state = 110
simulations_results = pred.Simul(
cases,
am_probed_state
)
### Recover the data for a specific cycle
### Note that `bytes_from_SB` being a 2D wires, the index `__i` is added
### to the verime signal name. Please check the value of
### pred.SIGNALS to get the names of all verime labels.
# Value of the state recover for all simulated cases
sh_byte0_fromSB_clk7 = simulations_results["B_fromSB__0"][:,7,:]
sh_byte1_fromSB_clk8 = simulations_results["B_fromSB__1"][:,8,:]
sh_byte2_fromSB_clk9 = simulations_results["B_fromSB__2"][:,9,:]
sh_byte3_fromSB_clk10 = simulations_results["B_fromSB__3"][:,10,:]
The first lines are generating the numpy 2D-array data_bytes
with random
bytes. Each row of this array contains the input bytes of a single simulation
case. In practice, each of these rows corresponds to an array char * data
that will be used by the function run_simu()
in the simulation wrapper. In
this example, 100 independant random cases are generated, and each row contains
the bytes representing the \( 80 \)-bits seed, the \( 128 d \)-bits
plaintext and key of a single case. Note that the practical amount of shares \( d
\) is fetch from the value that has been passed to Verime during the building
process by accessing to the GENERICS
metadata of the package.
Next, we use the package to simulate all the input cases. To this end, the package function Simul()
takes two input parameters:
the cases input data (as a numpy array of bytes with a shape of (n_cases, len_data)) and the amount of probed states to allocate.
More into the detail, the backend will allocate memory in order to store a given amount of times each annotated signal per case simulation.
Each time the function save_state()
is called, the value of the annoted signals are stored to the buffer. In our present example, the saving
is done at every clock cycle, and a total of 106 saves is done for a single execution.
The results of the simulation for each cases are stored in the variable
simulations_results
. In particular, the simulated values for a given signal
can be accessed directly using verime label corresponding to the signal.
The simulation results are organised as a 3D bytes arrays
of dimension (n_cases
, am_probed_state
, bytes_len_sig
), with
n_cases
: the amount of simulated cases.am_probed_state
: used for the memory allocation of the simulation. Correspond to the maximum amount of timesave_state()
can be called in the simulation wrapper. In particular, using the index i at the second dimension allows to recover the value of the i-th call tosave_state()
perfomed in the simulation wrapper.bytes_len_sig
: the amount of bytes required to encode the simulated signal.
It results that the variables sh_byte0_fromSB_clk7
, sh_byte1_fromSB_clk8
,
sh_byte2_fromSB_clk9
and sh_byte3_fromSB_clk10
hold 4 out of the 16 targeted values
(i.e., the values of the wires bytes_from_SB[0]
,bytes_from_SB[1]
,bytes_from_SB[2]
,bytes_from_SB[3]
respectively for the clock indexes 7, 8, 9 and 10)
when the input vectors stored in data_bytes
are used at the input of the core.
Integration in the example submission package
To ease the readibility of the model/attack scripts provided, the file
tap_config.py
defines the TapSignal
class. The latter allows to define a specific signal of
interest and provides useful features to deal with the simulated values. In
particular, each instance implements the simulation of the configured signal.
Besides, when the target signal configured holds a shared data, the user can
select to recover a specific sharing or the unmasked value hold by the wire.
The following parameters must be provided to each TapSignal instance
Instance parameter | Type | Description |
---|---|---|
sig_name | str | Verime label of the annotated signal |
cycle | int | Clock index of interest (considering that the values of the annotated signals are saved at each clock cycle). |
share_index | obj | Share index to tap. The user has the choice between
|
tap_bits | list of integers or range | The bits indexes of interest. The behaviour depends on the value of share_index
|
am_shares | int | Amount of shares used the encode the shared value. |
In the demo submission, a TapSignal
instance is generated for each shares of each bytes
of the state after the first SubByte layer (as done per the
function generate_TC_from_SB()
in tap_config.py). The tap signal are
then used in the profilling phase in order to recover the traces label when
building the templates. As a final remark, the TapConfig
us just a wrapper
designed in order to ease the management of multiple TapSignal
instances.