Puzzle link: Play on SudokuPad.
Background: Artifical intelligence (AI), often based on an artificial neural network, is omnipresent these days. The mathematical rules in this Sudoku capture how neural networks operate, including the ReLU activation function.
Rules: Normal Sudoku rules apply.
The Sudoku grid emulates the function of a three-layer neural network. Each 3x3 box represents one "neuron". Let i denote the neuron layer, with i = 1 in Sudoku columns 1-3 ("input layer"), i = 2 in columns 4-6 ("hidden layer"), and i = 3 in columns 7-9 ("output layer"). The network contains three "lines" of neurons. Let j denote the line, with j = 1 in Sudoku rows 1-3, j = 2 in rows 4-6, and j = 3 in rows 7-9.
Each neuron contains three "weights" w_ijk (marked with cages), with indices i and j as defined above and a third index k, where k = 1 for the top row, k = 2 for the middle row, and k = 3 for the bottom row in each box.
Cells directly to the left of each weight w_ijk (marked with squares), with values s_ijk, contain information on the sign of the adjacent weight as defined below.
Let x_ij denote "input" values to neurons.
The "signal" y_ij of the neuron in layer i and line j is calculated via y_ij = Sum_k {sign(s_ijk) * w_ijk * x_ik}, where "Sum_k" indicates a sum over three terms (k = 1,2,3) of the triple product in the curly brackets, and where we define (unconventionally) that sign(s) = +1 if s is even and sign(s) = -1 if s is odd.
For the input layer and the hidden layer, each signal is passed through a "Rectified Linear Unit (ReLU)" that acts as an "activation function" of the neuron, where ReLU(y) = y for non-negative y and ReLU(y) = 0 for negative y. The results provide the inputs for the subsequent layers via x_2j = ReLU(y_1j) and x_3j = ReLU(y_2j).
The circles in column 1 contain positive inputs x_1j and the circles in columns 4 and 7 contain the absolute magnitudes of the signals y_1j and y_2j, respectively, before applying the activation function, with j as defined above.
The three signals of the output layer, y_3j, are single-digit numbers and sum to zero.
Definition of Variables: The following image shows where the indexed variables occur in the grid.
Network structure: The following image illustrates which cells are mathematically connected. For example, input x_11 in the circle in row 2 column 1 is a multiplication partner of all those weights in column 3 connected by a red line. Vice versa, within neuron 1 in box 1, all three inputs connected with red, green, and blue lines are multiplied with the corresponding weights in the cages in column 3.
The magnitudes of the digits in column 2 themselves are not used, but they determine the plus or minus sign (if the cell in the square is either even or odd, respectively) by which each of the three products enters into the sum. The signal is the sum of the three products (taking into account their correct signs), and its absolute magnitude is written into the circle pointed at by the small corner arrows in the cages. This works analogously for all other neurons/boxes.
The activation function is not denoted graphically. It must be considered during the solve that only those circled cells with a signal >0 provide input values for the subsequent red, green, and blue lines in the next layers.
Example (4x4): The following image provides a fully solved example on a 4x4 grid. It is analogous to the full-sized puzzle, just without the activation function and with a modified final condition in the rules. If you want to get accustomed with the puzzle mechanics, you can solve the 4x4 example for yourself here on SudokuPad.
Rules (4x4): Normal 4×4 Sudoku rules apply. The Sudoku grid emulates the function of neurons from an artificial neural network. Neuron 1 is contained in lines 1-2, and neuron 2 in lines 3-4. Each neuron contains two "weights" (marked with cages). Square-marked cells directly to the left of each weight with value s contain information on the sign of the adjacent weight, unconventionally defined as sign(s) = +1 if s is even and sign(s) = -1 if s is odd. The circles in column 1 contain input values, and the circles in column 4 the absolute magnitudes of output values. The outputs are calculated as sums over the products of inputs and signed weights as follows: R1C4 = R1C1 * sign(R1C2) * R1C3 + R3C1 * sign(R2C2) * R2C3 and R3C4 = R1C1 * sign(R3C2) * R3C3 + R3C1 * sign(R4C2) * R4C3. The outputs are either both positive or both negative.
Solution code: All digits of row 4 (from left to right) followed by column 9 (from top to bottom) without spaces.
on 21. August 2024, 00:05 by Flash Groudon
Brilliant theme and execution!
The ruleset may initially appear daunting but it becomes relatively intuitive once you work through the reference example. It also leads to some really fun (and surprising) deductions as you work through the puzzle.
—————
Thank you so much for playing and your kind words! - TB