Cryptographic processor for fully homomorphic encryption (FHE) applications
US-11818244-B2 · Nov 14, 2023 · US
US12500762B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12500762-B2 |
| Application number | US-202318152773-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 11, 2023 |
| Priority date | Jul 6, 2022 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A parallel multiplier for the Saber algorithm comprises a coefficient memory, two parallel pre-adding circuits, three parallel multiplication circuits and a post-adding circuit. The coefficient memory, the two parallel pre-adding circuits, the three parallel multiplication circuits and the post-adding circuit adopt a divide-and-conquer strategy, the two parallel pre-adding circuits perform parallel computation, and the three parallel multiplication circuits perform parallel computation, such that the computation time of modulo multiplication is shorted; the modulo operation of non-prime numbers is realized by limiting the bit width, such that the constraint that the modulus is a prime number is avoided; and the Karatsuba algorithm is called once, such that extra circuit area expenditure is reduced. Thus, the parallel multiplier for the Saber algorithm is implemented by hardware, low in computation complexity, not limited by the constraint that the modulus is a prime number, and low in circuit area expenditure.
Opening claim text (preview).
What is claimed is: 1 . A parallel multiplier of a first computer terminal for establishing a cryptographic communication between the first computer terminal and a second computer terminal, characterized in that comprises: a coefficient memory; two parallel pre-adding circuits; three parallel multiplication circuits; and a post-adding circuit, wherein the coefficient memory has a coefficient input terminal, a clock input terminal and a coefficient output terminal, the coefficient input terminal of the coefficient memory, as a data input terminal of the parallel multiplier, is used for inputting coefficient data for modulo multiplication of two polynomials, the clock input terminal of the coefficient memory, as a clock input terminal of the parallel multiplier, is used for inputting a clock signal CLK, the two parallel pre-adding circuits are referred to as a first parallel pre-adding circuit and a second parallel pre-adding circuit respectively, the first parallel pre-adding circuit has two input ports and a data output port, the second parallel pre-adding circuit has two input ports and a data output port, the three parallel multiplication circuits are referred to as a first parallel multiplication circuit, a second parallel multiplication circuit and a third parallel multiplication circuit respectively, the first parallel multiplication circuit and the second parallel multiplication circuit each have an input port and an output port, the third parallel multiplication circuit has two input ports and an output port, the post-adding circuit has three input ports and an output port, the two input ports of the first parallel pre-adding circuit and the two input ports of the second parallel pre-adding circuit are connected to the coefficient output terminal of the coefficient memory, the input port of the first parallel multiplication circuit and the input port of the second parallel multiplication circuit are connected to an output port of the coefficient memory, the two input ports of the third parallel multiplication circuit are connected to the output port of the first parallel pre-adding circuit and the output port of the second parallel pre-adding circuit in a one-to-one correspondence manner, the output port of the first parallel multiplication circuit, the output port of the second parallel multiplication circuit and the output port of the third parallel multiplication circuit are connected to the three input ports of the post-adding circuit in a one-to-one correspondence manner, and the output port of the post-adding circuit, as an output terminal of the parallel multiplier, is used for outputting a final result OUT to transmit to the second computer terminal over a network as output data for the cryptographic communication, wherein in response to two polynomials are received at a input terminal of the parallel multiplier of the first computer terminal as input data for the cryptographic communication, the parallel multiplier multiplies coefficients of the polynomials specifically through the following steps: S1: loading the two polynomials into the coefficient memory, and denoting the two polynomials as a polynomial S and a polynomial A respectively, wherein the polynomial S comprises 256 coefficients, and a coefficient of an f th term (the f th coefficient) of the polynomial S is denoted as s f−1 , f=1, 2, . . . , 256, s f−1 is an integer, s f−1 ∈[−4, 4], a vector formed by the 256 coefficients of the polynomial S is (s 0 , s 1 . . . , s 255 ), a vector (s 128 , s 129 , . . . , s 255 ) formed by the first 128 coefficients of the polynomial S is denoted as S H , a vector (s 0 , s 1 , . . . , s 127 ) formed by the last 128 coefficients of the polynomial S is denoted as S L , an n th data in S L is denoted as S Ln , S Ln =S n−1 , n=1, 2, . . . , 128, an n th data in S H is denoted as S Hn , and S Hn =s n+127 , the polynomial A comprises 256 coefficients, each of the 256 coefficients has a bit width of 16 bits, 13 bits or 10 bits of the 16 bits are significant bits, other 3 bits or 6 bits of the 16 bits are used for data completion and coefficient alignment, the bit width of the data is set to 16 bits to ensure that 64 bits data of the polynomial A includes four consecutive coefficients, the coefficient of an f th term (the f th coefficient) of the polynomial A is denoted as a f−1 , a f−1 is an integer, a f−1 ∈[0, 8191], a vector formed by the 256 coefficients of the polynomial A is (a 0 , a 1 , . . . , a 255 ), a vector (a 128 , a 129 , . . . , a 255 ) formed by the first 128 coefficients of the polynomial A is denoted as A H , a vector (a 0 , a 1 , . . . , a 127 ) formed by the last 128 coefficients of the polynomial A is denoted as A L , an m th data in A L is denoted as A Lm , A Lm =a m−1 , m=1, 2, . . . , 128, and an m th data in A H is denoted as A Hm , and A Hm =a m+127 ; S2: through the output terminal of the coefficient memory, according to a preset time sequence under the control of the clock signal CLK, outputting A H and A L to the first parallel pre-adding circuit, outputting S H and S L to the second parallel pre-adding circuit, outputting A H and S H to the first parallel multiplication circuit, and outputting A L and S L to the second parallel multiplication circuit, processing A H and A L by the first parallel pre-adding circuit according to a formula (1) to obtain a result R A , which is output to the third parallel multiplication circuit through the output port of the first parallel pre-adding circuit: ra m−1 =( A Hm +A Lm )mod 8192 (1) wherein mod is a modulo operator, mod 8192 represents an 8192 modulo operation performed on (A Hm +A Lm ), ra m−1 is an m th data in R A , and R A includes 128 data (ra 0 , ra 1 , . . . , ra 127 ), processing S H and St by the second parallel pre-adding circuit according to a formula (2) to obtain a result R S , which is output to the third parallel multiplication circuit through the output port of the second parallel pre-adding circuit: rs n−1 =( S Hn +S Ln ) (2) wherein rs n−1 is an n th data in R S , and R S includes 128 data (rs 0 , rs 1 , . . . , rs 127 ); S3: processing A H and S H by the first parallel multiplication circuit through the following steps to obtain an output result P 0 , which is output to the post-adding circuit, wherein the step S3 comprises: S3.1: setting a round variable k and an intermediate vector T including 255 data, wherein T=(t 1_0 , t 1_1 , . . . , t 1_254 ), t 1_j is a (j+1) th data in T, and j=0, 1, 2, . . . , 254, and k and T are initialized to k=1 and t 1_j =0; S3.2: performing a k th round of shift accumulation, which specifically comprises: S3.2.1: setting an intermediate vector R k , and calculating each data in the intermediate vector R k according to a formula (3): r k_n−1 =A Hk ×S Hn (3) wherein r k_n−1 is an n th data in R k , and R k includes 128 data (r k_0 , r k_1 , . . . , r k_127 ); S3.2.2: setting an intermediate P 0 k including 255 data, P 0 k =(p k_0 , p k_1 , . . . , p k_254 ), where p k_j is a (j+1) th data in P 0 k , when k=1, p k_d =t 1_d +r k_d , p k_b =t 1_b , where d=0, 1, 2, . . . , 127, b=128, 129, . . . , 254, and the values of t 1_d and t 1_b are current latest values, when 2≤k<128, p k_0 =t 1_0 , . . . , P k_k−2 =t 1_k−2 , P k_k−1 =t 1_k−1 +r k_0 , p k_k =t 1_k +r k_1 , . . . , p k_k+126 +t 1_k+126 +r k_127 , p k_k+127 t 1_k+127 , . . . p k_254 =t 1_254 , where the value of t 1_j is a current latest value, when k=128, p k_e =t 1_e +r k_e , p k_g =t 1_g , where e=0, 1, 2, . . . , 126, g=127, 128, . . . , 254, and the values of t 1_e and t 1_g are current latest values; and S3.2.3: updating each data in the intermediate vector T, t 1_j =p k_j ; S3.3: determining whether the value of k is equal to 128,
details relating to polynomials generation, e.g. generation of irreducible polynomials · CPC title
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
using residue arithmetic · CPC title
involving Lattices or polynomial equations, e.g. NTRU scheme · CPC title
Parallelization or pipelining, e.g. for accelerating processing of cryptographic operations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.