What technology area does this patent fall under?

Primary CPC classification G06F17/16. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Parallel multiplier for saber algorithm

US12500762B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12500762-B2
Application number	US-202318152773-A
Country	US
Kind code	B2
Filing date	Jan 11, 2023
Priority date	Jul 6, 2022
Publication date	Dec 16, 2025
Grant date	Dec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A parallel multiplier for the Saber algorithm comprises a coefficient memory, two parallel pre-adding circuits, three parallel multiplication circuits and a post-adding circuit. The coefficient memory, the two parallel pre-adding circuits, the three parallel multiplication circuits and the post-adding circuit adopt a divide-and-conquer strategy, the two parallel pre-adding circuits perform parallel computation, and the three parallel multiplication circuits perform parallel computation, such that the computation time of modulo multiplication is shorted; the modulo operation of non-prime numbers is realized by limiting the bit width, such that the constraint that the modulus is a prime number is avoided; and the Karatsuba algorithm is called once, such that extra circuit area expenditure is reduced. Thus, the parallel multiplier for the Saber algorithm is implemented by hardware, low in computation complexity, not limited by the constraint that the modulus is a prime number, and low in circuit area expenditure.

First claim

Opening claim text (preview).

What is claimed is: 1 . A parallel multiplier of a first computer terminal for establishing a cryptographic communication between the first computer terminal and a second computer terminal, characterized in that comprises: a coefficient memory; two parallel pre-adding circuits; three parallel multiplication circuits; and a post-adding circuit, wherein the coefficient memory has a coefficient input terminal, a clock input terminal and a coefficient output terminal, the coefficient input terminal of the coefficient memory, as a data input terminal of the parallel multiplier, is used for inputting coefficient data for modulo multiplication of two polynomials, the clock input terminal of the coefficient memory, as a clock input terminal of the parallel multiplier, is used for inputting a clock signal CLK, the two parallel pre-adding circuits are referred to as a first parallel pre-adding circuit and a second parallel pre-adding circuit respectively, the first parallel pre-adding circuit has two input ports and a data output port, the second parallel pre-adding circuit has two input ports and a data output port, the three parallel multiplication circuits are referred to as a first parallel multiplication circuit, a second parallel multiplication circuit and a third parallel multiplication circuit respectively, the first parallel multiplication circuit and the second parallel multiplication circuit each have an input port and an output port, the third parallel multiplication circuit has two input ports and an output port, the post-adding circuit has three input ports and an output port, the two input ports of the first parallel pre-adding circuit and the two input ports of the second parallel pre-adding circuit are connected to the coefficient output terminal of the coefficient memory, the input port of the first parallel multiplication circuit and the input port of the second parallel multiplication circuit are connected to an output port of the coefficient memory, the two input ports of the third parallel multiplication circuit are connected to the output port of the first parallel pre-adding circuit and the output port of the second parallel pre-adding circuit in a one-to-one correspondence manner, the output port of the first parallel multiplication circuit, the output port of the second parallel multiplication circuit and the output port of the third parallel multiplication circuit are connected to the three input ports of the post-adding circuit in a one-to-one correspondence manner, and the output port of the post-adding circuit, as an output terminal of the parallel multiplier, is used for outputting a final result OUT to transmit to the second computer terminal over a network as output data for the cryptographic communication, wherein in response to two polynomials are received at a input terminal of the parallel multiplier of the first computer terminal as input data for the cryptographic communication, the parallel multiplier multiplies coefficients of the polynomials specifically through the following steps: S1: loading the two polynomials into the coefficient memory, and denoting the two polynomials as a polynomial S and a polynomial A respectively, wherein the polynomial S comprises 256 coefficients, and a coefficient of an f th term (the f th coefficient) of the polynomial S is denoted as s f−1 , f=1, 2, . . . , 256, s f−1 is an integer, s f−1 ∈[−4, 4], a vector formed by the 256 coefficients of the polynomial S is (s 0 , s 1 . . . , s 255 ), a vector (s 128 , s 129 , . . . , s 255 ) formed by the first 128 coefficients of the polynomial S is denoted as S H , a vector (s 0 , s 1 , . . . , s 127 ) formed by the last 128 coefficients of the polynomial S is denoted as S L , an n th data in S L is denoted as S Ln , S Ln =S n−1 , n=1, 2, . . . , 128, an n th data in S H is denoted as S Hn , and S Hn =s n+127 , the polynomial A comprises 256 coefficients, each of the 256 coefficients has a bit width of 16 bits, 13 bits or 10 bits of the 16 bits are significant bits, other 3 bits or 6 bits of the 16 bits are used for data completion and coefficient alignment, the bit width of the data is set to 16 bits to ensure that 64 bits data of the polynomial A includes four consecutive coefficients, the coefficient of an f th term (the f th coefficient) of the polynomial A is denoted as a f−1 , a f−1 is an integer, a f−1 ∈[0, 8191], a vector formed by the 256 coefficients of the polynomial A is (a 0 , a 1 , . . . , a 255 ), a vector (a 128 , a 129 , . . . , a 255 ) formed by the first 128 coefficients of the polynomial A is denoted as A H , a vector (a 0 , a 1 , . . . , a 127 ) formed by the last 128 coefficients of the polynomial A is denoted as A L , an m th data in A L is denoted as A Lm , A Lm =a m−1 , m=1, 2, . . . , 128, and an m th data in A H is denoted as A Hm , and A Hm =a m+127 ; S2: through the output terminal of the coefficient memory, according to a preset time sequence under the control of the clock signal CLK, outputting A H and A L to the first parallel pre-adding circuit, outputting S H and S L to the second parallel pre-adding circuit, outputting A H and S H to the first parallel multiplication circuit, and outputting A L and S L to the second parallel multiplication circuit, processing A H and A L by the first parallel pre-adding circuit according to a formula (1) to obtain a result R A , which is output to the third parallel multiplication circuit through the output port of the first parallel pre-adding circuit: ra m−1 =( A Hm +A Lm )mod 8192 (1) wherein mod is a modulo operator, mod 8192 represents an 8192 modulo operation performed on (A Hm +A Lm ), ra m−1 is an m th data in R A , and R A includes 128 data (ra 0 , ra 1 , . . . , ra 127 ), processing S H and St by the second parallel pre-adding circuit according to a formula (2) to obtain a result R S , which is output to the third parallel multiplication circuit through the output port of the second parallel pre-adding circuit: rs n−1 =( S Hn +S Ln ) (2) wherein rs n−1 is an n th data in R S , and R S includes 128 data (rs 0 , rs 1 , . . . , rs 127 ); S3: processing A H and S H by the first parallel multiplication circuit through the following steps to obtain an output result P 0 , which is output to the post-adding circuit, wherein the step S3 comprises: S3.1: setting a round variable k and an intermediate vector T including 255 data, wherein T=(t 1_0 , t 1_1 , . . . , t 1_254 ), t 1_j is a (j+1) th data in T, and j=0, 1, 2, . . . , 254, and k and T are initialized to k=1 and t 1_j =0; S3.2: performing a k th round of shift accumulation, which specifically comprises: S3.2.1: setting an intermediate vector R k , and calculating each data in the intermediate vector R k according to a formula (3): r k_n−1 =A Hk ×S Hn (3) wherein r k_n−1 is an n th data in R k , and R k includes 128 data (r k_0 , r k_1 , . . . , r k_127 ); S3.2.2: setting an intermediate P 0 k including 255 data, P 0 k =(p k_0 , p k_1 , . . . , p k_254 ), where p k_j is a (j+1) th data in P 0 k , when k=1, p k_d =t 1_d +r k_d , p k_b =t 1_b , where d=0, 1, 2, . . . , 127, b=128, 129, . . . , 254, and the values of t 1_d and t 1_b are current latest values, when 2≤k<128, p k_0 =t 1_0 , . . . , P k_k−2 =t 1_k−2 , P k_k−1 =t 1_k−1 +r k_0 , p k_k =t 1_k +r k_1 , . . . , p k_k+126 +t 1_k+126 +r k_127 , p k_k+127 t 1_k+127 , . . . p k_254 =t 1_254 , where the value of t 1_j is a current latest value, when k=128, p k_e =t 1_e +r k_e , p k_g =t 1_g , where e=0, 1, 2, . . . , 126, g=127, 128, . . . , 254, and the values of t 1_e and t 1_g are current latest values; and S3.2.3: updating each data in the intermediate vector T, t 1_j =p k_j ; S3.3: determining whether the value of k is equal to 128,

Assignees

Univ Wenzhou

Inventors

Classifications

H04L9/3026
details relating to polynomials generation, e.g. generation of irreducible polynomials · CPC title
G06F17/16Primary
Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title
G06F7/72
using residue arithmetic · CPC title
H04L9/3093
involving Lattices or polynomial equations, e.g. NTRU scheme · CPC title
H04L2209/125
Parallelization or pipelining, e.g. for accelerating processing of cryptographic operations · CPC title

Patent family

Related publications grouped by family.

View patent family 86420680

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12500762B2 cover?: A parallel multiplier for the Saber algorithm comprises a coefficient memory, two parallel pre-adding circuits, three parallel multiplication circuits and a post-adding circuit. The coefficient memory, the two parallel pre-adding circuits, the three parallel multiplication circuits and the post-adding circuit adopt a divide-and-conquer strategy, the two parallel pre-adding circuits perform para…
Who is the assignee on this patent?: Univ Wenzhou
What technology area does this patent fall under?: Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Cryptographic processor for fully homomorphic encryption (FHE) applications

Method for multiplying polynomials for a cryptographic operation

Configurable lattice cryptography processor for the quantum-secure internet of things and related techniques

Key agreement devices and method

Executing a cryptographic operation

Frequently asked questions