SM4 acceleration processors, methods, systems, and instructions

US10419210B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10419210-B2
Application numberUS-201816025706-A
CountryUS
Kind codeB2
Filing dateJul 2, 2018
Priority dateJul 22, 2014
Publication dateSep 17, 2019
Grant dateSep 17, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate one or more source packed data operands. The one or more source packed data operands are to have four 32-bit results of four prior SM4 cryptographic rounds, and four 32-bit values. The processor also includes an execution unit coupled with the decode unit and the plurality of the packed data registers. The execution unit, in response to the instruction, is to store four 32-bit results of four immediately subsequent and sequential SM4 cryptographic rounds in a destination storage location that is to be indicated by the instruction.

First claim

Opening claim text (preview).

What is claimed is: 1. A processor comprising: a data cache; a data translation lookaside buffer (TLB) coupled to the data cache; a branch prediction unit; an instruction cache; an instruction TLB coupled to the instruction cache; an instruction fetch unit to fetch instructions, including an instruction; a level 2 (L2) cache coupled to the data cache, and coupled to the instruction cache; a plurality of registers to store single instruction, multiple data (SIMD) data, including a first register, and a second register, the first register to store a first source data that includes four source data elements to be encrypted with an SM4 cryptographic algorithm, the second register to store a second source data that includes four round keys, wherein the plurality of registers are dynamically allocated using register renaming; a decoder to decode the instruction, the instruction having a first field to specify the first register, and a second field to specify the second register; and an execution unit coupled to the decoder, and coupled to the plurality of registers, the execution unit including at least some circuitry, and, in response to the instruction, to generate and store a result in the first register, the result to include four result data elements that include the first source data encrypted by four corresponding encryption rounds of the SM4 cryptographic algorithm, wherein the execution unit is to generate each of the four result data elements to be consistent with an evaluation of a linear substitution function with a value for the corresponding encryption round, which is equal to the value logically XOR'd with the value rotated left by two bits logically XOR'd with the value rotated left by ten bits logically XOR'd with the value rotated left by eighteen bits logically XOR'd with the value rotated left by twenty-four bits. 2. The processor of claim 1 , wherein the execution unit, in response to the decode of the instruction, is to generate each of the four result data elements by performing a mixer substitution for the corresponding encryption round, the mixer substitution including a linear substitution on a result of a non-linear substitution. 3. The processor of claim 1 , wherein the second source data is 128-bits and is to have: a first round key for an encryption round i in bits [ 31 : 0 ]; a second round key for an encryption round i+1 in bits [ 63 : 32 ]; a third round key for an encryption round i+2 in bits [ 95 : 64 ]; and a fourth round key for an encryption round i+3 in bits [ 127 : 96 ]. 4. The processor of claim 3 , wherein the result is 128-bits and is to include: a first result data element for an encryption round i+4 in bits [ 31 : 0 ]; a second result data element for an encryption round i+5 in bits [ 63 : 32 ]; a third result data element for an encryption round i+6 in bits [ 95 : 64 ]; and a fourth result data element for an encryption round i+7 in bits [ 127 : 96 ]. 5. The processor of claim 1 , wherein the first source data is to include a first data element in bits [ 31 : 0 ], a second data element in bits [ 63 : 32 ], a third data element in bits [ 95 : 64 ], and a fourth data element in bits [ 127 : 96 ], wherein the second source data is to include a first round key in bits [ 31 : 0 ], a second round key in bits [ 63 : 32 ], a third round key in bits [ 95 : 64 ], and a fourth round key in bits [ 127 : 96 ], and wherein the result is to include a first result data element in bits [ 31 : 0 ] that is equal to the first data element logically exclusive OR'd (XOR'd) with a first output of a function evaluated with a first input, the first input equal to the second data element logically XOR'd with the third data element logically XOR'd with the fourth data element logically XOR'd with the first round key, the first output equal to a first value, which is equal to a substitution box applied to the first input, logically XOR'd with the first value rotated left by two bits logically XOR'd with the first value rotated left by ten bits logically XOR'd with the first value rotated left by eighteen bits logically XOR'd with the first value rotated left by twenty-four bits. 6. The processor of claim 5 , wherein the result is further to include: a second result data element in bits [ 63 : 32 ] that is equal to the second data element logically XOR'd with a second output of the function evaluated with a second input, the second input equal to the third data element logically XOR'd with the fourth data element logically XOR'd with the first result data element logically XOR'd with the second round key, the second output equal to a second value of the substitution box applied to the second input logically XOR'd with the second value rotated left by two bits logically XOR'd with the second value rotated left by ten bits logically XOR'd with the second value rotated left by eighteen bits logically XOR'd with the second value rotated left by twenty-four bits; a third result data element in bits [ 95 : 64 ] that is equal to the third data element logically XOR'd with a third output of the function evaluated with a third input, the third input equal to the fourth data element logically XOR'd with the first result data element logically XOR'd with the second result data element logically XOR'd with the third round key, the third output equal to a third value of the substitution box applied to the third input logically XOR'd with the third value rotated left by two bits logically XOR'd with the third value rotated left by ten bits logically XOR'd with the third value rotated left by eighteen bits logically XOR'd with the third value rotated left by twenty-four bits; and a fourth result data element in bits [ 127 : 96 ] that is equal to the fourth data element logically XOR'd with a fourth output of the function evaluated with a fourth input, the fourth input equal to the first result data element logically XOR'd with the second result data element logically XOR'd with the third result data element logically XOR'd with the fourth round key, the fourth output equal to a fourth value of the substitution box applied to the fourth input logically XOR'd with the fourth value rotated left by two bits logically XOR'd with the fourth value rotated left by ten bits logically XOR'd with the fourth value rotated left by eighteen bits logically XOR'd with the fourth value rotated left by twenty-four bits. 7. The processor of claim 1 , wherein the processor is a reduced instruction set computing (RISC) processor. 8. The processor of claim 1 , wherein the decoder is also to decode a second instruction, the second instruction having a third field to specify a third register of the plurality of registers, a fourth field to specify a fourth register of the plurality of registers, and a fifth field to specify a destination register of the plurality of registers, the third register to store a third source data that includes four round keys corresponding to four prior key expansion rounds of the SM4 cryptographic algorithm, the fourth register to store a fourth source data that includes four key generation constants, and wherein the processor, in response to the decode of the second instruction, is to generate and store a second result in the destination register, the second result to include four round keys corresponding to four sequential key expansion rounds of the SM4 cryptographic algorithm that sequentially follow the four prior key expansion rounds. 9. A processor comprising: a plurality of registers to store single instruction, multiple data (SIMD) data, including a first register, and a second register, the first register to store a first source data that includes four source data elements to be encrypted with an SM4 cryptographic

Assignees

Inventors

Classifications

  • H04L9/0637Primary

    Modes of operation, e.g. cipher block chaining [CBC], electronic codebook [ECB] or Galois/counter mode [GCM] · CPC title

  • Details relating to cryptographic hardware or logic circuitry · CPC title

  • with splitting of the data block into left and right halves, e.g. Feistel based algorithms, DES, FEAL, IDEA or KASUMI · CPC title

  • G06F21/602Primary

    Providing cryptographic facilities or services · CPC title

  • Special purpose encoding of instructions, e.g. Gray coding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10419210B2 cover?
A processor of an aspect includes a plurality of packed data registers, and a decode unit to decode an instruction. The instruction is to indicate one or more source packed data operands. The one or more source packed data operands are to have four 32-bit results of four prior SM4 cryptographic rounds, and four 32-bit values. The processor also includes an execution unit coupled with the decode…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification H04L9/0637. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Sep 17 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).