What technology area does this patent fall under?

Primary CPC classification G06F7/49915. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Enhanced low precision binary floating-point formatting

US11775257B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11775257-B2
Application number	US-202016840847-A
Country	US
Kind code	B2
Filing date	Apr 6, 2020
Priority date	Jun 5, 2018
Publication date	Oct 3, 2023
Grant date	Oct 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms and low error rates for computing-intensive applications. The enhanced format can employ a specified definition for the lowest binade that enables the lowest binade to be used for zero and normal numbers; and a specified definition for the highest binade that enables it to be structured to have one data point used for a merged Not-a-Number (NaN)/infinity symbol and remaining data points used for finite numbers. The signs of zero and merged NaN/infinity can be “don't care” terms. The enhanced format employs only one rounding mode, which is for rounding toward nearest up.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a memory that stores computer-executable components; and a processor, operatively coupled to the memory, that executes computer-executable components, the computer-executable components comprising: a calculator component that facilitates operation on and calculation of binary floating-point numbers by the processor in accordance with a defined 16-bit floating-point number format, in connection with execution of a machine learning application, wherein the defined 16-bit floating-point number format utilizes greater than five bits in an exponent field, wherein the defined 16-bit floating-point number format utilizes a first binade to represent zero and normal numbers, wherein the first binade is associated with the exponent field having all zeros, and wherein a normal number of the normal numbers is a finite non-zero floating-point number with a magnitude greater than or equal to a minimum value that is determined as a function of a radix and a minimum exponent associated with the defined 16-bit floating-point number format, and wherein the defined 16-bit floating-point number format is applied to the machine learning application and results in reduced error rates and improved convergence time; an operation management component operatively coupled to the calculator component and the processor, wherein the operation management component: allocates a first portion of operations of the calculator component and associated data to a set of lower precision computation engines; and an enhanced format component that generates the defined 16-bit floating-point number format employed by the processor and the calculator component to calculate the binary floating-point numbers. 2. The system of claim 1 , wherein the defined 16-bit floating-point number format utilizes six bits in the exponent field and facilitates the machine learning algorithm and deep learning training algorithms, and wherein the exponent field is adjacent a sign field comprising one bit of data representing a sign of the floating-point number. 3. The system of claim 2 , wherein the calculator component generates an arbitrary value or symbol in the sign field of the defined 16-bit floating-point number format to reduce hardware complexity and based on a generation of a zero result for a binary floating-point number of the binary floating-point numbers. 4. The system of claim 1 , wherein the operation management component also allocates a second portion of the operations of the calculator component and second associated data to a set of higher precision computation engines. 5. The system of claim 1 , wherein the set of lower precision computation engines comprises computation engines comprising 16-bit floating-point units, and wherein the set of higher precision computation engines comprises computation engines comprising 32-bit floating-point units or 64-bit floating-point units. 6. The system of claim 1 , wherein the defined 16-bit floating-point number format comprises a 1/6/9 format having a single sign bit, a six bit exponent and a nine bit mantissa, wherein the processor employs the defined 16-bit floating-point number format as an arithmetic computation format as well as a data-interchange format. 7. The system of claim 1 , wherein the defined 16-bit floating-point number format defines a sign of zero as being a don't care term for selected ones of defined applications, the defined applications comprising deep learning applications or machine learning applications. 8. The system of claim 1 , wherein a data point of the first binade has a fraction of all zeros and represents zero, and other data points of the first binade represent the normal numbers. 9. The system of claim 1 , wherein the defined 16-bit floating-point number format utilizes a second binade associated with the exponent field having all ones, wherein the defined floating-point number format employs a reduced set of data points in the second binade to represent an infinity value and a not-a-number value, and wherein the reduced set of data points comprises less data points than a set of data points associated with an entirety of the second binade. 10. The system of claim 1 , wherein, in accordance with the defined 16-bit floating-point number format, the calculator component represents a sign of a value of zero as a term that indicates that the sign does not matter with respect to the value of zero, wherein the processor generates an arbitrary value in a sign field of the defined 16-bit floating-point number format to represent the term, and wherein the generation of the arbitrary value utilizes less resources than a determination and a generation of a non-arbitrary value for the sign field. 11. The system of claim 1 , wherein, in accordance with the defined 16-bit floating-point number format, the calculator component represents a not-a-number value and an infinity value together as a merged symbol, wherein the calculator component represents a sign of the merged symbol as a term that indicates that the sign does not matter with respect to the merged symbol, wherein the processor generates an arbitrary value in a sign field of the defined floating-point number format to represent the term, and wherein the generation of the arbitrary value utilizes less resources than a determination and a generation of a non-arbitrary value for the sign field. 12. The system of claim 1 , wherein, in accordance with the defined 16-bit floating-point number format, the calculator component utilizes only one rounding mode to perform rounding values of the binary floating-point numbers, to facilitate enhancing efficiency of the system by reducing hardware utilized to execute the application and to operate on and calculate the binary floating-point numbers, and wherein the one rounding mode is a round-nearest-up mode. 13. A computer-implemented method, comprising: generating, by a system operatively coupled to a processor, respective numerical fields in a defined 16-bit floating-point number format, wherein the respective numerical fields comprise a sign field, an exponent field, and a mantissa field, wherein the defined 16-bit floating-point number format utilizes greater than five bits in the exponent field, and wherein the defined 16-bit floating-point number format utilizes a first binade to represent zero and normal numbers, wherein the first binade is associated with the exponent field having all zeros, and wherein a normal number of the normal numbers is a finite non-zero floating-point number with a magnitude greater than or equal to a minimum value that is determined as a function of a radix and a minimum exponent associated with the defined 16-bit floating-point number format; calculating, by the system, binary floating-point numbers in accordance with the defined 16-bit floating-point number format, in connection with execution of a deep learning application, wherein the defined 16-bit floating-point number format is applied to the deep learning application and results in reduced error rates and improved convergence time; and allocating, by the system, a first portion of operations of the calculator component and associated data to a set of lower precision computation engines. 14. The computer-implemented method of claim 13 , wherein the defined 16-bit floating-point number format utilizes six bits in the exponent field that facilitates machine learning algorithms and deep learning training algorithms, and wherein the exponent field is adjacent a sign field comprising one bit of data representing a sign of the floating-point number. 15. The computer-implemented

Assignees

Inventors

Classifications

G06F7/49915Primary
Mantissa overflow or underflow in handling floating-point numbers · CPC title
G06F7/49968Primary
Rounding towards positive infinity (G06F7/49957 takes precedence) · CPC title
G06F7/483Primary
Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers {(G06F7/4806, G06F7/4824, G06F7/49, G06F7/491, G06F7/544 take precedence)} · CPC title

Patent family

Related publications grouped by family.

View patent family 68692691

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11775257B2 cover?: Techniques for operating on and calculating binary floating-point numbers using an enhanced floating-point number format are presented. The enhanced format can comprise a single sign bit, six bits for the exponent, and nine bits for the fraction. Using six bits for the exponent can provide an enhanced exponent range that facilitates desirably fast convergence of computing-intensive algorithms a…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F7/49915. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Denormalization in multi-precision floating-point arithmetic circuitry

Very low precision floating point representation for deep learning acceleration

Very low precision floating point representation for deep learning acceleration

Dynamic, variable bit-width numerical precision on fpgas for machine learning tasks

Dynamic precision for neural network compute operations

Denormalization in multi-precision floating-point arithmetic circuitry

Tensor processing using low precision format

Frequently asked questions