Distributed Multi-Client Control Of Performance Telemetry Subsystem In A Multi-Die Chip

US2025291692A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025291692-A1
Application numberUS-202418747404-A
CountryUS
Kind codeA1
Filing dateJun 18, 2024
Priority dateMar 17, 2024
Publication dateSep 18, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computing system performance monitors provide on-chip control, selection, collection, coalescing and communication of behavior and other processing-indicating data of high performance single- and multi-die computing and processing systems, such as for use in multi-chip-module and/or multi-instanced graphics processing units (GPUs) and/or systems-on-chips (SOCs). Commands and data records can be forwarded between modules to abstract the processing system from profilers and other data report consumers. Quality of Service and security isolation for different command and data report streams is maintained.

First claim

Opening claim text (preview).

1 . A graphics processor comprising: a first semiconductor die including a first control path circuit or processor that determines a first monitoring parameter and sends a forwarding command packet indicating the first monitoring parameter to a second semiconductor die; and the second semiconductor die including a second control path circuit or processor that determines a second monitoring parameter and determines a global monitoring parameter in response to the forwarding command packet and the determined second monitoring parameter. 2 . The graphics processor of claim 1 wherein the first and second monitoring parameters are each temporal. 3 . A method comprising: determining a first temporal region of interest local to a first semiconductor die; determining a second temporal region of interest local to a second semiconductor die; forwarding information indicating the first temporal region of interest from the first semiconductor die to the second semiconductor die; and determining a global temporal region of interest in response to the forwarded information and the determined second temporal region of interest. 4 . The method of claim 3 wherein determining the first temporal region of interest is based on an engine start command and an engine stop command, the engine disposed on the first semiconductor die. 5 . The method of claim 4 wherein determining the first temporal region of interest is also based on a further engine start command and a further engine stop command, the further engine also disposed on the first semiconductor die. 6 . The method of claim 5 wherein determining the first temporal region of interest comprises selecting the first temporal region of interest relative to the engine start command, the engine stop command, the further engine start command and the further engine stop command. 7 . The method of claim 6 wherein selecting comprises defining the global temporal region of interest between a first start command from any engine and a last stop command from any engine. 8 . The method of claim 6 wherein selecting comprises defining the global temporal region of interest between a first start command from any engine and a first stop command from any engine. 9 . The method of claim 3 wherein determining the global temporal region of interest comprises selecting the global temporal region of interest relative to the first temporal region of interest and the second temporal region of interest. 10 . The method of claim 3 further including triggering to snapshot performance data or propagating command and control information to a first data generator on the first semiconductor die during the global temporal region of interest, and triggering to snapshot performance data or propagating command and control information to a second data generator on the second semiconductor die during the global temporal region of interest. 11 . The method of claim 10 wherein at least one of the first data generator and the second data generator comprises a performance data monitor. 12 . A processing system comprising: a first semiconductor die including a first control path circuit or processor that determines a first temporal region of interest local to the first die and forwards information indicating the first temporal region of interest to a second semiconductor die; and the second semiconductor die including a second control path circuit or processor that determines a second temporal region of interest local to the second semiconductor die and determines a global temporal region of interest in response to the forwarded information and the determined second temporal region of interest. 13 . The processing system of claim 12 wherein the first control path circuit or processor determines the first temporal region of interest based on an engine start command and an engine stop command, the engine disposed on the first semiconductor die. 14 . The processing system of claim 13 wherein the first control path circuit or processor determines the first temporal region of interest also based on a further engine start command and a further engine stop command, the further engine also disposed on the first semiconductor die. 15 . The processing system of claim 13 wherein the first control path circuit or processor determines the first temporal region of interest by selecting the first temporal region of interest relative to the engine start command, the engine stop command, the further engine start command and the further engine stop command. 16 . The processing system of claim 15 wherein selecting comprises defining the global temporal region of interest between a first start command from any engine and a last stop command from any engine. 17 . The processing system of claim 15 wherein selecting comprises defining the global temporal region of interest between a first start command from any engine and a first stop command from any engine. 18 . The processing system of claim 12 wherein the second control path circuit or processor selects the global temporal region of interest relative to the first temporal region of interest and the second temporal region of interest. 19 . The processing system of claim 12 further including a first trigger that triggers a first data generator on the first semiconductor die to monitor a first engine on the first semiconductor die during the global temporal region of interest, and a second trigger that triggers a second data generator on the second semiconductor die to monitor a second engine on the second semiconductor die during the global temporal region of interest. 20 . The processing system of claim 19 wherein at least one of the first data generator and the second data generator comprises a counter, a workload execution timeline data or a performance monitor. 21 . A GPU comprising: a first virtualizer that enables a first tenant to use first fractional parts of a first die and a second die, and enables a second tenant to use second fractional parts of the first die and the second die, wherein at least some of the first fractional parts are distinct from the second fractional parts; a controller that enables the first tenant to issue first performance monitoring commands for the first fractional parts and enables the second tenant to issue second performance monitoring commands for the second fractional parts; and communication paths on the first die and the second die that keep the first monitoring commands and the second monitoring commands separate while communicating the first monitoring commands to the first fractional parts on the first die and the second die and communicating the second monitoring commands to the second fractional parts on the first die and the second die.

Assignees

Inventors

Classifications

  • Processor architectures; Processor configuration, e.g. pipelining · CPC title

  • System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package · CPC title

  • where the computing system component is a central processing unit [CPU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025291692A1 cover?
Computing system performance monitors provide on-chip control, selection, collection, coalescing and communication of behavior and other processing-indicating data of high performance single- and multi-die computing and processing systems, such as for use in multi-chip-module and/or multi-instanced graphics processing units (GPUs) and/or systems-on-chips (SOCs). Commands and data records can be…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F15/7807. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Sep 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).