Efficient performance monitoring of integrated circuit(s) having distributed clocks

US11144087B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11144087-B2
Application numberUS-201916351319-A
CountryUS
Kind codeB2
Filing dateMar 12, 2019
Priority dateAug 10, 2018
Publication dateOct 12, 2021
Grant dateOct 12, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Performance monitors are placed on computational units in different clock domains of an integrated circuit. A central dispatcher generates trigger signals to the performance monitors to cause the performance monitors to respond to the trigger signals with packets reporting local performance counts for the associated computational units. The data in the packets are correlated into a single clock domain. By applying a trigger and reporting system, the disclosed approach can synchronize the performance metrics of the various computational units in the different clock domains without having to route a complex global clock reference signal to all of the performance monitors.

First claim

Opening claim text (preview).

What is claimed is: 1. An integrated circuit device comprising: a plurality of clock domains each with a corresponding local clock; a plurality of performance monitors each operating in a different clock domain; and a central dispatcher coupled to a reference clock, the central dispatcher configured to: broadcast a trigger signal to the plurality of performance monitors, form a reference packet indicating when the central dispatcher broadcasted the trigger signal to the plurality of performance monitors, wherein the reference packet includes a first identification of the trigger signal and a reference clock value from the reference clock, and save the reference packet in a memory, wherein each of the performance monitors is adapted to: respond to the trigger signal by forming a return packet to be saved in the memory; and respond to the trigger signal by resetting one or more local performance counts; and wherein the clock domains in which the plurality of performance monitors operate are correlated based on the return packets saved in the memory and the reference clock value included in the reference packet saved in the memory. 2. The integrated circuit device of claim 1 , further comprising a router configured to: perform data reduction on the return packets from the performance monitors to form at least one aggregated return packet; and save the at least one aggregated return packet in the memory. 3. The integrated circuit device of claim 1 , wherein each return packet includes: a local clock value based on the corresponding local clock; a second identification of the trigger signal; and a local performance count. 4. The integrated circuit device of claim 3 , wherein the first identification and the second identification are the same, the first identification being generated by the central dispatcher, and the second identification being generated by at least one of the performance monitors. 5. The integrated circuit device of claim 1 , wherein the central dispatcher is further configured to receive a command. 6. The integrated circuit device of claim 5 , wherein the command is one of a broadcast a trigger signal command, a start broadcasting trigger signals periodically command, or a stop periodic broadcasting of trigger signals command. 7. The integrated circuit device of claim 1 , wherein the first identification of the trigger signal indicates a total number of trigger signals broadcast by the central dispatcher. 8. The integrated circuit device of claim 1 , wherein the reference packet further includes a bookmark received by the central dispatcher. 9. The integrated circuit device of claim 1 , wherein the reference packet further includes a local clock value for the central dispatcher. 10. The integrated circuit device of claim 1 , wherein the trigger signal comprises a binary level or edge trigger that can be transmitted on a single wire. 11. The integrated circuit device of claim 1 , wherein the integrated circuit device is a graphics processing unit. 12. A method comprising: broadcasting a trigger signal from a central dispatcher with access to a reference clock to a plurality of performance monitors each operating in a clock domain of a corresponding local clock; saving, in a memory, a reference packet indicating when the central dispatcher broadcasted the trigger signal to the plurality of performance monitors, the reference packet comprising a first identification of the trigger signal and a reference clock value based on the reference clock; and for each of the performance monitors, responding to the trigger signal by forming a return packet to be saved in the memory and responding to the trigger signal by resetting one or more local performance count values, wherein clock domains in which the plurality of performance monitors operate are correlated based on the return packet of each performance monitor saved in the memory and the reference clock value included in the reference packet saved in the memory. 13. The method of claim 12 , wherein each return packet includes: a local clock value based on the corresponding local clock; a second identification of the trigger signal; and a local performance count. 14. The method of claim 13 , wherein the first identification and the second identification are the same, the first identification being generated by the central dispatcher and the second identification being generated by at least one of the performance monitors. 15. The method of claim 12 , further comprising: receiving a command, wherein the command is one of a broadcast a trigger signal command, a start broadcasting trigger signals periodically command, or a stop periodic broadcasting of trigger signals command. 16. The method of claim 12 , wherein the first identification of the trigger signal indicates a total number of broadcasts of trigger signals. 17. The method of claim 12 , wherein the reference packet further includes a bookmark. 18. The method of claim 12 , wherein the reference packet further includes a local clock value for the central dispatcher. 19. The method of claim 12 , wherein the trigger signal comprises a binary level or edge trigger that can be transmitted on a single wire. 20. The method of claim 12 , wherein the performance monitors are in a central processing unit.

Assignees

Inventors

Classifications

  • G06F1/12Primary

    Synchronisation of different clock signals {provided by a plurality of clock generators} · CPC title

  • H04L43/065Primary

    related to network devices · CPC title

  • where the computing system is an embedded system, i.e. a combination of hardware and software dedicated to perform a certain function in mobile devices, printers, automotive or aircraft systems (testing or monitoring of control systems or parts thereof G05B23/02) · CPC title

  • involving deadlines, e.g. rate based, periodic · CPC title

  • by assessing time · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11144087B2 cover?
Performance monitors are placed on computational units in different clock domains of an integrated circuit. A central dispatcher generates trigger signals to the performance monitors to cause the performance monitors to respond to the trigger signals with packets reporting local performance counts for the associated computational units. The data in the packets are correlated into a single clock…
Who is the assignee on this patent?
Nvidia Corp
What technology area does this patent fall under?
Primary CPC classification G06F1/12. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 12 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).