Synchronous hardware event collection

US11232012B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11232012-B2
Application numberUS-201916520558-A
CountryUS
Kind codeB2
Filing dateJul 24, 2019
Priority dateMar 29, 2017
Publication dateJan 25, 2022
Grant dateJan 25, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method that includes monitoring execution of program code by first and second processor components. A computing system detects that a trigger condition is satisfied by: i) identifying an operand in a portion of the program code; or ii) determining that a current time of a clock of the computing system indicates a predefined time value. The operand and the predefined time value are used to initiate trace events. When the trigger condition is satisfied the system initiates trace events that generate trace data identifying respective hardware events occurring across the computing system. The system uses the trace data to generate a correlated set of trace data. The correlated trace data indicates a time ordered sequence of the respective hardware events. The system uses the correlated set of trace data to analyze performance of the executing program code.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for collecting event data about neural network computations for a neural network having multiple neural network layers, the method comprising: executing program code to perform the neural network computations using components of a processor configured to implement the neural network; wherein the processor comprises a global time counter, and each of the components of the processor includes a respective local time counter; wherein the global time counter and the respective local time counters each includes at least one respective offset bit used for decreasing phase variations between the global time counter and one or more of the respective local time counters; wherein the program code includes a first time parameter as a trigger condition for triggering a trace event across two or more components of the processor during performance of the neural network computations; synchronizing, using the at least one respective offset bit of each respective local time counter in two or more of the components of the processor, respective current time values indicated by the respective local time counters based on a current time value indicated by the global time counter; determining, by comparing the synchronized respective current time values against the first time parameter in the program code, if the trigger condition for triggering the trace event is satisfied; in response to determining that the trigger condition is satisfied, triggering the trace event to generate event data for the two or more of the components of the processor; wherein the event data is synchronized based on the synchronized respective current time values according to the global time counter; and providing the event data to a host used to analyze the program code during performance of the neural network computations. 2. The method of claim 1 , comprising: storing the event data in one or more trace buffers of a plurality of trace buffers disposed in a first component of the processor. 3. The method of claim 1 , wherein the global time counter and the respective local time counters each includes respective fixed-size binary data comprising a 60-bit counter and a 4-bit offset, wherein the program code further includes a second time parameter; wherein the second time parameter indicates a later time than the first time parameter; and wherein the trigger condition includes a predefined time window associated with the global time counter, wherein the predefined time window includes a start time for triggering the trace event based on the first time parameter, and an end time for stopping the trace event based on the second time parameter. 4. The method of claim 1 , wherein: executing the program code to perform the neural network computations comprises executing a sequence of operations to process vector elements of an inference workload using the neural network. 5. The method of claim 4 , wherein: a subset of the event data describes a plurality of memory access operations that are executed by the processor to perform the neural network computations; and the sequence of operations comprises moving the vector elements from a first memory of a first component to a second memory of a second component of the processor. 6. The method of claim 5 , wherein triggering the trace event to generate the event data for the two or more of the components of the processor comprises: generating the subset of the event data synchronized between the first component and the second component during the moving of the vector elements from the first memory of the first component to the second memory of the second component. 7. The method of claim 1 , wherein synchronizing the respective current time values indicated by the respective local time counters based on the current time value indicated by the global time counter, further comprises: computing, for each of the respective local time counters, a difference between the respective current time value indicated by the local time counter and the current time value indicated by the global time counter; and determining if the difference satisfies a minimum value over a sampling period. 8. The method of claim 1 , further comprising: embedding, by the host, the first time parameter in response to compiling the program code for execution at the processor; and loading, by the host, the compiled program code that includes the first time parameter embedded in the compiled program code. 9. An event collection system for collecting event data about neural network computations for a neural network having multiple neural network layers, the system comprising: one or more processing devices; and one or more non-transitory machine-readable storage devices for storing instructions that are executable by the one or more processing devices to cause performance of operations comprising: executing program code to perform the neural network computations using components of a processor configured to implement the neural network; wherein the processor comprises a global time counter, and each of the components of the processor includes a respective local time counter; wherein the global time counter and the respective local time counters each includes at least one respective offset bit used for decreasing phase variations between the global time counter and one or more of the respective local time counters; wherein the program code includes a first time parameter as a trigger condition for triggering a trace event across two or more components of the processor during performance of the neural network computations; synchronizing, using the at least one respective offset bit of each respective local time counter in two or more of the components of the processor, respective current time values indicated by the respective local time counters based on a current time value indicated by the global time counter; determining, by comparing the synchronized respective current time values against the first time parameter in the program code, if the trigger condition for triggering the trace event is satisfied; in response to determining that the trigger condition is satisfied, triggering the trace event to generate event data for the two or more of the components of the processor; wherein the event data is synchronized based on the synchronized respective current time values according to the global time counter; and providing the event data to a host used to analyze the program code during performance of the neural network computations. 10. The system of claim 9 , wherein the operations comprise: storing the event data in one or more trace buffers of a plurality of trace buffers disposed in a first component of the processor. 11. The system of claim 9 , wherein the global time counter and the respective local time counters each includes respective fixed-size binary data comprising a 60-bit counter and a 4-bit offset; wherein the program code further includes a second time parameter; wherein the second time parameter indicates a later time than the first time parameter; wherein the trigger condition includes a predefined time window associated with the global time counter; and wherein the predefined time window includes a start time for triggering the trace event based on the first time parameter, and an end time for stopping the trace event based on the second time parameter. 12. The system of claim 9 , wherein: executing the program code to perform the neural network computations comprises executing a sequence of operations to process vector elements of an inference workload using the neural network. 13. The system of claim 1

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11232012B2 cover?
A computer-implemented method that includes monitoring execution of program code by first and second processor components. A computing system detects that a trigger condition is satisfied by: i) identifying an operand in a portion of the program code; or ii) determining that a current time of a clock of the computing system indicates a predefined time value. The operand and the predefined time …
Who is the assignee on this patent?
Google Llc
What technology area does this patent fall under?
Primary CPC classification G06F11/3495. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 25 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).