oppat下載 - oppat源代碼下載

oppat

PowerBuilder

1.0.0

下載

開放能力/性能分析工具（OPPAT）

介紹

開放功率/性能分析工具（OPPAT）是跨架構功率和性能分析工具。

Cross-OS：支持Windows ETW跟踪文件和Linux/Android Perf/Trace-CMD跟踪文件
跨架構：支持英特爾和手臂芯片硬件事件（使用PERF和/或PCM）

項目網頁是https://patinnc.github.io

源代碼回購是https://github.com/patinnc/oppat

我已經在CPU框圖功能中添加了一個操作系統（OS_VIEW）。這基於Brendan Gregg的頁面，例如http://www.brendangregg.com/linuxperf.html。這是一些用於運行Geekbench v2.4.2（32位代碼）的舊版本的示例數據，該手臂64bit Ubuntu Mate v18.04.2 Raspberry Pi 3 B+，4 ARM Cortex A53 CPU：

OS_VIEW和ARM A53 CPU框圖運行GeekBench的更改的視頻：
- 有一些介紹性幻燈片可以嘗試解釋OS_VIEW和CPU_DIAGRAM佈局，然後1個幻燈片顯示了30個子測驗中的每一個的結果
電影中數據的excel文件：電影中的geekbench excel文件
電影中的數據的HTML ...請參閱4個核心ARM Cortext A53上的Geekbench v2.4.2，帶有OS_VIEW，CPU圖。
所有30個階段的儀表板PNG通過增加說明/秒排序...請參閱帶有CPU圖4核芯片儀表板的ARM Cortex A53 Raspberry Pi 3運行Geekbench。

這是一些用於運行我的旋轉基準測試的示例數據（內存/緩存帶寬測試，“旋轉” keep-cpu-busy Test測試）Raspberry Pi 3 B+（Cortex A53）CPU：

OS_VIEW和ARM A53 CPU框圖運行旋轉的更改的視頻：
- 有一些介紹性幻燈片可以嘗試解釋OS_VIEW圖表和CPU_DIAGRAGRAM佈局，這是一個顯示每個子測試的時間（以秒為單位）顯示（因此您可以轉到t = x secs直接轉到該子測驗），然後為5個子測試中的每個幻燈片中的每個幻燈片進行一次幻燈片
電影中數據的excel文件：電影中的geekbench excel文件
電影中數據的HTML ...參見ARM Cortex A53 Raspberry Pi 3帶有CPU圖4核芯片運行旋轉基準測試。
所有5個階段的儀表板PNG通過增加指令/秒排序...請參閱帶有CPU圖4核芯片儀表板的ARM Cortex A53 Raspberry Pi 3運行自旋基準。

這是用於在Haswell CPU上運行Geekbench的一些示例數據：

OS_VIEW和HASWELL CPU CPU框圖運行GeekBench的更改的視頻：
- 有一些介紹性幻燈片可以嘗試解釋OS_VIEW圖表和CPU_DIAGRAM佈局，這是一個幻燈片，顯示了每個子檢驗時（以秒為單位）顯示（因此您可以轉到t = x secs直接轉到該子測驗），然後為50個子測試中的每個幻燈片中的每個幻燈片進行一次幻燈片
電影中數據的excel文件：電影中的geekbench excel文件
電影中數據的HTML ...參見Intel Haswell，帶有CPU圖4-CPU芯片運行Geekbench。
所有50個階段的儀表板PNG通過增加UOPS退休/SEC排序...請參見Intel Haswell儀表板，帶有CPU圖4核芯片儀表板運行Geekbench。

這是一些用於運行我的“自旋”基準測試的數據，並在Haswell CPU上進行4個子測試：

第一個子測驗是讀取內存帶寬測試。在測試期間，L2/L3/內存塊高度使用並停滯不前。大鼠UOPS/循環很低，因為大鼠大部分停滯不前。
第二個子測驗是L3讀帶寬測試。內存BW現在很低。在測試期間，L2和L3塊高度使用並停滯不前。大鼠UOPS/循環較高，因為大鼠停滯不前。
第三個子測驗是L2讀帶寬測試。 L3和內存BW現在很低。在測試期間，L2塊高度使用並停滯不前。大鼠UOPS/循環甚至更高，因為大鼠的失速較小。
第四個子測驗是一個旋轉（只是添加循環）測試。 L2，L3和內存BW接近零。大鼠UOPS/循環約為3.3 UOP/循環，即接近4個UOPS/循環最大。
通過分析運行“自旋”的Haswell CPU框圖更改的視頻。看
電影中數據的excel文件：旋轉電影中的excel文件
電影中數據的HTML ...參見帶有CPU圖4-CPU芯片的Intel Haswell運行自旋基準。

帶有CPU圖數據收集的Intel Haswell適用於4-CPU Intel芯片，Linux OS，HTML文件，並通過PERF Sampling和其他收集的數據，帶有50多hw事件。 cpu_diagram特徵：

從wikichip.org的框圖SVG開始（允許使用），
查看資源限制（例如最大BW，最大途徑上的最大字節/循環，最小周期/UOP等），
計算用於資源使用的指標
下面是內存讀取帶寬測試的表，該測試在表中顯示了資源使用信息（以及由於使用情況而導致CPU是否停滯的估計）。懸停在字段上時，HTML表（但沒有PNG）具有彈出信息。表顯示：
- 核心停滯在內存帶寬上，最大可能25.9 GB/s BW的55％。這是一個內存BW測試
- Superqueue（SQ）已滿（核心0和62.3％core1）（因此無法處理更多的L2請求）
- 線條加速器FB已滿（30％和51％），因此無法從L2移動到L1D
- 結果是後端停滯（88％和87％），沒有UOPS退休。
- UOPS似乎來自環路檢測器（因為LSD循環/UOP與大鼠UOPS/循環大致相同。
- Haswell CPU圖記憶BW表的屏幕截圖
以下是L3讀取帶寬測試的表。
- 現在，內存BW和L3遺失字節/週期大約為零。
- SQ較少停滯（因為我們不在等內存）。
- L2交易字節/週期高約2倍，約為最大64個字節/循環的67％。
- UOPS_RETIRID_STALLS/CYCEL已從MEM BW測試攤位下降到88％的66％。
- 現在，加速緩衝區攤位高2倍以上。 UOPS仍來自LSD。
- Haswell CPU圖L3 BW表的屏幕截圖
以下是L2讀帶寬測試的表。
- L2錯過字節/週期遠低於L3測試。
- 現在，UOPS_RETRIED％停滯不見的是L3測試的一半，為34％，FB攤位也約為17％。
- UOPS仍來自LSD。
- Haswell CPU圖L2 BW表的屏幕截圖
以下是一個自旋測試的表（沒有負載，只需添加一個循環）即可。
- 現在，幾乎有零內存子系統攤位。
- UOPS來自解碼流緩衝區（DSB）。
- 3.31循環/UOP時的大鼠retired_uops/循環接近可能的4.0 UOPS/循環。
- 老鼠retired_uops％失速在％8時非常低。
- Haswell CPU圖旋轉表的屏幕截圖

目前，我只有Haswell和ARM A53的CPU_DIAGRAMY電影（因為我沒有其他系統要測試），但是不難添加其他框圖。您仍然可以獲得所有圖表，但沒有獲得CPU_DIAGRAM。

以下是相對圖之一。 “ CPU_BUSY”圖表顯示了每個CPU上正在運行的內容以及每個CPU上發生的事件。例如，綠色圓圈顯示一個在CPU 1上運行的spin.x線。紅色圓圈顯示了在CPU1上發生的一些事件。該圖表以Trace-CMD的kernelshark圖表進行建模。有關CPU_BUSY圖表的更多信息在圖表類別中。呼叫框顯示光標下事件的事件數據（包括Callstack（如果有））。不幸的是，Windows屏幕截圖不會捕獲光標。 CPU繁忙圖表的屏幕截圖

這是一些示例HTML文件。大多數文件的間隔較短，但有些是“完整” 8秒的運行。這些文件不會直接從存儲庫加載，但是它們將從項目網頁上加載：https：//patinnc.github.io

Intel Haswell帶有CPU圖4-CPU芯片，Linux OS，HTML文件，通過Perf Sampling或50+ HW事件或
Intel 4-CPU芯片，Windows OS，HTML文件，通過Xperf採樣或1 HW事件或
完整〜8秒Intel 4-CPU芯片，Windows OS，帶PCM和XPERF採樣的HTML文件或
Intel 4-CPU芯片，Linux OS，HTML文件，在2個多路復用組中具有10個HW事件。
ARM（Broadcom A53）芯片，Raspberry PI3 Linux HTML文件，帶有14個HW事件（用於2個多路復用組的CPI，L2 MISSES，MEM BW等）。
11 MB，上臂的完整版（Broadcom A53）芯片，Raspberry PI3 Linux HTML文件，帶有14個HW事件（用於2個多路復用組的CPI，L2 MISSES，MEM BW等）。

上述某些文件是從〜8秒長期提取的〜2秒間隔。這是整個8秒的運行：

整個8秒Linux運行示例HTML壓縮文件在此處以獲取更完整的文件。該文件對圖表數據進行JavaScript Zlib解壓縮，因此您會看到消息要求您在解壓縮過程中等待（約20秒）。

支持的數據支持

Linux Perf和/或Trace-CMD性能文件（二進製文件和文本文件），
- 完美統計輸出也接受了
- 英特爾PCM數據，
- 其他數據（使用LUA腳本導入），
- 因此，這應該與常規Linux或Android的數據一起使用
- 當前，對於PERF和TRACE-CMD數據，OPPAT需要二進制和後處理的文本文件，並且在“記錄”命令行和'perf Script/Trace-CMD報告'命令行上有一些限制。
- 可以使Oppat僅使用perf/trace-cmd文本輸出文件，但目前需要二進製文件和文本文件
Windows ETW數據（由Xperf收集並傾倒到文本）或Intel PCM數據，
使用LUA腳本支持的任意權力或性能數據（因此您無需重新編譯C ++代碼即可導入其他數據（除非LUA性能成為問題）
在Linux或Windows上讀取數據文件，無論文件起源於何處（因此，在Linux上讀取Windows或ETW文本文件上的Perf/Trace-CMD文件）

對立可視化

這是一些完整的示例VisualZation HTML文件：Windows示例HTML文件或此Linux示例HTML文件。如果您在存儲庫中（不是Github.io項目網站），則必須下載文件，然後將其加載到瀏覽器中。這些是由OPPAT創建的獨立Web文件，例如，可以通過電子郵件發送給其他人，或（如下所示）發佈在Web服務器上。

OPPAT在鉻中的作用要比Firefox更好，這主要是因為使用TouchPad 2手指滾動的變焦在Chrome上效果更好。

Oppat具有3種可視化模式：

通常的圖表機制（Oppat後端在其中讀取數據文件並將數據發送到瀏覽器）
您還可以創建一個獨立的網頁，該網頁相當於“常規圖表機制”，但可以與其他用戶交換...獨立網頁具有所有內置的腳本和數據，因此可以通過電子郵件將其發送給某人，並且可以將其加載到瀏覽器中。請參閱上面引用的sample_html_files中的html文件，並且（有關lnx_mem_bw4的較長版本）請參見“壓縮文件sample_html_files/lnx_mem_bw4_flyl.html.html”
您可以' - 保存'數據json文件，然後 - 稍後加載文件。保存的JSON文件具有OPPAT需要發送到瀏覽器的數據。這避免了重新閱讀輸入perf/xperf文件，但不會拿起Charts.json中所做的任何更改。使用-web_file選項創建的完整HTML文件僅比 - 保存文件大一點。 - 保存/ - 負載模式需要構建Oppat。請參閱sample_data_json_files subdir中的示例“保存”文件。

VIZ一般信息

在瀏覽器（Linux或Windows上）中圖表所有數據
圖表在JSON文件中定義
瀏覽器接口有點像Windows WPA（左側的Navbar）。
- 下面顯示了左Navbar（左側滑動菜單）。
- 圖表按類別分組（GPU，CPU，Power等）
  - 類別是在Input_files/Charts.json中定義和分配的
- 通過單擊Navbar中的圖表，可以全部隱藏或選擇性地顯示圖表。
- 懸停在左NAV菜單中的圖表標題上，將圖表到視圖
一組文件的數據可以在另一組旁邊繪製
- 因此，您可以說，比較Linux Perf Perform vs Windows ETW運行
- 下圖顯示Linux vs Windows功率使用情況：
- - 我只能在Linux和Windows上使用電池電源。
  - 許多站點具有更好的功率數據（以MSEC（或更高）速率為單位的電壓/電流/電源）。將這些類型的電源數據（例如來自Kratos或Qualcomm MDPS）合併很容易，但是我無法訪問數據。
- 或在同一平台上比較2種不同的運行
- 文件組標籤（file_tag）前綴為標題以區分圖表
  - 在數據dir的file_list.json文件和/或input_files/input_data_files.json中定義了“標籤”。
  - input_files/input_data_files.json是所有Oppat Data Dirs的列表（但用戶必須維護它）。
- 繪製具有相同標題的圖表一個接一個地繪製

圖表功能：

懸停在圖表的一行的一部分上顯示了該線路的數據點
- 這對垂直線不起作用，因為它們只是連接2分...只有每行的水平零件才能搜索數據值
- 以下是懸停事件的屏幕截圖。這顯示了（CSWTICH）事件的相對時間，例如Process/PID/TID等一些信息以及文本文件中的行號，因此您可以獲取更多信息。
- 以下是一個屏幕截圖，顯示了事件的呼叫插場信息（如果有）。
變焦
- 無限放大到納米克級，然後放大。
  - 比像素比像素要多的點要多的數量級，因此在放大時顯示了更多的數據。
  - 下面是一個屏幕截圖，顯示縮放到微秒級別。這顯示了Sched_switch事件的呼叫列表，其中Spin.x通過執行內存映射操作和閒置而被阻止。 “ CPU繁忙”圖表將“閒置”顯示為空白。
  - 。
- 可以單獨縮放圖表，也可以使用相同的file_tag圖表進行鏈接
  - 滾動到左納維託的底部，然後單擊“ Zoom/Pan：Unlinded”。這將將菜單項更改為“ Zoom/Pan：鏈接”。這將使文件組中的所有圖表縮放/pan縮小到最新的Zoom/Pan絕對時間。這將需要一些時間才能重新繪製所有圖表。
    - 最初，繪製每個圖表以顯示所有可用數據。如果您的圖表來自不同的來源，則T_BEGIN和T_END（對於來自不同來源的圖表）可能是不同的。
    - 一旦完成了縮放/鍋操作，並且鏈接已生效，則文件組中的所有圖表都將縮放/pan縮小到相同的絕對間隔。
      - 這就是為什麼每個源使用的“時鐘”必須相同。
      - oppat可以從一個時鐘轉換為另一個時鐘（例如gettime（clock_monotonic）和getTimeofday（）），但是該邏輯
    - 無論鏈接狀態如何，任何時間間隔的火焰圖始終都會縮放到“擁有圖表”間隔中。
- 您可以通過：
  - 放大：鼠標輪垂直在圖表區域。圖表放大圖表中心的時間。
    - 在我的筆記本電腦上，這是在觸摸板上垂直滾動2個手指
  - 放大：單擊圖表並將鼠標拖到右側並釋放鼠標（圖表將縮放為選定的間隔）
  - 縮小：單擊圖表並將鼠標拖到左側並釋放鼠標將與您選擇的圖表的數量成反比。也就是說，如果您幾乎拖動整個圖表區域，則圖表將縮放〜2倍。如果您剛剛拖動一個小間隔，則圖表將整個路線縮小。
  - 縮放：在我的筆記本電腦上，在縮放的相反方向上進行觸摸板2手指垂直滾動
- 您必須小心光標在哪裡...當您打算滾動圖表列表時，您可能會無意中縮小圖表。因此，當我想滾動圖表時，我通常將光標放在屏幕的左邊緣。
平板
- 在我的筆記本電腦上，這是在觸摸板上進行水平滾動運動的2個手指
- 在圖表下方的縮略圖上使用綠色框
- 平移在任何變焦級別上工作
- 完整圖表的“縮略圖”圖片位於每個圖表下方，並帶有一個光標，可以沿著縮略圖滑動，因此您可以在縮放/平移時在圖表周圍導航
- 下面顯示“ CPU繁忙”圖表t = 1.8-2.37秒。相對時間和絕對開始時間在左紅色橢圓形中升高。最終時間在右側紅色橢圓形中突出顯示。縮略圖上的相對位置由中紅色橢圓形顯示。
- 。
懸停在圖表傳奇條目上突出了這一行。
- 下面是一個屏幕截圖，其中突出顯示了“ PKG”（軟件包）功率
單擊圖表傳奇條目可切換該行的可見性。
雙擊傳奇條目僅使該條目可見/隱藏
- 下面是雙擊“ PKG”功率的屏幕截圖，因此只能看到PKG線。
- 上面顯示Y軸調整為顯示變量的最小/最大。傳說中的“未顯示”線在傳奇中被弄清楚。如果您懸停在傳說中的“未顯示”線上，它將被繪製（當您徘徊在傳說項目上時）。您可以通過雙擊“不譜”傳奇條目來獲取所有要再次顯示的項目。這將顯示所有“未顯示”行，但它將切換您剛剛單擊的行...因此，請單擊您剛擊中的項目。我知道這聽起來令人困惑。
如果隱藏了傳奇條目，並且您將其懸停在它上，則將顯示為直到懸停

圖表類型：

“ CPU繁忙”圖表：類似KernelShark的圖表，顯示了PID/線程的CPU佔用率。請參閱kernelshark參考http://rostedt.homelinux.com/kernelshark/
- 以下是CPU繁忙圖表的屏幕截圖。圖表顯示，對於每個CPU，在任何時間點運行的過程/PID/TID。閒置過程未繪製。對於屏幕截圖上的“ CPU 1”，綠色橢圓形在圖表的“上下文開關”部分。在每個CPU的上下文開關信息上方，CPU繁忙顯示與上下文交換事件同一文件中的事件。 CPU 1線上的紅色橢圓形顯示了圖表的事件部分。
- 該圖表基於上下文開關事件，並在任何給定時間顯示每個CPU上的線程
- 上下文開關事件是Linux Sched：Sched_switch或Windows ETW CWERD事件。
- 如果事件比數據文件中的上下文開關多，則所有其他事件都表示為CPU上方的垂直破折號。
- 如果事件有呼叫堆棧，那麼彈出氣球也顯示了通話堆棧
線圖
- 由於每個事件（到目前為止）具有持續時間，因此線圖可能更準確地稱為步驟圖表，並且這些“持續時間”由水平段表示，並由垂直段連接。
- 如果圖表線有很多變化，則步驟圖的垂直部分可以填充圖表
- 您可以選擇（在左NAV欄中）不連接每行的水平段...因此，圖表變成了一種“分散的破折號”圖表。 “地平線破折號”是實際數據點。當您從步驟圖表切換到破折號圖表時，直到有一些“ redraw'請求（例如Zoom/Pan或Emairlight（通過懸停在傳奇條目上））之前，該圖表不會重新繪製。
  - 以下是使用線圖的CPU_IDLE功率狀態的屏幕截圖。連接線將圖表中的信息刪除。
  - 以下是使用散落的儀表圖表的CPU_IDLE功率狀態的屏幕截圖。該圖現在顯示了數據點的水平破折號（破折號的寬度是事件的持續時間）。現在，我們可以看到更多信息，但是此圖表還顯示了我的圖表邏輯的缺點：許多數據是最大值和圖表的最小值，並且被遮蓋了。
堆疊圖表
- 堆疊的圖表可能會導致生成更多的數據，而不是線圖。例如，繪製特定線程運行何時僅取決於該線程的行圖。為運行線程繪製堆疊的圖表是不同的：任何線程上的上下文開關事件都會更改所有其他運行線程...因此，如果您有N CPU，則將獲得n-1的n-1件事，每個事件都可以繪製堆疊圖表的圖表。
火焰彈。對於每個具有Callstacks的PERF事件，並且與Sched_switch/Cwitch事件相同，則創建了一個FlameGraph。
- 以下是典型的默認火焰圖的屏幕截圖。通常，FlameGraph圖表的默認高度不足以將文本適合到火焰圖的每個級別。但是您仍然可以獲得“懸停”呼叫堆棧信息。
- 。
- 如果單擊圖表層，它將擴展更高，使文本適合。如果單擊最低層，則涵蓋了“擁有圖”間隔的所有數據。
- 下面是放大火焰彈藥的屏幕截圖（單擊火焰的一層之後）。
- 。
- 通常，FlameGraph圖表的默認高度不足以將文本適合到火焰圖的每個級別。但是您仍然可以獲得“懸停”呼叫堆棧信息。
- FlameGraph的顏色與CPU_BUSY圖表的傳說中的過程/PID/TID相匹配...因此它不像火焰般的圖形那樣漂亮，但是現在“火焰”的顏色實際上意味著某種內容。
- CPI（每次指令時鐘）FlameGraph圖表為該堆棧的CPI塗上過程/PID/TID。
  - 以下是一個未縮放的CPI圖表。 spin.x的左手實例（淺橙色）的CPI = 2.26個週期/指令。右綠色的4個Spin.x的cpi = 6.465。
  - 您必須有周期，說明和cpu-clock（或sched_switch）呼叫站
  - CPI“火焰”的寬度基於CPU-CLOCK時間。
  - 顏色基於CPI。圖表左上方的紅色至綠色至藍色梯度顯示著著色。
  - 紅色是CPI低的（每個時鐘的說明很多...我認為它是“熱”）
  - 藍色是一個高CPI（每個時鐘的說明很少...我認為它是“冷”）
  - 以下是樣本縮放的CPI圖表，顯示著色和CPI。 “ Spin.x”線程已在CPU_BUSY傳奇中取消，因此它們不會出現在FlameGraph中。
- GIPS（GIGA（十億）指令每秒）FlameGraph圖表為GIPS顏色該堆棧的過程/PID/TID。
  - 以下是一個未縮放的GIPS（每秒千兆/十億個說明）圖表。 spin.x的左手實例（淺綠色）的GIPS = 1.13。藍綠色的右側4個旋轉X的GIPS = 0.377。氣球中的呼叫站是氣球左側的Spike Callstack。
  - 在上圖中指出，左手堆棧（用於spin.x）獲得的說明/秒高於Spin.x的最右邊4個實例。 Spin.x的這些第一個實例自身運行（因此獲得大量內存BW）和右4個旋轉。 X線程並行運行並獲得較低的GIPS（因為一個線程幾乎可以最大化存儲器BW）。
  - 您必須有說明和CPU-clock（或sched_switch）呼叫站
  - GIPS“火焰”的寬度基於CPU-CLOCK時間。
  - 顏色基於GIP。圖表左上方的紅色至綠色至藍色梯度顯示著著色。
  - 紅色是高gip的（每秒很多說明...我認為它是“熱”做很多工作的）
  - 藍色是低gip的（每秒很少的說明...我認為它是“冷”）
  - 下面是樣品縮放GIPS圖表，顯示著色和GIP。我單擊了“ perf 3186/3186”，所以下面顯示了火焰。
- 如果您在傳說中隱藏一個過程（單擊傳奇條目...將被弄清楚），則該過程將不會在FlameGraph中顯示。
- 如果您正確地將鼠標拖到火焰儀中，則該部分將被縮放
- 單擊“火焰”縮放到該火焰
- 左將鼠標拖動在火焰中將縮小
- 單擊FlameGraph的較低級別的“ Unzooms”到該級別的所有數據
- 如果您單擊“全部”最低含量，則將縮放
- 當您單擊FlameGraph級別時，圖表的大小大小，以使每個級別都足夠高以顯示文本。這會導致圖表大小。為了嘗試為該調整大小的調整增加一些理智，我將調整大小圖表的最後一個級別定位到可見屏幕的底部。
- 如果您返回傳奇並懸停在隱藏的條目上
- 如果您單擊“火焰”上層，則該部分將被縮放。
- 如果您在“父”圖表上放大/輸出
- 如果您在“父”圖表上向左/右pan鍋，則將重新繪製所選間隔的火焰圖
- 默認情況下，火焰圖的每個級別的文本可能不合適。如果您單擊FlameGraph，則大小將擴展以啟用繪製文本
- 您可以選擇（在左NAV欄中）是否按Process/pid/tid或Process/PID或僅通過過程進行對火焰繪製進行分組。
- 都繪製了“在CPU”，“ OFF CPU”和“ Run Queue” Flamegraphs的“ cpu”。
  - “在CPU上”是線程在CPU上運行時正在執行的操作的呼叫站...因此SmempledProfile Callstacks或Perf CPU-CLOCK CALLASTACKS指示該線程在CPU上運行時的線程在做什麼
  - “不在CPU”顯示，對於不運行的線程，他們等待了多長時間以及交換後的呼叫箱。
    - 以下是OFF-CPU或等待時間圖表的屏幕截圖。彈出窗口（在火焰中）顯示“ wait.x”正在等待納米腿上。
    - 交換“狀態”（以及在ETW上的“原因”）顯示為該過程上方的一個級別。通常，大多數線程都會在睡覺或不運行，但這有助於回答以下問題：“當我的線程不運行時...它在等什麼？”。
    - 顯示上下文開關的“狀態”使您可以看到：
    - 例如，線程是否正在等待不可互相互動的睡眠（linux上的狀態== d ...通常IO）
    - 可中斷的睡眠（狀態= S ...經常是納米腿或futex）
  - “運行隊列”顯示了交換並處於運行或可運行狀態的線程。因此，此圖表顯示了CPU的飽和度，如果有可運行狀態但不運行的線程。
    - 以下是Run_queue圖表的屏幕截圖。該圖顯示了線程未運行的時間，因為CPU不夠。也就是說，它已經準備好運行，但是還有其他一些使用CPU。每個火焰儀顯示圖中涵蓋的總數。在Run_queue圖表的情況下，它在等待時間顯示約159毫秒。因此，鑑於Spin.x的運行時間約為20秒，而0.159秒“等待”時間，這似乎還不錯。
我沒有基於帆布的圖表庫可以使用，因此圖表有點粗糙...如果有更好的東西，我不想花太多時間創建圖表。由於數據量，圖表必須使用HTML畫布（不是SVG，D3.JS等）。

oppat的數據收集

收集性能和功率數據非常“情況”。一個人將要運行腳本，另一個人需要使用按鈕開始測量，然後啟動視頻，然後用按鈕按下按鈕結束集合。我有一個用於Windows的腳本和一個用於Linux的腳本，可以演示：

啟動數據收集，
運行工作量，
停止數據收集
後處理數據（從perf/xperf/trace-cmd二進制數據創建文本文件）
將所有數據文件放入輸出DIR中
在輸出dir中創建一個file_list.json文件（告訴Oppat oppat oppat oppat oppat oppat oppat oppation oppation Files的名稱和類型）

使用腳本收集數據的步驟：

構建spin.exe（spin.x）和wait.exe（wait.x）實用程序
- 來自Oppat Root Dir：
- 在Linux上： ./mk_spin.sh
- 在Windows上： .mk_spin.bat （來自Visual Studio CMD框）
- 二進製文件將放在./bin subdir中
從運行提供的腳本開始：
- RUN_PERF_X86_HASWELL.SH-對於Haswell CPU_DIAGRAGRAGRAM數據集合
  - 在Linux上，類型： sudo bash ./scripts/run_perf.sh
  - 默認情況下，腳本將數據放入dir ../oppat_data/lnx/mem_bw7中
- run_perf.sh-您需要安裝Trace -CMD和Perf
  - 在Linux上，類型： sudo bash ./scripts/run_perf.sh
  - 默認腳本將數據放入dir ../oppat_data/lnx/mem_bw4中
- run_xperf.bat-您需要安裝xperf.exe。
  - 在Windows上，從具有管理員特權的CMD框，類型： .scriptsrun_xperf.sh
  - 默認情況下，腳本將數據放入dir .. oppat_data win mem_bw4
- 如果要更改默認值，請編輯運行腳本
- 除了數據文件外，運行腳本還在輸出dir中創建file_list.json文件。 OPPAT使用file_list.json文件來找出輸出dir中的文件名和文件類型。
運行腳本的“工作負載”是spin.x（或spin.exe），它在1 CPU上進行4秒鐘，然後在所有CPU上進行內存帶寬測試，然後再進行4秒。
另一個程序wait.x/wait.exe也在後台開始。 WAIT.CPP讀取我的筆記本電腦的電池信息。它可以在我的雙啟動Windows 10/Linux Ubuntu筆記本電腦上工作。 SYSFS文件在您的Linux上可能具有不同的名稱，並且在Android上幾乎肯定是不同的。
在Linux上，您可能只能使用與run_perf.sh中相同的語法生成prf_trace.data和prf_trace.txt文件，但我沒有嘗試過。
如果您要在筆記本電腦上運行並想要獲取電池電源，請在運行腳本之前記住斷開電源電纜。

PCM數據支持

Oppat可以讀取並圖表PCM .CSV文件。
以下是創建圖表列表的快照。
不幸的是，您必須對PCM進行補丁，以創建一個具有絕對時間戳的文件，以便OPPAT進行處理。
- 這是因為PCM CSV文件沒有時間戳我可以用來與其他數據源相關。
我在這裡添加了補丁PCM補丁

建立Oppat

在Linux上， make在Oppat root dir中
- 如果一切正常，應該有一個bin/oppat.x文件
在Windows上，您需要：
- 安裝Windows版本的GNU Make。請參閱http://gnuwin32.sourceforge.net/packages/make.htm或，對於最少所需的二進製文件，請使用http://gnuwin32.sourceforge.net/downlinks/downlinks/make.php
- 將這個新的“ make”二進制放在路徑中
- 您需要當前的Visual Studio 2015或2017 C/C ++編譯器（我同時使用了VS 2015 Professional和VS 2017社區編譯器）
- 啟動Windows Visual Studio X64本機CMD提示框
- 在Oppat Root Dir中輸入make
- 如果一切正常
如果要更改源代碼
- 您需要安裝Perl
- 在linux上，在oppat root dir中do： ./mk_depends.sh 。這將創建一個依賴s_lnx.mk依賴項文件。
- on Windows, in the OPPAT root dir do: .mk_depends.bat . This will create a depends_win.mk dependency file.
If you are going to run the sample run_perf.sh or run_xperf.bat scripts, then you need to build the spin and wait utilities:
- On Linux: ./mk_spin.sh
- On Windows: .mk_spin.bat

Running OPPAT

Run the data collection steps above
- now you have data files in a dir (if you ran the default run_* scripts:
  - on Windows ..oppat_datawinmem_bw4
  - on Linux ../oppat_data/lnx/mem_bw4
- You need to add the created files to the input_filesinput_data_files.json file:
Starting OPPAT reads all the data files and starts the web server
on Windows to generate the haswell cpu_diagram (assuming your data dir is ..oppat_datalnxmem_bw7)

   binoppat.exe -r ..oppat_datalnxmem_bw7 --cpu_diagram webhaswell_block_diagram.svg > tmp.txt

on Windows (assuming your data dir is ..oppat_datawinmem_bw4)

   binoppat.exe -r ..oppat_datawinmem_bw4 > tmp.txt

on Linux (assuming your data dir is ../oppat_data/lnx/mem_bw4)

   bin/oppat.exe -r ../oppat_data/lnx/mem_bw4 > tmp.txt

Now connect your browser to localhost:8081
You can create a standalone HTML file with the '--web_file some_file.html' option.例如：

   bin/oppat.exe -r ../oppat_data/lnx/mem_bw4 --web_file tst2.html > tmp.txt

Then you can load the file into the browser with the URL address: file:///C:/some_path/oppat/tst2.html

Derived Events

'Derived events' are new events created from 1 or more events in a data file.

Say you want to use the ETW Win32k InputDeviceRead events to track when the user is typing or moving the mouse.
- ETW has 2 events:
  - Microsoft-Windows-Win32k/InputDeviceRead/win:Start
  - Microsoft-Windows-Win32k/InputDeviceRead/win:Stop
- So with the 2 above events we know when the system started reading input and we know when it stopped reading input
- But OPPAT plots just 1 event per chart (usually... the cpu_busy chart is different)
- We need a new event that marks the end of the InputDeviceRead and the duration of the event
The derived event needs:
- a new event name (in chart.json... see for example the InputDeviceRead event)
- a LUA file and routine in src_lua subdir
- 1 or more 'used events' from which the new event is derived
  - the derived events have to be in the same file
  - For the InputDeviceRead example, the 2 Win32k InputDeviceRead Start/Stop events above are used.
The 'used events' are passed to the LUA file/routine (along with the column headers for the 'used events') as the events are encountered in the input trace file
- In the InputDeviceRead lua script:
  - the script records the timestamp and process/pid/tid of a 'start' event
  - when the script gets a matching 'Stop' event (matching on process/pid/tid), the script computes a duration for the new event and passes it back to OPPAT
A 'trigger event' is defined in chart.json and if the current event is the 'trigger event' then (after calling the lua script) the new event is emitted with the new data field(s) from the lua script.
An alternate to the 'trigger event' method is to have the lua script indicate whether or not it is time to write the new event. For instance, the scr_lua/prf_CPI.lua script writes a '1' to a variable named ' EMIT ' to indicate that the new CPI event should be written.
The new event will have:
- the name (from the chart.json evt_name field)
- The data from the trigger event (except the event name and the new fields (appended)
I have tested this on ETW data and for perf/trace-cmd data

Using the browser GUI Interface

TBD

Defining events and charts in charts.json

TBD

Rules for input_data_files.json

The file 'input_files/input_data_files.json' can be used to maintain a big list of all the data directories you have created.
You can then select the directory by just specifying the file_tag like:
- bin/oppat.x -u lnx_mem_bw4 > tmp.txt # assuming there is a file_tag 'lnx_mem_bw4' in the json file.
The big json file requires you to copy the part of the data dir's file_list.json into input_data_files.json
- in the file_list.json file you will see lines like:

{ "cur_dir" : " %root_dir%/oppat_data/win/mem_bw4 " },
{ "cur_tag" : " win_mem_bw4 " },
{ "txt_file" : " etw_trace.txt " , "tag" : " %cur_tag% " , "type" : " ETW " },
{ "txt_file" : " etw_energy2.txt " , "wait_file" : " wait.txt " , "tag" : " %cur_tag% " , "type" : " LUA " }

don't copy the lines like below from the file_list.json file:

{ "file_list" :[ 
  ]}

paste the copied lines into input_data_files.json. Pay attention to where you paste the lines. If you are pasting the lines at the top of input_data_files.json (after the {"root_dir":"/data/ppat/"}, then you need add a ',' after the last pasted line or else JSON will complain.
for Windows data files add an entry like below to the input_filesinput_data_files.json file:
- yes, use forward slashes:

{ "root_dir" : " /data/ppat/ " },
{ "cur_dir" : " %root_dir%/oppat_data/win/mem_bw4 " },
{ "cur_tag" : " win_mem_bw4 " },
{ "txt_file" : " etw_trace.txt " , "tag" : " %cur_tag% " , "type" : " ETW " },
{ "txt_file" : " etw_energy2.txt " , "wait_file" : " wait.txt " , "tag" : " %cur_tag% " , "type" : " LUA " }

for Linux data files add an entry like below to the input_filesinput_data_files.json file:

{ "root_dir" : " /data/ppat/ " },
{ "cur_dir" : " %root_dir%/oppat_data/lnx/mem_bw4 " },
{ "cur_tag" : " lnx_mem_bw4 " },
{ "bin_file" : " prf_energy.txt " , "txt_file" : " prf_energy2.txt " , "wait_file" : " wait.txt " , "tag" : " %cur_tag% " , "type" : " LUA " },
{ "bin_file" : " prf_trace.data " , "txt_file" : " prf_trace.txt " , "tag" : " %cur_tag% " , "type" : " PERF " },
{ "bin_file" : " tc_trace.dat " ,  "txt_file" : " tc_trace.txt " , "tag" : " %cur_tag% " , "type" : " TRACE_CMD " },

Unfortunately you have to pay attention to proper JSON syntax (such as trailing ','s)
Here is an explanation of the fields:
- The 'root_dir' field only needs to entered once in the json file.
  - It can be overridden on the oppat cmd line line with the -r root_dir_path option
  - If you use the -r root_dir_path option it is as if you had set "root_dir":"root_dir_path" in the json file
  - the 'root_dir' field has to be on a line by itself.
- The cur_dir field applies to all the files after the cur_dir line (until the next cur_dir line)
  - the '%root_dir% string in the cur_dir field is replaced with the current value of 'root_dir'.
  - the 'cur_dir' field has to be on a line by itself.
- the 'cur_tag' field is a text string used to group the files together. The cur_tag field will be used to replace the 'tag' field on each subsequent line.
  - the 'cur_tag' field has to be on a line by itself.
- For now there are four types of data files indicated by the 'type' field:
  - type:PERF These are Linux perf files. OPPAT currently requires both the binary data file (the bin_file field) created by the perf record cmd and the perf script text file (the txt_file field).
  - type:TRACE_CMD These are Linux trace-cmd files. OPPAT currently requires both the binary dat file (the bin_file field) created by the trace-cmd record cmd and the trace-cmd report text file (the txt_file field).
  - type:ETW These are Windows ETW xperf data files. OPPAT currently requires only the text file (I can't read the binary file). The txt_file is created with xperf ... -a dumper command.
  - type:LUA These files are all text files which will be read by the src_lua/test_01.lua script and converted to OPPAT data.
    - the 'prf_energy.txt' file is perf stat output with Intel RAPL energy data and memory bandwidth data.
    - the 'prf_energy2.txt' file is created by the wait utility and contains battery usage data in the 'perf stat' format.
    - the 'wait.txt' file is created by the wait utility and shows the timestamp when the wait utility began
      - Unfortunately 'perf stat' doesn't report a high resolution timestamp for the 'perf stat' start time

限制

The data is not reduced on the back-end so every event is sent to the browser... this can be a ton of data and overwhelm the browsers memory
- I probably should have some data reduction logic but I wanted to get feedback first
- You can clip the files to a time range: oppat.exe -b abs_beg_time -e abs_beg_time to reduce the amout of data
  - This is a sort of crude mechanism right now. I just check the timestamp of the sample and discard it if the timestamp is outside the interval. If the sample has a duration it might actually have data for the selected interval...
- There are many cases where you want to see each event as opposed to averages of events.
- On my laptop (with 4 CPUs), running for 10 seconds of data collection runs fine.
- Servers with lots of CPUs or running for a long time will probably blow up OPPAT currently.
- The stacked chart can cause lots of data to be sent due to how it each event on one line is now stacked on every other line.
Limited mechanism for a chart that needs more than 1 event on a chart...
- say for computing CPI (cycles per instruction).
- Or where you have one event that marks the 'start' of some action and another event that marks the 'end' of the action
- There is a 'derived events' logic that lets you create a new event from 1 or more other events
- See the derived event section
The user has to supply or install the data collection software:
- on Windows xperf
  - See https://docs.microsoft.com/en-us/windows-hardware/get-started/adk-install
  - You don't need to install the whole ADK... the 'select the parts you want to install' will let you select just the performance tools
- on Linux perf and/or trace-cmd
  - For perf, try:

sudo apt-get install linux-tools-common linux-tools-generic linux-tools- ` uname -r `

For trace-cmd, see https://github.com/rostedt/trace-cmd
- You can do (AFAIK) everything in 'perf' as you can in 'trace-cmd' but I have found trace-cmd has little overhead... perhaps because trace-cmd only supports tracepoints whereas perf supports tracepoints, sampling, callstacks and more.
Currently for perf and trace-cmd data, you have to give OPPAT both the binary data file and the post-processed text file.
- Having some of the data come from the binary file speeds things up and is more reliable.
- But I don't want to the symbol handling and I can't really do the post-processing of the binary data. Near as I can tell you have to be part of the kernel to do the post processing.
OPPAT requires certain clocks and a certain syntax of 'convert to text' for perf and trace-cmd data.
- OPPAT requires clock_monotonic so that different file timestamps can be correlated.
- When converting the binary data to text (trace-cmd report or 'perf script') OPPAT needs the timestamp to be in nanoseconds.
- see scriptsrun_xperf.bat and scriptsrun_perf.sh for the required syntax
given that there might be so many files to read (for example, run_perf.sh generates 7 input files), it is kind of a pain to add these files to the json file input_filesinput_data_files.json.
- the run_xperf.bat and run_perf.sh generate a file_list.json in the output directory.
perf has so, so many options... I'm sure it is easy to generate some data which will break OPPAT
- The most obvious way to break OPPAT is to generate too much data (causing browser to run out of memory). I'll probably handle this case better later but for this release (v0.1.0), I just try to not generate too much data.
- For perf I've tested:
  - sampling hardware events (like cycles, instructions, ref-cycles) and callstacks for same
  - software events (cpu-clock) and callstacks for same
  - tracepoints (sched_switch and a bunch of others) with/without callstacks
Zooming Using touchpad scroll on Firefox seems to not work as well it works on Chrome