oppat下载 - oppat源代码下载

oppat

PowerBuilder

1.0.0

下载

开放能力/性能分析工具（OPPAT）

介绍

开放功率/性能分析工具（OPPAT）是跨架构功率和性能分析工具。

Cross-OS：支持Windows ETW跟踪文件和Linux/Android Perf/Trace-CMD跟踪文件
跨架构：支持英特尔和手臂芯片硬件事件（使用PERF和/或PCM）

项目网页是https://patinnc.github.io

源代码回购是https://github.com/patinnc/oppat

我已经在CPU框图功能中添加了一个操作系统（OS_VIEW）。这基于Brendan Gregg的页面，例如http://www.brendangregg.com/linuxperf.html。这是一些用于运行Geekbench v2.4.2（32位代码）的旧版本的示例数据，该手臂64bit Ubuntu Mate v18.04.2 Raspberry Pi 3 B+，4 ARM Cortex A53 CPU：

OS_VIEW和ARM A53 CPU框图运行GeekBench的更改的视频：
- 有一些介绍性幻灯片可以尝试解释OS_VIEW和CPU_DIAGRAM布局，然后1个幻灯片显示了30个子测验中的每一个的结果
电影中数据的excel文件：电影中的geekbench excel文件
电影中的数据的HTML ...请参阅4个核心ARM Cortext A53上的Geekbench v2.4.2，带有OS_VIEW，CPU图。
所有30个阶段的仪表板PNG通过增加说明/秒排序...请参阅带有CPU图4核芯片仪表板的ARM Cortex A53 Raspberry Pi 3运行Geekbench。

这是一些用于运行我的旋转基准测试的示例数据（内存/缓存带宽测试，“旋转” keep-cpu-busy Test测试）Raspberry Pi 3 B+（Cortex A53）CPU：

OS_VIEW和ARM A53 CPU框图运行旋转的更改的视频：
- 有一些介绍性幻灯片可以尝试解释OS_VIEW图表和CPU_DIAGRAGRAM布局，这是一个显示每个子测试的时间（以秒为单位）显示（因此您可以转到t = x secs直接转到该子测验），然后为5个子测试中的每个幻灯片中的每个幻灯片进行一次幻灯片
电影中数据的excel文件：电影中的geekbench excel文件
电影中数据的HTML ...参见ARM Cortex A53 Raspberry Pi 3带有CPU图4核芯片运行旋转基准测试。
所有5个阶段的仪表板PNG通过增加指令/秒排序...请参阅带有CPU图4核芯片仪表板的ARM Cortex A53 Raspberry Pi 3运行自旋基准。

这是用于在Haswell CPU上运行Geekbench的一些示例数据：

OS_VIEW和HASWELL CPU CPU框图运行GeekBench的更改的视频：
- 有一些介绍性幻灯片可以尝试解释OS_VIEW图表和CPU_DIAGRAM布局，这是一个幻灯片，显示了每个子检验时（以秒为单位）显示（因此您可以转到t = x secs直接转到该子测验），然后为50个子测试中的每个幻灯片中的每个幻灯片进行一次幻灯片
电影中数据的excel文件：电影中的geekbench excel文件
电影中数据的HTML ...参见Intel Haswell，带有CPU图4-CPU芯片运行Geekbench。
所有50个阶段的仪表板PNG通过增加UOPS退休/SEC排序...请参见Intel Haswell仪表板，带有CPU图4核芯片仪表板运行Geekbench。

这是一些用于运行我的“自旋”基准测试的数据，并在Haswell CPU上进行4个子测试：

第一个子测验是读取内存带宽测试。在测试期间，L2/L3/内存块高度使用并停滞不前。大鼠UOPS/循环很低，因为大鼠大部分停滞不前。
第二个子测验是L3读带宽测试。内存BW现在很低。在测试期间，L2和L3块高度使用并停滞不前。大鼠UOPS/循环较高，因为大鼠停滞不前。
第三个子测验是L2读带宽测试。 L3和内存BW现在很低。在测试期间，L2块高度使用并停滞不前。大鼠UOPS/循环甚至更高，因为大鼠的失速较小。
第四个子测验是一个旋转（只是添加循环）测试。 L2，L3和内存BW接近零。大鼠UOPS/循环约为3.3 UOP/循环，即接近4个UOPS/循环最大。
通过分析运行“自旋”的Haswell CPU框图更改的视频。看
电影中数据的excel文件：旋转电影中的excel文件
电影中数据的HTML ...参见带有CPU图4-CPU芯片的Intel Haswell运行自旋基准。

带有CPU图数据收集的Intel Haswell适用于4-CPU Intel芯片，Linux OS，HTML文件，并通过PERF Sampling和其他收集的数据，带有50多hw事件。 cpu_diagram特征：

从wikichip.org的框图SVG开始（允许使用），
查看资源限制（例如最大BW，最大途径上的最大字节/循环，最小周期/UOP等），
计算用于资源使用的指标
下面是内存读取带宽测试的表，该测试在表中显示了资源使用信息（以及由于使用情况而导致CPU是否停滞的估计）。悬停在字段上时，HTML表（但没有PNG）具有弹出信息。表显示：
- 核心停滞在内存带宽上，最大可能25.9 GB/s BW的55％。这是一个内存BW测试
- Superqueue（SQ）已满（核心0和62.3％core1）（因此无法处理更多的L2请求）
- 线条加速器FB已满（30％和51％），因此无法从L2移动到L1D
- 结果是后端停滞（88％和87％），没有UOPS退休。
- UOPS似乎来自环路检测器（因为LSD循环/UOP与大鼠UOPS/循环大致相同。
- Haswell CPU图记忆BW表的屏幕截图
以下是L3读取带宽测试的表。
- 现在，内存BW和L3遗失字节/周期大约为零。
- SQ较少停滞（因为我们不在等内存）。
- L2交易字节/周期高约2倍，约为最大64个字节/循环的67％。
- UOPS_RETIRID_STALLS/CYCEL已从MEM BW测试摊位下降到88％的66％。
- 现在，加速缓冲区摊位高2倍以上。 UOPS仍来自LSD。
- Haswell CPU图L3 BW表的屏幕截图
以下是L2读带宽测试的表。
- L2错过字节/周期远低于L3测试。
- 现在，UOPS_RETRIED％停滞不见的是L3测试的一半，为34％，FB摊位也约为17％。
- UOPS仍来自LSD。
- Haswell CPU图L2 BW表的屏幕截图
以下是一个自旋测试的表（没有负载，只需添加一个循环）即可。
- 现在，几乎有零内存子系统摊位。
- UOPS来自解码流缓冲区（DSB）。
- 3.31循环/UOP时的大鼠retired_uops/循环接近可能的4.0 UOPS/循环。
- 老鼠retired_uops％失速在％8时非常低。
- Haswell CPU图旋转表的屏幕截图

目前，我只有Haswell和ARM A53的CPU_DIAGRAMY电影（因为我没有其他系统要测试），但是不难添加其他框图。您仍然可以获得所有图表，但没有获得CPU_DIAGRAM。

以下是相对图之一。 “ CPU_BUSY”图表显示了每个CPU上正在运行的内容以及每个CPU上发生的事件。例如，绿色圆圈显示一个在CPU 1上运行的spin.x线。红色圆圈显示了在CPU1上发生的一些事件。该图表以Trace-CMD的kernelshark图表进行建模。有关CPU_BUSY图表的更多信息在图表类别中。呼叫框显示光标下事件的事件数据（包括Callstack（如果有））。不幸的是，Windows屏幕截图不会捕获光标。 CPU繁忙图表的屏幕截图

这是一些示例HTML文件。大多数文件的间隔较短，但有些是“完整” 8秒的运行。这些文件不会直接从存储库加载，但是它们将从项目网页上加载：https：//patinnc.github.io

Intel Haswell带有CPU图4-CPU芯片，Linux OS，HTML文件，通过Perf Sampling或50+ HW事件或
Intel 4-CPU芯片，Windows OS，HTML文件，通过Xperf采样或1 HW事件或
完整〜8秒Intel 4-CPU芯片，Windows OS，带PCM和XPERF采样的HTML文件或
Intel 4-CPU芯片，Linux OS，HTML文件，在2个多路复用组中具有10个HW事件。
ARM（Broadcom A53）芯片，Raspberry PI3 Linux HTML文件，带有14个HW事件（用于2个多路复用组的CPI，L2 MISSES，MEM BW等）。
11 MB，上臂的完整版（Broadcom A53）芯片，Raspberry PI3 Linux HTML文件，带有14个HW事件（用于2个多路复用组的CPI，L2 MISSES，MEM BW等）。

上述某些文件是从〜8秒长期提取的〜2秒间隔。这是整个8秒的运行：

整个8秒Linux运行示例HTML压缩文件在此处以获取更完整的文件。该文件对图表数据进行JavaScript Zlib解压缩，因此您会看到消息要求您在解压缩过程中等待（约20秒）。

支持的数据支持

Linux Perf和/或Trace-CMD性能文件（二进制文件和文本文件），
- 完美统计输出也接受了
- 英特尔PCM数据，
- 其他数据（使用LUA脚本导入），
- 因此，这应该与常规Linux或Android的数据一起使用
- 当前，对于PERF和TRACE-CMD数据，OPPAT需要二进制和后处理的文本文件，并且在“记录”命令行和'perf Script/Trace-CMD报告'命令行上有一些限制。
- 可以使Oppat仅使用perf/trace-cmd文本输出文件，但目前需要二进制文件和文本文件
Windows ETW数据（由Xperf收集并倾倒到文本）或Intel PCM数据，
使用LUA脚本支持的任意权力或性能数据（因此您无需重新编译C ++代码即可导入其他数据（除非LUA性能成为问题）
在Linux或Windows上读取数据文件，无论文件起源于何处（因此，在Linux上读取Windows或ETW文本文件上的Perf/Trace-CMD文件）

对立可视化

这是一些完整的示例VisualZation HTML文件：Windows示例HTML文件或此Linux示例HTML文件。如果您在存储库中（不是Github.io项目网站），则必须下载文件，然后将其加载到浏览器中。这些是由OPPAT创建的独立Web文件，例如，可以通过电子邮件发送给其他人，或（如下所示）发布在Web服务器上。

OPPAT在铬中的作用要比Firefox更好，这主要是因为使用TouchPad 2手指滚动的变焦在Chrome上效果更好。

Oppat具有3种可视化模式：

通常的图表机制（Oppat后端在其中读取数据文件并将数据发送到浏览器）
您还可以创建一个独立的网页，该网页相当于“常规图表机制”，但可以与其他用户交换...独立网页具有所有内置的脚本和数据，因此可以通过电子邮件将其发送给某人，并且可以将其加载到浏览器中。请参阅上面引用的sample_html_files中的html文件，并且（有关lnx_mem_bw4的较长版本）请参见“压缩文件sample_html_files/lnx_mem_bw4_flyl.html.html”
您可以' - 保存'数据json文件，然后 - 稍后加载文件。保存的JSON文件具有OPPAT需要发送到浏览器的数据。这避免了重新阅读输入perf/xperf文件，但不会拿起Charts.json中所做的任何更改。使用-web_file选项创建的完整HTML文件仅比 - 保存文件大一点。 - 保存/ - 负载模式需要构建Oppat。请参阅sample_data_json_files subdir中的示例“保存”文件。

VIZ一般信息

在浏览器（Linux或Windows上）中图表所有数据
图表在JSON文件中定义
浏览器接口有点像Windows WPA（左侧的Navbar）。
- 下面显示了左Navbar（左侧滑动菜单）。
- 图表按类别分组（GPU，CPU，Power等）
  - 类别是在Input_files/Charts.json中定义和分配的
- 通过单击Navbar中的图表，可以全部隐藏或选择性地显示图表。
- 悬停在左NAV菜单中的图表标题上，将图表到视图
一组文件的数据可以在另一组旁边绘制
- 因此，您可以说，比较Linux Perf Perform vs Windows ETW运行
- 下图显示Linux vs Windows功率使用情况：
- - 我只能在Linux和Windows上使用电池电源。
  - 许多站点具有更好的功率数据（以MSEC（或更高）速率为单位的电压/电流/电源）。将这些类型的电源数据（例如来自Kratos或Qualcomm MDPS）合并很容易，但是我无法访问数据。
- 或在同一平台上比较2种不同的运行
- 文件组标签（file_tag）前缀为标题以区分图表
  - 在数据dir的file_list.json文件和/或input_files/input_data_files.json中定义了“标签”。
  - input_files/input_data_files.json是所有Oppat Data Dirs的列表（但用户必须维护它）。
- 绘制具有相同标题的图表一个接一个地绘制

图表功能：

悬停在图表的一行的一部分上显示了该线路的数据点
- 这对垂直线不起作用，因为它们只是连接2分...只有每行的水平零件才能搜索数据值
- 以下是悬停事件的屏幕截图。这显示了（CSWTICH）事件的相对时间，例如Process/PID/TID等一些信息以及文本文件中的行号，因此您可以获取更多信息。
- 以下是一个屏幕截图，显示了事件的呼叫插场信息（如果有）。
变焦
- 无限放大到纳米克级，然后放大。
  - 比像素比像素要多的点要多的数量级，因此在放大时显示了更多的数据。
  - 下面是一个屏幕截图，显示缩放到微秒级别。这显示了Sched_switch事件的呼叫列表，其中Spin.x通过执行内存映射操作和闲置而被阻止。 “ CPU繁忙”图表将“闲置”显示为空白。
  - 。
- 可以单独缩放图表，也可以使用相同的file_tag图表进行链接
  - 滚动到左纳维托的底部，然后单击“ Zoom/Pan：Unlinded”。这将将菜单项更改为“ Zoom/Pan：链接”。这将使文件组中的所有图表缩放/pan缩小到最新的Zoom/Pan绝对时间。这将需要一些时间才能重新绘制所有图表。
    - 最初，绘制每个图表以显示所有可用数据。如果您的图表来自不同的来源，则T_BEGIN和T_END（对于来自不同来源的图表）可能是不同的。
    - 一旦完成了缩放/锅操作，并且链接已生效，则文件组中的所有图表都将缩放/pan缩小到相同的绝对间隔。
      - 这就是为什么每个源使用的“时钟”必须相同。
      - oppat可以从一个时钟转换为另一个时钟（例如gettime（clock_monotonic）和getTimeofday（）），但是该逻辑
    - 无论链接状态如何，任何时间间隔的火焰图始终都会缩放到“拥有图表”间隔中。
- 您可以通过：
  - 放大：鼠标轮垂直在图表区域。图表放大图表中心的时间。
    - 在我的笔记本电脑上，这是在触摸板上垂直滚动2个手指
  - 放大：单击图表并将鼠标拖到右侧并释放鼠标（图表将缩放为选定的间隔）
  - 缩小：单击图表并将鼠标拖到左侧并释放鼠标将与您选择的图表的数量成反比。也就是说，如果您几乎拖动整个图表区域，则图表将缩放〜2倍。如果您刚刚拖动一个小间隔，则图表将整个路线缩小。
  - 缩放：在我的笔记本电脑上，在缩放的相反方向上进行触摸板2手指垂直滚动
- 您必须小心光标在哪里...当您打算滚动图表列表时，您可能会无意中缩小图表。因此，当我想滚动图表时，我通常将光标放在屏幕的左边缘。
平板
- 在我的笔记本电脑上，这是在触摸板上进行水平滚动运动的2个手指
- 在图表下方的缩略图上使用绿色框
- 平移在任何变焦级别上工作
- 完整图表的“缩略图”图片位于每个图表下方，并带有一个光标，可以沿着缩略图滑动，因此您可以在缩放/平移时在图表周围导航
- 下面显示“ CPU繁忙”图表t = 1.8-2.37秒。相对时间和绝对开始时间在左红色椭圆形中升高。最终时间在右侧红色椭圆形中突出显示。缩略图上的相对位置由中红色椭圆形显示。
- 。
悬停在图表传奇条目上突出了这一行。
- 下面是一个屏幕截图，其中突出显示了“ PKG”（软件包）功率
单击图表传奇条目可切换该行的可见性。
双击传奇条目仅使该条目可见/隐藏
- 下面是双击“ PKG”功率的屏幕截图，因此只能看到PKG线。
- 上面显示Y轴调整为显示变量的最小/最大。传说中的“未显示”线在传奇中被弄清楚。如果您悬停在传说中的“未显示”线上，它将被绘制（当您徘徊在传说项目上时）。您可以通过双击“不谱”传奇条目来获取所有要再次显示的项目。这将显示所有“未显示”行，但它将切换您刚刚单击的行...因此，请单击您刚击中的项目。我知道这听起来令人困惑。
如果隐藏了传奇条目，并且您将其悬停在它上，则将显示为直到悬停

图表类型：

“ CPU繁忙”图表：类似KernelShark的图表，显示了PID/线程的CPU占用率。请参阅kernelshark参考http://rostedt.homelinux.com/kernelshark/
- 以下是CPU繁忙图表的屏幕截图。图表显示，对于每个CPU，在任何时间点运行的过程/PID/TID。闲置过程未绘制。对于屏幕截图上的“ CPU 1”，绿色椭圆形在图表的“上下文开关”部分。在每个CPU的上下文开关信息上方，CPU繁忙显示与上下文交换事件同一文件中的事件。 CPU 1线上的红色椭圆形显示了图表的事件部分。
- 该图表基于上下文开关事件，并在任何给定时间显示每个CPU上的线程
- 上下文开关事件是Linux Sched：Sched_switch或Windows ETW CWERD事件。
- 如果事件比数据文件中的上下文开关多，则所有其他事件都表示为CPU上方的垂直破折号。
- 如果事件有呼叫堆栈，那么弹出气球也显示了通话堆栈
线图
- 由于每个事件（到目前为止）具有持续时间，因此线图可能更准确地称为步骤图表，并且这些“持续时间”由水平段表示，并由垂直段连接。
- 如果图表线有很多变化，则步骤图的垂直部分可以填充图表
- 您可以选择（在左NAV栏中）不连接每行的水平段...因此，图表变成了一种“分散的破折号”图表。 “地平线破折号”是实际数据点。当您从步骤图表切换到破折号图表时，直到有一些“ redraw'请求（例如Zoom/Pan或Emairlight（通过悬停在传奇条目上））之前，该图表不会重新绘制。
  - 以下是使用线图的CPU_IDLE功率状态的屏幕截图。连接线将图表中的信息删除。
  - 以下是使用散落的仪表图表的CPU_IDLE功率状态的屏幕截图。该图现在显示了数据点的水平破折号（破折号的宽度是事件的持续时间）。现在，我们可以看到更多信息，但是此图表还显示了我的图表逻辑的缺点：许多数据是最大值和图表的最小值，并且被遮盖了。
堆叠图表
- 堆叠的图表可能会导致生成更多的数据，而不是线图。例如，绘制特定线程运行何时仅取决于该线程的行图。为运行线程绘制堆叠的图表是不同的：任何线程上的上下文开关事件都会更改所有其他运行线程...因此，如果您有N CPU，则将获得n-1的n-1件事，每个事件都可以绘制堆叠图表的图表。
火焰弹。对于每个具有Callstacks的PERF事件，并且与Sched_switch/Cwitch事件相同，则创建了一个FlameGraph。
- 以下是典型的默认火焰图的屏幕截图。通常，FlameGraph图表的默认高度不足以将文本适合到火焰图的每个级别。但是您仍然可以获得“悬停”呼叫堆栈信息。
- 。
- 如果单击图表层，它将扩展更高，使文本适合。如果单击最低层，则涵盖了“拥有图”间隔的所有数据。
- 下面是放大火焰弹药的屏幕截图（单击火焰的一层之后）。
- 。
- 通常，FlameGraph图表的默认高度不足以将文本适合到火焰图的每个级别。但是您仍然可以获得“悬停”呼叫堆栈信息。
- FlameGraph的颜色与CPU_BUSY图表的传说中的过程/PID/TID相匹配...因此它不像火焰般的图形那样漂亮，但是现在“火焰”的颜色实际上意味着某种内容。
- CPI（每次指令时钟）FlameGraph图表为该堆栈的CPI涂上过程/PID/TID。
  - 以下是一个未缩放的CPI图表。 spin.x的左手实例（浅橙色）的CPI = 2.26个周期/指令。右绿色的4个Spin.x的cpi = 6.465。
  - 您必须有周期，说明和cpu-clock（或sched_switch）呼叫站
  - CPI“火焰”的宽度基于CPU-CLOCK时间。
  - 颜色基于CPI。图表左上方的红色至绿色至蓝色梯度显示着着色。
  - 红色是CPI低的（每个时钟的说明很多...我认为它是“热”）
  - 蓝色是一个高CPI（每个时钟的说明很少...我认为它是“冷”）
  - 以下是样本缩放的CPI图表，显示着色和CPI。 “ Spin.x”线程已在CPU_BUSY传奇中取消，因此它们不会出现在FlameGraph中。
- GIPS（GIGA（十亿）指令每秒）FlameGraph图表为GIPS颜色该堆栈的过程/PID/TID。
  - 以下是一个未缩放的GIPS（每秒千兆/十亿个说明）图表。 spin.x的左手实例（浅绿色）的GIPS = 1.13。蓝绿色的右侧4个旋转X的GIPS = 0.377。气球中的呼叫站是气球左侧的Spike Callstack。
  - 在上图中指出，左手堆栈（用于spin.x）获得的说明/秒高于Spin.x的最右边4个实例。 Spin.x的这些第一个实例自身运行（因此获得大量内存BW）和右4个旋转。X线程并行运行并获得较低的GIPS（因为一个线程几乎可以最大化存储器BW）。
  - 您必须有说明和CPU-clock（或sched_switch）呼叫站
  - GIPS“火焰”的宽度基于CPU-CLOCK时间。
  - 颜色基于GIP。图表左上方的红色至绿色至蓝色梯度显示着着色。
  - 红色是高gip的（每秒很多说明...我认为它是“热”做很多工作的）
  - 蓝色是低gip的（每秒很少的说明...我认为它是“冷”）
  - 下面是样品缩放GIPS图表，显示着色和GIP。我单击了“ perf 3186/3186”，所以下面显示了火焰。
- 如果您在传说中隐藏一个过程（单击传奇条目...将被弄清楚），则该过程将不会在FlameGraph中显示。
- 如果您正确地将鼠标拖到火焰仪中，则该部分将被缩放
- 单击“火焰”缩放到该火焰
- 左将鼠标拖动在火焰中将缩小
- 单击FlameGraph的较低级别的“ Unzooms”到该级别的所有数据
- 如果您单击“全部”最低含量，则将缩放
- 当您单击FlameGraph级别时，图表的大小大小，以使每个级别都足够高以显示文本。这会导致图表大小。为了尝试为该调整大小的调整增加一些理智，我将调整大小图表的最后一个级别定位到可见屏幕的底部。
- 如果您返回传奇并悬停在隐藏的条目上
- 如果您单击“火焰”上层，则该部分将被缩放。
- 如果您在“父”图表上放大/输出
- 如果您在“父”图表上向左/右pan锅，则将重新绘制所选间隔的火焰图
- 默认情况下，火焰图的每个级别的文本可能不合适。如果您单击FlameGraph，则大小将扩展以启用绘制文本
- 您可以选择（在左NAV栏中）是否按Process/pid/tid或Process/PID或仅通过过程进行对火焰绘制进行分组。
- 都绘制了“在CPU”，“ OFF CPU”和“ Run Queue” Flamegraphs的“ cpu”。
  - “在CPU上”是线程在CPU上运行时正在执行的操作的呼叫站...因此SmempledProfile Callstacks或Perf CPU-CLOCK CALLASTACKS指示该线程在CPU上运行时的线程在做什么
  - “不在CPU”显示，对于不运行的线程，他们等待了多长时间以及交换后的呼叫箱。
    - 以下是OFF-CPU或等待时间图表的屏幕截图。弹出窗口（在火焰中）显示“ wait.x”正在等待纳米腿上。
    - 交换“状态”（以及在ETW上的“原因”）显示为该过程上方的一个级别。通常，大多数线程都会在睡觉或不运行，但这有助于回答以下问题：“当我的线程不运行时...它在等什么？”。
    - 显示上下文开关的“状态”使您可以看到：
    - 例如，线程是否正在等待不可互相互动的睡眠（linux上的状态== d ...通常IO）
    - 可中断的睡眠（状态= S ...经常是纳米腿或futex）
  - “运行队列”显示了交换并处于运行或可运行状态的线程。因此，此图表显示了CPU的饱和度，如果有可运行状态但不运行的线程。
    - 以下是Run_queue图表的屏幕截图。该图显示了线程未运行的时间，因为CPU不够。也就是说，它已经准备好运行，但是还有其他一些使用CPU。每个火焰仪显示图中涵盖的总数。在Run_queue图表的情况下，它在等待时间显示约159毫秒。因此，鉴于Spin.x的运行时间约为20秒，而0.159秒“等待”时间，这似乎还不错。
我没有基于帆布的图表库可以使用，因此图表有点粗糙...如果有更好的东西，我不想花太多时间创建图表。由于数据量，图表必须使用HTML画布（不是SVG，D3.JS等）。

oppat的数据收集

收集性能和功率数据非常“情况”。一个人将要运行脚本，另一个人需要使用按钮开始测量，然后启动视频，然后用按钮按下按钮结束集合。我有一个用于Windows的脚本和一个用于Linux的脚本，可以演示：

启动数据收集，
运行工作量，
停止数据收集
后处理数据（从perf/xperf/trace-cmd二进制数据创建文本文件）
将所有数据文件放入输出DIR中
在输出dir中创建一个file_list.json文件（告诉Oppat oppat oppat oppat oppat oppat oppat oppation oppation Files的名称和类型）

使用脚本收集数据的步骤：

构建spin.exe（spin.x）和wait.exe（wait.x）实用程序
- 来自Oppat Root Dir：
- 在Linux上： ./mk_spin.sh
- 在Windows上： .mk_spin.bat （来自Visual Studio CMD框）
- 二进制文件将放在./bin subdir中
从运行提供的脚本开始：
- RUN_PERF_X86_HASWELL.SH-对于Haswell CPU_DIAGRAGRAGRAM数据集合
  - 在Linux上，类型： sudo bash ./scripts/run_perf.sh
  - 默认情况下，脚本将数据放入dir ../oppat_data/lnx/mem_bw7中
- run_perf.sh-您需要安装Trace -CMD和Perf
  - 在Linux上，类型： sudo bash ./scripts/run_perf.sh
  - 默认脚本将数据放入dir ../oppat_data/lnx/mem_bw4中
- run_xperf.bat-您需要安装xperf.exe。
  - 在Windows上，从具有管理员特权的CMD框，类型： .scriptsrun_xperf.sh
  - 默认情况下，脚本将数据放入dir .. oppat_data win mem_bw4
- 如果要更改默认值，请编辑运行脚本
- 除了数据文件外，运行脚本还在输出dir中创建file_list.json文件。 OPPAT使用file_list.json文件来找出输出dir中的文件名和文件类型。
运行脚本的“工作负载”是spin.x（或spin.exe），它在1 CPU上进行4秒钟，然后在所有CPU上进行内存带宽测试，然后再进行4秒。
另一个程序wait.x/wait.exe也在后台开始。 WAIT.CPP读取我的笔记本电脑的电池信息。它可以在我的双启动Windows 10/Linux Ubuntu笔记本电脑上工作。 SYSFS文件在您的Linux上可能具有不同的名称，并且在Android上几乎肯定是不同的。
在Linux上，您可能只能使用与run_perf.sh中相同的语法生成prf_trace.data和prf_trace.txt文件，但我没有尝试过。
如果您要在笔记本电脑上运行并想要获取电池电源，请在运行脚本之前记住断开电源电缆。

PCM数据支持

Oppat可以读取并图表PCM .CSV文件。
以下是创建图表列表的快照。
不幸的是，您必须对PCM进行补丁，以创建一个具有绝对时间戳的文件，以便OPPAT进行处理。
- 这是因为PCM CSV文件没有时间戳我可以用来与其他数据源相关。
我在这里添加了补丁PCM补丁

建立Oppat

在Linux上， make在Oppat root dir中
- 如果一切正常，应该有一个bin/oppat.x文件
在Windows上，您需要：
- 安装Windows版本的GNU Make。请参阅http://gnuwin32.sourceforge.net/packages/make.htm或，对于最少所需的二进制文件，请使用http://gnuwin32.sourceforge.net/downlinks/downlinks/make.php
- 将这个新的“ make”二进制放在路径中
- 您需要当前的Visual Studio 2015或2017 C/C ++编译器（我同时使用了VS 2015 Professional和VS 2017社区编译器）
- 启动Windows Visual Studio X64本机CMD提示框
- 在Oppat Root Dir中输入make
- 如果一切正常
如果要更改源代码
- 您需要安装Perl
- 在linux上，在oppat root dir中do： ./mk_depends.sh 。这将创建一个依赖s_lnx.mk依赖项文件。
- 在Windows上，在Oppat root Dir中： .mk_depends.bat 。 This will create a depends_win.mk dependency file.
If you are going to run the sample run_perf.sh or run_xperf.bat scripts, then you need to build the spin and wait utilities:
- On Linux: ./mk_spin.sh
- On Windows: .mk_spin.bat

Running OPPAT

Run the data collection steps above
- now you have data files in a dir (if you ran the default run_* scripts:
  - on Windows ..oppat_datawinmem_bw4
  - on Linux ../oppat_data/lnx/mem_bw4
- You need to add the created files to the input_filesinput_data_files.json file:
Starting OPPAT reads all the data files and starts the web server
on Windows to generate the haswell cpu_diagram (assuming your data dir is ..oppat_datalnxmem_bw7)

   binoppat.exe -r ..oppat_datalnxmem_bw7 --cpu_diagram webhaswell_block_diagram.svg > tmp.txt

on Windows (assuming your data dir is ..oppat_datawinmem_bw4)

   binoppat.exe -r ..oppat_datawinmem_bw4 > tmp.txt

on Linux (assuming your data dir is ../oppat_data/lnx/mem_bw4)

   bin/oppat.exe -r ../oppat_data/lnx/mem_bw4 > tmp.txt

Now connect your browser to localhost:8081
You can create a standalone HTML file with the '--web_file some_file.html' option.例如：

   bin/oppat.exe -r ../oppat_data/lnx/mem_bw4 --web_file tst2.html > tmp.txt

Then you can load the file into the browser with the URL address: file:///C:/some_path/oppat/tst2.html

Derived Events

'Derived events' are new events created from 1 or more events in a data file.

Say you want to use the ETW Win32k InputDeviceRead events to track when the user is typing or moving the mouse.
- ETW has 2 events:
  - Microsoft-Windows-Win32k/InputDeviceRead/win:Start
  - Microsoft-Windows-Win32k/InputDeviceRead/win:Stop
- So with the 2 above events we know when the system started reading input and we know when it stopped reading input
- But OPPAT plots just 1 event per chart (usually... the cpu_busy chart is different)
- We need a new event that marks the end of the InputDeviceRead and the duration of the event
The derived event needs:
- a new event name (in chart.json... see for example the InputDeviceRead event)
- a LUA file and routine in src_lua subdir
- 1 or more 'used events' from which the new event is derived
  - the derived events have to be in the same file
  - For the InputDeviceRead example, the 2 Win32k InputDeviceRead Start/Stop events above are used.
The 'used events' are passed to the LUA file/routine (along with the column headers for the 'used events') as the events are encountered in the input trace file
- In the InputDeviceRead lua script:
  - the script records the timestamp and process/pid/tid of a 'start' event
  - when the script gets a matching 'Stop' event (matching on process/pid/tid), the script computes a duration for the new event and passes it back to OPPAT
A 'trigger event' is defined in chart.json and if the current event is the 'trigger event' then (after calling the lua script) the new event is emitted with the new data field(s) from the lua script.
An alternate to the 'trigger event' method is to have the lua script indicate whether or not it is time to write the new event. For instance, the scr_lua/prf_CPI.lua script writes a '1' to a variable named ' EMIT ' to indicate that the new CPI event should be written.
The new event will have:
- the name (from the chart.json evt_name field)
- The data from the trigger event (except the event name and the new fields (appended)
I have tested this on ETW data and for perf/trace-cmd data

Using the browser GUI Interface

TBD

Defining events and charts in charts.json

TBD

Rules for input_data_files.json

The file 'input_files/input_data_files.json' can be used to maintain a big list of all the data directories you have created.
You can then select the directory by just specifying the file_tag like:
- bin/oppat.x -u lnx_mem_bw4 > tmp.txt # assuming there is a file_tag 'lnx_mem_bw4' in the json file.
The big json file requires you to copy the part of the data dir's file_list.json into input_data_files.json
- in the file_list.json file you will see lines like:

{ "cur_dir" : " %root_dir%/oppat_data/win/mem_bw4 " },
{ "cur_tag" : " win_mem_bw4 " },
{ "txt_file" : " etw_trace.txt " , "tag" : " %cur_tag% " , "type" : " ETW " },
{ "txt_file" : " etw_energy2.txt " , "wait_file" : " wait.txt " , "tag" : " %cur_tag% " , "type" : " LUA " }

don't copy the lines like below from the file_list.json file:

{ "file_list" :[ 
  ]}

paste the copied lines into input_data_files.json. Pay attention to where you paste the lines. If you are pasting the lines at the top of input_data_files.json (after the {"root_dir":"/data/ppat/"}, then you need add a ',' after the last pasted line or else JSON will complain.
for Windows data files add an entry like below to the input_filesinput_data_files.json file:
- yes, use forward slashes:

{ "root_dir" : " /data/ppat/ " },
{ "cur_dir" : " %root_dir%/oppat_data/win/mem_bw4 " },
{ "cur_tag" : " win_mem_bw4 " },
{ "txt_file" : " etw_trace.txt " , "tag" : " %cur_tag% " , "type" : " ETW " },
{ "txt_file" : " etw_energy2.txt " , "wait_file" : " wait.txt " , "tag" : " %cur_tag% " , "type" : " LUA " }

for Linux data files add an entry like below to the input_filesinput_data_files.json file:

{ "root_dir" : " /data/ppat/ " },
{ "cur_dir" : " %root_dir%/oppat_data/lnx/mem_bw4 " },
{ "cur_tag" : " lnx_mem_bw4 " },
{ "bin_file" : " prf_energy.txt " , "txt_file" : " prf_energy2.txt " , "wait_file" : " wait.txt " , "tag" : " %cur_tag% " , "type" : " LUA " },
{ "bin_file" : " prf_trace.data " , "txt_file" : " prf_trace.txt " , "tag" : " %cur_tag% " , "type" : " PERF " },
{ "bin_file" : " tc_trace.dat " ,  "txt_file" : " tc_trace.txt " , "tag" : " %cur_tag% " , "type" : " TRACE_CMD " },

Unfortunately you have to pay attention to proper JSON syntax (such as trailing ','s)
Here is an explanation of the fields:
- The 'root_dir' field only needs to entered once in the json file.
  - It can be overridden on the oppat cmd line line with the -r root_dir_path option
  - If you use the -r root_dir_path option it is as if you had set "root_dir":"root_dir_path" in the json file
  - the 'root_dir' field has to be on a line by itself.
- The cur_dir field applies to all the files after the cur_dir line (until the next cur_dir line)
  - the '%root_dir% string in the cur_dir field is replaced with the current value of 'root_dir'.
  - the 'cur_dir' field has to be on a line by itself.
- the 'cur_tag' field is a text string used to group the files together. The cur_tag field will be used to replace the 'tag' field on each subsequent line.
  - the 'cur_tag' field has to be on a line by itself.
- For now there are four types of data files indicated by the 'type' field:
  - type:PERF These are Linux perf files. OPPAT currently requires both the binary data file (the bin_file field) created by the perf record cmd and the perf script text file (the txt_file field).
  - type:TRACE_CMD These are Linux trace-cmd files. OPPAT currently requires both the binary dat file (the bin_file field) created by the trace-cmd record cmd and the trace-cmd report text file (the txt_file field).
  - type:ETW These are Windows ETW xperf data files. OPPAT currently requires only the text file (I can't read the binary file). The txt_file is created with xperf ... -a dumper command.
  - type:LUA These files are all text files which will be read by the src_lua/test_01.lua script and converted to OPPAT data.
    - the 'prf_energy.txt' file is perf stat output with Intel RAPL energy data and memory bandwidth data.
    - the 'prf_energy2.txt' file is created by the wait utility and contains battery usage data in the 'perf stat' format.
    - the 'wait.txt' file is created by the wait utility and shows the timestamp when the wait utility began
      - Unfortunately 'perf stat' doesn't report a high resolution timestamp for the 'perf stat' start time

限制

The data is not reduced on the back-end so every event is sent to the browser... this can be a ton of data and overwhelm the browsers memory
- I probably should have some data reduction logic but I wanted to get feedback first
- You can clip the files to a time range: oppat.exe -b abs_beg_time -e abs_beg_time to reduce the amout of data
  - This is a sort of crude mechanism right now. I just check the timestamp of the sample and discard it if the timestamp is outside the interval. If the sample has a duration it might actually have data for the selected interval...
- There are many cases where you want to see each event as opposed to averages of events.
- On my laptop (with 4 CPUs), running for 10 seconds of data collection runs fine.
- Servers with lots of CPUs or running for a long time will probably blow up OPPAT currently.
- The stacked chart can cause lots of data to be sent due to how it each event on one line is now stacked on every other line.
Limited mechanism for a chart that needs more than 1 event on a chart...
- say for computing CPI (cycles per instruction).
- Or where you have one event that marks the 'start' of some action and another event that marks the 'end' of the action
- There is a 'derived events' logic that lets you create a new event from 1 or more other events
- See the derived event section
The user has to supply or install the data collection software:
- on Windows xperf
  - See https://docs.microsoft.com/en-us/windows-hardware/get-started/adk-install
  - You don't need to install the whole ADK... the 'select the parts you want to install' will let you select just the performance tools
- on Linux perf and/or trace-cmd
  - For perf, try:

sudo apt-get install linux-tools-common linux-tools-generic linux-tools- ` uname -r `

For trace-cmd, see https://github.com/rostedt/trace-cmd
- You can do (AFAIK) everything in 'perf' as you can in 'trace-cmd' but I have found trace-cmd has little overhead... perhaps because trace-cmd only supports tracepoints whereas perf supports tracepoints, sampling, callstacks and more.
Currently for perf and trace-cmd data, you have to give OPPAT both the binary data file and the post-processed text file.
- Having some of the data come from the binary file speeds things up and is more reliable.
- But I don't want to the symbol handling and I can't really do the post-processing of the binary data. Near as I can tell you have to be part of the kernel to do the post processing.
OPPAT requires certain clocks and a certain syntax of 'convert to text' for perf and trace-cmd data.
- OPPAT requires clock_monotonic so that different file timestamps can be correlated.
- When converting the binary data to text (trace-cmd report or 'perf script') OPPAT needs the timestamp to be in nanoseconds.
- see scriptsrun_xperf.bat and scriptsrun_perf.sh for the required syntax
given that there might be so many files to read (for example, run_perf.sh generates 7 input files), it is kind of a pain to add these files to the json file input_filesinput_data_files.json.
- the run_xperf.bat and run_perf.sh generate a file_list.json in the output directory.
perf has so, so many options... I'm sure it is easy to generate some data which will break OPPAT
- The most obvious way to break OPPAT is to generate too much data (causing browser to run out of memory). I'll probably handle this case better later but for this release (v0.1.0), I just try to not generate too much data.
- For perf I've tested:
  - sampling hardware events (like cycles, instructions, ref-cycles) and callstacks for same
  - software events (cpu-clock) and callstacks for same
  - tracepoints (sched_switch and a bunch of others) with/without callstacks
Zooming Using touchpad scroll on Firefox seems to not work as well it works on Chrome