Metrics exposed by Scaphandre
With Stdout exporter, you can see all metrics available on your machine with flag --raw-metrics
.
With prometheus exporter, all metrics have a HELP section provided on /metrics (or whatever suffix you choosed to expose them).
Here are some key metrics that you will most probably be interested in:
scaph_host_power_microwatts
: Aggregation of several measurements to give a try on the power usage of the the whole host, in microwatts (GAUGE). It might be the same as RAPL PSYS (see RAPL domains) measurement if available, or a combination of RAPL PKG and DRAM domains + an estimation of other hardware componentes power usage.scaph_process_power_consumption_microwatts{exe="$PROCESS_EXE",pid="$PROCESS_PID",cmdline="path/to/exe --and-maybe-options"}
: Power consumption due to the process, measured on at the topology level, in microwatts. PROCESS_EXE being the name of the executable and PROCESS_PID being the pid of the process. (GAUGE)
For more details on that metric labels, see this section.
And some more deep metrics that you may want if you need to make more complex calculations and data processing:
scaph_host_energy_microjoules
: Energy measurement for the whole host, as extracted from the sensor, in microjoules. (COUNTER)scaph_socket_power_microwatts{socket_id="$SOCKET_ID"}
: Power measurement relative to a CPU socket, in microwatts. SOCKET_ID being the socket numerical id (GAUGE)
If your machine provides RAPL PSYS domain (see RAPL domains), you can get the raw energy counter for PSYS/platform with scaph_host_rapl_psys_microjoules
. Note that scaph_host_power_microwatts
is based on this PSYS counter if it is available.
Since 1.0.0 the following host metrics are availalable as well ;
scaph_host_swap_total_bytes
: Total swap space on the host, in bytes.scaph_host_swap_free_bytes
: Swap space free to be used on the host, in bytes.scaph_host_memory_free_bytes
: Random Access Memory free to be used (not reused) on the host, in bytes.scaph_host_memory_available_bytes
: Random Access Memory available to be re-used on the host, in bytes.scaph_host_memory_total_bytes
: Random Access Memory installed on the host, in bytes.scaph_host_disk_total_bytes
: Total disk size, in bytes.scaph_host_disk_available_bytes
: Available disk space, in bytes.
Disk metrics have the following labels : disk_file_system, disk_is_removable, disk_type, disk_mount_point, disk_name
scaph_host_cpu_frequency
: Global frequency of all the cpus. In MegaHertzscaph_host_load_avg_fifteen
: Load average on 15 minutes.scaph_host_load_avg_five
: Load average on 5 minutes.scaph_host_load_avg_one
: Load average on 1 minute.
If you hack scaph or just want to investigate its behavior, you may be interested in some internal metrics:
-
scaph_self_memory_bytes
: Scaphandre memory usage, in bytes -
scaph_self_memory_virtual_bytes
: Scaphandre virtual memory usage, in bytes -
scaph_self_topo_stats_nb
: Number of CPUStat traces stored for the host -
scaph_self_topo_records_nb
: Number of energy consumption Records stored for the host -
scaph_self_topo_procs_nb
: Number of processes monitored by scaph -
scaph_self_socket_stats_nb{socket_id="SOCKET_ID"}
: Number of CPUStat traces stored for each socket -
scaph_self_socket_records_nb{socket_id="SOCKET_ID"}
: Number of energy consumption Records stored for each socket, with SOCKET_ID being the id of the socket measured -
scaph_self_domain_records_nb{socket_id="SOCKET_ID",rapl_domain_name="RAPL_DOMAIN_NAME "}
: Number of energy consumption Records stored for a Domain, where SOCKET_ID identifies the socket and RAPL_DOMAIN_NAME identifies the rapl domain measured on that socket
Getting per process data with scaph_process_* metrics
Here are available labels for the scaph_process_power_consumption_microwatts
metric that you may need to extract the data you need:
exe
: is the name of the executable that is the origin of that process. This is good to be used when your application is running one or only a few processes.cmdline
: this contains the whole command line with the executable path and its parameters (concatenated). You can filter on this label by using prometheus=~
operator to match a regular expression pattern. This is very practical in many situations.instance
: this is a prometheus generated label to enable you to filter the metrics by the originating host. This is very useful when you monitor distributed services, so that you can not only sum the metrics for the same service on the different hosts but also see what instance of that service is consuming the most, or notice differences beteween hosts that may not have the same hardware, and so on...pid
: is the process id, which is useful if you want to track a specific process and have your eyes on what's happening on the host, but not so practical to use in a more general use case
Since 1.0.0 the following per-process metrics are available as well :
scaph_process_cpu_usage_percentage
: CPU time consumed by the process, as a percentage of the capacity of all the CPU Coresscaph_process_memory_bytes
: Physical RAM usage by the process, in bytesscaph_process_memory_virtual_bytes
: Virtual RAM usage by the process, in bytesscaph_process_disk_total_write_bytes
: Total data written on disk by the process, in bytesscaph_process_disk_write_bytes
: Data written on disk by the process, in bytesscaph_process_disk_read_bytes
: Data read on disk by the process, in bytesscaph_process_disk_total_read_bytes
: Total data read on disk by the process, in bytes
Get container-specific labels on scaph_process_* metrics
The flag --containers enables Scaphandre to collect data about the running Docker containers or Kubernetes pods on the local machine. This way, it adds specific labels to make filtering processes power consumption metrics by their encapsulation in containers easier.
Generic labels help to identify the container runtime and scheduler used (based on the content of /proc/PID/cgroup
):
container_scheduler
: possible values are docker
or kubernetes
. If this label is not attached to the metric, it means that scaphandre didn't manage to identify the container scheduler based on cgroups data.
Then the label container_runtime
could be attached. The only possible value for now is containerd
.
container_id
is the ID scaphandre got from /proc/PID/cgroup for that container.
For Docker containers (if container_scheduler
is set), available labels are :
container_names
: is a string containing names attached to that container, according to the docker daemoncontainer_docker_version
: version of the docker daemoncontainer_label_maintainer
: content of the maintainer field for this container
For containers coming from a docker-compose file, there are a bunch of labels related to data coming from the docker daemon:
container_label_com_docker_compose_project_working_dir
container_label_com_docker_compose_container_number
container_label_com_docker_compose_project_config_files
container_label_com_docker_compose_version
container_label_com_docker_compose_service
container_label_com_docker_compose_oneoff
For Kubernetes pods (if container_scheduler
is set), available labels are :
kubernetes_node_name
: identifies the name of the kubernetes node scaphandre is running onkubernetes_pod_name
: the name of the pod the container belongs tokubernetes_pod_namespace
: the namespace of the pod the container belongs to