Abstract Wikipedia team/Observability/Metrics
Appearance
Wikifunctions metrics is fed into our Grafana dashboard.
What Do I Look At?
[edit]CPU Usage
- For App Overview:
- 'Total CPU', row: 'Saturation'
- For Evaluator and Orchestrator: scroll down for their 'Saturation' rows
Heap and Memory Usage
- For App Overview: 'Total Memory', row: 'Saturation'
- For Evaluator and Orchestrator: scroll down for their 'Saturation' rows
- For NodeJS of Orchestrator:
- panel: 'NodeJS heap allocation overview', row: 'NodeJS'
- panels: 'NodeJS memory usage overview', 'NodeJS memory usage details'
Garbage Collection
- For the Orchestrator:
- panels that preface with 'NodeJS GC time', row: 'NodeJS'
Errors and Failures
- For the App: N/A
- For the Orchestrator: look at
function_orchestrator_function_implementation_error_count
under the row 'Function Calls'
What Do I Look For?
[edit]- In the row, 'Function Calls', see if there are spikes in the
function_orchestrator_function_duration_milliseconds
andfunction_orchestrator_function_implementation_error_count
. Many times, when we see spikes in duration and/or implementation errors, we notice heap/memory exceeded log events from NodeJS. We are still in the process of figuring out if these occurrences are directly correlated which is why we want to watch out for both types of events. - As mentioned in our Logging page, see how such spikes and graph trajectory might correspond with the events shown in LogStash.
- For reasons aforementioned, we should also see if there are any spikes in the 'Saturation' rows for the Orchestrator. Namely, those related to CPU and Memory usage.
(Additional info on Wikifunctions/Performance_observability)
Custom Metrics
[edit]The following metrics were added with the intent to track them via logging onto our LogStash. These metrics are instantiated in lib/util.js
in both the Evaluator and Orchestrator repositories.
(Debugging pro tip: Track these metrics locally by entering curl http://localhost:9100/metrics
in your terminal.)
function-evaluator:
Name | Type | Labels | Help (description) |
---|---|---|---|
function_evaluator_wasm_subprocess_count | Gauge | n/a | Number of currently running WASM subprocesses in Evaluator |
function-orchestrator:
Name | Type | Labels | Help (description) |
---|---|---|---|
function_orchestrator_router_request_duration_seconds | Histogram | 'path', 'method', 'status' | request duration handled by router in seconds |
function_orchestrator_incomingrequestcount | Counter | n/a | function-orchestrator request count |
function_orchestrator_nonrequesterror | Counter | n/a | function-orchestrator non-request error count |
function_orchestrator_outgoingresponsecount | Counter | n/a | function-orchestrator response count |
function_orchestrator_function_duration_milliseconds | Summary | 'Z7_function_identity', 'isBuiltIn', 'implementationZID', 'requestId' | function call duration in milliseconds |
function_orchestrator_function_implementation_error_count | Counter | 'error', 'reqId' | function call implementation errors count |
function_orchestrator_function_execute()_count | Counter | 'reqId' | function execution counter |
More Resources: