Grafana - The open-source platform for monitoring and observability

Grafana

The open-source platform for monitoring and observability.

License Drone Go Report Card

Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share dashboards with your team and foster a data driven culture:

  • Visualize: Fast and flexible client side graphs with a multitude of options. Panel plugins offer many different ways to visualize metrics and logs.
  • Dynamic Dashboards: Create dynamic & reusable dashboards with template variables that appear as dropdowns at the top of the dashboard.
  • Explore Metrics: Explore your data through ad-hoc queries and dynamic drilldown. Split view and compare different time ranges, queries and data sources side by side.
  • Explore Logs: Experience the magic of switching from metrics to logs with preserved label filters. Quickly search through all your logs or streaming them live.
  • Alerting: Visually define alert rules for your most important metrics. Grafana will continuously evaluate and send notifications to systems like Slack, PagerDuty, VictorOps, OpsGenie.
  • Mixed Data Sources: Mix different data sources in the same graph! You can specify a data source on a per-query basis. This works for even custom datasources.

Get started

Unsure if Grafana is for you? Watch Grafana in action on play.grafana.org!

Documentation

The Grafana documentation is available at grafana.com/docs.

Contributing

If you're interested in contributing to the Grafana project:

Get involved

License

Grafana is distributed under AGPL-3.0-only. For Apache-2.0 exceptions, see LICENSING.md.

Owner
Grafana Labs
Grafana Labs is behind leading open source projects Grafana and Loki, and the creator of the first open & composable observability platform.
Grafana Labs
Comments
  • building alerting system for grafana

    building alerting system for grafana

    Hi everyone, I recently joined raintank and I will be working with @torkelo, @mattttt , and you, on alerting support for Grafana.

    From the results of the Grafana User Survey it is obvious that alerting is the most commonly missed feature for Grafana. I have worked on/with a few alerting systems in the past (nagios, bosun, graph-explorer, etsy's kale stack, ...) and I'm excited about the opportunity in front of us: we can take the best of said systems, but combine them with Grafana's focus on a polished user experience, resulting in a powerful alerting system, well-integrated and smooth to work with.

    First of all, terminology sync:

    • alerting: executing logic (threshold checks or more advanced) to know the state of an entity. (ok, warning, critical)
    • notifications: emails, text messages, posts to chat, etc to make people aware of a state change
    • monitoring: this term covers everything about monitoring (data collection, visualizations, alerting) so I won't be using it here.

    I want to spec out requirements, possible implementation ideas and their pro's/cons. With your feedback, we can adjust, refine and choose a specific direction.

    General thoughts:

    • integration with existing tools vs built-in: there's some powerfull alerting systems out there (bosun, kale) that deserve integration. Many alerting systems are more basic (define expression/threshold, get notification when breached), for those it seems integration is not worth the pain (though I won't stop you)
      The integrations are a long term effort. I think the low hanging fruit ("meet 80% of the needs with 20% of the effort") can be met with a system that is more closely tied to Grafana, i.e. compiled into the grafana binary. That said, a lot of people confuse seperation of concerns with "must be different services". If the code is sane, it'll be decoupled packages but there's nothing necessarily wrong with compiling them together. i.e. you could run:
      • 1 grafana binary that does everything (grafana as you know it + all alerting features) for simplicity
      • multiple grafana binaries in different modes (visualization instances and alerting instances) even highly available/redundant setups if you want to, using an external worker queue

    That said, we don't want to reinvent the wheel: we want alerting code and functionality to integrate well with Grafana, but if high-quality code is compatible, we should use it. In fact, I have a prototype that leverages some existing bosun code. (see "Current state")

    • polling vs stream processing: they have different performance characteristics, but they should be able to take the same or similar alerting rule definitions (thresholds, boolean logic, ..), they mostly are about how the actual rules are executed and don't change much about how rules are defined. Since polling is much simpler and should be able to scale fairly far this should IMHO be our initial focus.

    Current state

    The raintank/grafana version currently has an alerting package with a simple scheduler, an in-process worker bus as well as rabbitmq based, an alert executor and email notifications. It uses the bosun expression libraries which gives us the ability to evaluate arbitrarily complex expressions (use several metrics, use boolean logic, math, etc). This package is currently raintank-specific but we will merge a generic version of this into upstream grafana. This will provide an alert execution platform but notably still missing is

    1. an interface to create and manage alerting rules
    2. state management (acknowledgements etc)

    these are harder problems, which I hope to tackle with your input.

    Requirements, Future implementations

    First off, I think bosun is a pretty fantastic system for alerting (not so much for visualization) You can make your alerting rules as advanced as you want, and it enables you to fine-tune over time, backtest on historical data, so you can get them just right. And it has a good state machine. In theory we could just compile bosun straight into grafana, and leverage bosun via its REST api instead of Golang api, but then we have less finegrained control and for now I feel more comfortable trying out piece by piece (piece meaning golang package) and make the integration decision on a case by case basis. Though the integration may look different down the road based on experience and as we figure out what we want our alerting to look like.

    Either way, we don't just want great alerting. We want great alerting combined with great visualizations, notifications with context, and a smooth workflow where you can manage your alerts in the same place you manage your visualizations. So it needs to be nicely integrated into Grafana. To that end, there's a few things to consider:

    1. some visualized metrics (metrics plotted on graphs) are not alerted on
    2. some visualized metrics are alerted on:
      • A: with simple threshold checks: easy to visualize alerting logic
      • B: with more advanced logic: (e.g. look at standard deviation of the series being plotted, compare current median against historical median, etc): can't easily be visualized nex to the input series
    3. some metrics used in alerting logic are not to be vizualized

    Basically, there's a bunch of stuff you may want visualized (V), and a bunch of stuff you want alerts (A), and V and A have some overlap. I need to think about this a bit more and wonder what y'all think. There will definitely need to be 1 central place where you can get an overview of all the things you're alerting on, irrespective of where those rules are defined.

    There's a few more complications which I'll explain through an example sketch of how alerting could look like: sketch

    let's say we have a timeseries for requests (A) and one for errorous requests (B) and this is what we want to plot. we then use fields C,D,E to put stuff that we don't want to alert on. C contains the formula for ratio of error requests against the total.

    we may for example want to alert (see E) if the median of this ratio in the last 5min ago is more than 1.5 of what the ratio was in the same 5minute period last week, and also if the errors seen in the last 5min is worse than the errors seen since 2 months ago until 5min ago.

    notes:

    • some queries use different timeranges than what is rendered
    • in addition to processing by tsdb (such as Graphite's sum(), divide() etc which return series) we need to be able to reduce series to single numbers. fairly easy to implement (and in fact currently the bosun library does this for us)
    • we need boolean logic (bosun also gives us this)
    • in this example the expression only uses variables defined within the same panel, but it might make sense to include expressions of other panels/graphs.

    other ponderings:

    • do we integrate with current grafana graph threshold settings (which are currently for viz only, not for processing) ? if the expression is a threshold check, we could automatically display a threshold line
    • using the letters is a bit clunky, could we refer to the aliases instead? like #requests and #errors?
    • if the expression are stats.$site.requests and stats.$site.errors, and we want to have seperate alert instances for every site (but only set up the rule once)? what if we only want it for a select few of the sites. what if we want different parameters based on which site? bosun actually supports all these features, and we could expose them though we should probably build a UI around them.

    I think for an initial implementation every graph could have two fields, like so:

    warn: - expression
             - notification settings (email,http hook, ..)
    crit: - expression
            -notification settings
    

    where the expression is something like what I put in E in the sketch. for logic/data that we don't want to visualize, we just toggle off the visibility icon. grafana would replace the variables in the formula's, execute the expression (with the current bosun based executor). results (state changes) could be fed into something like elasticsearch and displayed via the annotations system.

    Thoughts? Do you have concerns or needs that I didn't addres?

  • [Feature request] Add Alert support for singlestats

    [Feature request] Add Alert support for singlestats

    Please include this information:

    • What Grafana version are you using? v4.0.2 (commit: v4.0.2)

    • What datasource are you using? Graphite

    • What OS are you running grafana on? Mac OS

    • What did you do? Went to a dashboard, clicked on a single stat and expected to find the "Alert" tab.

    • What was the expected result? I was expecting to see the "Alert" tab

    • What happened instead? The "Alert" tab was not present for the Singlestat. It is present for Graph though.

    If it relates to alerting

    • An image of the test execution data fully expanded.
    screen shot 2016-12-16 at 02 28 33
  • Grafana 2.0: SQL Data source

    Grafana 2.0: SQL Data source

    With the backend comes the possibility to have SQL data source.

    My thinking is that when you add the data source you

    • db type (initially only mysql and postgres and sqlite3)
    • db connection details
    • specify a metric query template (basically a SQL query with params)
    • specify a annotation query template

    Maybe also an option to allow RAW SQL queries from the panel metric query interface.

    Any other ideas?

  • [Feature request] Multiple alerts per graph

    [Feature request] Multiple alerts per graph

    As per http://docs.grafana.org/alerting/rules/, Grafana plans to track state per series in future releases.

    • "If a query returns multiple series then the aggregation function and threshold check will be evaluated for each series. What Grafana does not do currently is track alert rule state per series." and
    • "To improve support for queries that return multiple series we plan to track state per series in a future release"

    But it seems like there can be use cases where we have graphs containing set of metrics for which different sets of alerts are required. This is slightly different from "Support per series state change" ( https://github.com/grafana/grafana/issues/6041 ) because

    1. The action (notifications) can be different.
    2. Also, tracking separate states of an alert is not always preferred (as the end-user would need to know the details behind the individual states ) vs just knowing if alert is triggered.

    Grafana version = 4.x

  • Localize time_format in graphs

    Localize time_format in graphs

    Hello dear developers team. Thank you for a awesome product, but I have one problem.

    Now there is a hadcoded US date format in time_format function. The most annoying case is display month and day. When I see something like "2/3" I'm a bit confused. Is it "the second of Mart" or "the third of February"? The most sad thing, that I can't to configure this behaviour.

    Unfortunately the simplest way (and may be the most proper) doesn't help here. I mean toLocaleString with additional options. You can return different array of options instead of hardcoded format pattern and this method convert date in accordance with a right locale. But in our case there is a jquery plot and it requires date format for converting timestamp by itself.

    So, the second way is make some kind of mapping locale -> format array. Example. It seems a bit ugly. But it could be a single working solution.

    May be I missed some obvious and better solutions. That's why I didn't create a pull request. =)

  • Alerting support for queries using template variables

    Alerting support for queries using template variables

    It would be pretty useful if grafana would support alerting for queries using template variables. The way I see it work it would be as follows:

    1. Generate queries foreach template variable combination (discarding template variable for all)
    2. When generating queries, consider the frozen list if the template variable is set to never refresh, else update the template variable list
    3. Allow filtering (trough regex or by providing a static value) for each template variable

    The current workaround is to use an invisible wildcard metric, but the problem I see with this approach is that it loses context.

  • Elasticsearch as timeseries datasource

    Elasticsearch as timeseries datasource

    I'm aware of #158, but in this issue I want to discuss whether Elasticsearch is a good fit for storing timeseries data and whether grafana's graph panel should support fetching timeseris data from elasticsearch. I'm not suggesting that grafana should also implement features like log analytics.

    Elasticsearch offers the date_histogram aggregation that could be used to feed the graphs. When comparing the available aggregations elasticsearch has to offer to InfluxDB's aggregate functions, you can see, that they are quite simmilar.

    One could argue, that if InfluxDB already offers this features, why add 'yad' (yet another datasource)? Even though it looks like InfluxDB's development is moving fast, I think that elasticsearch is a more mature, easier to scale scaleable, and easier to install and operate. Also, because it's based on Java it just runs anywhere, even Windows and Solaris for example.

    Besides, a lot of people already use elasticsearch to store the dashboards and add annotations to the graphs. If elasticsearch could also be used as a timeseries datasource, you would just have to install one database which again reduces time and cost to install and operate the system.

    Because you would store metrics in a json format, the ideas described in metric 2.0 could be incorporated.

    A metric document could look something like this:

    {
        "@timestamp" : "2013-07-20T09:43:58.000+0000",
        "name" : "bulk-request-timer",
        "tags" : {
            "application"  : "my awesome app",
            "host"  : "host-1",
            "datacenter"  : "dc1"
        },
        "values" : {
            "count" : 114,
            "max" : 109.681,
            "mean" : 5.439666666666667,
            "min" : 2.457,
            "p50" : 4.3389999999999995,
            "p75" : 5.0169999999999995,
            "p95" : 8.37175,
            "p98" : 9.6832,
            "p99" : 94.68429999999942,
            "p999" : 109.681,
            "stddev" : 9.956913151098842,
            "m15_rate" : 0.10779994503690074,
            "m1_rate" : 0.07283351433589833,
            "m5_rate" : 0.10101298115113727,
            "mean_rate" : 0.08251056571678642
        }
    }
    

    Finally, I think that these features should rather be added to grafana than to kibana. IMHO kibana's focus just isn't displaying metrics but analytics. For example, it is not possible to add multiple graphs to one panel. For instance, if you wanted to display multiple percentiles of the response time in one panel, that's just not possible. Also there is no such functionality like grafana's templated queries. That could be implemented in grafana by performing a terms aggregation e.g. on the host field to populate the dropdown with all available hosts.

    I think @Dieterbe already experimented using elasticsearch as a timeseries database for his graphite-ng project. What are your experiences and thoughts about my thoughts? @torkelo what do you think about all of that?

  • Feature Request (Grafana 2.0): LDAP / Active Directory authentication and import groups

    Feature Request (Grafana 2.0): LDAP / Active Directory authentication and import groups

    I need to authenticate user against an Active Directory Backend.

    We need also import all "memberOf" groups as main groups to organize dashboards.

    Could be possible that every user can see / edit (depending on the group permissions ) the dashboards contained on the groups where he belongs ?

  • Support for multiple organizations

    Support for multiple organizations

    Update: Thank you for your feedback on this, we've decided to keep multi org support. Refer to the closing comment on this issue for more information.

    Organizations (orgs) were originally created to support a multi-tenancy hosting service. Meaning, to use a single Grafana server to provide a fully isolated Grafana experience to multiple companies.

    But mainly used orgs to help split a large installation up and create some separation between users & teams within a single company. In Grafana v5 we added teams & dashboard permissions to help manage a large single organization within Grafana so that use case is no longer recommended.

    Multi-org support is currently a big usability problem, especially around user management. Removing multi-org support would make Grafana a lot easier to manage and setup for single org setups. But we realize this could be very disruptive and want to gather all feedback on use cases first.

    Whatever we decide, we plan to have some migration story ready when we remove multi-org support.

  • Support more datasources in alerting

    Support more datasources in alerting

    Since alerting is executed in the back end we have to reimplement support for the timeserie databases

    • [x] Prometheus
    • [x] Influxdb
    • [x] Elasticsearch
    • [x] Cloudwatch
    • [x] OpenTsdb
  • Support for multiple series & bars (side by side) for same time point

    Support for multiple series & bars (side by side) for same time point

    I have multiple data source on one single graph. For example I have 6 metrics on one single graph.Under chart options when i select bars they all show up on single bar line I want to show them as different bars ? How do i do that ? I have attached a sample image file.

    The problem with having all of them on one single line is the last data source that i add takes up all the space on the bar.

    git2

    git

    images

  • Add Snyk GitHub Action

    Add Snyk GitHub Action

    What this PR does / why we need it: This PR adds a short GH Action to generate and upload SBOM reports via Snyk.

    Which issue(s) this PR fixes:

    Fixes #

    Special notes for your reviewer:

    Secrets will need to be created in the Org before this action will work.

    FYSA @grafana/security-team

  • Update emotion monorepo to v11.9.0

    Update emotion monorepo to v11.9.0

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | @emotion/css (source) | 11.7.1 -> 11.9.0 | age | adoption | passing | confidence | | @emotion/react | 11.8.2 -> 11.9.0 | age | adoption | passing | confidence |


    Release Notes

    emotion-js/emotion (@​emotion/css)

    v11.9.0

    Compare Source

    Patch Changes
    emotion-js/emotion (@​emotion/react)

    v11.9.0

    Compare Source

    Patch Changes

    Configuration

    📅 Schedule: At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about these updates again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

  • Update dependency webpack-dev-server to v4.9.0

    Update dependency webpack-dev-server to v4.9.0

    WhiteSource Renovate

    This PR contains the following updates:

    | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | webpack-dev-server | 4.8.1 -> 4.9.0 | age | adoption | passing | confidence |


    Release Notes

    webpack/webpack-dev-server

    v4.9.0

    Compare Source

    Features
    Bug Fixes
    4.8.1 (2022-04-06)
    Bug Fixes

    Configuration

    📅 Schedule: At any time (no schedule defined).

    🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

    Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

    🔕 Ignore: Close this PR and you won't be reminded about this update again.


    • [ ] If you want to rebase/retry this PR, click this checkbox.

    This PR has been generated by WhiteSource Renovate. View repository job log here.

  • Snapshots: Add event tracking for snapshot creation

    Snapshots: Add event tracking for snapshot creation

    What this PR does / why we need it:

    • adds event tracking for snapshot creation!

    Which issue(s) this PR fixes:

    Fixes #48573

    Special notes for your reviewer:

  • LibraryPanels: Library Panels with orphaned connections cannot be deleted

    LibraryPanels: Library Panels with orphaned connections cannot be deleted

    What happened:

    As a result of #47460 (fixed in #49161), it's possible to have library panels connected to non-existent dashboards, which cannot be deleted from the UI and requires manual database intervention. See https://github.com/grafana/grafana/issues/47460#issuecomment-1129901186

    What you expected to happen:

    The user should be able to delete a library panel after deleting all (visible) connections to it.

    How to reproduce it (as minimally and precisely as possible):

    Prior to https://github.com/grafana/grafana/pull/49161 being merged:

    • Create a dashboard with a library panel and export for external sharing.
    • Import the dashboard into the same grafana instance
      • The dashboard will import, but the panel will show an error for an unfound library panel.
    • Delete all panels using the library panel.
    • Try to delete the library panel
      • ⚠️ It'll error and say 'the element has connection' despite these connections not showing in the UI
Xiaomi Platform Tree for Snapdragon 660 Devices

This repository contains common device configuration for Xiaomi sdm660-based devices Copyright # # Copyright (C) 2018 The LineageOS Project # # Licens

Dec 17, 2021
Uberlog - Cross platform multi-process C++ logging system

uberlog uberlog is a cross platform C++ logging system that is: Small Fast Robust Runs on Linux, Windows, OSX MIT License Small Two headers, and three

Apr 29, 2022
log4cplus is a simple to use C++ logging API providing thread-safe, flexible, and arbitrarily granular control over log management and configuration. It is modelled after the Java log4j API.

% log4cplus README Short Description log4cplus is a simple to use C++17 logging API providing thread--safe, flexible, and arbitrarily granular control

May 19, 2022
Colorful Logging is a simple and efficient library allowing for logging and benchmarking.
Colorful Logging is a simple and efficient library allowing for  logging and benchmarking.

Colorful-Logging "Colorful Logging" is a library allowing for simple and efficient logging as well for benchmarking. What can you use it for? -Obvious

Feb 17, 2022
View and log aoe-api requests and responses

aoe4_socketspy View and log aoe-api requests and responses Part 1: https://www.codereversing.com/blog/archives/420 Part 2: https://www.codereversing.c

Apr 28, 2022
Portable, simple and extensible C++ logging library
Portable, simple and extensible C++ logging library

Plog - portable, simple and extensible C++ logging library Pretty powerful logging library in about 1000 lines of code Introduction Hello log! Feature

May 19, 2022
A DC power monitor and data logger
A DC power monitor and data logger

Hoverboard Power Monitor I wanted to gain a better understanding of the power consumption of my hoverboard during different riding situations. For tha

May 1, 2021
An ATTiny85 implementation of the well known sleep aid. Includes circuit, software and 3d printed case design
An ATTiny85 implementation of the well known sleep aid. Includes circuit, software and 3d printed case design

dodowDIY An ATTiny85 implementation of the well known sleep aid. Includes circuit, software and 3d printed case design The STL shells are desiged arou

Nov 7, 2021
A BSD-based OS project that aims to provide an experience like and some compatibility with macOS
A BSD-based OS project that aims to provide an experience like and some compatibility with macOS

What is Helium? Helium is a new open source OS project that aims to provide a similar experience and some compatibiilty with macOS on x86-64 sytems. I

May 13, 2022
A revised version of NanoLog which writes human readable log file, and is easier to use.
A revised version of NanoLog which writes human readable log file, and is easier to use.

NanoLogLite NanoLogLite is a revised version of NanoLog, and is easier to use without performance compromise. The major changes are: NanoLogLite write

May 17, 2022
Receive and process logs from the Linux kernel.

Netconsd: The Netconsole Daemon This is a daemon for receiving and processing logs from the Linux Kernel, as emitted over a network by the kernel's ne

Apr 28, 2022
Minimalistic logging library with threads and manual callstacks

Minimalistic logging library with threads and manual callstacks

May 11, 2022
Compressed Log Processor (CLP) is a free tool capable of compressing text logs and searching the compressed logs without decompression.

CLP Compressed Log Processor (CLP) is a tool capable of losslessly compressing text logs and searching the compressed logs without decompression. To l

Mar 16, 2022
A Fast and Convenient C++ Logging Library for Low-latency or Real-time Environments

xtr What is it? XTR is a C++ logging library aimed at applications with low-latency or real-time requirements. The cost of log statements is minimised

Apr 28, 2022
Log.c2 is based on rxi/log.c with MIT LICENSE which is inactive now. Log.c has a very flexible and scalable architecture

log.c2 A simple logging library. Log.c2 is based on rxi/log.c with MIT LICENSE which is inactive now. Log.c has a very flexible and scalable architect

Feb 13, 2022
✔️The smallest header-only GUI library(4 KLOC) for all platforms
✔️The smallest header-only GUI library(4 KLOC) for all platforms

Welcome to GUI-lite The smallest header-only GUI library (4 KLOC) for all platforms. 中文 Lightweight ✂️ Small: 4,000+ lines of C++ code, zero dependenc

May 13, 2022
Example code for collecting weather data from an ESP32 and then uploading this data to InfluxDB in order to create a dashboard using Grafana.

InfluxGrafanaTutorial Example code for collecting weather data from an ESP32 and then uploading this data to InfluxDB in order to create a dashboard u

May 12, 2022