Hi, my name is Stephan, and used grafana primary to monitor our NetApp systems (29I have a grafana vm running primarily for the monitoring of my netapps (29) but i added influxdb and telegraf, to monitor also out vmware environment (1000 vm’s, 100 hosts), i’ve got every thing working only the Datastores usage capacity tells me i have no data. i changed the quick ranges from 5 minutes to 24 hours but still no data. i think it is timing out on the table due to so many datastores, there are 118 datastores. can anyone help finding out why this is not working?
i’m running telegraf 1.11.0.
and this is what the telegraf.log returns a couple of 100 times:.
[agent] input “inputs.vsphere” did not complete within its interval
This is my telegraf.conf:
Telegraf Configuration
Telegraf is entirely plugin driven. All metrics are gathered from the
declared inputs, and sent to the declared outputs.
Plugins must be declared in here to be active.
To deactivate a plugin, comment out the name and any variables.
Use ‘telegraf -config telegraf.conf -test’ to see what metrics a config
file would generate.
Environment variables can be used anywhere in this config file, simply surround
them with {}. For strings the variable must be within quotes (ie, "{STR_VAR}"),
for numbers and booleans they should be plain (ie, {INT_VAR}, {BOOL_VAR})
Global tags can be specified here in key=“value” format.
[global_tags]
dc = “us-east-1” # will tag all metrics with dc=us-east-1
rack = “1a”
Environment variables can be used as tags, and throughout the config file
user = “$USER”
Configuration for telegraf agent
[agent]
Default data collection interval for all inputs
interval = “60s”
Rounds collection interval to ‘interval’
ie, if interval=“10s” then always collect on :00, :10, :20, etc.
round_interval = true
Telegraf will send metrics to outputs in batches of at most
metric_batch_size metrics.
This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 20000
Maximum number of unwritten metrics per output.
metric_buffer_limit = 1000000
Collection jitter is used to jitter the collection by a random amount.
Each plugin will sleep for a random time within jitter before collecting.
This can be used to avoid many plugins querying things like sysfs at the
same time, which can have a measurable effect on the system.
collection_jitter = “0s”
Default flushing interval for all outputs. Maximum flush_interval will be
flush_interval + flush_jitter
flush_interval = “10s”
Jitter the flush interval by a random amount. This is primarily to avoid
large write spikes for users running a large number of telegraf instances.
ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = “1s”
By default or when set to “0s”, precision will be set to the same
timestamp order as the collection interval, with the maximum being 1s.
ie, when interval = “10s”, precision will be “1s”
when interval = “250ms”, precision will be “1ms”
Precision will NOT be used for service inputs. It is up to each individual
service input to set the timestamp at the appropriate precision.
Valid time units are “ns”, “us” (or “µs”), “ms”, “s”.
precision = “”
Log at debug level.
debug = false
Log only error level messages.
quiet = false
Log file name, the empty string means to log to stderr.
logfile = “/var/log/telegraf/telegraf.log”
The logfile will be rotated after the time interval specified. When set
to 0 no time based rotation is performed.
logfile_rotation_interval = “1d”
The logfile will be rotated when it becomes larger than the specified
size. When set to 0 no size based rotation is performed.
logfile_rotation_max_size = “0MB”
Maximum number of rotated archives to keep, any older logs are deleted.
If set to -1, no archives are removed.
logfile_rotation_max_archives = 5
Override default hostname, if empty use os.Hostname()
hostname = “”
If set to true, do no set the “host” tag in the telegraf agent.
omit_hostname = false
###############################################################################
OUTPUT PLUGINS
###############################################################################
Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
The full HTTP or UDP URL for your InfluxDB instance.
Multiple URLs can be specified for a single cluster, only ONE of the
urls will be written to each interval.
urls = [“unix:///var/run/influxdb.sock”]
urls = [“udp://127.0.0.1:8089”]
urls = [“http://127.0.0.1:8086”]
The target database for metrics; will be created as needed.
For UDP url endpoint database needs to be configured on server side.
database = “vsphere”
The value of this tag will be used to determine the database. If this
tag is not set the ‘database’ option is used as the default.
database_tag = “”
If true, no CREATE DATABASE queries will be sent. Set to true when using
Telegraf with a user without permissions to create databases or when the
database already exists.
skip_database_creation = false
Name of existing retention policy to write to. Empty string writes to
the default retention policy. Only takes effect when using HTTP.
retention_policy = “”
Write consistency (clusters only), can be: “any”, “one”, “quorum”, “all”.
Only takes effect when using HTTP.
write_consistency = “any”
Timeout for HTTP messages.
timeout = “5s”
HTTP Basic Auth
username = “xxx”
password = “xxx”
HTTP User-Agent
user_agent = “telegraf”
UDP payload size is the maximum packet size to send.
udp_payload = “512B”
Optional TLS Config for use on HTTP connections.
tls_ca = “/etc/telegraf/ca.pem”
tls_cert = “/etc/telegraf/cert.pem”
tls_key = “/etc/telegraf/key.pem”
Use TLS but skip chain & host verification
insecure_skip_verify = true
HTTP Proxy override, if unset values the standard proxy environment
variables are consulted to determine which proxy, if any, should be used.
http_proxy = “http://corporate.proxy:3128”
Additional HTTP headers
http_headers = {“X-Special-Header” = “Special-Value”}
HTTP Content-Encoding for write request body, can be set to “gzip” to
compress body or “identity” to apply no encoding.
content_encoding = “identity”
When true, Telegraf will output unsigned integers as unsigned values,
i.e.: “42u”. You will need a version of InfluxDB supporting unsigned
integer values. Enabling this option will result in field type errors if
existing data has been written.
influx_uint_support = false
###############################################################################
INPUT PLUGINS
###############################################################################
Read metrics about cpu usage
[[inputs.cpu]]
Whether to report per-cpu stats or not
percpu = true
Whether to report total system cpu stats or not
totalcpu = true
If true, collect raw CPU time metrics.
collect_cpu_time = false
If true, compute and report the sum of all non-idle CPU states.
report_active = false
Read metrics about disk usage by mount point
[[inputs.disk]]
By default stats will be gathered for all mount points.
Set mount_points will restrict the stats to only the specified mount points.
mount_points = ["/"]
Ignore mount points by filesystem type.
ignore_fs = [“tmpfs”, “devtmpfs”, “devfs”, “iso9660”, “overlay”, “aufs”, “squashfs”]
Get kernel statistics from /proc/stat
[[inputs.kernel]]
no configuration
Read metrics about memory usage
[[inputs.mem]]
no configuration
Get the number of processes and group them by status
[[inputs.processes]]
no configuration
Read metrics about swap memory usage
[[inputs.swap]]
no configuration
Read metrics about system load & uptime
[[inputs.system]]
Uncomment to remove deprecated metrics.
fielddrop = [“uptime_format”]
###############################################################################
SERVICE INPUT PLUGINS
###############################################################################
Realtime instance
[[inputs.vsphere]]
interval = “60s”
vcenters = [ “xxx” ]
username = “xxx”
password = “xxx”
insecure_skip_verify = true
force_discover_on_init = true
Exclude all historical metrics
datastore_metric_exclude = [""]
cluster_metric_exclude = [""]
datacenter_metric_exclude = ["*"]
collect_concurrency = 5
discover_concurrency = 5
Historical instance
[[inputs.vsphere]]
interval = “300s”
vcenters = [ “xxx” ]
username = “xxx”
password = “xxx”
insecure_skip_verify = true
force_discover_on_init = true
host_metric_exclude = [""] # Exclude realtime metrics
vm_metric_exclude = [""] # Exclude realtime metrics
datastore_metric_include = [""]
cluster_metric_include = [""]
datacenter_metric_include = ["*"]
max_query_metrics = 256
collect_concurrency = 4
discover_concurrency = 4