Telegraf and missing CPU interrupts

As I’ve been playing around with Telegraf and Grafana, I’ve noticed CPU interrupts and context switches are not apart of the standard metric gathering.

We know vmstat shows this information, and can be shown it in a easy processable list form:

$ vmstat -s
 1016888 K total memory
 497920 K used memory
 184412 K active memory
 173296 K inactive memory
 518968 K free memory
 70276 K buffer memory
 86416 K swap cache
 522236 K total swap
 88460 K used swap
 433776 K free swap
 18175 non-nice user cpu ticks
 0 nice user cpu ticks
 17799 system cpu ticks
 172172 idle cpu ticks
 7214 IO-wait cpu ticks
 0 IRQ cpu ticks
 2412 softirq cpu ticks
 0 stolen cpu ticks
 227755 pages paged in
 986944 pages paged out
 2353 pages swapped in
 121572 pages swapped out
 458705 interrupts
 1467529 CPU context switches
 1461773910 boot time
 3456 forks

I decided to hack together a little exec plugin for Telegraf using a Python script,
running the script will get you standard out JSON:

$ /cpu.py
{
 "CPU context switches": 1487011,
 "IO-wait cpu ticks": 7375,
 "IRQ cpu ticks": 0,
 "K active memory": 190392,
 "K buffer memory": 70756,
 "K free memory": 512988,
 "K free swap": 433776,
 "K inactive memory": 173296,
 "K swap cache": 86560,
 "K total memory": 1016888,
 "K total swap": 522236,
 "K used memory": 503900,
 "K used swap": 88460,
 "boot time": 1461773910,
 "forks": 3607,
 "idle cpu ticks": 183663,
 "interrupts": 465536,
 "nice user cpu ticks": 0,
 "non-nice user cpu ticks": 18424,
 "pages paged in": 227767,
 "pages paged out": 990560,
 "pages swapped in": 2353,
 "pages swapped out": 121572,
 "softirq cpu ticks": 2418,
 "stolen cpu ticks": 0,
 "system cpu ticks": 17902
}

The script is a little hacky, but gets the job done for these test:

cpu.py

#!/usr/bin/env python
import commands
import json

output = {}
raw = commands.getstatusoutput('vmstat -s')
raw = raw[1].split('\n')

for row in raw:
    value = int(row.split()[0])
    key = ' '.join(row.split()[1:])
    output[key] = value

print json.dumps(output)

All that is left is to add this script to Telegraf as a plugin:

$ cat /etc/telegraf/telegraf.d/cpu.conf
[[inputs.exec]]
 command = "/cpu.py"
 data_format = "json"
 name_suffix = "_cpu"
 interval = "5s"

And now we have access to all the counters shown in vmstat:

Screen Shot 2016-04-27 at 9.59.11 AM


Share on Facebook Share on Twitter Share on Google+

Python traceroute style tool

Recently, while talking with a some techies I was asked to explain how traceroute works. I was quick to answer what results I expect back from the command, and how to understand that data, but for the life of me I was having issues recalling how TTL works at the foundation.

Later that week I spent some time reading up on it, and click, it all came back.

To reinforce this I decided to write some code. I decided to use Scapy in order to make
crafting network packets a breeze (shown below). You should be able to get Scapy right from PyPi:

$ sudo pip install scapy

tracehop.py

#!/usr/bin/env python


# avoid printing some ipv6 debug stuff from scapy.
import logging
logging.getLogger("scapy.runtime").setLevel(logging.ERROR)

from scapy.all import IP, ICMP, sr1

from sys import argv


def tracehop(host_or_ip):
  """
  Return the network hops used to arrive at destination.
  """
  # craft our IP portion of the packet with destination address.
  ip = IP(dst=host_or_ip)

  # use a blank ICMP packet.
  icmp = ICMP()

  # set our time to live to 1, this will get our first hop.
  ip.ttl = 1

  # we will loop until getting a echo-response.
  while True:

    # send the ICMP packet and expect one response.
    res = sr1(ip/icmp, verbose=False)

    # print out the responding host source address.
    print res['IP'].src

    # if our ICMP type is 0 we've got
    # the echo response and can break while loop.
    # http://www.nthelp.com/icmp.html
    if res['ICMP'].type == 0:
      break

    # if we did not break above we increase the ttl by
    # one, and let the while loop run once again.
    ip.ttl += 1


if __name__ == '__main__':
  tracehop(argv[1])

The above code should be documented well enough to answer most questions.

Python is unable to create network sockets without root privileges. I do not advise running any script you find on the internet as root until reading over it, and fully understanding what may happen to your system and why root privileges are needed.

With that out of the way lets get right to the usage and output:

$ sudo python tracehop.py nessy.info
192.168.0.1
10.82.16.1
68.4.12.108
68.4.11.68
68.1.1.167
129.250.194.165
129.250.4.42
129.250.3.237
129.250.4.150
129.250.3.27
129.250.5.32
129.250.204.118
198.199.99.238
104.236.149.49

We can compare traceroute’s results to ours, and make sure the hops are relatively similar:

$ traceroute -n nessy.info
traceroute to nessy.info (104.236.149.49), 30 hops max, 60 byte packets
 1 192.168.0.1 19.107 ms 19.470 ms 19.315 ms
 2 10.82.16.1 19.208 ms 28.023 ms 28.233 ms
 3 68.4.12.108 28.142 ms 28.034 ms 27.901 ms
 4 68.4.11.68 27.799 ms 27.683 ms 27.557 ms
 5 68.1.1.167 27.426 ms 27.330 ms *
 6 129.250.194.165 27.095 ms 11.672 ms 18.996 ms
 7 129.250.4.42 27.821 ms 27.727 ms 32.801 ms
 8 129.250.3.237 15.742 ms 20.731 ms 17.218 ms
 9 129.250.4.150 28.956 ms 28.876 ms 28.787 ms
10 129.250.3.27 31.914 ms 32.112 ms 28.468 ms
11 129.250.4.118 28.356 ms 129.250.5.32 31.783 ms 25.326 ms
12 129.250.203.82 23.522 ms 30.280 ms 129.250.204.118 26.967 ms
13 198.199.99.238 26.921 ms 198.199.99.234 31.626 ms 198.199.99.254 27.053 ms
14 104.236.149.49 26.340 ms 26.614 ms 27.644 ms

Then again we could just proof this using traceroute, the -m flag sets our TTL:

$ traceroute -m 1 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 1 hops max, 60 byte packets
 1 192.168.0.1 (192.168.0.1) 3.398 ms 3.592 ms 3.482 ms

And then hack together quickly a bash script:

$ for i in $(seq 1 30) ; do
     traceroute -m $i -n nessy.info | tail -n 1 | egrep "$i " | awk '{print $2}'
  done

192.168.0.1
10.82.16.1
68.4.12.108
68.4.11.68
68.1.1.167
129.250.194.165
129.250.4.42
129.250.3.237
129.250.4.150
129.250.3.121
129.250.4.118
129.250.204.118
198.199.99.254
104.236.149.49

And now the inner-workings of traceroute should be stuck in my head,
never to be tripped up again, or at least that is the plan ;)


Share on Facebook Share on Twitter Share on Google+

Process Elasticsearch JSON on the shell

Lets throw security out the window for a moment. Say we store user accounts with clear text passwords in Elasticsearch, what is the easiest way to use the results in a shell script? We can begin by creating two accounts, one for admin and one for john:

# curl -XPUT localhost:9200/site/people/1?pretty=True -d '
  {"name": "admin", "password": "secret", "admin": "true"}
'
{
  "_index" : "site",
  "_type" : "people",
  "_id" : "1",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
 "created" : true
}
# curl -XPUT localhost:9200/site/people/2?pretty=True -d '
  {"name": "john", "password": "password", "admin": "false"}
'
{
  "_index" : "site",
  "_type" : "people",
  "_id" : "1",
  "_version" : 2,
  "_shards" : {
    "total" : 2,
    "successful" : 2,
    "failed" : 0
  },
 "created" : false
}

Using curl this is very easy to query, we just use the id:

# curl localhost:9200/site/people/2?pretty=True
{
  "_index" : "site",
  "_type" : "people",
  "_id" : "2",
  "_version" : 1,
  "found" : true,
  "_source" : {
    "name" : "john",
    "password" : "password",
    "admin" : "false"
  }
}

But what we really want is to get the values of the keys without needing to hack it with sed/awk/grep. We can install a small json parsing package called jq:

# apt-get install jq

Now we can easily pull the keys out:

# curl -s localhost:9200/site/people/2 | jq ._source.name
"john"

# curl -s localhost:9200/site/people/2 | jq ._source.password
"password"

# curl -s localhost:9200/site/people/2 | jq ._source.admin
"false"

Using the power of jq we could easily search Elasticsearch and perform comparisons for login, remember we are throwing security out the window here, this is merely for example:

# curl -s $ip:9200/site/people/_search?q=name:admin | jq -r '.hits.hits'
[
  {
    "_source": {
      "admin": "true",
      "password": "secret",
      "name": "admin"
    },
    "_score": 0.30685282,
    "_id": "1",
    "_type": "people",
    "_index": "site"
  }
]

We can assume this query only returns one user, and we are pretty sure the account exists, so lets grab the password key directly:

# curl -s $ip:9200/site/people/_search?q=name:admin | 
  jq -r '.hits.hits[0]._source.password'
secret

If the value is not present, and we attempt to access it, we will get a null result:

# curl -s $ip:9200/site/people/_search?q=name:jack | 
  jq -r '.hits.hits[0]._source.password'

null

And here it is all wrapped up in a poorly written shell script, I would not advise using this logic for anything more than toying with:

auth.sh

#!/bin/bash

read -p "Login username: " username
read -p "Login password: " password

account=`curl -s localhost:9200/site/people/_search?q=name:$username | 
  jq '.hits.hits[0]'`

if [ "$account" != "null" ] ; then

  account_username=`echo $account | jq -r ._source.name`

  if [ "$username" == "$account_username" ] ; then

    account_password=`echo $account | jq -r ._source.password`
 
    if [ "$password" == "$account_password" ] ; then
      echo "You are in!"
    else
      echo "Wrong username or password"
    fi

  fi

fi

And usage looks like this:

# ./auth.sh
Login username: admin
Login password: password
Wrong username or password
# ./auth.sh
Login username: john
Login password: password
You are in!

 


Share on Facebook Share on Twitter Share on Google+

Telegraf laptop battery plugin

Wanted to expand a little on my previous blog post Custom Telegraf Plugin, and decided to do a simple battery monitor. The end result looks something like this:

Screen Shot 2016-04-11 at 5.40.47 PM

I decided to read from the file /sys/class/power_supply/BAT0/capacity on my Ubuntu 14.04 machine, this file merely shows the current battery percent:

# cat /sys/class/power_supply/BAT0/capacity
62

All that is needed is a little Python script for converting this output to JSON, my script outputs like this:

# /battery.py
{"status": "Charging", "capacity": 63}

The code is very basic Python:

/battery.py

#!/usr/bin/env python
import json

with open('/sys/class/power_supply/BAT0/capacity') as f:
 capacity = f.read().split('\n')[0]
 capacity = int(capacity)

with open('/sys/class/power_supply/BAT0/status') as f:
 status = f.read().split('\n')[0]

results = dict(capacity=capacity,
 status=status)

print json.dumps(results)

All that is left is to add this script to our Telegraf configuration directory:

# cat /etc/telegraf/telegraf.d/battery.conf
[[inputs.exec]]
 command = "/battery.py"
 data_format = "json"
 name_suffix = "_battery"
 interval = "5s"

And there you have it, a simple battery tracker.


Share on Facebook Share on Twitter Share on Google+

Custom Telegraf Plugin

I just started looking to InfluxDB and Telegraf for collecting data on a Linux machine, then visualizing it with Grafana. I’ve historically used collectd, statsite, and graphite to accomplish the same sort of task, but wanted to see how some of the new software compares.

I’m running a Ubuntu 14.04 LTS virtual machine, so feel free to follow along.

I managed to install the packages from the InfluxDB ubuntu repositories:

$ cat /etc/apt/sources.list.d/influxdb.list
deb https://repos.influxdata.com/ubuntu trusty stable

After adding the repo, and their GPG key, update and install the packages:

$ sudo apt-get update
$ sudo apt-get install influxdb telegraf

At the time of this writing Grafana just released 3.0 beta, but the packages are not yet in the repositories. I went ahead and followed the debian documentation to install directly from .deb.

$ dpkg -l grafana
||/ Name Version Architecture Description
+++-======-=====================-=====-=======
ii grafana 3.0.0-beta21459801392 amd64 Grafana

Next I made sure my services were running for these three bad boys:

$ service influxdb status
$ service telegraf status
$ service grafana-server status

At this point you should be able to hit the Grafana on the Linux machine, the service listens on tcp port 3000, so something like the following url should work:

http://127.0.0.1:3000

On the login screen login using the following credentials:

username: admin
password: admin

Add your InfluxDB service as a Datasource in Grafana:

View post on imgur.com

Next go ahead and create a new dashboard by clicking the Home dropdown on the top toolbar. Next from the left green widget create a new Graph Panel:

View post on imgur.com

If Telegrapf is running and collecting you should be able to see a few options in your FROM select measurements Graph options:

View post on imgur.com

What we are going to do now is add a new measurement to this section by using the Telegraf exec plugin:

$ cat /etc/telegraf/telegraf.d/custom.conf
[[inputs.exec]]
 command = "/silly.py"
 data_format = "json"
 name_suffix = "_custom"
 interval = "10s"

This plugin will call a silly Python script that returns the current second /silly.py:

#!/usr/bin/env python
import json
from datetime import datetime

s = datetime.now().second
print json.dumps(dict(seconds=s))

Once we have these in place we can restart the Telegraf service and should start to see some metrics:

$ telegraf -config /etc/telegraf/telegraf.conf -test
$ service telegraf restart

Lets use this new measurement, and tweak the Display a little bit:

View post on imgur.com

Now just for fun lets alter the Interval of our plugin from 10 seconds to 1 seconds:

$ cat /etc/telegraf/telegraf.d/custom.conf
[[inputs.exec]]
 command = "/silly.py"
 data_format = "json"
 name_suffix = "_custom"
 interval = "1s"

And of course restart Telegraf:

$ telegraf -config /etc/telegraf/telegraf.conf -test
$ sudo service telegraf restart

After letting it run for about a minute you should get something like this:

View post on imgur.com

Told you it was Silly :)


Share on Facebook Share on Twitter Share on Google+