Hi All,
I know and can feel it that it is always important for
system and network admins to have their infrastructures performance data well organized and graphical representation of such data is an added advantage, in
fact for production and critical environment it is must.
These requirement
gives birth to Monitoring Systems, in today`s world there are bunch of
commercial monitoring systems are available but they are pretty much
expensive.
What is I tell you, you can built your own Monitoring System
using pure open Source technologies…
Here it is..
Configuring/building Performance
Monitoring System using open source technologies
To build and configure Performance monitoring system 3 main
components are needed.
1.
Agent:
Which will collect and send the performance metrics at regular interval
2.
Time Series
Database: To hold the data send by the agent in systematic manner
3.
Web based
front end: To query the data from DB and visualize it in human friendly way
Though there are various options available in each of the
above 3 categories, in our case, we have used Telegarf as Agent, InfluxDB as
TimeSeries database and Grafana as front end. These 3 components are open
source.
Reasons why we chose Telegraf, InfluxDB and Grafana.
1.
Telegraf:
Can be installed on Linux/Windows, negligible resource consumption, has built
in modules for various system/application metrics like, system, apache, mysql,
mongodb, nginx, MSSQL etc. targeting Linux/Windows, highly customizable as we
can configure telegraf to collect data from custom shell/batch/PowerShell
scripts (these much built in modules and this type of customization is not
available in other alternatives of telegraf)
2. InfluxDB: before jumping on to InfluxDB
I have built the system using Elastic search`s ELK stack, but unfortunately its
resource consumption was too much. Whereas InfluxDB utilizes very less
resources and support SQL like queries.
System Highlights
1. Agent`s resource consumption is near to
negligible
This
telegraf agent is configured to send Windows system metrics, one folder`s file
stat metrics as well as metrics from one custom made PowerShell script, which is
utilizing negligible CPU and approx. 12MB of RAM.
InfluxDB server`s resource consumption is very
less as compare to its alternatives
Currently
3 servers are reporting their data to InfluxDB server since last 10-12 days and
InfluxDB is utilizing negligible CPU, approx. 300MB of RAM and Only 300MB of
disk is consumed for database, we will monitor its usage once we add dozens of
machines to report to InfluxDB server to verify how InfluxDB performs.
1. Data communication can be configured on
HTTPS with SSL encryption
2. ACL on front end users
Front end can be configured with
separate dashboards for every organization/project, we configure to have access
to organization`s/project`s data only. Role based access like view only, Admin,
edit can also be configured.
I have used this monitoring stack
for monitoring server resources and cisco firewall resources. Below are the snaps, soon a details blog will be available.
Hello,
ReplyDeleteVery nice article, however it's not clear which platform you built this on. ALl i see are exe for telegraf and influx. Is grafana an exe as well if so you built this platform on a Windows Docker or Windows Server,
All the components are running on windows server 2012
ReplyDeleteCan you just provide customised power shell scripts?
ReplyDeleteAny chance to share telegraf.conf?
ReplyDelete