Sunday, November 25, 2007

Event Correlation on a Budget

Log management and its wiser, old brother, event correlation, are processes that anyone in the security space is likely very familiar with. I've been dealing with them since day 0, but in the past year or more things have taken a more serious turn. Previously, logs had been used as a last resort and the people capable of wrangling them were much revered. Now there are plenty of standards, books, products and companies that attempt to make sense of your logs, and for good reason -- they are important. Logs will alert you to situations that most traditional monitoring systems would be blind to. Proper log management is necessary if legal action is necessary. There is interesting shit in logs. Really. Look some time.

Lets be honest, though. Even wrangling the logs from your little desktop can be a complicated process -- they'll generate hundreds of logs per day. A relatively unused server will generate upwards of a megabyte of logs per day. An active web, mail or shell server? Millions of entries, several gigabytes of logs in a single day. Now combine the logs from across your entire organization. Information overload.

There are plenty of products you can drop a pretty penny on that will, without a doubt, bring you leaps and bounds from where you very likely sit right now. Some organizations have no log management. Some have centralized logging, but very few have anything further. If you are lucky, some hotshot has a script that tails the logs and looks for very specific error messages which will save your tail.

I am a firm believer in the school of thought that before you go out and drop any sort of significant cash on a security product, you have to go out and really get your hands dirty. For me, that often means seeing what free solutions currently exist, or, worst case, roll your own.

In terms of free (as in beer) solutions, swatch, logwatch, SEC and OSSEC are among the top, the later two being the most powerful. Swatch suffers from lacking any real correlation abilities. Logwatch has some of these capabilities but suffers from essentially using a pile of large, horrifically ugly perl scripts to parse the logs. I've written many ugly perl scripts, and I fear for anyone who is not perl savvy and has to maintain a logwatch setup. SEC and OSSEC have very similar capabilities, though OSSEC is more targeted towards host-based intrusion detection (HIDS) by way of correlating security events within logs. It is a great approach, it is just not the solution that I decided to write about.

What follows is an abridged example of how I used SEC to get some very much needed event correlation up and running in an environment that has anywhere between 500M and 50G of logs per day, depending on how you look at things and who you ask :). I say "abridged" because this ruleset is far from complete. In fact, if you take it as is and set it loose on your logs, you will get a metric crapload of emails about things you probably already know of or otherwise don't care about. The reason here is two-fold. One, I don't want to give away all of my secrets. Two, I cannot tell you what log messages you should or should not care about. That is up for you to learn and decide accordingly.

Save the snippet below as you SEC configuration file and then point SEC at some of the logs you are concerned with. It will give you a base from which you can:

  • Explicitly ignore certain messages
  • Alert on certain messages
  • Do minimal correlation on a per-host, per-service basis

Good luck and enjoy!

# ignore events that SEC generates internally
type=suppress
ptype=RegExp
pattern=^SEC_INTERNAL

# ignore syslog-ng "MARK"s
type=suppress
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+-- MARK --

# ignore cron,ssh session open/close
# Nov 23 00:17:01 dirtbag CRON[26568]: pam_unix(cron:session): session opened for user root by (uid=0)
# Nov 23 00:17:01 dirtbag CRON[26568]: pam_unix(cron:session): session closed for user root
# Nov 25 16:19:30 dirtbag sshd[13072]: pam_unix(ssh:session): session opened for user warchild by (uid=0)
# Nov 25 16:19:30 dirtbag sshd[13072]: pam_unix(ssh:session): session closed for user warchild
type=suppress
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(cron|CRON|sshd|SSHD)\[\d+\]: .*session (opened|closed) .*

# alert on root ssh
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(sshd|SSHD)\[\d+\]: Accept (password|publickey) for root from (\S+) .*
desc=$0
action=pipe '$0' /usr/bin/mail -s '[SEC] root $3 from $4 on $1' jhart


# ignore ssh passwd/pubkey success
#
# Nov 24 17:09:22 dirtbag sshd[8819]: Accepted password for warchild from 192.168.0.6 port 53686 ssh2
# Nov 25 16:19:30 dirtbag sshd[13070]: Accepted publickey for warchild from 192.168.0.100 port 57051 ssh2
type=suppress
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(sshd|SSHD)\[\d+\]: Accepted (password|publickey) .*



#############################################################################
# pile up all the su, sudo and ssh messages, alert when we see an error
# stock-pile all messages on a per-pid basis...
# create a session on the first one only, and pass it on
type=single
ptype=RegExp
continue=TakeNext
pattern=^.{14,15}\s+(\S+)\s+(sshd|sudo|su|unix_chkpwd)\S*\[([0-9]*)\]:.*
desc=$0
context=!$2_SESSION_$1_$3
action=create $2_SESSION_$1_$3 10;

# add it to the context
type=single
ptype=RegExp
continue=TakeNext
pattern=^.{14,15}\s+(\S+)\s+(sshd|sudo|su|unix_chkpwd)\S*\[([0-9]*)\]:.*
desc=$0
action=add $2_SESSION_$1_$3 $0;

# check for failures.  if we catch one, set the timeout to 30 seconds from now,
# and set the timeout action to report everything from this PID
type=single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(sshd|sudo|su|unix_chkpwd)\S*\[([0-9]*)\]:.*fail(ed|ure).*
desc=$0
action=set $2_SESSION_$1_$3 15 (report $2_SESSION_$1_$3 /usr/bin/mail -s '[SEC] $2 Failure on $1' jhart)
#
##########

##########
# These two rules lump together otherwise uncaught messages on a per-host,
# per-message type basis.  The first rule creates the context which is set
# to expire and email its contents after 30 seconds.  The second rule simply
# catches all of the messages that match a given pattern and appropriately
# adds them to the context.
#
type=Single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(\S+):.*$
context=!perhost_$1_$2
continue=TakeNext
desc=perhost catchall starter for $1 $2
action=create perhost_$1_$2 30 (report perhost_$1_$2 /usr/bin/mail -s '[SEC] Uncaught $2 messages for $1' jhart)

type=Single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+(\S+):.*$
context=perhost_$1_$2
desc=perhost catchall lumper for $1 $2
action=add perhost_$1_$2 $0
#
###########


###########
# These two rules catch all otherwise uncaught messages on a per-host basis. 
# The first rule creates the context which is set to expire and email its
# contents after 30 seconds.  The second rule simpy catches all of the messages
# that match a given pattern and appropriately adds them to the context.
#
type=Single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+\S+:.*$
context=!perhost_$1
continue=TakeNext
desc=perhost catchall starter for $1
action=create perhost_$1 30 (report perhost_$1 /usr/bin/mail -s '[SEC] Uncaught messages for $1' jhart)

type=Single
ptype=RegExp
pattern=^.{14,15}\s+(\S+)\s+\S+:.*$
context=perhost_$1
desc=perhost catchall lumper for $1
action=add perhost_$1 $0
#
###########


###########
# These last two rules act simlar to the above sets, the only exception being that
# they are designed to catch bogus syslog messages.
type=Single
ptype=RegExp
pattern=^.*$
context=!catchall
continue=TakeNext
desc=catchall starter
action=create catchall 30 (report catchall /usr/bin/mail -s '[SEC] Unknown syslog message(s)' jhart)

type=Single
ptype=RegExp
pattern=^.*$
context=catchall
desc=catchall lumper
action=add catchall $0
#
###########

No comments: