fluentd PostgreSQL hstore plugin
May be you already know the fluentd ,a very good log collector daemon.
fluentd sends logs with JSON format and have output plugins such as mongod or Amazon S3.
And also, I write the plugin it can output to the PostgreSQL hstore.
HStore is an extension of PostgreSQL which can store information with Key-Value. More about PostgreSQL hstore, see PostgreSQL document .
Here is the repository.
Install plugin
Just type gem install. easy.
% gem install fluent-plugin-pghstore
apache log
For example, if you want to apache access log which is collected using in_tail plugin to postgresql, write this config.
<source>
type tail
path /var/log/apache/access_log_sym
tag apache.access
format apache
</source>
<match apache.*>
type pghstore
database test
</match>
It generates,
tag | time | record
----------------+------------------------+---------------------------------------
{apache,access} | 2012-04-01 22:55:15+09 | "code"=>"200",
"host"=>"XXX.XXX.XXX.XXX", "path"=>"/", "size"=>"2608",
"user"=>"-", "agent"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X
10_6_8) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.83 Safari/535.11", "method"=>"GET",
"referer"=>"-"
fluetnd tag is splitted with "." and stored in the array. yes, PostgreSQL can manage array.
Ehe record column is hstore type. Above example is a little bit difficult to read, but all key and value is stored.
http
Next, add this config,
<source>
type http
port 9880
and type curl,
curl -F 'json={"log":"hoge"}' "http://localhost:9880/apache.curl"
then
tag | time | record
----------------+------------------------+---------------------------------------
{apache,access} | 2012-04-01 22:55:15+09 | "code"=>"200",
"host"=>"XXX.XXX.XXX.XXX",
{apache,curl} | 2012-04-01 23:28:44+09 | "log"=>"hoge"
Other schema data is inserted to the record column. hstore can add key dynamically, plugin-hstore can manage any type of input plugin.
What do you want to do?
After data inserted into the PostgreSQL, you can everything using SQL.
Getting UserAgent
SELECT
COUNT(*) AS c,
record->'agent'
FROM apache_log
GROUP BY record->'agent'
ORDER BY c;
Access count last 10 minitues
SELECT count(*) FROM apache_log WHERE time > (CURRENT_TIMESTAMP -
interval '10 min')
Statuc code last 10 minitues
SELECT
count(CASE WHEN record->'code' = '200' THEN 1 ELSE NULL END) AS
OK_200,
count(CASE WHEN record->'code' = '301' THEN 1 ELSE NULL END) AS
MOVED_301,
count(CASE WHEN record->'code' = '302' THEN 1 ELSE NULL END) AS
FOUND_302,
count(CASE WHEN record->'code' = '304' THEN 1 ELSE NULL END) AS
NOTMODIFIED_304,
count(CASE WHEN record->'code' = '401' THEN 1 ELSE NULL END) AS
UNAUTHORIZED_401
FROM apache_log
WHERE time > (CURRENT_TIMESTAMP - interval '10 min')
Limitation
However, nested JSON is not allowed such as...
'json={"log":"hoge", "nest":{"a":"hoge", "b":"hige"}}'
This is difficult because hstore itself is not allowed nest. If you really want to use such kind of structure, you may wait the JSON type in the PostgreSQL 9.2.
Since this plugin uses only one connection, there are possiblity to drop the under highly-loaded circumstance. fluentd itself has retry function, I think it never happen, but not sure.
Finally
mongodb is good but PostgreSQL is also good.