Handling production logs
Par Mathieu Lecarme le vendredi, 11 février 2011, 21:20 - Lien permanent
Logs, the big files that fill up your hard drives can also tell things to you. Post mortem crash or strange slowness.
You can add temporary logging for slow query (mysql or php-fpm provides such services). Be careful, logging is not your first activity, works have to be done without probing servers to death. Applications and hardware provides data, but never forget to use application specific information. Datas that speaks to you, like page serving time, number of logged users or growth of user’s data.
The basic tools to understanding logs are tail, grep, wc, sort, uniq and all this kind of pipable stuffes. Facebook provides titanic tools for handling iceberg sized volume of logs. There is room between them. There is few step to understand logs.
Reading
First step is reading logs. Some logs come from plain old files, other from sockets, they can be piped. Syslog is an elder unix worker. Its protocol is now funny and ingenuous, but it’s a defacto standard, and rsyslogd is a nice product wich can route some of its incoming logs to another server.
Parsing
Most of logs are one line ascii string. Basic, but grep friendly. Some data are directly usable, other need processing like resolving IP to name or geolocalisation, guessing web browser.
Filtering
Logs line can be boring or hidden like needle in an haystack. Date is an important filter. What happened when the server was so slow?
Storing
Logs are immutable, append only data. It’s a specific case well handled by specific storage, few will cry if you loose some data. Files are easy to handle but some index can help you a lot. Logrotate helps you to compress and erase old logs periodicaly. RRD data is a perfect way to store data with a fixed size store.
Querying
Now, you can querying and digging into this files. Graphing helps you to visualizing your data. You can find where the trouble append, where is the bottleneck, how many user your service can handle, how many server you will have to buy. Such questions can be answered.
Logator try to answer this questions. It’s actually a work in progress. Ip2something is a fast geolocalisation tool which make IP more usable, and logs more understandable. Node-logator is a prototype to graphing real time connection localization.

