ApacheCon
Clustered Logging with mod_log_spread - Theo Sclossnagle
Spread background: a multicast technology that creates a ring of hosts (a spread) that can each push packets to the entire ring. This results in an huge save on network traffic, as the packets are not repeated for each host
Aggratgating logs is a hassle, but youve got to do it. You need all the data, ordered by time, and it's nice to group log entries into sessions so you can know user paths and behavior.
This is tough on clusters, as anyone who has grepped logs on cat01, 04, 06 knows ;)
Also, real time metrics are never available, and they sure are nice.
Fixes: Network-enabled syslogs are not bad, but they result in lots of tcp traffic. Passive logging (sniffing, basically) loses packets and ssl is prohibitive.
mod_log_spread is closer to network enabled syslog, but it uses the Spread concept to do multicast, so no extra packets are generated, even if the number of cluster members/logging boxes/monitors/etc increases.
Spread config tips: It actually uses two ports, the one specified and the next one, for example TCP/UDP 4913 and UDP 4914. Also, when listing the hosts in the Spread, you must list them in the same order on every member of the ring!
Replace mod_log_config with mod_log_spread in Apache on your content servers.
Then add log-consumer boxes to the ring: loggers (spreadlogd is perfect and stable for the task, and allows for customizable logging methods (file, sql, etc.)), and monitors if desired.
Now that you have real-time aggragated statistics you can join the Spread ring however you like, say with a super cute little Cocoa app that show realtime hits per second by reponse code, cluster member, user, etc at no extra cost to the logging system at all.
It is now advisable to write your own log handler, so you can include whatever neat information you want in your logs, allowing you to analyze ratio of response codes, unequal file sizes across cluster members, user paths in real time, etc.
WAN solution: run a Spread ring for each data center, and have one box that is a member of both, forwarding the packets through one way through one pipe.
This looks freaking amazing. I cannot get the image of realtime user paths updating in an ncurses display out of my mind: WebQ Home -> Summary: Bio 101 -> Edit Survey ...
Apache Performance - Rich Bowen
This was basically an overview of the tools availble and the hot spots to look for in optimizing Apache.
Tools:
- ab - Apache Benchmark, comes with Apache and we know about this: it slams one document at a variable rate at a variable N, and genererates some stats, most significantly time per request.
- flood - Has more finesse, in that it is driven by XML profiles. You may defined paths, including branching and random choices, which it follows at benchmark speeds. Really good, task-sensitive load testing, but the docs are somewhat weak.
- siege - Not sure if this is a security/performance tool or a script kiddie toy, but it does a really really good job of trying to break your website.
Hotspots:
- Don't forget MaxClients - really efficient queuing that can prevent running out of RAM.
- DNS - don't use it. Ever. Use IP addresses and lookup later if you need to, the time it takes to lookup a hostname for a log entry is time that child cannot be serving a client.
- .htaccess files - don't use them unless you really need them. This saves many file stats/checks per request. Use a Directory tag in conf instead.
- Logs - no more than you need.
- Process Creation - prefork for chrissake if you can't thread. Don't forget to tune your ttl.
- Content Negotiation - I don't know what this is, but it's apparently expensive.
- Symlinks - Always allow symlinks for best performance, as this is one less check per file retrieve. AllowIfOwnerMatch is useful but even more expensive at 3 filesystem operations per file retrieve.
- Modules - Trim everything you're not using. Standard builds/distro builds are usually bloated in a big way.
- mod_mmap/mod_file_cache are great for static content. mod_file 2.0 is a good compromise as it caches only the filehandles and will never serve a stale page.
- Proxying allows you to keep a lean front end server and run custom servers tailored to other content.
- mod_gzip/mod_deflate - compress the HTML before you send it! It's text! It collapses like crazy, but set a good threshold, so you don't fire up the compressor for only 10kb, etc. Deflate is usually superior.
- mod_throttle and mod_bandwidth and mod_bwshare can limit the bandwidth usage by individual clients, but seem to cut them off instead of acutal throttling.
Chat with Rich (speaker for a couple of good Perl sessions) on IRC: DrBacchus on #apache on irc.freenode.net
Keynote - Building Software - Some guy named Doc
This opened, continued, and closed with a very very extended analogies between software and construction.
Boils down to the important distinction between software prisons that compete by being more amenable and modular, commodity-based systems.
I thought that notetaking for this event was over, but I'd just like to record this thought: "The net is our cathedral." Thank you, goodnight.
Authentication, Authorization and beyond... auth_ldap - graham leggett
Ugh, this doesn't look good - opens with a deeply technical but non-interesting overview of the platform difference trouble
I lost focus and looked into some other things - Patrick's summary was "They're never going to do what we want them to." Okay.