Miles Crawford's Staff Page

MySQL founders introduce mysql and it's history.

This is cheerleading, and I understand that, but the real info content is low. For example, we are told that you should "Write code as good as you possibly can the first time" and "Use Modular Architecture." eh, we know that.

They really do feel that their success comes from working with the community, to provide compilation on every platform, to support many languages as soon as possible, and to get feedback and bug work moving right along.

Connector/MXJ sounds neat though - a jar file that contains MySQL. Use it as a library and you can cantrol a database server as a native Java object. Tee hee.

They're also suggesting that you write your own storage engine. It's easy and there's no way a given engine is going to suite exactly the needs of your app.

Cute insert-read thing is possible with myisam, where you insert only at the end, and you get full-speed inserts and reads.

MySQL 5.0 is in beta 2. There's going to be lots of small, annoying bugs. They can only fix them if they know about em. Test it.

5.0 features: Procedures, Triggers, Views, XA (safe transactions across storage engines), big data dictionary (lots of meta info), precision math (56 digits percision should be enough for most people, he says. You can recompile for more), strict mode (better warnings), federated storage (good clustering), whoops missed a few interesting ones

Roadmap - Replication, clustering, hash and merge joins (hell yeah) and xpath.

MySQL inc is now hadning out gui tools with snazzy icons that help you browse queries and manage servers.

We're doing an in-depth example of migrating an Oracle database to MySQL with one of these tools. Not gonna happen. Looks pretty damned slick though, running on Windows. hm. oracle bashing. He did a quick intro to the general workbench one, which looked nice.

They're pitching the MySQL Network, a subscription support service.

Hot damn, we're gonna get CAKE with DOLPHINS on it for MySQL 10th birthday!

RedHat Man Talking

Still doing a bit of cheerleading here - why does open source provide so many benefits?

Since we don't know the next great innovation comes from and how it it change things, don't make investments that prevent you from taking advantage of whatever it may be.

Some interesting ideas about what it takes to succeed - you can usually get 90% done with something with a few people, but the smoothing, the perfect, the edge takes many many many more. Open development allows this to work.

He has some neat graphs about the efficiency of steam engines before and after its patent expired. Not sure how well that maps, but hey, why not? Go team.

This is way out there... He's talking about temperate zones in the americas as some sort of explaination about the history of technilogical development. Here's the tie-in... There wasn't much of a tie-in.

Wow. Interesting numbers. If Microsoft keeps increasing security spending at the current rate, they will run out of money in 2008.

There's always someone better. Open source lets them come along. Writing bad code in the 80's is no excuse for spending 2 billion on security now.

Read Massive Change by Bruce Mau.

LiveJournal's Backend - A history of scaling

Brian Fitzpatrick

Started as a hobby, and just kept scaling.

Started with one server running Apache + Database, and then upgraded to a dedicated db server.

MySQL replication - all writes to the top, spread out the reads. THis was done by extending the DBI to include get_read_dbh and get_write_dbh.

Hmm. they partitioned the DB, assigning users to particular database clusters.

Use a dual-column primary key where you have an entity id and a namespace. Then set read-only flag somehwere and use a tool to move things around. You need tools and harnesses to move partitioned data around!

DBI::Role - a library that allows them to do weighting and hot-replacement.

there is one master that writes a global database for things like read-only flags and shit like that. This is given some redundancy by doing a dual-master replication cluster.

They handle replication switching with the global masters, and numbering too, so there aren't any collisions.

Use innodb all the time. MyISAM is good for logging and storage. Innodb is faster and more reliable.

They use a special daemon that holds a connection open and log over that so there aren't too many connections.

They don't see much redundancy in replication, only scaling. master-master scaling is better for redundancy.

They agree that MySQLCluster will be cool some day

DRBD turns a pair of innodb machines into a cluster, using a floating IP to make it look like it's just one server. This includes a heartbeat that does automatic reliabilty. This sounds really good!

Caching - this is how they use memcached. Grr. not enough time on this

Their load balancer is home-grown, Perlbal. Sounds pretty fancy, it's a perldaemon based on some fancy queueing libraries. It has ssh-in status info. Good balancing and dead node handling. It uses persistant connections to the mod_perl backend, and mod_perl can finish writing out large pages to a temp file, and pass a pointer to that file to perlbal. Perlbal handles the rest of the file transfer while mod_perl gets busy again. Holy shit. It transparently hides 404's and 500s! That's insane.

MogileFS a filesystem based on mysql that does clustered filesystem redundancy. Yeah, they use HTTP transport to move this stuff around using perlbal's PUT and DELETE methods. I'm not sure I even believe this...

mod_perl asks for a file, the tracker tells mod_perl where it is, and mod_perl goes to the storage nodes.

Watch out for:

MyISAM - it sucks. Wow.

APparently many disks lie about how soon or fast they write out data, so turn off disk write caching for all disks behind a raid.

They don't use persistent connections for memory saving (that is to mysql, they still do keep connections to memcached.)

I asked about our memcached troubles: We saw it being slower to freeze/thaw our data than pulling from the DB. Brian says don't serialize to memcached, store really granular things that don't need freezing. Also, if your database isn't loaded, don't use it, but it will help alot when the DBs get bound.

Managing LAMP Stacks with OS Tools

Yazz Atlas

This is intended to be an overview of tools that can help.

Make sure your web servers are dumb boxes. Some places use diskless, this fellow reccommends mounting everything important nfs. One code push, which sounds nice.

His laptop sucks, and is not booting. Classic.

COmmunication Tools

Recommends tikiwiki, Jabber, and RequestTracker

Security

Use sudo to keep extensive root activity log. Use Nessus to do auditing. Osiris does some fancy system management security. Sweet! sounds like the "ask me" system I ranted about wanting one day. Samhain is an intensely hideable IDS host monitor kinda thing. Stores its binary data in a gif. Hey, why not? Tripwire obviously, is a more complete suite, but not scalable.

Monitoring

Nagios - recommended monitoring and notification suite. Supports distributed monitoring. Maybe we should do this instead of messing around with C&C's stuff if we can. Looks like a nice web-interface thing too, with logging.

Graphing - RRDtool does some nice graphing and logging stuff, related to MRTG.

Munni? Muinni? Graphing with the ability to send triggers to Nagios.

Cacti - Access controlled graphing.

Moodss - modular monitoring - you can write your own modules, and you can interoperate with nagios as well.

Distributed State/Session made easy with MySQL

Phillip Morelock

This is EVite's employee.

Used their main database as a state engine.

SO use session. Great if you have one server.

distribute via the filesystem, which is slow and a hassle.

You can multicast it over the network via something like memcached (though his example doesn't do smart stuff like memcached does)

So lets use mysql! yay!

It's fast, so hey. Why not?

Use tmpfs to hold the session table! It's all memory no matter what.

They use the same concept as us, serialize the objects into a blob.

They also only store the results of expensive calculations to keep session small. Additionally, they delete old sessions every minute! This only leaves 300,000 sessions in the table at any given time.

High concurrency in session makes innodb the ticket.

Scaling Panel

A panel This was interesting but not useful. Not terribly technical

Optimizing Subqueries and Joins

Inner joins allow mysql to choose table order, outer (LEFT, RIGHT) do not.

Multiple the rows in a join to see total selected rows

Avoid outer joins unless you really need them!

join_buffer_size - what version of mysql is this in? Ours - yikes!

Dependant subqueries, where the inner query references values from the outer, are very slow. Avoid them.

Use subqueries instead of temp tables where possible, because it ensures you get fresh data.

Order is important in where clauses - do as much narrowing as possible before you use subqueries.

As no surprise to me, a subquery that produces a big list of id's to be used inside an IN( ...) where clause is much faster than the join. WHY? Because it has to do comparisons on ids and where clauses for the join, whereas in is just a list of straight primary key lookups. Still, seems dumb.

MySQL Cluster features and roadmap

This all sits ontop of NDB, the network database, developed by Ericcsson.

This is not a terribly useful thing thus far - I can read the changelog for this - Hopefully we start talking about good uses for this, and/or estimations on when/if it's coming out of RAM.

Looks like they're planning on pulling non-indexed data down onto disk in MySQL 5.1, probably early 2006. Looks like indexed fields will always be in memory.

Replication is also clumsy sounding, some improvements down the line (5.1).