Getting a £60M/year ecommerce site back to life in 10 minutes

Real life story

You get back to home after a good day at work; it’s christmas time so business is doing good, servers are getting busy, but everything seems normal, working as expected (let’s say… inside the OK thresholds). But when you are relaxed, at home after a good day at work, maybe preparing some food, just having a beer, that scary sound: our friend Nagios, the systems overlord.

You grab your phone, wishing it’s only just some service coming a bit more busy than expected (OK, it’s Xmas! can happen!) but you see something like: CRITICAL – your images servers are loosing 60% of the packets in the network. PANIC. You think it cannot be possible, those servers were coping fine with the traffic, maybe it’s just the connection to the monitoring server, so you connect to the website and… MORE PANIC. Any page takes more than 10 seconds to load and it doesn’t even have all the images, Chrome is displaying some funny icon instead of that pair of jeans over that gorgeous model.

So first, you check Analytics… Yeah, we are having some big traffic, is it that bad? At the same time you try to think what is happening, the face of your CEO gets to your head, armed with a shotgun, he will definitely kill me, also, if I survive, I will need a new job.

Quick check on the monitoring graphs, something is going on the network, but doesn’t seem to be that bad… Let’s call the Data Centre: oh, yes, the firewall is dying, that old forgotten server is dying too (let’s face it, you have forgotten that old server, it was serving just images!!). The provider might be able to provision a new firewall, but it won’t be ready for another hour or more, and replacing it will be a contractual nightmare.

You have the Head of Web Development on the line, he is ready to move the images to use another domain under the main servers, but it will definitely cause more problems. We are running out of time, xmas is here and the CEO is loading shells on his shotgun, so we switched it quickly. The site is back but definitely slower as we are grabbing power from the app servers just for serving images, putting pressure on the rest of the network, SAN and firewalls. Did I say that it was the biggest day of the year? Maybe the biggest day in the company’s history? That we needed maximum speed? Did I mention the shotgun?

So, let’s call the sky, Cloud Power!!!

Amazon saved my life: let’s create a new Amazon CloudFront distribution, and configure the DNS… Come on CloudFront!! 5 Minutes?!?! Really?? Just for setting up a complete ad-hoc CDN, being deployed on around 30 POPs around the world? 5 minutes!! Yes, I’m an speed addict, but, thinking a bit more about it, 5 minutes is OK for setting all that :).

All looks good. Hey, Head of Dev, the CDN is now ready :D, can you run some quick checks? All looks good for him, so, please, send it to production!!

After a minute, we are serving our images from the CDN now, cache seems to be working nice, let’s give it some minutes so it gets warm… We experience an incredible ramp of speed on the site! When you see 200 products per page, the images of the products load now even quicker than the rest of the site!! This looks very good.

Hearing the Head of Dev singing and saying “Ohh!! That’s f*** fast!! Look at that!!” is almost priceless, so your day starts to become to a good ending; tomorrow, instead of starting the search for a new job, you will be like “Hey! Had fun last night?! Thanks guys!” :). So, you send some humble, low profile email to the CEO, saying something like “Oh, yes, I was just passing around and as the server was not coping as expected (ehem… burning like hell), we have deployed a complete world-wide CDN solution. Oh, yes, it’s OK, normal everyday Sysadmin job, nothing to worry about” and you have just saved your life, your £60M business and avoided that awful email asking explanations of what happened last night and what solutions are you providing in order to avoid this kind of issues next time.

Now, it’s time to grab another beer (or maybe some single malt scotch?) and say “Prost! Thank you Amazon, I love The Cloud!!”, and start realizing that you have made an impressive infrastructure change in just 10 minutes.

Just another normal day at the Systems Department.

Cloud is here, don’t be afraid of it, it will probably speed up your business, save money, and keep Sysadmins (and CEOs) happy.

 

Cloud & Bolt image from University of New England, Maine.

Amazed by Technology <3

Present communications and technology are amazing, I feel really enthusiastic at today’s and future possibilities. It’s incredible how many pieces are working together for almost simple things that we have completely absorbed and got used to, to a point that we cannot live without them, but we don’t realize what is happening behind the screen; and I can assure you: it’s amazing :D!!!

Let’s take for example, a simple WhatsApp message, just a small piece of text, an string for IT people, a sentence for the rest of mortals, just some letters (and perhaps symbols), one after another. That sentence can contain lots of emotions, information, hope, it could mean a lot for the one who writes it and also to the other party that receives and reads it.

You just open an app in your smartphone, just touching a piece of glass/plastic that senses the micro electrical pulses from your body (just from the tip of your finger!) and sends that information to some sort of processor and software that interprets your touches and sketches some letter on the screen (again, some electrical pulses, creating light, enabling pixels, drawing that smiley face). After you keep touching your screen for a while, you have your sentence, your piece of information, a rendering of your emotions, a representation of your speech, ready to be sent to the other person, anywhere in the world.

That’s the moment, you might read it again for confirming that it holds what you expect, and just one more touch, just some electricity captured by your device, and again, millions of small things starting to work. Your app (WhatsApp here) builds a small chunk of data containing your text, your information, and the information of the receiver (that girl / boy / woman / man / kid / friend / lover / family / mate / partner in the other end of the line) and sends it to the operative system of your phone. Again, your OS wraps your small text a couple more times, introducing information enabling the network to move your message between several devices, cables, connections, microwaves in order to deliver that sentence, that emotion, to the other party.

As smartphones are small computers, your message gets wrapped with some information that allows computers to talk to each other, using probably common protocols like HTTP, TCP, IP, Ethernet, etc etc etc. Your message keeps being wrapped/unwrapped and it’s sent to our great network (THE INTERNET!!), jumping from router to router (yes, you know… thingies that receive a chunk of electrical pulses, convert it to ones and zeros, interpret it, read basic information like the MAC addresses, put it on a queue, take it from the queue, process it, add some, remove some, put it again on another queue, convert it again to binary code, translate it to electricity (or maybe to light?), put it on the cable (or optical fibre), and send it), it keeps jumping and jumping around 15 times and then it gets to some server, or some antenna, that converts it to microwaves, hitting your mate’s phone, in the other side of the world, breaking through the OS, the protocol, the app, the screen, making that amazing, unexpected, even scary “beep” that draws the attention of your friend.

In that moment, the device in the pocket of your party vibrates, she/he grabs it, clicks the power button, slides her/his finger and BOOM, you just hit her/him straight in the middle of the head, just in the eyes, her/his retina interprets the light, produce some electrical pulses that would be pushed to the nerves, hitting the brain, and creating amazing chemical & electrical interactions that produce a torrent of emotions inside her/him, activating the thinking process in order to generate some reply (if you’re lucky enough 🙂 ) to your small chunk of data.

And you, here or there, maybe in London, maybe in New York, Paris, Bucharest, Kuala Lumpur or Cape Town, just clicking your screen, hitting send, and you hit  the other person in Munich, Edinburgh, Buenos Aires, Rio de Janeiro, San Francisco, Mexico or perhaps Sydney, without ever noticing that lots and lots of small parts and systems, created by humanity, working together so you can transfer your emotions from one side to another of the world, in just a small fraction of a second.

Technology is amazing. And that was just a bird’s-eye view. Are you amazed?