Coding with Jesse

Debugging a slow web app

A watercolour illustration of a robot slowly wading through a swamp

I got an email today from one of my clients, letting me know that one of his web apps was down. He said he was getting an error and asked me to take a look.

In the end, I was able to fix it and get it running faster than ever. What caused it turned out to be a huge surprise to me. I thought I'd outline the steps I went through here, to try to help others trying to solve similar problems in the future.

See for yourself

The obvious first step is to go see for myself. I loaded up the web site, wondering what kind of error I would find. The site initially loaded fine, but then a spinner would appear, and seemed to get stuck. After a long while, the main content of the site failed to load, and an error appeared about failing to parse JSON.

I opened up dev tools and refreshed, keeping an eye on the network tab. Most things were loading fine, including static assets. I noticed that there were some fetches to the web site API that were taking a long time. They eventually failed with a 504 gateway timeout error.

The web site is behind a load balancer, and I know load balancers generally have a timeout limit around one minute. So all I knew is that these API calls were taking longer than that. I could only assume they would eventually succeed and were simply slow, but I wasn't totally sure.

Try to reproduce

Fortunately, I already had a dev environment for the site set up locally. I didn't write the whole application, but I had recently made some performance improvements to parts of the site. I wasn't very familiar with the API side of things though.

Sure enough, it started up fine, and the data all loaded correctly. So I figured it probably wasn't an issue with the code itself.

I started to wonder what could have happened to break the site all of a sudden, when it had worked fine in the past. Did the server auto-update some dependency that was breaking something? Was the server out of disk space? Was the database out of memory?

Getting close to the metal

My next step was to actually ssh into one of the servers to try and see what's going on there. Everything seemed ok. I ran free -m to check on the memory, but the RAM usage was fine. I ran df -h to check on disk usage, but none of the disks were full. Running top, the CPU usage looked fine as well. I was a bit stumped.

I turned to look at the database. This site is running on AWS, so I logged on to the RDS admin in the console and checked the graphs in the monitoring tab. Everything seemed fine there too. CPU wasn't too high, there were barely any connections, and the database wasn't out of space.

Still, I knew these API requests were hanging on something. I went back to the code and looked at the API endpoints in question, and all they did was make a database query. At this point I was pretty sure it was database-related somehow.

Going into the database

I decided to log in to the production database using the mysql command-line tool. The database is not accessible to the public, so only the production web server has access. I'd never gone into there before, so I looked at the config file for the server-side application to find the credentials (hostname, username, password and database name).

This is a MySQL database (MariaDB actually), so once I got in, the first thing I ran was SHOW PROCESSLIST to see if there was anything there. There were a ton of queries in there, many of them had been running for more than a minute. One had been sitting there for almost an hour!

Optimizing queries

Finally, I found the problem. All the slow queries were working with a single table. There was a mix of SELECT and UPDATE statements, so I figured the table was probably missing indices or something, something to make the queries run slowly.

I called SHOW CREATE TABLE xyz on the table, to see the structure of it. I was wrong, there were lots of keys on the table. Knowing that MySQL will only use one key per query, my first guess was that the problem was that maybe there were actually too many keys on the table, and the table would benefit by having fewer keys with multiple columns in it instead, targeted at these particularly queries.

I tested my theory by hand writing a simple query to see how slow it would be. It was slow, but it only took about one second, not a minute. So it wasn't that.

I wondered if I was missing something. Calling SHOW PROCESSLIST shows a summary of queries, but it cuts them off. A quick DuckDuckGo search later, and I found out you can call SHOW FULL PROCESSLIST to see the entire query.

It was then that I discovered what the problem was. The query was written exclusively using LIKE statements instead of =, eg.:

SELECT *
FROM xyz
WHERE thing_id LIKE '12345678'
AND status LIKE 'ok'

Even the update statements used LIKE:

UPDATE xyz
SET status = 'ok'
WHERE id LIKE '12345678'

I found it unconventional to say the least. But was that really causing the problem?

I changed my hand-written query to use LIKE instead of =, and sure enough, I had to Ctrl+C to abort it after a long time of waiting.

I realised that yes, of course, this would slow things down. MySQL must be scanning the entire table, converting the IDs from numbers to strings, and doing pattern matching on each one. No wonder it was running so slowly!

Closer to a solution

I searched the code base for "LIKE" and found the cause. Buried in a custom query builder a past developer had assembled, it would only use either LIKE or IN for all query parameters, including in UPDATE statements.

I'm not totally sure what the developer was thinking here. Were they making it so you could search on any field easily? I'm not sure, because I wasn't able to find an example where fields were actually searched on anywhere. We may never know.

The problem was in the code base

I was surprised the problem actually was in the code base itself. It made sense to me, this isn't the kind of problem that would have shown up locally, or even when the site first launched. It would have grown slowly as the table size grew, and apparently only became a major issue once the table had over 750k records in it.

The solution seemed straightforward to me. I modified the API endpoints that used this table, and rewrote the queries directly instead of using this query builder code. (Side note: I've never liked query builders, and this is an excellent example of why!)

I would have liked to modify the query builder to replace LIKE with =, but because I'm not sure if that functionality was needed elsewhere, I thought it best to leave it alone, and migrate away from the query builder instead.

Ship it!

Last step was to commit and push the code, and rolled out an updated version of the system. Shortly after the new version went live, I logged in to the database again and ran SHOW PROCESSLIST. Nothing but a bunch of idle connections! Perfect!

I went over to the AWS admin panel, and sure enough, the "Read IOPS/second" chart had dropped from a steady 100 down to 0! That was a nice reassurance that things were massively improved.

The site wasn't just working again, it was faster than ever!

Lessons learned

There are always lessons to be learned from any outage. Here are a few I learned today:

  1. You should definitely not use LIKE in your database queries to match numeric, indexed IDs. I've never done this, but now the fact has been ingrained deep in my brain.

  2. You probably shouldn't write your own query builders. Again, I've never liked query builders so this just reaffirmed my belief.

  3. You should maybe think about testing your web apps with a very large amount of dummy data in the database sometimes. I've never done this, as it's slow and seems a bit excessive, but I think I may start trying this in the future, particularly on systems where I expect the tables to grow enormously over time.

  4. SHOW FULL PROCESSLIST is a thing. Okay, that's not so much a lesson as it is something I learned that wasn't already in my tool belt.

All in all, the whole process took about an hour, and I'm glad I was able to get things back up and running quickly. Beyond the lessons learned, I got to know the system a little better, and have a lot of new ideas of ways I can improve and speed things up in the future, so it was definitely a worthwhile experience!

Published on February 24th, 2023. © Jesse Skinner

Web apps that last

A castle made of sand

When you're building a new web application, or even a new feature, how can you ensure that you're not creating a nightmare code base that will need to be rewritten completely in a few years?

Some people will say it's hopeless to even try and write code that will last. I've even heard people suggest that you should aim to rewrite all your code every few years. That sounds like a very expensive, wasteful strategy.

In two decades of building web apps, I've seen many codebases start as a shiny new prototype and grow into a huge system. In some cases, they've become old, ugly, painful legacy systems that teams are begging to rewrite and replace. (And often those rewrites will themselves grow into ugly, painful legacy systems!) But sometimes a codebase will remain more or less unchanged a decade later, running smoothly as ever.

I believe there are some decisions you can make when writing code that will help it to last longer, and withstand the test of time.

Change is inevitable

Probably the one thing you can be sure of is that change will come. The goals of a business will change, and the people within a business will change. There will inevitably be features added, and existing features will evolve and be repurposed. The names of the products will almost surely change. So is it even possible to write code that doesn't need to change?

I think the key is in the phrase "If it ain't broke, don't fix it". Code that is fulfilling its task, that is doing what it's supposed to do, and is bug-free, is code that will last a long time.

Nightmare code

To understand how to write code that will last, let's think about the opposite: a nightmare codebase that demands to be rewritten. The worst I've seen is a web server written as a giant single file with thousands of lines of code. A system built like a house of cards, where changing one thing will break everything. Code that is very difficult to read or understand. Code that literally gives developers nightmares.

Unfortunately, this is often the kind of code that comes out of throwing together a quick prototype. A hero developer stays up late one night and churns out a first draft like a stream-of-consciousness. The next morning, the business owner is delighted to see their dreams come to life. Everyone's happy.

Then, they ask to change just one thing. Add this little feature. And this other feature. And now this user needs this other thing. And could you just change that other thing quick?

Months later, and this rough draft has accidentally become the foundation for a web application that continues to grow, held together with digital duct tape.

So how do you prevent this nightmare from unfolding?

Do one thing, and do it well

Modularity is extremely important in writing code that will last. And a good module is a piece of code that does one thing, and does it well. That one thing might be interfacing with a single database table. Or it could be handling HTTP calls on a single URL and passing the data to and from other modules that talk to the database.

I find generally that it works best when each module has zero, one or two major dependencies. With zero dependencies, you have a set of functions that receive input data, process it in some way, and return results. With one dependency, you have a set of functions that act as an abstraction or interface to that dependency. With two dependencies, you're writing code that bridges the gap between the two, acting as an adapter or controller.

More than two major dependencies, and you should ask yourself if there's any way to split things up into smaller pieces that are responsible for fewer things.

A dependency might not always be a program you have to install. Another module in your system is also a dependency. So is your business logic. I think about dependencies as anything that your module "knows about". This could even be the shape of certain data structures that might not have explicit type definitions.

The fewer things a module knows about, the more likely the module will be able to persist unchanged over time, because there will be fewer reasons to change it.

When your web application is built with small, independent modules that only do one thing, the chances are much, much lower that any of those pieces will need to be rewritten. And the chance of the whole application needing to be rewritten all at once drops to nearly zero. Even if you later want to do a major redesign, you'll find it easier to copy over lots of these older, simple modules to reuse in the new system.

Finally, a tangible example

Let's say you need to send out a Forgot Password email. You could do the whole thing in one file, but I would prefer to split it up like this:

  1. A module that knows how to actually send an email using AWS SES or something, but doesn't know the recipient, subject or body of the email. function sendEmail(toAddress, subject, body), for example.

  2. A module that knows about the subject and body of the Forgot Password email, but doesn't know who it's sending to or what the reset URL will be. function sendForgotPasswordEmail(toAddress, resetUrl)

  3. A module for the user table in the database, that has a function to generate a reset code, but doesn't know how the reset code will be used or even whether an email will be sent out. function createResetCode(userEmail)

  4. A module that knows about the URL structure of the site, and has a function that can generate a password reset link from a reset code. function getResetUrlFromCode(code)

  5. A module that ties everything together. It takes an email address, calls createResetCode, uses that to call getResetUrlFromCode, passes that to getForgotPasswordEmail, and sends the recipient address and email body to sendForgotPasswordEmail. forgotPassword(email)

  6. A user interface widget with a form, a text field and a button, so the user can type in their email address and click Send password reset link. When the form is submitted, it tells the user to go wait for the email.

  7. A module that is responsible for the server-side password reset part of the system. It receives the form submission, pulls the email address from the form data, calls forgotPassword, and then sends a success status back to the browser.

Here, only a few modules are likely to change. You'll probably see changes to the sendForgotPasswordEmail function, as well as the user interface widget. All the other modules I've outlined are very reusable, and highly unlikely to change, unless you change your email sending provider, or your database software, or something else major. Even in those situations, the code that needs to change is very isolated and easy to replace without affecting anything else.

You can even improve on this further, by having the contents of the email be database-driven, so that non-technical staff members can change the email templates themselves through an admin interface. But an architecture like this is a good starting point that makes those sorts of changes simpler to make.

A good start

If you get in the habit of writing more modular code, and splitting things up as early as possible, then the next time you're throwing together a quick prototype, you'll be able to lean on those principals in the process.

Instead of a giant ball of tangled dependencies and logic, you'll be building smaller, simpler, reusable components that can be used as solid building blocks. Some of these will be so useful and generic that you'll even be able to reuse them in completely different systems without changing them at all.

Published on February 19th, 2023. © Jesse Skinner

Trying to decide what to do next? Follow the light.

A tree growing towards the light

Happy New Year! I've been trying to come up with a New Year's resolution, and it got me thinking about setting goals, finding and following your purpose, and how this ties into some books I read this year.

TL;DR: If you're trying to decide what to do next in your life, in which direction you should expand and grow, maybe it helps to think like a tree and go where the sunshine is.

The purpose of life

The most interesting book I read in 2022 was The Romance of Reality, where Bobby Azarian does an amazing job applying Darwinism to the universe.

He says that the universe itself is a self-organizing system, with a bias towards increasing order, complexity and awareness. The idea is that the process of evolution was around before life emerged. Life formed at the bottom of the ocean at thermal vents, where tremendous amounts of extreme heat energy met extreme coldness. Eventually, the first forms of life emerged here to capture that wasted energy and put it to use.

Fast forward to the present, and now we have complex life everywhere we look, actively consuming any and all food and energy available and using it to maintain the structures of our bodies, our systems and our species.

As lifeforms, humans are sentient agents of the universe whose purpose is to use our awareness and intelligence in order to optimise the conversion of available energy into complexity and order.

It's not hard to see how true this is. So much of what we do boils down to consuming energy (food, fuel, heat) so that we can create more order (clean homes, growing families, bigger cities, information, content creation). Pretty much every job is related either directly or tangentially to this process, or optimizing the process.

We've even dug towards the centre of the earth and the centre of the atom in order to unlock and consume more and more available energy and use it to create increasingly complex systems and structures.

So how do you fit into all this? And how can you use this perspective and knowledge to live a good life?

Following your dreams

I just finished reading The Alchemist by Paulo Coelho. It's a story about following your dreams. It's about a shepherd boy in Spain who dreams about finding treasure at the pyramids in Egypt. Following the guidance of those he meets along the way, he goes on a quest to literally follow his dream and see where it leads.

I've always liked stories about following your dreams. I've always tried to follow my own dreams. Once upon a time, I was stuck in the proverbial office job, and dreamed of the day I could be working from home, setting my own hours, choosing work I found interesting. I dreamed about buying a house, getting married and having children.

I followed those dreams, and soon started freelancing. Several years later, I bought a house, got married, and now have the family and life I'd always dreamed about.

So now I'm looking to the future, wondering where to go from here. There are so many possibilities that it's hard to focus and hard to decide.

Thinking like a tree

We're all in search of our potential. It's not just about finding happiness, but also finding activities that won't burn us out, that are sustainable in every sense of the word.

A tree will put more energy into the branches that get more sunshine, because opportunity creates a void that must be filled. Nature abhors a vacuum.

Thinking about our careers, interests and opportunities, it's as if we are trees with branches growing out in many directions, and we are trying to decide whether we should grow in this direction or that.

Like a tree, we can feel the energy coming from each branch to decide whether it's getting more sunshine or less. Sometimes this process is described as market research, trying to establish whether there is demand for a particular enterprise. Is it time to hire an employee? Or write a book? Or start a non-profit? Or teach? Or make a video game? Or take time off and travel? Or just hunker down and work harder doing the same things as ever?

Energy can take the form of light, or heat, or money, but also inspiration, joy, excitement and motivation. Which activities will give you a turbo boost and allow you to grow and expand further? Which are a dead end? You can also see which of your branches are already giving you more energy back. And you can expand carefully, incrementally, to get more feedback, to see if these directions are the right directions for you.

They need not expand forever in any given direction. Maybe there's a ton of energy available to move in the direction of, for example, publishing one small video game. But maybe that's also where it stops, and going all-in on video game development would be a terrible mistake. Or, maybe it opens up a new opportunity, one nobody could see or feel from here?

Follow the light

Follow your passions? What does that even mean? Instead, follow the light.

What is shining brightest to you right now? Where are your branches expanding to, and which of those branches are shouting "Go this way!!" Optimize for excitement. Learn how to convert some of that sunshine into food, by bringing joy to others (aka "providing value") such that others will be happy to give you sunshine tokens (aka "money") for the joy you bring.

If something excites you, it'll likely excite others. Because you do not live alone in a desert. If you can capture a bit of that sunshine out of the air, you can make it available to the whole world. And when you do, more energy will flow back from the world to you, as if to say "Yes, keep going!"

Follow the light. Capture excitement out of the air and share it with the world.

Published on January 1st, 2023. © Jesse Skinner

Add Mastodon replies to your blog

You can now comment on blog posts on Coding with Jesse! I turned off comments years ago, because I was getting tons of spam. But recently, with my return to social media, I decided to integrate Mastodon to give people a way to comment on and interact with my articles.

Initially, I wasn't sure how I would accomplish this. Mastodon has a ton of servers, and Mastodon search can only search for hashtags, so how would I know whether someone commented on my article? And how would I integrate it into my website?

I looked around at how some other blogs were handling this, and came across Webmentions, and particularly, Webmention.io. It's a web standard for communicating interactions across web servers! Perfect!

I naively assumed that Mastodon would automatically hit my server with notifications any time someone favourited, boosted or replied to a post that contained a link to my site. But alas, Mastodon doesn't do that for privacy reasons (understandably).

Fortunately, I'm not the first one to run into this problem, and so there's a free service available that solves this problem. If you sign up for brid.gy, you can link your social media accounts, including Mastodon, and brid.gy will automatically send webmentions to your site whenever one of your posts contains a link to your site, and people reply to, boost or favourite your post. Essentially, your Mastodon posts become an anchor for all the interactions on your blog posts.

With these two services in hand, here's how you can integrate Mastodon into your website the way I did:

1. Sign in to webmention.io.

You need to sign in with your website URL, and your GitHub account. Also, your blog needs to link to that GitHub profile with either <a href="https://github.com/jesseskinner" rel="me"> or <link href="https://github.com/jesseskinner" rel="me">, to prove that you own the site.

2. Add webmention tags to your blog

When you sign in, go to https://webmention.io/settings. Under Setup, you'll see these two link tags:

<link rel="webmention" href="https://webmention.io/username/webmention" />
<link rel="pingback" href="https://webmention.io/username/xmlrpc" />

Copy these and paste them into the <head> on your blog. These will tell other services (like brid.gy) where they need to send webmentions for your posts.

Go to fed.brid.gy and, if you're like me, you'll want to click on "Cross-post to a Mastodon account", so that it'll integrate with your existing Mastodon account.

4. Post a link to a blog post on Mastodon

Try linking to your most recent blog post on Mastodon. If you already did this some time ago, brid.gy will scan your posts looking for links. You can also feed it a URL to a specific Mastodon post so that it will discover it.

Brid.gy will periodically poll your account looking for new interactions on these posts, and will send any new favourites, boosts or replies to webmention.io.

Note that your post doesn't count as a webmention - only the interactions on that post do. But you can reply to your own post as a way to trigger a webmention.

When I was setting this up, I was logged into both brid.gy and webmention.io, clicking the "Poll now" button on brid.gy and eagerly looking for interactions to show up. You have to have some patience here as well, as both services have a bit of a delay.

Once you see some mentions show up on webmention.io, you're ready to render them onto your blog.

5. Add the webmentions onto your website

Here's the trickier part. You'll need to hit the webmention.io API and fetch the mentions for your blog post. You can do this server-side, if you want. My blog is static, so I needed to do this client side.

Since the results are paginated, you can only get back 100 at a time. I wrote this function to help me retrieve all the pages at once, and sort the results into chronological order:

async function getMentions(url) {
    let mentions = [];
    let page = 0;
    let perPage = 100;

    while (true) {
        const results = await fetch(
            `https://webmention.io/api/mentions.jf2?target=${url}&per-page=${perPage}&page=${page}`
        ).then((r) => r.json());

        mentions = mentions.concat(results.children);

        if (results.children.length < perPage) {
            break;
        }

        page++;
    }

    return mentions.sort((a, b) => ((a.published || a['wm-received']) < (b.published || b['wm-received']) ? -1 : 1));
}

Then, I used the results of this to pull out four things: the favourites, boosts, replies, and also a link to the original post where I can send other visitors to my blog if they want to "Discuss this article on Mastodon". Here's how that looks:

let link;
let favourites;
let boosts;
let replies;

const mentions = await getMentions(url);

if (mentions.length) {
    link = mentions
        // find mentions that contain my Mastodon URL
        .filter((m) => m.url.startsWith('https://toot.cafe/@JesseSkinner/'))
        // take the part before the hash
        .map(({ url }) => url.split('#')[0])
        // take the first one
        .shift();

    // use the wm-property to make lists of favourites, boosts & replies
    favourites = mentions.filter((m) => m['wm-property'] === 'like-of');
    boosts = mentions.filter((m) => m['wm-property'] === 'repost-of');
    replies = mentions.filter((m) => m['wm-property'] === 'in-reply-to');
}

Of course, you should replace the link to my profile with the link to your own. I'm taking the first mention (after sorting chronologically) that is interacting with one of my posts, and linking to that post URL.

With those in hand, you'll have everything you need to render the replies, boosts and favourites to your blog. My approach was to render just the avatars of everyone who boosted or favourited my post, and all the replies.

One thing to watch out for is that the content of each reply is HTML. To be safe (paranoid), I'm running the HTML through sanitize-html to make sure nobody can inject sketchy HTML into my site.

6. Allow people to share posts without mentions

For any posts that don't have any mentions, I added a different button, "Share this on Mastodon". When you click it, it runs this code, which prompts you for your Mastodon server (inspired by Advent of Code's share functionality):

const server = prompt('Mastodon Instance / Server Name?');

if (server) {
    // test if server looks like a domain
    if (!server.match(/^[^\s]+\.[^\s]+$/)) {
        alert('Invalid server name');
        return;
    }

    const text = `"${post.title}" by @[email protected]\n\n${url}`;

    window.open(`https://${server}/share?text=${encodeURIComponent(text)}`);
}

Yay for Mastodon comments!

I'm really happy with how this turned out. To add some placeholders on my old blog posts, I posted some links to some of the more recent posts, so they interactions would have a place to live. For the older posts, I'm just relying on the share functionality.

I'm considering implementing some server-side functionality to replace either webmention.io or brid.gy in the future, so that the mentions live in my database instead of relying on a third-party service that may disappear one day. I think I could also skip the webmentions process by associating a Mastodon post URL with each blog post, and then using the Mastodon API of my server to periodically check for interactions and replies. Or maybe it could log in to my server and listen for notifications. But for now, this works really well.

So from now on, whenever I write a new blog post, like this one, I'm sure to share in on Mastodon and give a place for readers to ask questions or discuss the article. Check the end of this blog post to see how it all looks, and be sure to favourite, boost or reply to my Mastodon post so that you show up on the page as well!

Published on December 27th, 2022. © Jesse Skinner

Advent of Code 2022

I've been really enjoying working on this year's Advent of Code. If you haven't heard of it, it's a series of coding puzzles, two a day for 25 days, from December 1st to December 25th every year. It only started a few days ago, so it's not too late to catch up. Or, if you're reading this later, you can always go back and try it out at your leisure. But, it is a lot of fun to wait until midnight each day to see what the next puzzle is.

What's cool about it, is that you can use any programming language you want. You just need to take an input file, run a calculation based on the puzzle instructions, and come up with an "answer", which is usually a number of some kind.

You can use your favourite language to try to come up with an answer as fast as possible, or you can use it as an opportunity to strengthen your skills in another language, even a language you've never used before and want to try out!

You need to log in with GitHub, Google, Reddit or Twitter, and then you can download an input file for each day. You'll need to read the input file in to your language of choice, and parse and process each line of the file.

If you're really fast, you can even get on the leaderboard. But doing that requires completing the puzzles in just a few minutes at midnight EST so that you're one of the first 100 people to do so. I'm definitely not fast enough to even bother trying!

So far, I've been using JavaScript with Node.js this year. My approach is to pipe in the input into my puzzle solution like this:

node solution.js < input.txt

To do this, I'm using an npm library called split that simply splits a stream into lines, to make it easier to work with. Here's a simple example that just counts the number of lines in a stream:

import split from 'split';

// keep some variables in the module scope to keep track of things
let count = 0;

// read the input file as a stream from stdin (Standard Input)
process.stdin

    // pipe it to the split library, to split it into lines
    .pipe(split())

    // receive a callback for each line in the stream
    .on('data', (line) => {
        // do something with each line, in this case just counting
        count++;
    })

    // receive a single callback when the file is done
    .on('end', () => {
        // do something at the end, eg. console.log the output
        console.log(count);
    });

If you're interested, there is an online community on Reddit where you can share your solution and join in the discussion to see what others have done each day.

I'll be sharing my progress on Mastodon at @[email protected], so you can follow me on there for updates and commentary.

I've also been pushing my solutions to Advent of Code up to GitHub, so feel free to see how I've approached it, if you're interested. But no cheating! 😉

Published on December 4th, 2022. © Jesse Skinner
<< older posts newer posts >> All posts