Building & Debugging Custom Container Images for Lambda | Glenn Gillen
Building & Debugging Custom Container Images for Lambda

Building & Debugging Custom Container Images for Lambda

Jun 07, 2021

So I've got a side-project that's one of those somewhat frustrating endeavours. It's just useful enough that I don't want to shut it down, it's not important enough to justify much of my focus. Which in itself is fine. But inevitably something happens that means it suddenly requires me somewhat urgent focus: a CVE in a dependency that might expose the data I've stored in it, the decommissioning of a service I'm dependent on. It's then that I inevitably discover that the bulk of what I'm running is no longer officially supported. Heroku has stopped supporting the version of ruby I'm using. The postgres version I'm using is woefully out of date. All manner of local development build chain apparently deprecated.

And so what seemed like it would be a relatively trivial version bump means an upgrading of all the things.

Taking ownership of erosion resistance

One of the propositions of Heroku was that it was "erosion resistent". Which to their credit they've done a commendable job of honouring, I've had joke apps still running there almost a decade later. But they don't make any promises about supporting a particular stack forever. Nor do I think they should. But I also don't love the upgrade all of the things at once dynamic that happens when I do need to fix something.

It's been a long time desire to refactor some of these smaller apps to use custom container images on AWS Lambda. For my fairly low-traffic and sporadic loads I don't need permanently provisioned infrastructure. I like the idea of knowing the stack I'm using will be usable for as long as I'm willing to support it. And for my current problem there's a path to make my upgrade hell easier by isolating the offending function into it's own function/container/service and leaving the rest of the app untouched.

The best laid plans…

A quick skim of some AWS documentation, blog post announcements, and some GitHub READMEs and I'm ready to go. This should be easy. Some minor refactoring a Docker builds later and we get:

START RequestId: 5a0e91fc-7ae2-480e-af2d-260de6cebf90 Version: $LATEST
time="2021-05-17T07:50:54.186" level=warning msg="Cannot list external agents" error="open /opt/extensions: no such file or directory"
time="2021-05-17T07:50:54.186" level=warning msg="Couldn't find valid bootstrap(s)" bootstrapPathsChecked="[aws_lambda_ric]"
time="2021-05-17T07:50:54.186" level=warning msg="First fatal error stored in appctx: Runtime.InvalidEntrypoint"

And here I'm stuck. For hours. Actually, days. I slept on it and came back no wiser to what I'd done wrong. I get it, it needs a bootstrap script. But I've got one. In fact in my fumbling frustrations I've got dozens just incase I'd misconfigured or misnamed them. Every permutation and interpretation I could think of is in my container just in case. It's still not working.

It's at this point I should probably abandon this plan and just go make the change on Heroku like I ordinarily would. But... that's not how I do things. It's time to pull this thing apart and understand it properly.

So, dear reader, here's hoping you get value from this bottoms-up dive into how custom Docker images for Lambda work.

The Steve Austin approach

It's ok, this isn't going to cost $6M. But we do have the technology. We can rebuild this. So it's time to start with the most basic and bare bones container possible, and incrementally step our way toward something closer to the developer ergonomics I'm after. Each way solidifying understanding of how and why things work a certain way before adapting them to my particular needs.

We'll start by using the barebones shell examples from the AWS docs, which consists of a bootstrap script:

#!/bin/sh
set -euo pipefail
# Initialization - load function handler
source $LAMBDA_TASK_ROOT/"$(echo $_HANDLER | cut -d. -f1).sh"
# Processing
while true
do
HEADERS="$(mktemp)"
# Get an event. The HTTP request will block until one is received
EVENT_DATA=$(curl -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next")
# Extract request ID by scraping response headers received above
REQUEST_ID=$(grep -Fi Lambda-Runtime-Aws-Request-Id "$HEADERS" | tr -d '[:space:]' | cut -d: -f2)
# Run the handler function from the script
RESPONSE=$($(echo "$_HANDLER" | cut -d. -f2) "$EVENT_DATA")
# Send the response
curl -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d "$RESPONSE"
done

The lambda container will use the script above when it is instantiated to bootstrap itself, and fetch for any work to do. The $_HANDLER environment variable is set at runtime and will contain the name of the function that contains our code. The while true loop ensures that we keep looping around to process events for as long as the container is alive. The curl command will block until we have an event to process so we wont DoS ourselves by looping constantly when there is nothing to do.

FROM alpine:3.7
RUN apk add curl
ENV LAMBDA_TASK_ROOT=/var/task
ENV LAMBDA_RUNTIME_DIR=/var/runtime
COPY bootstrap ${LAMBDA_RUNTIME_DIR}/bootstrap
RUN chmod 755 ${LAMBDA_RUNTIME_DIR}/bootstrap
COPY function.sh ${LAMBDA_TASK_ROOT}/function.sh
RUN chmod 755 ${LAMBDA_TASK_ROOT}/function.sh
CMD [ "function.handler" ]

function.sh

function handler () {
EVENT_DATA=$1
echo "$EVENT_DATA" 1>&2;
RESPONSE="Echoing request: '$EVENT_DATA'"
echo $RESPONSE
}

docker build -t aws-lambda-custom-image-example .

Local testing

Install the Lambda Runtime Interface Emulator:

mkdir -p ~/.aws-lambda-rie && \
curl -Lo ~/.aws-lambda-rie/aws-lambda-rie https://github.com/aws/aws-lambda-runtime-interface-emulator/releases/latest/download/aws-lambda-rie && \
chmod +x ~/.aws-lambda-rie/aws-lambda-rie

Now you can mount the emulator into your container at runtime, and adjust the entrypoint to use it:

docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 \
--entrypoint /aws-lambda/aws-lambda-rie \
aws-lambda-custom-image-example /var/runtime/bootstrap function.handler

Test it:

curl -v -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{}'

You'll see a 200 response and the literal string from our function of Echoing request: '{}'. Change the payload (i.e., the {} in the -d '{} argument).

Unpacking how and why this all works

Now that we've got a basic working implementation we can use it to understand all of the moving parts. Once there's a solid understanding of all of the fundamentals it'll be easier to rebuild it to do exactly what we need.

The Lambda Runtime Interface

The Lambda Runtime Interface is just an API contract your container needs to adhere to. The Lambda platform has a certain set of baseline expectations about how it can communicate with you container, and how your container will communicate back. If you don't honour the contract nothing will work.

Our first contact with that is with the URL we provided to the curl command. The localhost:9000 host and port combo is how we target the Docker container we're running, but what about that long path of /2015-03-31/functions/function/invocations? Where did that come from? Enter the magic of the Lambda Runtime Interface Emulator we downloaded, and specifically the aws-lambda-rie command. We mounted the emulator into the container at runtime as a new volume. We then updated the entrypoint of the container to be the aws-lambda-rie which we then passed two additional arguments: the name of our bootstrap script and the name of our function, /var/runtime/bootstrap and function.handler respectively. Standard naming conventions with Lambda handlers expect the handler to follow the format of filename_without_extension.method_name. In our case we're calling the handler() method within the function.sh file. You'll see in the bootstrap script we append the .sh to make sure we source in the correct file.

The aws-lambda-rie command is what does all the magic of making our function callable via HTTP and at the appropriate path.

It's also helping us manage and process events and state for our invocations. Within the bootstrap script you may have noticed we're actually making requests to get an initial event to process, and then sending a POST request with the result when we're done processing. Our container is handling those requests too (i.e., we're querying an API within the same container). That API is all magically available and does what it needs to thank to the Lambda Runtime Interface Emulator.

On AWS these requests would go back out to the platform itself but for the purposes of local testing this is more than sufficient.

Retrying our custom Ruby container

So here's the Dockerfile for my own custom Ruby 2.5 container image which in theor should work based on how I understood the documentation:

FROM alpine:3.7
ENV RUBY_MAJOR 2.5
ENV RUBY_VERSION 2.5.9
ENV RUBY_DOWNLOAD_SHA256 a87f2fa901408cc77652c1a55ff976695bbe54830ff240e370039eca14b358f0
ENV RUBYGEMS_VERSION 3.1.4
ENV LAMBDA_TASK_ROOT=/var/task
ENV LAMBDA_RUNTIME_DIR=/var/runtime
RUN mkdir -p /usr/local/etc \
&& { \
echo 'install: --no-document'; \
echo 'update: --no-document'; \
} >> /usr/local/etc/gemrc
RUN set -ex \
\
&& apk add --no-cache --virtual .ruby-builddeps \
autoconf \
bison \
bzip2 \
bzip2-dev \
ca-certificates \
coreutils \
dpkg-dev dpkg \
gcc \
gdbm-dev \
glib-dev \
libc-dev \
libffi-dev \
libxml2-dev \
libxslt-dev \
linux-headers \
make \
ncurses-dev \
openssl \
openssl-dev \
procps \
readline-dev \
ruby \
tar \
yaml-dev \
zlib-dev \
xz
RUN wget -O ruby.tar.xz "https://cache.ruby-lang.org/pub/ruby/${RUBY_MAJOR%-rc}/ruby-$RUBY_VERSION.tar.xz" \
&& echo "$RUBY_DOWNLOAD_SHA256 *ruby.tar.xz" | sha256sum -c - \
\
&& mkdir -p /usr/src/ruby \
&& tar -xJf ruby.tar.xz -C /usr/src/ruby --strip-components=1 \
&& rm ruby.tar.xz \
\
&& cd /usr/src/ruby \
\
# hack in "ENABLE_PATH_CHECK" disabling to suppress:
# warning: Insecure world writable dir
&& { \
echo '#define ENABLE_PATH_CHECK 0'; \
echo; \
cat file.c; \
} > file.c.new \
&& mv file.c.new file.c \
\
&& autoconf \
&& gnuArch="$(dpkg-architecture --query DEB_BUILD_GNU_TYPE)" \
# the configure script does not detect isnan/isinf as macros
&& export ac_cv_func_isnan=yes ac_cv_func_isinf=yes \
&& ./configure \
--build="$gnuArch" \
--disable-install-doc \
--enable-shared \
&& make -j "$(nproc)" \
&& make install \
\
&& runDeps="$( \
scanelf --needed --nobanner --recursive /usr/local \
| awk '{ gsub(/,/, "\nso:", $2); print "so:" $2 }' \
| sort -u \
| xargs -r apk info --installed \
| sort -u \
)" \
&& apk add --virtual .ruby-rundeps $runDeps \
bzip2 \
ca-certificates \
libffi-dev \
openssl-dev \
yaml-dev \
procps \
zlib-dev \
&& apk del .ruby-builddeps \
&& cd / \
&& rm -r /usr/src/ruby \
\
&& gem update --system "$RUBYGEMS_VERSION"
ENV BUNDLER_VERSION 2.2.11
RUN gem install bundler --version "$BUNDLER_VERSION"
# install things globally, for great justice
# and don't create ".bundle" in all our apps
ENV GEM_HOME /usr/local/bundle
ENV BUNDLE_PATH="$GEM_HOME" \
BUNDLE_BIN="$GEM_HOME/bin" \
BUNDLE_SILENCE_ROOT_WARNING=1 \
BUNDLE_APP_CONFIG="$GEM_HOME"
ENV PATH $BUNDLE_BIN:$PATH
RUN mkdir -p "$GEM_HOME" "$BUNDLE_BIN" \
&& chmod 777 "$GEM_HOME" "$BUNDLE_BIN"
RUN gem install aws_lambda_ric
WORKDIR ${LAMBDA_TASK_ROOT}
RUN mkdir -p ${LAMBDA_TASK_ROOT}
COPY app.rb ${LAMBDA_TASK_ROOT}
ENTRYPOINT ["/usr/local/bin/aws_lambda_ric"]
CMD ["app.App::Handler.process"]

And for the sake of completeness here's the app.rb that serves as our function:

module App
class Handler
def self.process(event:, context:)
"Hello World!"
end
end
end

Towards the end of the Dockerfile we install aws_lambda_ric which is the AWS Lambda Ruby Runtime Interface Client. Think of it like the emulator, but only implementing the API endpoints required for a production deployment (i.e., not the lambda platform API for getting the next event or posting results back). It also serves as a replacement for our bootstrap script and handles the polling required to hand out additional work.

All the way back at the start of this post it was this script that was causing our error of "Couldn't find valid bootstrap(s)", even though the script exists. Was it not setting the right values somewhere? Is there something else a bootstrap needed to do to announce that the container was indeed bootstrapped?

Use what we've learnt from the minimal approach to debug this thing. Time to mount this container just like we did before with the emulator and post an event to it. We'll launch the new container, with the new bootstrap (i.e., aws_lambda_ric instead of our script), with the new handler:

docker run -d -v ~/.aws-lambda-rie:/aws-lambda -p 9000:8080 \
--entrypoint /aws-lambda/aws-lambda-rie \
lambda-ruby2.5 \
/usr/local/bin/aws_lambda_ric app.App::Handler.process

Post an event to it just like before and... 502 bad gateway. Again. Looking at the logs and once again:

time="2021-06-07T06:51:55.195" level=error msg="Init failed" InvokeID= error="Couldn't find valid bootstrap(s): [/usr/local/bin/aws_lambda_ric]"
time="2021-06-07T06:51:58.162" level=warning msg="Couldn't find valid bootstrap(s)" bootstrapPathsChecked="[/usr/local/bin/aws_lambda_ric]"

Hrm. As we learnt before the bootstrap is pretty simple, it just needs to sit there in a loop looking for work. Maybe I've been assuming inferring more than I should have here from the "valid bootstrap" terminology? Maybe this isn't about aws_lambda_ric being invalid, maybe it can't find that script at all! Time to connect to that container via an interactive shell and test that theory:

$ docker run --entrypoint="" -it lambda-ruby2.5 /bin/sh
$ find / -name "aws_lambda_ric"
/usr/local/bundle/bin/aws_lambda_ric
/usr/local/bundle/gems/aws_lambda_ric-1.0.2/lib/aws_lambda_ric
/usr/local/bundle/gems/aws_lambda_ric-1.0.2/bin/aws_lambda_ric

Yep! Definitely a user problem here! It turns out all of the various base images I'd tried had been installing aws_lambda_ric into /usr/local/bundle/bin/ (note the extra /bundle sub-directory). Update the ENTRYPOINT in my Dockerfile (and in the command to test locally) and it works!

And there you have it. Use aws_lambda_ric (it's available in multiple languages, not just ruby) and you can turn any custom image into a Lambda-compatible container.

Hi, I'm Glenn! 👋 I've spent most of my career working with or at startups. I'm currently the Director of Product @ Ockam where I'm helping developers build applications and systems that are secure-by-design. It's time we started securely connecting apps, not networks.

Previously I led the Terraform product team @ HashiCorp, where we launched Terraform Cloud and set the stage for a successful IPO. Prior to that I was part of the Startup Team @ AWS, and earlier still an early employee @ Heroku. I've also invested in a couple of dozen early stage startups.