Creating a multi-language compiler system: Containerization (7/11)

May 18, 2021

Creating a multi-language compiler system: Containerization (7/11)

Filed under: Programming — admin @ 10:29 PM

Table of Contents:

This post will discuss how to containerize the multi-language compiler system. As far as functionality goes, it is not necessarily needed: the system as described up to this point is complete and will function as described. Containerization is just a nice to have since it provides a consistent environment for the compiler system to run it. Additionally, containerizing the system will allow for a more flexible architecture since each language can run in its own container. When combined with orchestration platforms like Kubernetes, the architecture can become even more powerful as these different containers can have replicas and autoscaling.

The system will be containerized via Docker; we will isolate each language into its own Dockerfile. These individual language Dockerfiles will extend a general purpose Dockerfile that will contain features common to all environments. You can think of this as a similar approach to the Bash scripts of the file watcher.

Top-level Dockerfile

This is the base Dockerfile that the various language-specific ones extend. This Dockerfile is responsible for:

Adding packages common to all images
Compiling the file watcher component code
Compiling the execution component code
Settings up the user and execution environments and directories

To provide a degree of isolation and some security, the execution component will run as a different, lower privileged, user than the file watcher. Read/write/execute privileges are lowered as well for the execution environment. This helps a bit from a security standpoint, although it is definitely not foolproof. The top-level Dockerfile is provided below:

FROM n0madic/alpine-gcc:9.2.0

RUN apk add --update --no-cache su-exec inotify-tools build-base busybox-suid sudo

# Setup user
ARG USER=user
ENV HOME=/home/${USER}
ENV EXEC=exec
ENV EXEC_HOME=/home/${EXEC}

ENV CODE_PATH=/home/${USER}/code
ENV EXEC_PATH=/home/${EXEC}/code

RUN mkdir ${HOME}
RUN mkdir ${CODE_PATH}

RUN mkdir ${EXEC_HOME}
RUN mkdir ${EXEC_PATH}

ADD agentshared ${CODE_PATH}

# Build file watcher code
RUN g++ -std=c++17 -o ${CODE_PATH}/agent -I ${CODE_PATH}/builtin/code/agent/thirdparty/cereal/include \
    -I ${CODE_PATH}/builtin/code/agent/thirdparty/thread_pools/include \ 
    ${CODE_PATH}/builtin/code/agent/src/*.cpp \ 
    ${CODE_PATH}/builtin/code/agent/src/Agent/Notify/*.cpp \ 
    -lstdc++fs -pthread

# Build executor code
RUN g++ -std=c++17 -o ${EXEC_PATH}/executor ${CODE_PATH}/builtin/code/executor/src/Source.cpp -lstdc++fs -pthread

RUN mv ${CODE_PATH}/builtin/scripts/startup.sh ${CODE_PATH}
RUN mv ${CODE_PATH}/builtin/scripts/shutdown.sh ${CODE_PATH}
RUN mv ${CODE_PATH}/builtin/config/config.json ${CODE_PATH}
RUN rm -rf ${CODE_PATH}/builtin

RUN adduser -D ${USER} && echo "$USER ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/$USER && chmod 0440 /etc/sudoers.d/$USER
RUN adduser -D exec

RUN sudo passwd -d root
RUN sudo passwd -d ${USER}

RUN echo 'user ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers

RUN chown -R ${USER}:${USER} ${HOME}
RUN chmod -R 751 ${HOME}

RUN chown -R ${EXEC}:${EXEC} ${EXEC_HOME}
RUN chmod -R 555 ${EXEC_PATH}

The startup.sh and shutdown.sh scripts referenced in the Dockerfile as shell scripts that, as their name suggests, are invoked at startup and shutdown. The startup script is responsible for setting up the appropriate folders when a container is launched. Its content is shown below:

#!/bin/bash

CONTAINER_ID=$(basename $(cat /proc/1/cpuset))

create_folders () {
    mkdir ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/${LANGUAGE}/output/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/${LANGUAGE}/workspace/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/${LANGUAGE}/stdin/${CONTAINER_ID}
}

fix_config () {
    sed -i "s|\${CODE_PATH}|${CODE_PATH}|g" ${CODE_PATH}/config.json
    sed -i "s|\${UNIQUE_ID}|${CONTAINER_ID}|g" ${CODE_PATH}/config.json
    sed -i "s|\${LANGUAGE}|${LANGUAGE}|g" ${CODE_PATH}/config.json
    sed -i "s|\${SUPPORTED_LANGUAGES}|\"${SUPPORTED_LANGUAGES}\"|g" ${CODE_PATH}/config.json
    sed -i "s|\${IS_MULTITHREADED}|${IS_MULTITHREADED}|g" ${CODE_PATH}/config.json
}

start_agent () {
    ./agent config.json
}

launch_dotnet () {
    dotnet run
    sudo su-exec exec dotnet run
}

main () {
    launch_dotnet
    create_folders
    fix_config
    start_agent
}

main

Since multiple containers can be launched, each container needs its own isolated container environment. This is handled by the create_folders function. The fix_configs function is responsible for setting up the configuration that the file watcher will use. Once the appropriate folders have been created and the configuration substitutions made, the file watcher can be launched via the start_agent function.

The opposite of this process happens on a container shutdown. The created folders are deleted, any unprocessed input is relocated, and the file watcher is shut down. The code for this is shown below:

#!/bin/bash

CONTAINER_ID=$(basename $(cat /proc/1/cpuset))

kill_agent () {
    killall -9 agent
}

relocate_input () {
    mkdir ${CODE_PATH}/share/relocate/input/${CONTAINER_ID}
    mkdir ${CODE_PATH}/share/relocate/arguments/${CONTAINER_ID}
    mv ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}/* ${CODE_PATH}/share/relocate/input/${CONTAINER_ID}/
    mv ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}/* ${CODE_PATH}/share/relocate/arguments/${CONTAINER_ID}/
}

delete_folders () {
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/output/${CONTAINER_ID}
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/workspace/${CONTAINER_ID}
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}
    rm -rf ${CODE_PATH}/share/${LANGUAGE}/stdin/${CONTAINER_ID}
}

main () {
    relocate_input
    delete_folders
    kill_agent
}

main

Language-specific Dockerfiles

The language-specific Dockerfiles are much smaller since they contain only the additional functionality needed for a particular language environment. This language-specific functionality is usually just packages or runtimes that are needed for the compiler/interpreter to run. An example for the Java Dockerfile is shown below:

FROM compiler-base-alpine:latest

# Install Java
RUN apk add --update --no-cache openjdk11 --repository=http://dl-cdn.alpinelinux.org/alpine/edge/community

# Setup Java environment
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk
RUN export JAVA_HOME

ARG USER=user
ENV CODE_PATH=/home/${USER}/code

# Setup language(s)
ENV LANGUAGE=java
ENV SUPPORTED_LANGUAGES=java
ENV IS_MULTITHREADED=true

# Setup PATH
ENV PATH=/usr/lib/jvm/java-11-openjdk/bin:${PATH}

USER ${USER}
WORKDIR ${CODE_PATH}

CMD ["./startup.sh"]

This Dockerfile extends the base one by installing OpenJDK and setting up the environment variables and path that are needed for Java to be invoked from the command line. Dockerfiles for other languages follow a similar pattern. The script to build these Dockerfiles is provided below. This script builds the Dockerfile for each supported language, tags it as latest, and pushes it to the local Docker repository.

#!/bin/bash

languages="alpine c cpp cs java py"

for language in $languages; do
    echo "Building image for ${language}"
    sudo docker build -t compiler-base-${language} -f Dockerfiles/${language}/Dockerfile .
    echo "Tagging image for ${language}"
    sudo docker tag compiler-base-${language}:latest localhost:32000/compiler-base-${language}:latest
    echo "Pushing image for ${language}"
    sudo docker push localhost:32000/compiler-base-${language}:latest
done

At this point each language has been containerized and has its own defined environment for the compiler system to run it. The next post will cover how these Dockerfiles work in combination with Kubernetes to provide resiliency and scaling.

Comments (0)

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

RCE Endeavors 😅

May 18, 2021

Creating a multi-language compiler system: Containerization (7/11)

No Comments »

Leave a comment