Table of Contents:
- Introduction (1/11)
- Goals and Architecture (2/11)
- The inotify API (3/11)
- File Watcher, C++ (4/11)
- File Watcher, Bash (5/11)
- Execution Engine (6/11)
- Containerization (7/11)
- Kubernetes (8/11)
- Demo (9/11)
- Conclusion (10/11)
- System Setup (11/11)
- Source code
This post will discuss how to containerize the multi-language compiler system. As far as functionality goes, it is not necessarily needed: the system as described up to this point is complete and will function as described. Containerization is just a nice to have since it provides a consistent environment for the compiler system to run it. Additionally, containerizing the system will allow for a more flexible architecture since each language can run in its own container. When combined with orchestration platforms like Kubernetes, the architecture can become even more powerful as these different containers can have replicas and autoscaling.
The system will be containerized via Docker; we will isolate each language into its own Dockerfile. These individual language Dockerfiles will extend a general purpose Dockerfile that will contain features common to all environments. You can think of this as a similar approach to the Bash scripts of the file watcher.
Top-level Dockerfile
This is the base Dockerfile that the various language-specific ones extend. This Dockerfile is responsible for:
- Adding packages common to all images
- Compiling the file watcher component code
- Compiling the execution component code
- Settings up the user and execution environments and directories
To provide a degree of isolation and some security, the execution component will run as a different, lower privileged, user than the file watcher. Read/write/execute privileges are lowered as well for the execution environment. This helps a bit from a security standpoint, although it is definitely not foolproof. The top-level Dockerfile is provided below:
FROM n0madic/alpine-gcc:9.2.0
RUN apk add --update --no-cache su-exec inotify-tools build-base busybox-suid sudo
# Setup user
ARG USER=user
ENV HOME=/home/${USER}
ENV EXEC=exec
ENV EXEC_HOME=/home/${EXEC}
ENV CODE_PATH=/home/${USER}/code
ENV EXEC_PATH=/home/${EXEC}/code
RUN mkdir ${HOME}
RUN mkdir ${CODE_PATH}
RUN mkdir ${EXEC_HOME}
RUN mkdir ${EXEC_PATH}
ADD agentshared ${CODE_PATH}
# Build file watcher code
RUN g++ -std=c++17 -o ${CODE_PATH}/agent -I ${CODE_PATH}/builtin/code/agent/thirdparty/cereal/include \
-I ${CODE_PATH}/builtin/code/agent/thirdparty/thread_pools/include \
${CODE_PATH}/builtin/code/agent/src/*.cpp \
${CODE_PATH}/builtin/code/agent/src/Agent/Notify/*.cpp \
-lstdc++fs -pthread
# Build executor code
RUN g++ -std=c++17 -o ${EXEC_PATH}/executor ${CODE_PATH}/builtin/code/executor/src/Source.cpp -lstdc++fs -pthread
RUN mv ${CODE_PATH}/builtin/scripts/startup.sh ${CODE_PATH}
RUN mv ${CODE_PATH}/builtin/scripts/shutdown.sh ${CODE_PATH}
RUN mv ${CODE_PATH}/builtin/config/config.json ${CODE_PATH}
RUN rm -rf ${CODE_PATH}/builtin
RUN adduser -D ${USER} && echo "$USER ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/$USER && chmod 0440 /etc/sudoers.d/$USER
RUN adduser -D exec
RUN sudo passwd -d root
RUN sudo passwd -d ${USER}
RUN echo 'user ALL=(ALL) NOPASSWD: ALL' >> /etc/sudoers
RUN chown -R ${USER}:${USER} ${HOME}
RUN chmod -R 751 ${HOME}
RUN chown -R ${EXEC}:${EXEC} ${EXEC_HOME}
RUN chmod -R 555 ${EXEC_PATH}
The startup.sh and shutdown.sh scripts referenced in the Dockerfile as shell scripts that, as their name suggests, are invoked at startup and shutdown. The startup script is responsible for setting up the appropriate folders when a container is launched. Its content is shown below:
#!/bin/bash
CONTAINER_ID=$(basename $(cat /proc/1/cpuset))
create_folders () {
mkdir ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}
mkdir ${CODE_PATH}/share/${LANGUAGE}/output/${CONTAINER_ID}
mkdir ${CODE_PATH}/share/${LANGUAGE}/workspace/${CONTAINER_ID}
mkdir ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}
mkdir ${CODE_PATH}/share/${LANGUAGE}/stdin/${CONTAINER_ID}
}
fix_config () {
sed -i "s|\${CODE_PATH}|${CODE_PATH}|g" ${CODE_PATH}/config.json
sed -i "s|\${UNIQUE_ID}|${CONTAINER_ID}|g" ${CODE_PATH}/config.json
sed -i "s|\${LANGUAGE}|${LANGUAGE}|g" ${CODE_PATH}/config.json
sed -i "s|\${SUPPORTED_LANGUAGES}|\"${SUPPORTED_LANGUAGES}\"|g" ${CODE_PATH}/config.json
sed -i "s|\${IS_MULTITHREADED}|${IS_MULTITHREADED}|g" ${CODE_PATH}/config.json
}
start_agent () {
./agent config.json
}
launch_dotnet () {
dotnet run
sudo su-exec exec dotnet run
}
main () {
launch_dotnet
create_folders
fix_config
start_agent
}
main
Since multiple containers can be launched, each container needs its own isolated container environment. This is handled by the create_folders function. The fix_configs function is responsible for setting up the configuration that the file watcher will use. Once the appropriate folders have been created and the configuration substitutions made, the file watcher can be launched via the start_agent function.
The opposite of this process happens on a container shutdown. The created folders are deleted, any unprocessed input is relocated, and the file watcher is shut down. The code for this is shown below:
#!/bin/bash
CONTAINER_ID=$(basename $(cat /proc/1/cpuset))
kill_agent () {
killall -9 agent
}
relocate_input () {
mkdir ${CODE_PATH}/share/relocate/input/${CONTAINER_ID}
mkdir ${CODE_PATH}/share/relocate/arguments/${CONTAINER_ID}
mv ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}/* ${CODE_PATH}/share/relocate/input/${CONTAINER_ID}/
mv ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}/* ${CODE_PATH}/share/relocate/arguments/${CONTAINER_ID}/
}
delete_folders () {
rm -rf ${CODE_PATH}/share/${LANGUAGE}/input/${CONTAINER_ID}
rm -rf ${CODE_PATH}/share/${LANGUAGE}/output/${CONTAINER_ID}
rm -rf ${CODE_PATH}/share/${LANGUAGE}/workspace/${CONTAINER_ID}
rm -rf ${CODE_PATH}/share/${LANGUAGE}/arguments/${CONTAINER_ID}
rm -rf ${CODE_PATH}/share/${LANGUAGE}/stdin/${CONTAINER_ID}
}
main () {
relocate_input
delete_folders
kill_agent
}
main
Language-specific Dockerfiles
The language-specific Dockerfiles are much smaller since they contain only the additional functionality needed for a particular language environment. This language-specific functionality is usually just packages or runtimes that are needed for the compiler/interpreter to run. An example for the Java Dockerfile is shown below:
FROM compiler-base-alpine:latest
# Install Java
RUN apk add --update --no-cache openjdk11 --repository=http://dl-cdn.alpinelinux.org/alpine/edge/community
# Setup Java environment
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk
RUN export JAVA_HOME
ARG USER=user
ENV CODE_PATH=/home/${USER}/code
# Setup language(s)
ENV LANGUAGE=java
ENV SUPPORTED_LANGUAGES=java
ENV IS_MULTITHREADED=true
# Setup PATH
ENV PATH=/usr/lib/jvm/java-11-openjdk/bin:${PATH}
USER ${USER}
WORKDIR ${CODE_PATH}
CMD ["./startup.sh"]
This Dockerfile extends the base one by installing OpenJDK and setting up the environment variables and path that are needed for Java to be invoked from the command line. Dockerfiles for other languages follow a similar pattern. The script to build these Dockerfiles is provided below. This script builds the Dockerfile for each supported language, tags it as latest, and pushes it to the local Docker repository.
#!/bin/bash
languages="alpine c cpp cs java py"
for language in $languages; do
echo "Building image for ${language}"
sudo docker build -t compiler-base-${language} -f Dockerfiles/${language}/Dockerfile .
echo "Tagging image for ${language}"
sudo docker tag compiler-base-${language}:latest localhost:32000/compiler-base-${language}:latest
echo "Pushing image for ${language}"
sudo docker push localhost:32000/compiler-base-${language}:latest
done
At this point each language has been containerized and has its own defined environment for the compiler system to run it. The next post will cover how these Dockerfiles work in combination with Kubernetes to provide resiliency and scaling.