RCE Endeavors 😅

May 18, 2021

Creating a multi-language compiler system: File Watcher, C++ (4/11)

Filed under: Programming — admin @ 10:28 PM

Table of Contents:

These next few posts will go in to detail about the file watcher component. As previously discussed, this component is responsible for watching changes in the input directory and kick starting the compilation and execution process. The file watcher will be implemented as a background process, written in C++, along with some helper Bash scripts.

Configuration

The runtime configuration for the file watcher will be a straightforward JSON file. The deserialization functionality will come courtesy of the cereal library. In this configuration will be everything that is needed for the file watch process to perform its functionality: the paths to the various input, output, and intermediate directories and files, the set of supported languages, and whether to run in single threaded or multi-threaded mode. The final configuration is shown below. From looking at it, you can see that there are several variables that are meant to be substituted. How and why this is done will be covered in a future post, and a brief explanation of each field is provided below.

{
     "configuration": {
         "inputpath": "${CODE_PATH}/share/${LANGUAGE}/input/${UNIQUE_ID}",
         "outputpath": "${CODE_PATH}/share/${LANGUAGE}/output/${UNIQUE_ID}",
         "workspacepath": "${CODE_PATH}/share/${LANGUAGE}/workspace/${UNIQUE_ID}",
         "dependenciespath": "${CODE_PATH}/share/${LANGUAGE}/dependencies",
         "argumentspath": "${CODE_PATH}/share/${LANGUAGE}/arguments/${UNIQUE_ID}",
         "stdinpath": "${CODE_PATH}/share/${LANGUAGE}/stdin/${UNIQUE_ID}",
         "interactivetimeout": "600s",
         "relocatepath": "${CODE_PATH}/share/relocate/${UNIQUE_ID}",
         "bootstrappath": "${CODE_PATH}/share/bootstrap",
         "bootstrapscriptname": "bootstrap-common.sh",
         "supportedlanguages": [${SUPPORTED_LANGUAGES}],
         "ismultithreaded": ${IS_MULTITHREADED}
     }
 }

Hopefully most of these are pretty straightforward from their naming, or from having read the previous posts outlining the general architecture of the system. The inputpath is the folder where the user will provide the source code file to compile, and correspondingly the outputpath is where the execution output will be written to. The workspacepath is the folder where the compilation and execution will take place. As mentioned previously, this is to allow for multiple compilation processes at the same time without fear of one interfering with another.

The dependenciespath has not been discussed yet and is the path where dependencies for each language are present. What is meant by this is that more than just a source file is needed to compile under some languages. Specifically, to support compiling C# on the command line under .NET Core, there must be a .csproj file present in the directory. When the compilation process is to begin, everything present in the dependenciespath for a language is copied to the workspace path. With the languages supported under this particular system this is only an issue for C# and all others have an empty dependenciespath.

The argumentspath, stdinpath, and interactivetimeout all have to do with providing command-line input to a running executable. The argumentspath is the folder where the command-line arguments are stored for the source code. Likewise, the stdinpath is the folder where the input for the interactive session is stored. This file can change during the execution of a program and its changes will be picked up and written to the stdin of the running process by the execution component. The interactivetimeout is the time limit that this interactive session, where a user can provide input to a running process, can have.

For resiliency, there is a relocatepath field, which is the directory that input files that have not been processed yet will go in the event of a crash. This is done so that they may be relocated to another instance that is active for processing. The next two fields, bootstrappath and bootstrapscriptname are for the Bash scripts that are responsible for performing the core functionality: settings up the workspace folders, compiling the code, and invoking the execution component to run the executable and capture the output. The implementation of these Bash scripts will be covered in the next post.

Lastly, there are two, hopefully self-explanatory, fields supportedlanguages and ismultithreaded, which contain a list of supported languages for the compiler system and whether to run in multi-threaded mode.

This configuration file has a corresponding object in the file watcher code. The NotifyConfiguration object is defined below:

struct NotifyConfiguration
{
    std::string m_inputPath;
    std::string m_outputPath;
    std::string m_workspacePath;
    std::string m_dependenciesPath;
    std::string m_argumentsPath;
    std::string m_stdinPath;
    std::string m_interactiveTimeout;
    std::string m_relocatePath;
    std::string m_bootstrapPath;
    std::string m_bootstrapScriptName;
    std::vector<std::string> m_supportedLanguages;
    bool m_isMultithreaded;
};

The code to read this file at runtime is pretty straightforward thanks to the ease of the cereal API:

static std::shared_ptr<T> Read(const std::string& filePath, const std::string& name)
{
    std::ifstream fileStream(filePath);
    if (!fileStream.is_open())
    {
        std::cerr << "Failed to open " << filePath << std::endl;
        return nullptr;
    }

    T object;
    {
        cereal::JSONInputArchive iarchive(fileStream);
        iarchive(cereal::make_nvp(name.c_str(), object));
    }
  
    return std::make_shared<T>(object);
}

Event handlers

Listening to directory change events is what powers the entire compilation and execution process. The previous post covered the inotify API and this implementation just expands on that for a bit. Since multiple languages are supported, there needs to be a mapping between a language and an input folder:

std::unordered_map<std::string /*Language*/, std::unique_ptr<NotifyEventHandler> /*Handler*/> m_dispatchTable;

At runtime, each supported language will register a handler and add it to the map.

bool NotifyEventHandlerTopmost::AddHandler(std::unique_ptr<NotifyEventHandler> handler)
{
	const std::string& language = handler->Language();
	if (m_dispatchTable.find(handler->Language()) != m_dispatchTable.end())
	{
		std::cerr << "Handler for " << language << " already exists." << std::endl;
		return false;
	}

	std::cout << "Adding handler for " << language << std::endl;
	m_dispatchTable.insert({ handler->Language(), std::move(handler) });
	return true;
}

When an input file is added, its handler is found in the map, and if it exists, that handler subsequently gets invoked:

void NotifyEventHandlerTopmost::Handle(std::shared_ptr<NotifyEvent> event)
{
    const std::string& type = event->Language();
    auto handler = m_dispatchTable.find(type);
    if (handler == m_dispatchTable.end())
    {
        std::cerr << "No handler found for " << event->Language() << " files" << std::endl;
        return;
    }

    handler->second->Handle(event);
 }

Under this current implementation, all languages share a common handler, which launches a child process to compile and execute the source code, and subsequently reads the captured output.

void NotifyEventHandlerLanguage::Handle(std::shared_ptr<NotifyEvent> event)
{
    NotifyChildProcess childProcess(event);
    childProcess.Launch();

    auto outputPath = event->OutputPath() + std::filesystem::path::preferred_separator + std::to_string(event->Index()) +
		std::filesystem::path::preferred_separator + event->FileName() + ".log";
    const NotifyExecutionResult resultObject{ childProcess.Result(), childProcess.Output() };
    NotifySerializer<NotifyExecutionResult>::Write(outputPath, "result", resultObject);
}

Child process

The child process code is rather minimal. Since most of the workflow is delegated to a Bash script, the only things that the code needs to do is to format the arguments, call the Bash script, and read the output. The Bash script takes in a fair number of arguments whose usage and purpose will be explained in the next post.


void NotifyChildProcess::buildCommand(std::shared_ptr<NotifyEvent> notifyEvent)
{
	const auto argumentsFullPath = notifyEvent->ArgumentsPath() + std::filesystem::path::preferred_separator + notifyEvent->FileName() + ".args";
	const auto inputFileFullPath = notifyEvent->InputPath() + std::filesystem::path::preferred_separator + notifyEvent->FileName();
	const auto stdinFileFullPath = notifyEvent->StdinPath() + std::filesystem::path::preferred_separator + notifyEvent->FileName() + ".stdin";

	m_builtCommand.reserve(512);
	m_builtCommand = notifyEvent->BootstrapPath() + std::filesystem::path::preferred_separator + notifyEvent->BootstrapScriptName()
		+ " -f " + inputFileFullPath
		+ " -a " + argumentsFullPath
		+ " -s " + stdinFileFullPath
		+ " -t " + notifyEvent->InteractiveTimeout()
		+ " -i " + std::to_string(notifyEvent->Index())
		+ " -d " + notifyEvent->DependenciesPath()
		+ " -w " + notifyEvent->WorkspacePath()
		+ " -o " + notifyEvent->OutputPath()
		+ " -l " + notifyEvent->Language();
}

bool NotifyChildProcess::Launch()
{
	auto pipe = popen(m_builtCommand.c_str(), "r");
	if (!pipe)
	{
		perror("popen");
		std::cerr << "Could not execute " << m_builtCommand << std::endl;
		return false;
	}

	m_result = pclose(pipe);

	m_output = NotifyFile::ReadFile(OutputFilePath());

	return Success();
}

Multi-threading

The last feature to cover is related to how the events – the user adding a source file to the input directory – will be processed: serially or in parallel. As mentioned above, this is controlled via the ismultithreaded configuration parameter. Under the typical scenario an event comes in, gets processed, a child process is launched, and the thread blocks until the child process terminates. This works fine, but can be improved by taking advantage of parallelism. There shouldn’t be any dependencies between different users input files, so there shouldn’t be anything stopping us from running this process in parallel*.

To run in parallel, the handler for the event is called on a separate thread. This is done by taking advantage of a third party thread-pool library that is responsible for instantiating the thread pool. The event dispatch code is shown below:


void NotifyEventDispatcher::dispatchEvent(const inotify_event* pEvent)
{
	std::cout << "Read event for watch " << pEvent->wd << std::endl;

	if (pEvent->len <= 0)
	{
		std::cerr << "No file name associated with event. Watch descriptor = " << pEvent->wd << std::endl;
		return;
	}

	if (pEvent->mask & IN_CLOSE_WRITE)
	{
		std::string fileName((char*)pEvent->name);
		auto config = m_manager->Configuration();
		std::shared_ptr<NotifyEvent> notifyEvent(new NotifyEvent(config, fileName, pEvent->wd, pEvent->mask));

		if (m_manager->Configuration()->m_isMultithreaded && m_threadPool != nullptr)
		{
			m_threadPool->enqueue([this](std::shared_ptr<NotifyEvent> notifyEvent)
				{ m_handler->Handle(notifyEvent); },
				notifyEvent);
		}
		else
		{
			m_handler->Handle(notifyEvent);
		}
	}
}

This concludes the C++ portion of the file watcher. The next post will cover the Bash script portion and detail how the input file gets compiled and executed.

* I haven’t fully evaluated this. Although the code itself can run fine multi-threaded, there may be compilers that aren’t friendly to having multiple instances run at the same time.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

 

Powered by WordPress