Compiling C++ code dynamically at runtime under Linux
Mar 2021C/C++ do not have any inbuilt functionality to dynamically compile and run code. Yet, it can still be achieved although it will not be portable code. See also stackoverflow dynamic function creation in C++. The examples on this page will only work under Linux and some only on an AMD/Intel x86-64 CPU.
Write own machine code
In principle this is straight forward:- write machine code into a
char
array - allocate memory which allows code execution via
mmap
- copy the
char
array into the memory location viamemcpy
- point a function pointer to the memory address
A minimalist example
The below example calculates the square of an input number.// sqr.cpp #include <cstdlib> // EXIT_FAILURE etc #include <cstdio> // printf(), fopen() etc #include <cstring> // memcpy() #include <sys/mman.h> // mmap() int main(int argc, char** argv) { // machine code unsigned char opcode[] = { 0xf2, 0x0f, 0x59, 0xc0, // mulsd xmm0,xmm0 0xc3 // ret }; // allocate memory which allows code execution // https://en.wikipedia.org/wiki/NX_bit void* codelocation = mmap(NULL,sizeof(opcode), PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON,-1,0); // copy machine code to executable memory location memcpy(codelocation, opcode, sizeof(opcode)); // function pointer to point to that memory location double (*myfunc)(double); myfunc = reinterpret_cast<double(*)(double)>(codelocation); // read command line arguments and execute myfunc() double x = 0.0; if(argc>1) x = atof(argv[1]); double y = myfunc(x); printf("f(%f)=%f\n", x,y); return EXIT_SUCCESS; }Compile and execute as follows:
$ g++ -O2 -Wall sqr.cpp -o sqr $ ./sqr 7 f(7.000000)=49.000000
Debugging
It is easy to make mistakes writing machine code so we need to be able to disassemble it. We can do this dynamically by writing the code buffer into an output file and callobjdump
on it.
void print_asm(const void* buf, size_t n) { FILE* fp = fopen("/tmp/opcode.bin", "w"); if(fp!=NULL) { fwrite(buf, n, 1, fp); fclose(fp); } system("objdump -D -M intel -b binary -mi386 /tmp/opcode.bin"); }The example below calculates the harmonic series (∑ 1⁄i) using a function
f()
written in C++ and a function
myfunc()
written in machine code. It can also
print its own code at runtime. Compile and execute as follows:
$ g++ -Wall -O2 harmon.cpp -o harmon $ ./harmon 100 f(100)=5.187378 myfunc(100)=5.187378 $ ./harmon -d Disassembly of myfunc() ----------------------- /tmp/opcode.bin: file format binary Disassembly of section .data: 00000000 <.data>: 0: 85 ff test edi,edi 2: 7e 22 jle 0x26 4: 66 0f 57 c0 xorpd xmm0,xmm0 8: b8 01 00 00 00 mov eax,0x1 d: f2 0f 2a c8 cvtsi2sd xmm1,eax 11: f2 0f 2a d7 cvtsi2sd xmm2,edi 15: f2 0f 10 d9 movsd xmm3,xmm1 19: f2 0f 5e da divsd xmm3,xmm2 1d: f2 0f 58 c3 addsd xmm0,xmm3 21: 83 ef 01 sub edi,0x1 24: 75 eb jne 0x11 26: f3 c3 repz ret Disassembly of f() ------------------ /tmp/opcode.bin: file format binary Disassembly of section .data: 00000000 <.data>: 0: 85 ff test edi,edi 2: 66 0f 57 c0 xorpd xmm0,xmm0 6: 7e 1f jle 0x27 8: f2 0f 10 15 08 03 00 movsd xmm2,QWORD PTR ds:0x308 f: 00 10: f2 0f 2a cf cvtsi2sd xmm1,edi 14: 66 0f 28 da movapd xmm3,xmm2 18: 83 ef 01 sub edi,0x1 1b: f2 0f 5e d9 divsd xmm3,xmm1 1f: f2 0f 58 c3 addsd xmm0,xmm3 23: 75 eb jne 0x10 25: f3 c3 repz ret 27: f3 c3 repz ret 29: 0f 1f 80 00 00 00 00 nop DWORD PTR [eax+0x0]
How do I learn to write machine code?
The simplest answer is: learn from your compiler.Since the compiler knows "everything" on how to convert the high level language into machine code it is easiest to tap into that knowledge:
- write some example function
// myfunc.cpp double f(double x) { return x*x; }
- compile and then disassemble the code with e.g.
gdb
orobjdump
$ g++ -O2 -c myfunc.cpp $ gdb -batch -ex "file myfunc.o" -ex "set disassembly-flavor intel" -ex "disassemble/rs f" Dump of assembler code for function _Z1fd: 0x0000000000000000 <+0>: f2 0f 59 c0 mulsd xmm0,xmm0 0x0000000000000004 <+4>: c3 ret End of assembler dump.
This shows that the simple functionreturn x*x;
is converted into assemblymulsd xmm0,xmm0
,ret
which in machine code in hex-representation isf2 0f 59 c0
,c3
. This also shows that the compiler places the first input argument into the registerxmm0
and expects the return value inxmm0
as well. See calling convention for details.
- godbolt.org: in the output window tick the option "compile to binary"
Language parser
This is more complex but tools have been developed for language parsing, e.g. Lex and Yacc or Bison and Flex.Let the compiler write machine code
Although it is good fun to write machine code it will be incredibly complex to dynamically generate machine code from a string of instructions in some language. It is considerably simpler to use the C/C++ compiler to do that job for us:std::string code = "extern \"C\" double myfunc(double x) { ... }";
– write some C/C++ code into a string
(needextern "C"
block to avoid name mangling)- output string to temporary file
file.cpp
system("g++ file.cpp -o file.so -shared -fPIC");
– invoke external compiler, producesfile.so
void* dynlib = dlopen ("file.so", RTLD_LAZY);
– load shared library dynamicallydouble (*myfunc)(int) = (double(*)(int)) dlsym(dynlib, "myfunc");
– point function pointer to symbol exported byfile.so
double y = (*myfunc)(n);
– execute the function
Simplest example
Below is a simple example without error handling:// sqr.cpp #include <cstdlib> // system(), EXIT_SUCCESS #include <dlfcn.h> // dynamic library loading #include<string> #include <iostream> #include <fstream> int main(int argc, char** argv) { std::string code = "extern \"C\" double myfunc(double x) { return x*x; }"; // temporary output files std::string cppfile="/tmp/runtimecode.cpp"; std::string libfile="/tmp/runtimecode.so"; std::string logfile="/tmp/runtimecode.log"; std::ofstream out(cppfile.c_str(), std::ofstream::out); out << code; out.close(); // invoke external compiler std::string cmd = "g++ -Wall " + cppfile + " -o " + libfile + " -O2 -shared -fPIC &> " + logfile; system(cmd.c_str()); // dynamic library loading void* dynlib = dlopen (libfile.c_str(), RTLD_LAZY); // function pointer to symbol "myfunc" exported by the shared .so library double (*myfunc)(int); myfunc = (double(*)(int)) dlsym(dynlib, "myfunc"); // execute double x=0.0; if(argc>1) x = atof(argv[1]); double y=(*myfunc)(x); std::cout << "myfunc(" << x << ") = " << y << std::endl; return EXIT_SUCCESS; }Compile and execute as follows:
$ g++ -Wall -O2 sqr.cpp -o sqr -ldl $ ./sqr 1.5 myfunc(1.5) = 2.25We can inspect the dynamically created shared library:
$ nm /tmp/runtimecode.so | grep myfunc
0000000000000610 T myfunc T = in text section and global (exported)
$ objdump -d -M intel /tmp/runtimecode.so
...
0000000000000610 <myfunc>:
610: f2 0f 59 c0 mulsd xmm0,xmm0
614: c3 ret
...
Simple example with error handling
With added error handling the below function reads code from std-input, dynamically compiles and executes it. Compile and run the parser as follows:$ g++ -Wall parse.cpp -o parse -ldl $ ./parse 100 < myfunc.txt compiling ... running ... myfunc(100) = 5.18738Where
myfunc.txt
contains the calculation of the
harmonic series:
$ cat myfunc.txt double sum=0.0; for(int i=n; i>0; i--) { // n is the input to this function sum+=1.0/(double)i; } return sum;
Parsing classes
Classes can also be dynamically compiled and used as described above except that we need to export a class creation function in the shared.so
library:
// parse_class.cpp #include "base.h" [...] // add necessary class maker-function code = code + "\n" + "extern \"C\" base* make_class() {\n" + " return (base*) new myclass();\n" + "}" ; [...] // loading symbol from library and assign to function pointer base* (*make_class)(); // function pointer make_class = reinterpret_cast<base*(*)()>(dlsym(dynlib, "make_class")); [...] // execute function std::shared_ptr<base> f = std::shared_ptr<base>(make_class()); double y=(*f)(x);
// base.h #ifndef BASE_H #define BASE_H class base { public: virtual double operator()(double) const = 0; }; #endif /* BASE_H */
$ g++ -Wall -std=c++11 parse_class.cpp -o parse_class -ldl $ ./parse_class 2.5 < myclass.txt compiling ... running ... myclass(2.5) = 8.25Where
myclass.txt
is
$ cat myclass.txt #include "base.h" #include<cstdio> class myclass : public base { public: double operator()(double x) const { return x*x + 2.0*x - 3.0; } };
- Source: parse_class.cpp, base.h, myclass.txt