Parrot Virtual Machine/Run Core and Opcodes
Run Core
[edit | edit source]We've discussed run cores earlier, but in this chapter we are going to get into a much deeper discussion of them. Here, we are going to talk about opcodes, and the special opcode compiler that converts them into standard C code. We will also look at how these opcodes are translated by the opcode compiler into different forms, and we will see the different runcores that perform these opcodes.
Opcodes
[edit | edit source]Opcodes are written using a very special syntax which is a mix of C and special keywords. Opcodes are converted by the opcode compiler, tools/dev/ops2c.pl
into the formats necessary for the different run cores.
The core opcodes for Parrot are all defined in src/ops/
, in files with a *.ops
extension. Opcodes are divided into different files, depending on their purpose:
Ops file | Purpose |
---|---|
bit.ops | bitwise logical operations |
cmp.ops | comparison operations |
core.ops | Basic Parrot operations, private internal operations, control flow, concurrency, events and exceptions. |
debug.ops | ops for debugging Parrot and HLL programs. |
experimental.ops | ops which are being tested, and which might not be stable. Do not rely on these ops. |
io.ops | ops to handle input and output to files and the terminal. |
math.ops | mathematical operations |
object.ops | ops to deal with object-oriented details |
obscure.ops | ops for obscure and specialized trigonometric functions |
pic.ops | private opcodes for the polymorphic inline cache. Do not use these. |
pmc.ops | Opcodes for dealing with PMCs, creating PMCs. Common operations for dealing with array-like PMCs (push, pop, shift, unshift) and hash-like PMCs |
set.ops | ops to set and load registers |
stm.ops | Ops for software transactional memory, the inter-thread communication system for Parrot. In practice, these ops are not used, use the STMRef and STMVar PMCs instead. |
string.ops | Ops for working with strings |
sys.ops | Operations to interact with the underlying system |
var.ops | ops to deal with lexical and global variables |
Writing Opcodes
[edit | edit source]Ops are defined with the op
keyword, and work similarly to C source code. Here is an example:
op my_op () { }
Alternatively, we can use the inline
keyword as well:
inline op my_op () { }
We define the input and output parameters using the keywords in
and out
, followed by the type of input. If an input parameter is used but not altered, you can define it as inconst
The types can be PMC
, STR
(strings), NUM
(floating-point values) or INT
(integers). Here is an example function prototype:
op my_op(out NUM, in STR, in PMC, in INT) { }
That function takes a string, a PMC, and an int, and returns a num. Notice how the parameters do not have names. Instead, they correspond to numbers:
op my_op(out NUM, in STR, in PMC, in INT) ^ ^ ^ ^ | | | | $1 $2 $3 $4
Here's another example, an operation that takes three integer inputs, adds them together, and returns an integer sum:
op sum(out INT, in INT, in INT, in INT) { $1 = $2 + $3 + $4; }
Nums are converted into ordinary floating point values, so they can be passed directly to functions that require floats or doubles. Likewise, INTs are just basic integer values, and can be treated as such. PMCs and STRINGs, however, are complex values. You can't pass a Parrot STRING to a library function that requires a null-terminated C string. The following is bad:
#include <string.h> op my_str_length(out INT, in STR) { $1 = strlen($2); // WRONG! }
Advanced Parameters
[edit | edit source]When we talked about the types of parameters above, we weren't entirely complete. Here is a list of direction qualifiers that you can have in your op:
direction | meaning | example |
---|---|---|
in | The parameter is an input | op my_op(in INT) |
out | The parameter is an output | op pi(out NUM) { $1 = 3.14; } |
inout | The parameter is an input and an output: | op increment(inout INT) { $1 = $1 + 1; } |- | inconst || The input parameter is constant, it is not modified | <pre> op double_const(out INT, inconst INT) { $1 = $2 + $2; } And, in PIR: $I0 = double_const 5 # numeric literal "5" is a constant |
invar | The input parameter is a variable, like a PMC | op my_op(invar PMC) |
The type of the argument can also be one of several options:
type | meaning | example |
---|---|---|
INT | integer value | 42 or $I0 |
NUM | floating-point value | 3.14 or $N3 |
STR | string | "Hello" or $S4 |
PMC | PMC variable | $P0 |
KEY | Hash key | ["name"] |
INTKEY | Integer index | [5] |
LABEL | location in code to jump to | jump_here: |
OP naming and function signatures
[edit | edit source]You can have many ops with the same name, so long as they have different parameters. The two following declarations are okay:
op my_op (out INT, in INT) { }
op my_op (out NUM, in INT) { }
The ops compiler converts these op declarations similar to the following C function declarations:
INTVAL op_my_op_i_i(INTVAL param1) { }
NUMBER op_my_op_n_i(INTVAL param1) { }
Notice the "_i_i" and "_n_i" suffixes at the end of the function names? This is how Parrot ensures that function names are unique in the system to prevent compiler problems. This is also an easy way to look at a function signature and see what kinds of operands it takes.
Control Flow
[edit | edit source]An opcode can determine where control flow moves to after it has completed executing. For most opcodes, the default behavior is to move to the next instruction in memory. However, there are many sorts of ways to alter control flow, some of which are very new and exotic. There are several keywords that can be used to obtain an address of an operation. We can then goto
that instruction directly, or we can store that address and jump to it later.
Keyword | Meaning |
---|---|
NEXT() | Jump to the next opcode in memory |
ADDRESS(a) | Jump to the opcode given by a. a is of type opcode_t* .
|
OFFSET(a) | Jump to the opcode given by offset a from the current offset. a is typically type in LABEL .
|
POP() | get the address given at the top of the control stack. This feature is being deprecated and eventually Parrot will be stackless internally. |
The Opcode Compiler
[edit | edit source]The opcode compiler is located at dev/build/ops2c.pl
, although most of its functionality is located in a variety of included libs, such as Parrot::OpsFile
. Parrot::Ops2c::*
and Parrot::OpsTrans::*
.
We'll look at the different runcores in the section below. Suffice it to say, however, that different runcores require that the opcodes be compiled into a different format for execution. Therefore the job of the opcode compiler is relatively complex: it must read in the opcode description files and output syntactically correct C code in several different output formats.
Dynops: Dynamic Opcode Libraries
[edit | edit source]The ops we've been talking about so far are all the standard built-in ops. These aren't the only ops available however, Parrot also allows dynamic op libraries to be loaded in at runtime.
dynops are dynamically-loadable op libraries. They are written almost exactly like regular built-in ops are, but they're compiled separately into a library and loaded in to Parrot at runtime using the .loadlib
directive.
Run Cores
[edit | edit source]Runcores are the things that decode and execute the stream of opcodes in a PBC file. In the most simple case, a runcore is a loop that takes each bytecode value, gathers the parameter data from the PBC stream, and passes control to the opcode routine for execution.
There are several different opcores. Some are very practical and simple, some use special tricks and compiler features to optimize for speed. Some opcores perform useful ancillary tasks such as debugging and profiling. Some runcores serve no useful purpose except to satisfy some basic academic interest.
Basic Cores
[edit | edit source]- Slow Core
- In the slow core, each opcode is compiled into a separate function. Each opcode function takes two arguments: a pointer to the current opcode, and the Parrot interpreter structure. All arguments to the opcodes are parsed and stored in the interpreter structure for retrieval. This core is, as its name implies, very slow. However, it's conceptually very simple and it's very stable. For this reason, the slow core is used as the base for some of the specialty cores we'll discuss later.
- Fast Core
- The fast core is exactly like the slow core, except it doesn't do the bounds checking and explicit context updating that the slow core does.
- Switched Core
- The switch core uses a gigantic C
switch { }
statement to handle opcode dispatching, instead of using individual functions. The benefit is that functions do not need to be called for each opcode, which saves on the number of machine code instructions necessary to call an opcode.
Native Code Cores
[edit | edit source]- JIT Core
- Exec Core
Advanced Cores
[edit | edit source]The two cores that we're going to discuss next rely on a specialty feature of some compilers called computed goto. In normal ANSI C, labels are control flow statements and are not treated like first-class data items. However, compilers that support compute goto allow labels to be treated like pointers, stored in variables, and jumped to indirectly.
void * my_label = &&THE_LABEL; goto *my_label;
The computed goto cores compile all the opcodes into a single large function, and each opcode corresponds to a label in the function. These labels are all stored in a large array:
void *opcode_labels[] = { &&opcode1, &&opcode2, &&opcode3, ... };
Each opcode value can then be taken as an offset to this array as follows:
goto *opcode_labels[current_opcode];
- Computed Goto Core
- The computed goto core uses the mechanism described above to dispatch the various opcodes. After each opcode is executed, the next opcode in the incoming bytecode stream is looked up in the table and dispatched from there.
- Predereferenced Computed Goto Core
- In the precomputed goto core, the bytecode stream is preprocessed to convert opcode numbers into the respective labels. This means they don't need to be looked up each time, the opcode can be jumped to directly as if it was a label. Keep in mind that the dispatch mechanism must be used after every opcode, and in large programs there could be millions of opcodes. Even small savings in the number of machine code instructions between opcodes can make big differences in speed.
Specialty Cores
[edit | edit source]- GC Debug Core
- Debugger Core
- Profiling Core
- Tracing Core