Parrot Virtual Machine/Run Core and Opcodes

Parrot Virtual Machine

Run Core

We've discussed run cores earlier, but in this chapter we are going to get into a much deeper discussion of them. Here, we are going to talk about opcodes, and the special opcode compiler that converts them into standard C code. We will also look at how these opcodes are translated by the opcode compiler into different forms, and we will see the different runcores that perform these opcodes.

Opcodes

Opcodes are written using a very special syntax which is a mix of C and special keywords. Opcodes are converted by the opcode compiler, tools/dev/ops2c.pl into the formats necessary for the different run cores.

The core opcodes for Parrot are all defined in src/ops/, in files with a *.ops extension. Opcodes are divided into different files, depending on their purpose:

Ops file	Purpose
bit.ops	bitwise logical operations
cmp.ops	comparison operations
core.ops	Basic Parrot operations, private internal operations, control flow, concurrency, events and exceptions.
debug.ops	ops for debugging Parrot and HLL programs.
experimental.ops	ops which are being tested, and which might not be stable. Do not rely on these ops.
io.ops	ops to handle input and output to files and the terminal.
math.ops	mathematical operations
object.ops	ops to deal with object-oriented details
obscure.ops	ops for obscure and specialized trigonometric functions
pic.ops	private opcodes for the polymorphic inline cache. Do not use these.
pmc.ops	Opcodes for dealing with PMCs, creating PMCs. Common operations for dealing with array-like PMCs (push, pop, shift, unshift) and hash-like PMCs
set.ops	ops to set and load registers
stm.ops	Ops for software transactional memory, the inter-thread communication system for Parrot. In practice, these ops are not used, use the STMRef and STMVar PMCs instead.
string.ops	Ops for working with strings
sys.ops	Operations to interact with the underlying system
var.ops	ops to deal with lexical and global variables

Writing Opcodes

Ops are defined with the op keyword, and work similarly to C source code. Here is an example:

op my_op () {
}

Alternatively, we can use the inline keyword as well:

inline op my_op () {
}

We define the input and output parameters using the keywords in and out, followed by the type of input. If an input parameter is used but not altered, you can define it as inconstThe types can be PMC, STR (strings), NUM (floating-point values) or INT (integers). Here is an example function prototype:

op my_op(out NUM, in STR, in PMC, in INT) {
}

That function takes a string, a PMC, and an int, and returns a num. Notice how the parameters do not have names. Instead, they correspond to numbers:

op my_op(out NUM, in STR, in PMC, in INT)
              ^       ^       ^       ^
              |       |       |       |
             $1      $2      $3      $4

Here's another example, an operation that takes three integer inputs, adds them together, and returns an integer sum:

op sum(out INT, in INT, in INT, in INT) {
   $1 = $2 + $3 + $4;
}

Nums are converted into ordinary floating point values, so they can be passed directly to functions that require floats or doubles. Likewise, INTs are just basic integer values, and can be treated as such. PMCs and STRINGs, however, are complex values. You can't pass a Parrot STRING to a library function that requires a null-terminated C string. The following is bad:

#include <string.h>
op my_str_length(out INT, in STR) {
  $1 = strlen($2);  // WRONG!
}

Advanced Parameters

When we talked about the types of parameters above, we weren't entirely complete. Here is a list of direction qualifiers that you can have in your op:

direction	meaning	example
in	The parameter is an input	op my_op(in INT)
out	The parameter is an output	op pi(out NUM) { $1 = 3.14; }
inout	The parameter is an input and an output:	op increment(inout INT) { $1 = $1 + 1; } \|- \| inconst \|\| The input parameter is constant, it is not modified \| <pre> op double_const(out INT, inconst INT) { $1 = $2 + $2; } And, in PIR: $I0 = double_const 5 # numeric literal "5" is a constant
invar	The input parameter is a variable, like a PMC	op my_op(invar PMC)

The type of the argument can also be one of several options:

type	meaning	example
INT	integer value	42 or $I0
NUM	floating-point value	3.14 or $N3
STR	string	"Hello" or $S4
PMC	PMC variable	$P0
KEY	Hash key	["name"]
INTKEY	Integer index	[5]
LABEL	location in code to jump to	jump_here:

OP naming and function signatures

You can have many ops with the same name, so long as they have different parameters. The two following declarations are okay:

op my_op (out INT, in INT) {
}

op my_op (out NUM, in INT) {
}

The ops compiler converts these op declarations similar to the following C function declarations:

INTVAL op_my_op_i_i(INTVAL param1) {
}

NUMBER op_my_op_n_i(INTVAL param1) {
}

Notice the "_i_i" and "_n_i" suffixes at the end of the function names? This is how Parrot ensures that function names are unique in the system to prevent compiler problems. This is also an easy way to look at a function signature and see what kinds of operands it takes.

Control Flow

An opcode can determine where control flow moves to after it has completed executing. For most opcodes, the default behavior is to move to the next instruction in memory. However, there are many sorts of ways to alter control flow, some of which are very new and exotic. There are several keywords that can be used to obtain an address of an operation. We can then goto that instruction directly, or we can store that address and jump to it later.

Keyword	Meaning
NEXT()	Jump to the next opcode in memory
ADDRESS(a)	Jump to the opcode given by a. a is of type `opcode_t*`.
OFFSET(a)	Jump to the opcode given by offset a from the current offset. a is typically type `in LABEL`.
POP()	get the address given at the top of the control stack. This feature is being deprecated and eventually Parrot will be stackless internally.

The Opcode Compiler

The opcode compiler is located at dev/build/ops2c.pl, although most of its functionality is located in a variety of included libs, such as Parrot::OpsFile. Parrot::Ops2c::* and Parrot::OpsTrans::*.

We'll look at the different runcores in the section below. Suffice it to say, however, that different runcores require that the opcodes be compiled into a different format for execution. Therefore the job of the opcode compiler is relatively complex: it must read in the opcode description files and output syntactically correct C code in several different output formats.

Dynops: Dynamic Opcode Libraries

The ops we've been talking about so far are all the standard built-in ops. These aren't the only ops available however, Parrot also allows dynamic op libraries to be loaded in at runtime.

dynops are dynamically-loadable op libraries. They are written almost exactly like regular built-in ops are, but they're compiled separately into a library and loaded in to Parrot at runtime using the .loadlib directive.

Run Cores

Runcores are the things that decode and execute the stream of opcodes in a PBC file. In the most simple case, a runcore is a loop that takes each bytecode value, gathers the parameter data from the PBC stream, and passes control to the opcode routine for execution.

There are several different opcores. Some are very practical and simple, some use special tricks and compiler features to optimize for speed. Some opcores perform useful ancillary tasks such as debugging and profiling. Some runcores serve no useful purpose except to satisfy some basic academic interest.

Basic Cores

Slow Core: In the slow core, each opcode is compiled into a separate function. Each opcode function takes two arguments: a pointer to the current opcode, and the Parrot interpreter structure. All arguments to the opcodes are parsed and stored in the interpreter structure for retrieval. This core is, as its name implies, very slow. However, it's conceptually very simple and it's very stable. For this reason, the slow core is used as the base for some of the specialty cores we'll discuss later.
Fast Core: The fast core is exactly like the slow core, except it doesn't do the bounds checking and explicit context updating that the slow core does.
Switched Core: The switch core uses a gigantic C switch { } statement to handle opcode dispatching, instead of using individual functions. The benefit is that functions do not need to be called for each opcode, which saves on the number of machine code instructions necessary to call an opcode.

Native Code Cores

JIT Core
Exec Core

Advanced Cores

The two cores that we're going to discuss next rely on a specialty feature of some compilers called computed goto. In normal ANSI C, labels are control flow statements and are not treated like first-class data items. However, compilers that support compute goto allow labels to be treated like pointers, stored in variables, and jumped to indirectly.

 void * my_label = &&THE_LABEL;
 goto *my_label;

The computed goto cores compile all the opcodes into a single large function, and each opcode corresponds to a label in the function. These labels are all stored in a large array:

 void *opcode_labels[] = {
   &&opcode1,
   &&opcode2,
   &&opcode3,
   ...
 };

Each opcode value can then be taken as an offset to this array as follows:

 goto *opcode_labels[current_opcode];

Computed Goto Core: The computed goto core uses the mechanism described above to dispatch the various opcodes. After each opcode is executed, the next opcode in the incoming bytecode stream is looked up in the table and dispatched from there.
Predereferenced Computed Goto Core: In the precomputed goto core, the bytecode stream is preprocessed to convert opcode numbers into the respective labels. This means they don't need to be looked up each time, the opcode can be jumped to directly as if it was a label. Keep in mind that the dispatch mechanism must be used after every opcode, and in large programs there could be millions of opcodes. Even small savings in the number of machine code instructions between opcodes can make big differences in speed.

Specialty Cores

GC Debug Core
Debugger Core
Profiling Core
Tracing Core

Previous	Parrot Virtual Machine	Next
IMCC and PIRC	Parrot Virtual Machine	PMC System