A-level Computing/CIE/Advanced Theory/System software

A-level Computing
Hardware	System Software	Security

Specification link

Purposes of an operating system

show understanding of how an OS can maximise the use of resources
describe the ways in which the user interface hides the complexities of the hardware from the user
show understanding of processor management: multiprogramming, including:
- the concept of multiprogramming and a process
- the process states: running, ready and blocked
- the need for low-level scheduling and high-level scheduling
- the concept of an interrupt
- how the kernel of the OS acts as the interrupt handler and how interrupt handling is used to manage low-level scheduling
show understanding of paging for memory management including:
- the concepts of paging and virtual memory
- the need for paging
- how pages can be replaced
- show how disk thrashing can occur

Virtual machine

show understanding of the concept of a virtual machine
give examples of the role of virtual machines
show understanding of the benefits and limitations of virtual machines

Translation software

show awareness of the need for different types of translation software
show understanding of how an interpreter can execute programs without producing a translated version
show understanding of the various stages in the compilation of a program: lexical analysis, syntax analysis, code generation and optimisation
show understanding of how the grammar of a language can be expressed using syntax diagrams or Backus-Naur Form (BNF) notation
show understanding of how Reverse Polish notation (RPN) can be used to carry out the evaluation of expressions

Purposes of an Operating System

OS Resources

The OS manages three main resources: the CPU, the memory, and the input/output (I/O) system. I/O takes a significantly longer time to access, so the OS must balance the use of these resources so that the CPU is not idle when waiting for I/O to take place.

User Interfaces

The user interface is how the user interacts with the computer and the OS. There are two main flavours of OS: command line interface (CLI) and graphical user interface (GUI).

CLI

GUI

Examples of a CLI and a GUI

Command Line Interface

A command line interface is a text-only interface that was common until around the 1990s. In order to use a CLI, the user needed to know the commands that were required to do what they want to be done, which had the disadvantage of being unintuitive.

Graphical User Interface

A graphical user interface is a more visual user interface. GUIs typically involve a system of Windows, Icons, Menus, and Pointers. GUIs are generally considered to be more intuitive and user-friendly than CLIs.

Processor Management

Processes

A process is a program in memory that is currently being executed or is waiting to be executed. Processes are controlled by process control blocks (PCBs).

PCBs keep track of:

Process ID: A unique identifier for the process.
State: Whether the process is running, ready, or blocked.
Privileges: What data, memory, and system functions the process has access to.
Register states: The values of each of the registers in the CPU as the process is being executed.
Priority: Whether the process is important enough to have priority in the scheduling algorithm.
Memory information: The amount of memory allocated for the process.
I/O information: A list of the I/O devices associated with the process.

Threads

A thread is a part of a process that is being executed. Processes can be single-threaded or multi-threaded. Multi-threaded processes are typically more efficient to execute, as different threads can run independently.

Threads can be run at the user level or at the kernel level. User-level threads are scheduled at the user level, so the kernel is not involved in thread scheduling. Kernel-level threads are scheduled at the kernel level, which makes it easier to access priveleged system functions.

Scheduling

Scheduling is the allocation of time to executing each process in the order that optimises the use of resources. A scheduling algorithm is an algorithm that determines which processes are run in what order.

Each process is in one of three states: ready, running, or blocked. Only one process can be running at a time. New processes always begin in the ready state.

To explain the scheduling algorithms, we will use an example of several processes being run. The first process, A, takes 10 cycles, the second process, B, takes 15 cycles, and the third process, C, takes 5 cycles. B and C arrive after A.

To do:
Illustrate the examples

First Come First Served

First Come First Served (FCFS) is a scheduling algorithm in which processes are done in the order that they arrived in the Ready queue.

In the example, this would mean process A is done first, process B next, and finally process C

Shortest Job First

Shortest Job First (SJF) is a scheduling algorithm which prioritises the shortest job in the queue.

In the example, process A would still go first, because B and C have not arrived, but it will be followed by process C, then process B.

Round Robin

A Round Robin algorithm allocates each time slice to processes in a repeating sequential order.

In the example with time slices of length 5 cycles, the time slices would go A, B, C, A, B, B.

Highest Response Ratio Next

With the highest response ratio next scheduling algorithm, the OS chooses the process with the highest response ratio. The response ratio is defined by $W+S \over S$ where W is the time the process has been waiting and S is the time the process will take to run.

Priority-based

A priority-based scheduling algorithm chooses the next process to be done based on priority criteria. If a process is more important, it will be executed earlier.

Interrupts

An interrupt is a signal that is sent to the OS to tell it to stop the current process and focus on a different task.

Interrupts are used to enforce time slices in pre·emptive scheduling algorithms.

Memory Management

The OS is not only responsible for determining how processes are to be given time, but also how processes are to be given memory. Thus, the OS needs some kind of method to determine how memory is to be allocated amongst processes.

Dynamic Partitioning

Dynamic partitioning is where the main memory is split into contiguous blocks which are exactly the right size for the process. This has the advantage of avoiding internal fragmentation but causes external fragmentation.

Paging

Paging is where the OS divides memory into equal-sized pages. Each process uses a certain amount of these pages.

The advantage of paging is that it can be used with virtual memory in order to increase the amount of memory that can be used at a given time. To do so, the main memory is divided into page-sized spaces called page frames. Pages can be loaded from the disk into these page frames. When a process is complete or not being executed, the pages can be loaded back to the disk.

To make sure that pages are being loaded in and out of memory efficiently, a page replacement algorithm needs to be used. A common page replacement algorithm is to put the least recently used page back into disk to make room for new pages.

Segmentation

Segmentation is like paging, except the memory is divided into blocks of varying, rather than fixed, size. Each segment is mapped to a section of main memory, similar to how pages are mapped to page frames in a paging system.

Virtual Memory

Virtual memory is where paging is used to expand the amount of memory available. In a virtual memory system, pages are loaded in and out of main memory as needed. Pages are given logical addresses which are mapped to page frames using a page map.

The main problem of virtual memory is that it causes disk thrashing. Disk thrashing is where pages are loaded in and out of the disk too often, which may damage the disk.

Virtual Machine

A virtual machine is a way of emulating an operating system or process within a different operating system. To do this, the host OS interfaces with the virtual machine, while the user interfaces with the host OS.

Virtual machines are either system virtual machines or process virtual machines. System virtual machines emulate an entire operating system, whereas process virtual machines emulate a program running within a different operating system. Process virtual machines can run inside of system virtual machines.

Translation Software

Types of Translation software

Compilers

A compiler is a program that converts the program source code into an executable file, which can then be distributed.

Interpreters

An interpreter is a program which reads the source code line-by-line and executes it without producing an executable file.

Assemblers

An assembler is a program which converts assembly code into an executable file.

The translation process

Lexical analysis

Lexical analysis is the process of breaking the source code into its lexemes. A lexeme is a word or symbol that appears in the program.

e.g. x = y + 1 consists of five lexemes: x, =, y, +, and 1.

Lexical analysis also examines what type of token each lexeme is. Tokens can be of the following types:

Identifier: A variable, like x.
Keyword: A word that is used to control the program, like if or while. Keywords cannot be used as identifiers.
Operators: Symbols that act on the variables, such as + and =.
Literal: Literal values that are used in the program, such as "Hello" or 1.
Separator: Symbols that separate one statement from another, such as semicolons or line breaks.
Comment: A note by the programmer that clarifies what the code does. Comments are ignored by the compiler.

The end goal of lexical analysis is to produce a list of tokens which can be parsed in syntax analysis.

e.g. The code x = y + 1 may be converted by lexical analysis to [(identifier,x),(operator,=),(identifier,y),(operator,+),(literal,1)]. This array of tokens is then parsed by syntax analysis.

Syntax analysis

Syntax analysis is the process of converting the string of tokens from lexical analysis into a parse tree. A parse tree is a data structure which stores the tokens according to the laws of precedence.

The laws of precedence, also called the order of operations, determines what operations in an expression are applied first. For instance, multiplication is applied before addition.

The full order of operations will typically resemble this:

Function calls, array access, field access
Unary operators
Multiplication & Division
Addition & Subtraction
Comparisions
Bitwise and, xor, then or
Logical and, then or
Assignment

Because operations with a lower precedence are applied later, operations with higher precedence tend to be at the bottom of the parse tree.

Backus-Naur Form (BNF)

Backus-Naur Form is a way of expressing the grammar of a programming language. BNF does so using a system of placeholders and alternatives.

Every place holder is defined as consisting of one of several alternatives. The symbol ::= means "is defined as". The symbol | means "or".

e.g. The following BNF definitions can be constructed based on the syntax diagrams to the right.

<digit> ::= 0|1|2|3|4|5|6|7|8|9
<variable> ::= x|y|z
<constant> ::= <digit>|<constant><digit>
<factor> ::= <constant>|<variable>|(<expression>)
<term> ::= <factor>|<term>*<factor>
<expression> ::= <term>|<expression>+<term>

Note that some definitions, such as the definition for <constant>, are recursive. This is useful as it allows us to create constants that are potentially infinite in length.

The advantage of using BNF as opposed to a syntax diagram is that BNF can be entered into and interpreted by a computer, whereas a syntax diagram can't be interpreted in the same way.

Optimisation

Optimisation is the process of making code more efficient. This means removing unnecessary steps and rephrasing complicated expressions.

e.g. The expression z = x^2 + 2*x*y + y^2 can be simplified to z = (x+y)^2

Optimisation is typically done in the back-end of the compiler, as it converts the program to assembly code and then an executable file.

Optimisation is beneficial because:

It reduces the amount of time that the program will take to run.
It reduces the amount of space that the program will take up in memory.

Reverse Polish Notation (RPN)

Reverse Polish Notation is a way of representing expressions such that it is easier to evaluate them with a stack. It is to be contrasted against the more common infix notation.

In infix notation, operands are placed on both sides of an operator, with the operator sandwiched between them. (e.g. 2+2) Whereas, in RPN, the operands are written first and the operator is placed at the end. (e.g. 2 2 +)

Converting from Infix notation to RPN with a parse tree

To convert an expression, such as (2+x)^(3-y), to RPN, it can be useful to make a parse tree.

A parse tree for the expression (2+x)^(3-y)

Evaluating RPN using a stack

Evaluating the RPN expression 3 10 5 + * with a stack

When evaluating an RPN expression with a stack, we need to follow a few rules:

When we encounter a variable or number, we push its value onto the stack.
When we encounter an operator, we pop the top values from the stack then apply the operation to them.
When the evaluation is complete, there should be one item left in the stack, which is the answer.