Memory - Part 1

If we define memory as a place where data is stored, there are many different levels of memory. Processor registers, different levels of cache, main memory, disk memory - which may include several levels of cache. As programmers however we tend to think of memory as that array of word sized (or byte sized) elements, where our programs and data can be accessed directly by the processor. As we will see shortly this is not the only way we think of memory, and it may not be the most useful.

Flat memory

So we will start by considering memory as an array from 0 to MAXMEM of bytes. All of our programs and the data we are working with must fit somewhere into this array. Our programs will generate an address for every memory reference, either for the next instruction to execute or for some data to execute on.

Split memory space

It is almost universal that part of the memory is reserved for the operating system, and the rest is available to our programs. The operating system section is either at the bottom or the top of memory, which depends on the hardware and where interrupts are sent to be handled.

Multiple processes

We don't have anything much to say about flat memory if we only have one process running at a time. One thing we can say is that it would be nice if the running process couldn't knock the system over or corrupt it inadvertently. This can be achieved with the addition of a fence register holding the base address of the process. When the process is running in user mode or problem state all memory accesses are checked against this register to see if they are to a legitimate area of memory. This type of checking has to be done by hardware if it is going to be efficient.

Normally however we are concerned about having several processes in memory at once in a multiprogramming environment.

Partitions

The simplest way to allow this is to give any program a fixed place to run in memory. We divide memory up into fixed sized partitions and the compiler (or linker) allocates a partition for this program and hard-codes the correct addresses into the program for this partition. It is important to see where the actual addresses are being bound to our program. The linker does the work, which means the addresses are stored with the program.

Then when we come to run the program we wait until the correct partition is clear and can then load and run the program.

Remember that we are still talking about a single flat address space. Every memory access is to a unique address in the range 0..MAXMEM.

Because each program had to run in a particular partition, the operating system maintained separate queues for each partition. As a partition became free, another program from the corresponding queue could be loaded and executed.

The major problem here was that it was easy for memory to be wasted. Imagine a whole collection of programs which must all run in one particular partition. Even if there are 5 or 6 other partitions which are all unused, only one program can be loaded and running at a time.

Another area of memory wastage with this system is that it is unlikely for a program to completely use its partition. Using the simplest implementation this means that quite a bit of memory is unused in each partition. This is another example of fragmenation. We first saw fragmentation when looking at storing files. Fragmentation is inevitable in almost all forms of memory management. This type of fragmentation within a structure (a partition in this case) is known as internal fragmentation.

Some of this wasted memory can be used by allowing more than one program to execute within a partition.

Relocatable loading

This lead to relocating loaders, whereby the same program could be loaded into different absolute memory positions (and hence partitions). Of course with a relocating loader we don't have to limit ourselves to preset partitions. Instead we can load programs whenever we have enough contiguous memory space for them.

In this case the addresses are bound to the program as it is loaded into memory.

Placement algorithms

This still leaves us with a problem. Where do we put programs? There may be more than one place in memory which will take a particular program. Different methods have been tried. These same methods will surface again later in this section when we look at segment allocation methods, and so we will leave discussion of these techniques until then.

Swapping

Frequently a process currently in memory isn't able to continue immediately. In this case it makes more sense to allocate its memory space to another process. The stopped process is swapped out of main memory into a temporary storage area, known as swapping space.

Early time-sharing multiprogramming systems did all of their multiprogramming by swapping the one address space. They were rather slow and couldn't handle many processes at a time.

Base & limit registers

With multiple processes running, we need to ensure that one process doesn't interfere with another. The first realistic solution to this problem was an extension of the fence register we saw earlier. We give each process a base register (sometimes called a "relocation" register), below which it is not allowed to access memory, and either a register holding the highest address it can access or a limit register with the length of the process.

The hardware must compare every memory address to see that it is within the legal range.

We can now position programs anywhere within the available memory space, but more than this. We can now move programs around in memory. Since the memory addresses which the program uses and the "real" memory addresses are not the same we have added flexibility. Of course to move a program it must not be running while it is being moved. When it is in its new position, the base register just needs to be changed and the program can continue running.

This moving around is time consuming. And considering that a program may need to request more memory while it is running another approach is needed.

Overlays

Even with variable partitions it is obvious that sometimes there will be enough free memory about the system to satisfy a program but not in a contiguous chunk (necessitating a move).

Other times we will want to run programs which are too big for any chunk of memory e.g. when we want to run a program larger than all user allocated memory. In this case programmer's used overlays. An overlay is a section of program which is not needed all the time and is only brought in to memory when it is needed, being taken out of memory when it is needed no longer. In this way programs can be almost as long as we like, as long as the biggest needed overlay can fit in memory. The program requests a chunk of memory large enough for the biggest overlay and surrounding code and data.

With an overlay system there must always be a part of the program which stays in memory all the time. Something has to control the movement of the overlays and hold the data which is referred to throughout the program.

Overlay systems leave the difficult bits to the programmer. The programmer has to decide which sections of program will never need to be in memory at the same time. The programmer has to provide this sort of information to the compiler which generates the code to move the overlays about.

Apart from the difficulty of determining which sections of code can be included in an overlay there is a large problem with maintaining such code. Any changes can have multiple effects.

Locality of reference

We will pause for a moment to consider why overlays work. It turns out that in almost all programs if we look at their memory access over a short period of time (a window) we see that only a small amount of the program is being used. Each memory access is very probably going to be near another recent memory reference. This is known as the principle of locality of reference and it shouldn't be too surprising really.

Programs do not reference memory with a random distribution. It seems reasonably intuitive that when we are executing an instruction the next instruction we will want to execute will be the next one in memory. Even most branches are to somewhere reasonably close by, in loops for example.

Procedure calls ruin this a little, but only a little. When we return from a procedure we go back nearby to where we were a short time ago. And of course while we were handling the procedure we stayed pretty close to the same place as well.

If I refer to this memory address, it is highly likely I will refer to other memory addresses close by. Many studies have been done of programs and almost all programs exhibit a high degree of locality of reference.

The fact that data exhibits a high degree of locality of reference may come as more of a surprise. But studies have shown this too. Imagine dealing with large arrays, if we refer to a particular element in the array, it is likely that we will want to refer to its neighbours as well. Of course different algorithms and different ordering of our arrays - row ordered when we want to refer to them via columns can upset this in some circumstances.

We will remember this principle and come back to it when we don't have enough memory for all of programs to be in memory at once.

Pages and frames

Going back to the problem of dealing with shovelling programs around in memory in order to make a large enough section of contiguous memory for a new program (or an old one that wants more memory).

This was the motivation behind dividing memory up into chunks which we can fool the process into seeing as contiguous. In principle the memory system divides our processes up into chunks and scatters these fixed sized chunks throughout memory. We no longer have any external fragmentation problems because all of these chunks are exactly the same size. Processes just request how many chunks they need to load and run. As long as the total number of chunks doesn't exceed the number of available chunks the process can be loaded and run.

Rather than call these chunks of memory chunks, they are known as pages and frames.

Each program is divided up into a number of pages of memory.

These pages will be mapped into equivalent frames of real memory. These frames are not necessarily contiguous, they can be anywhere.

Process point of view

From the point of view of the process absolutely nothing has happened. Its memory is still a contiguous block from zero to whatever. It just seems to be able to run more often than it could under the same amount of memory on a system which doesn't use pages. (Since it is not being stopped and moved around all the time.)

(Actually many processes really want two or three contiguous chunks of memory - one for the code, initialised data and space for dynamically allocated memory and one for the activation records with local variables etc. These can start at either end of the address space and grow towards each other. We commonly want to have these two types of growable areas of memory.)

Adding memory to the process is easy too. If the process requests some more memory these new pages can be tacked on the end of the processes address space. (Or in the middle as described just above.)

System point of view

However, something has to make this magic work. In order for the process to see a contiguous address space from zero up there must be a translation process which takes the address the process is referring to and translates it to the corresponding address in the systems memory.

We refer to the address coming out of the process (and hence the processor) as the logical address, the memory management hardware (MMU - memory management unit) has to convert this logical address into a physical address in one of the frames of real memory.

With the base and limit registers we saw how the program could address memory from zero up whereas the real memory base address was the value of the base register. Exactly the same approach can be taken here. Except that in this case we need a value like the base address register value for every page the process uses.

Page tables

This information is kept in the page table. Page tables can be system wide or per process. The information in the page table should allow the hardware to convert a logical address from the process into a real memory address.

Commonly the topmost bits of the logical address are used to index into the page table to find out where the real frame, which the page maps to, exists in memory. This value is then concatenated with the bottom bits of the virtual address to give the physical address.

Speed of access

There are problems with page tables. First of all if we are storing all the important information from them in memory we are effectively halving the rate at which we make real memory accesses. We must access the real memory for information from the table and then access the real memory for the process.

One way around this is to have associative store which can generate the real memory frame number immediately. This is expensive, even if we only load the associative store with frame numbers for the currently executing process. Instead we frequently cache recently referenced frame numbers in such a store using the page tables in memory when we need to. This associative store is sometimes referred to as the TLB (translation lookaside buffer, a horrible name originating from IBM).

The textbook has a reasonably realistic calculation (things work faster than this now) which shows that if we can get a hit rate on the TLB of 98% (Intel claim this is common for their systems) then the memory access time is only 22% worse than it would be on an un-paged system. We have to be a little careful here, because we aren't talking about virtual memory yet. We are presuming that every page of every process is stored somewhere in the physical memory.

The size of the page tables is also a problem. We will look again at this when we discuss virtual memory.

Fragmentation

There is no free lunch. Now that we have made a lot of gains there have to be some minuses. The external fragmentation we had between the memory used by different processes has gone. Memory is now divided up into these perfectly aligned pages. Of course this doesn't force our processes into requiring exactly an even number of pages of memory. If our processes get memory in one contiguous block then there will be an average of 1/2 a wasted page per process. If (as is likely) our process gets memory as two separate contiguous blocks which are conceptually growing towards each other we will be wasting a block of memory per process. This wastage within pages is known as internal fragmentation.

How big should a page be?

To minimise the internal fragmentation we can make our pages smaller. In fact if we make our pages byte-sized we don't have any internal fragmentation at all. Of course this is ludicrous…

Back to the lecture index

On to the next lecture