Introduction to CSc 310

 

 

What is assembly language?

 

Most of you have been writing programs in a high-level language (or HLL), such as C++ or Java. A program written in an HLL is compiled into a binary executable (a.out  by default on Unix systems) before it can be executed by the computer system. The binary executable file is in machine language, very similar to assembly language. Machine language is the simple low-level language directly “understood” by the computer hardware. Assembly language is almost the same as machine language, but it is encoded in ASCII strings to make it easier for human programmers to work with. (You can obviously program directly in machine language as well, but it won’t be any fun at all.)

 

We think of HLL programs as being made up of statements. Assembly language programs are made up of instructions, which are much simpler than typical HLL statements. Hence, a typical HLL statement is compiled/translated into many assembly language instructions.

 

HLL programs are relatively portable; if you developed your C++ or Java source code on (for example) a Unix machine,  it’s not too difficult for you to move the source files to (for example) your Windows PC, and with minimal changes, compile and run it.

 

Assembly language or machine language is specific to a family of computer systems, called a computer architecture. For example, libra.sfsu.edu is a Sun Sparc system; it has a different architecture from your home PC, which may have (say) an Intel Pentium III CPU. The assembly language program you develop for a Sun Sparc, in Sparc assembly language, is very difficult to port to a PC.

 

Usually in software development, we directly compile our HLL programs to binary  executables in machine language. It is also possible to first compile an HLL program to assembly language, and then compile the assembly language program to machine language. An assembler  translates from assembly language to machine language; most compilers contain assemblers.

 

Say you have a C++ program source file, bye.cc. To generate the assembly language source file, on Solaris systems, add the –S flag:

 

            g++ -S bye.c

 

Instead of generating an a.out file, a bye.s  file is generated. You view this file in a text editor; it is in Sparc assembly language. (More on that later.)


Why do we study assembly language?

 

Assembly language is lower level than HLLs; hence it’s more difficult to work with. Assembly language programs are not portable. Most software development today is done in HLLs. So why do computer science departments in most major universities still teach some assembly language?

 

Obviously, there are many aspects of system software that require knowledge of assembly language. If you are writing a compiler, or writing device drivers or other software that talks directly to hardware, you have to know assembly language. Also, a significant amount of software development occurs on systems where the hardware is pushed to its limit in terms of performance; think high-resolution graphics and animation for video games, high-bandwidth real-time multimedia applications, and so on. In these systems, crucial parts of the code are usually handtuned assembly routines to maximize performance. There are still many niches for assembly language programming.

 

Just as importantly, assembly language is our key to studying computer systems hardware. Good background in hardware is very useful to even applications programmers. Think about this analogy: most car owners only know the basics of how to operate a car, but a race car driver knows a lot about how a car works, and can get the maximum performance out of a race car. Similarly, a good computer scientist needs to have decent hardware background to be able to write the most efficient code, and understand how software interacts with hardware.

 

MIPS assembly language

 

As we discussed earlier, assembly language is specific to a computer system architecture. The assembly language we will use in this class is based on a CPU called MIPS. MIPS is generally considered one of the cleanest assembly language instruction sets ever designed. It was invented at Stanford University,  and incorporated into CPUs built by a company called MIPS. The MIPS CPU was used in Silicon Graphics workstations, which was used in  a lot of high-end Hollywood animation films such as Jurassic Park. Today, MIPS is used mainly in embedded (non-PC) systems, such as the Sony Playstation 2 and the Nintendo64.

 

Since we use the Sun Sparc system libra for this class, you may ask why we are not using Sparc assembly language instead. Sparc assembly language actually has some odd quirks which can be difficult to work with (like register windows; ask me if you want more details). Instead, we use SPIM, a MIPS simulator (a program that simulates or pretends to be a MIPS CPU) on libra, and run our MIPS programs on top of it.

 

RISC and CISC

 

MIPS is a member of a family of architectures called RISC architectures. RISC stands for Reduced Instruction Set Computer; it represents an approach to designing assembly language instruction sets that are good interfaces between compilers and hardware. Most commercial CPUs today are RISC architectures:  Sun’s Sparc, Apple’s PowerPC (used in Macintoshes), Compaq’s Alpha (used in Compaq’s servers), and Intel’s embedded processor Xscale. Even Intel’s IA64 systems such as Itanium and McKinley, can be considered extensions to RISC architectures.

 

The older (pre-RISC) systems were usually CISCs (Complex Instruction Set Computers). In fact , just about the only commercial CPUs that are CISCs today are the old-style PCs! PC assembly language (called x86 assembly language) is rather ugly and difficult to work with; it is no longer a good match with today’s hardware technology. In fact, from Intel’s Pentium Pro on, the CISC-style x86 instructions are actually translated by hardware into simple RISC-type operations before being executed. This internal translation was necessary to achieve higher performance. (For more details, see CSc 656.)

 

Looking ahead

 

In the next lecture, we will start with a quick review of integer representations and arithmetic. Please take a quick look through Chapter 1 of your lecture notes. See you all soon...

 

 

Bill Hsu