Assembly Tutorial
#1
Assembly Tutorial


Chapter 1: Introduction

This tutorial will teach you how to read/write basic Power PC Assembly (computer programming language) for the purpose of making cheat codes for Wii games. This tutorial is a supplementation to my thread - 'How to Make your own Cheat Codes', which can be read HERE

The Power PC Assembly Language is the Assembly Language for the Wii's CPU. Broadway is the name of the CPU.


Chapter 2: Registers

What are Registers? They are a set of data holding places within the CPU. These data holding places are what Code Creators utilize to make their ASM Codes. There are all types of Registers. First thing's first. There are 32 normal integer registers. These registers are referred to as the General Purpose Registers (GPR for short).

There are also 32 Floating Point Registers (FPR for short). They obviously use floating point values instead of normal integer values. The Count Register (CTR) is used to help make loops and the Link Register (LR) holds the address that is used to navigate to/from a subroutine.

Most Wii codes only use the GPR's, thus these Registers are the only ones that will be discussed in further detail for this beginner's Assembly tutorial.

Data within Registers:

Each GPR holds a 32 bit length of data. For the Dolphin Emulator, every register is displayed in Hexadecimal form and every register has their entire length of data shown. So the GPR's have their entire 32 bit length of data shown. 32 bits of data is referred to as a 'word'. Let's say one of the General Purpose Registers has the following value...

80EFCAB0

The 'word' of the register would be the entire 32 bits of data shown above. 16 bits of data is referred to as a 'halfword'. The 'upper 16 bits' of a register is the first half of the word value, while the 'lower 16 bits' is the second half of the word value.

80EF = Upper 16 bits
CAB0 = Lower 16 bits

8 bits of data is referred to as a 'byte'. So for the example above, if I was to say "the 3rd byte of the register",  then I am talking about the 'CA' byte value within 80EFCAB0.


Chapter 3: Compiler Basics

The Dolphin Emulator will emulate Broadway to execute the game's ASM instructions. An instruction will modify the register(s), the game's memory, or both. For your ASM cheat codes, you will have a compiler to write instructions in (The CodeWrite program mentioned in the 'How to Make your own Cheat Codes' thread).

Characters/symbol set:

When you write out ASM instructions in the compiler, various symbols are required for proper formatting. This will allow the compiler to interpret your ASM instructions and compile them correctly into a finished cheat code.

List of symbols:
. (period)
: (colon)
, (comma)
() (parenthesis)
+ (plus)
- (minus)
_ (underscore)
# (hash tag)
x (not multiply, this is for writing Hex values)

Hex vs Decimal:

Compiled cheat codes are shown in Hex byte code. It's common sense you need to know Hex beforehand. There are plenty of quick simple tutorials you can find via Google if you need to learn the basics of Hex. For writing ASM instructions, there are certain elements of an instruction that you can write in Hex. However, the downside is all known PowerPC compilers will decompile a cheat code using decimal representation. If you are not sure what to use, then I recommend using decimal for byte data and using Hex for all other data.

When you write Hex values in the compiler, you must pre-pend those values with '0x'. As an fyi, Dolphin displays all Register values in Hex but they are NOT pre-pended with '0x', as it's already assumed the user knows those values are displayed in Hex form.


Chapter 4: Format for Writing ASM Instructions

Any General Purpose Register is written as rX. X = the register's number. The register number is in decimal form. The first register is Register 0, aka r0. The last is Register 31, aka r31. Fyi: Dolphin may display r1 as sp, and r2 as rtoc.

In every instruction, there is a Destination Register. In most instructions, the Destination Register is the Register that holds the result of an executed instruction, while the Source Register is the Register that is used to compute the result for the Destination Register. Some instructions will have one source register, while others will have two. Every instruction can only have one Destination Register.

Here is a basic example of an instruction with two source registers~

rD, rA, rB

rD = Destination Register
rA = 1st Source Register
rB = 2nd Source Register

Keep in mind this is not an actual instruction, or an exact correct format. This is just to show you a very very general view of any instruction that uses two source registers to compute a value for the destination register. Now let's look at an example of an instruction with just one source register..

rD, rA, VALUE

rD = Destination Register
rA = Source Register
VALUE = Immediate Value

Immediate Value is a decimal or hex value that CANNOT exceed 16 bits and it is NOT a register that contains a value. The use of Immediate Values allows Broadway to have instructions that can provide more flexibility with less register usage.

Before continuing further it's critical that you understand signed vs logical (unsigned) values.

What is signed & logical?
Values within a particular instruction (regardless of whether they are register values or immediate values) will either be signed or logical. Logical values are NEVER negative, while signed values CAN be negative.

The range of numbers in an entire register is 0x00000000 thru 0xFFFFFFFF. If an instruction treats its values as signed, then values 0x00000000 thru 0x7FFFFFFF are positive, while values 0x80000000 thru 0xFFFFFFFF are negative. If an instruction treats its values logically, then all values are positive.

The value of 0xFFFFFFFF in a register (if signed) is -1, and a value of 0xFFFFFFFE would be -2. All the way til 0x80000000 which is the lowest signed negative 32-bit number.

Since Immediate Values are 16-bits in size instead of 32-bits, their range of numbers will differ.

Signed 16 bit Immediate Value Range:
Negative values: 0xFFFFFFFF thru 0xFFFF8000 (decimal form -1 thru -32768)
Positive values: 0x0000 thru 0x7FFF (decimal form 0 thru 32767)

Logical 16 bit Immediate Value Range:
0x0000 thru 0xFFFF (decimal form 0 thru 65535)

You will notice right away that negative Immediate Values are not 16-bit in size in the range shown above. This 'trick' allows Broadway to have negative 16-bit values displayed in a 32-bit register. When writing these Immediate Values in the compiler, you must follow the ranges shown above or else a compiling error will occur. Keep in mind you can write the Immediate Values in decimal form if desired.


Chapter 5: Integer ASM Instructions

At this point you should have a well understanding of the...
Registers
Symbols that can be used in ASM instructions
General Format/Layout of ASM instructions


Let's go over actual real world instructions that a person would use to make codes. Here is one of the most basic ASM instructions....

Add (adds two source registers to compute the value of the destination register)

add rD, rA, rB

The value of rA is added with the value of rB. rD will hold the result of the two values added together. Values are treated as signed. Whatever value was in rD beforehand gets erased and replaced with the new value after the instruction has executed.

Let's say we add the values of r4, and r25. The result of this value will be stored in r20. Our 'add' instruction would be this...

add r20, r4, r25

For a majority of instructions that use two source registers, you can swap them. So you can also write this as...

add r20, r25, r4

Imagine this as a basic math equation of 2 + 3 = 5. It doesn't matter if you swap the positions of 2 and 3, the result is always 5. You obviously can't change the spot where the destination register is within the instruction. Keep in mind certain instructions won't allow the swapping of source registers. Let's move onto another basic ASM instruction...

Add Immediate

addi r4, r30, 12

Notice the number 12. It doesn't have the letter 'r' before it. So we know 12 represents a 16 bit value instead of a source register. Values in this instruction are signed. This instruction adds together the value of r30 and the value of 12. The result will be stored in r4. For the addi instruction, you CANNOT swap the positions of 12 and r30! If you wanted to write this same instruction in Hex form, it would be like this..

addi r4, r30, 0xC

The '0x' must be put before any hex value, or the compiler will compile it as decimal or not compile it at all (throw an error). You can of course throw a minus (-) before your signed value to designate a negative number. So if we did.....

addi r4, r30, -12

This would be adding the value of r30 and negative 12. Thus we are actually subtracting 12 from the value in r30. For simplicity, you can use what are called simplified mnemonics. A simplified mnemonic is a 'shortcut'/'simplified' version of an ASM instruction.

The simplified mnemonic for addi r4, r30, -12 is...

subi r4, r30, 12

Subi stands for Subtract Immediately. Let's go over the most common simplified mnemonic of all....

Load Immediate

li r6, 0xFF

Values in this instruction are signed. As you can see there are no source registers in this simplified mnemonic. It is a shortcut for the add instruction for addi r6, 0, 0xFF. You will notice the 0 in the middle doesn't have an r in front of it...

Special note about r0:
In certain ASM instructions (such as addi), if r0 is used as the first source register, then it is treated by the compiler as the number 0.

The 'li' instruction simply sets a register to the designated immediate value. Which is 0xFF in our case. Therefore, after that instruction is executed, register 1 now has the value of 0x000000FF. 

Example of li to load a register with a negative 16-bit signed immediate value~
li r7, 0xFFFFFFFC

This will set r7 to 0xFFFFFFFC. You can also write this as...

li r7, -4

Add Instruction using a register for both a Source Register and the Destination Register:

add r4, r4, r30

In the above instruction, the value of r4 (before execution of the instruction) plus the value of r30 will then be placed in the value of r4 once the instruction has executed. Thus after the instruction has executed whatever old value was in r4 is now replaced by the new value.

Writing multiple instructions in the compiler:
Before we continue further here's a quick example of writing multiple instructions.

add r4, r4, r30
li r31, 1
addi r12, r31, 0xA

Each instruction takes up one 'line/row' in the compiler. You cannot put multiple instructions on one line. Once you have typed out an instruction, you must enter into a new line to write your next instruction. Obviously, this shows instructions start from the top and are executed in downward order. There can be exceptions for things such as loops & branches. However, generally speaking, instructions are executed in downward order.


Chapter 6: Store, Load ASM Instructions

This chapter will demonstrate how to take register values and write them to memory, or take values from memory and write them to the registers. Let's take a look at one of the most basic store-type (write a register's value to memory) instructions...

Store Word

stw rD, VALUE (rA)

Note: Values in this instruction are signed.

This instruction will copy the word (entire value) of rD and write it to a memory location that is referenced by the value in rA + VALUE. VALUE is a 16 bit signed hex/decimal number. With any store instruction, both rD & rA will not lose their data.

stw r3, 0x0020 (r28)

The word of r3 will be stored at the memory location (address) that is the value in r28 + 0x0020. The 0x0020 value is usually referred to as the term 'offset'.

Please also note that the memory location of rA + VALUE is usually referred to as the Effective Address.

So let's say our value in r3 is 0x0000200A, and r28 is 0x80001500. First add the offset value to 0x80001500. We now have the effective address of 0x80001520. Let's say before the ASM instruction, the word at memory location 0x80001520 was 0xFFEF1023. After the instruction is executed, the word at memory location 0x80001520 is now 0x0000200A.

There are also sth (Store Halfword) and stb (Store Byte) instructions. The sth instruction will store the lower 16 bits of a register to memory, while the stb instruction will store the 4th byte (far right) byte of a register to memory.

Load Word & Zero

lwz rD, VALUE (rA)

Note: Values in this instruction are signed.

This is simply the 'reverse' of stw. The word at memory location rA+VALUE will be copied into rD. Whatever was in rD beforehand is now completely erased.

lwz r31, 0 (r15)

For this lwz instruction, the offset is 0 (no offset). Therefore, nothing is added to r15 so the effective address is simply r15's value. Let's say r15 is 0x806553E4, and the word at that address in memory is 0x00000001. After the lwz instruction is executed, r31 is now 0x00000001. The previous data in r31 is erased. The word at the memory address 0x806553E4 is not erased, the data remains intact.

There are also lhz (Load Halfword & Zero) and lbz (Load Byte & Zero) instructions. The lhz instruction loads a halfword from memory into a register. Whatever value was in the register beforehand gets erased. This means every time a lhz instruction gets executed, the rD for instruction will always result with a value of 0x0000XXXX (XXXX being the halfword value that was loaded from memory).

The lbz instruction loads a byte from memory into a register. Whatever value was in the register beforehand gets erased. Thus, every time a lbz instruction gets executed, the rD for the instruction will always result with a value of 0x000000XX (XX being the byte value that was loaded from memory)


Chapter 7: Writing an Entire Word Value to a Register from Scratch

You are probably wondering at this point how to write a whole word value from scratch to a Register. This is useful for establishing memory locations to later use for store-type and load-type ASM instructions. So let's say we want to write the value of 0x80E6FF30 to Register 22, how do we do this? Simple, with just two ASM instructions like this...

First we write the upper 16 bits. For example:

Load Immediate Shifted

lis r22, 0x80E6

NOTE: Values in this instruction are Signed.

Lis is an odd instruction. You will notice that even though the lis instruction treats its values as signed, the example above is using a 16-bit Immediate Value in the Logical Range. Without needing to get into endless technical detail which as a beginner you won't understand, this is basically due to how all PPC ASM Compilers are setup. Don't worry as this 'breaking of the rules' only applies to the lis and addis instructions (lis is a simplified mnemonic of addis).

So just to recap, whenever you are writing lis instructions in your compiler. You can use logical number ranges, but remember lis treats its values as signed.

Load Immediate Shifted (lis) is similar to the Load Immediate (li) instruction but you are setting the upper 16 bits of a register instead of the lower 16 bits. Whenever any lis instruction is executed the lower 16 bits are always CLEARED (set to 0000)!

So at this point, r22 has a value of 0x80E60000. To write in the lower 16 bits without effecting the upper 16 bits, we do this with an instruction called Or Immediate.

Or Immediate

ori r22, r22, 0xFF30

NOTE: Values in this instruction are Logical

Now r22 will have our desired value of 0x80E6FF30. Simple to do! If you are wondering what exactly happens with the Or Immediate instruction and you are not familiar with Logical Operations (And, Or, Xor), I wouldn't concern yourself with it for now. Just remember to use lis and ori instructions if you need to set an entire word value into a register from scratch.


Chapter 8: Branch, Compare ASM Instructions

Branch instructions are used as 'jumps' to skip over certain other instructions. Let's take a look at the most simple branch instruction...

Branch

b 0x8
li r3, 1
stw r3, 0 (r31)

The letter b is used for what is known as an unconditional branch. Unconditional meaning the branch is executed no matter what the conditions are. Think of it like a jump. The branch will skip/jump over a certain amount of instructions below, thus not executing said instructions. In the provided example, the 'li r3, 1' instruction would be skipped. Now, the '0x8' next to branch is the amount to 'jump/skip'. This 'jumping' value is signed by the way, meaning you can have branches that jump backwards. Since each instruction is 4 bytes in compiled length, a jump of 0x4 would be pointless as this would simply just go down to the next instruction below. Obviously, the larger the jump, the harder it would be to correctly calculate the amount to write for the branch instruction. Therefore, we use a trick called 'labels'.

Labels are just that, they are labels.  Wink

To allow the compiler to know you are using labels, you designate labels with two symbols. The underscore symbol and the colon symbol. To first establish a branch label name, you must implement an underscore somewhere in the name. Like this...

b the_label

You can name labels whatever you want as long as you do NOT use special characters like percent signs or dollar signs. You can implement the underscore symbol if you want like the example provided. Okay, you have set the label name, now all you need to do is put that same label name right before the first instruction that you want executed after the jump has occurred. Put in the label name and add a colon afterwards like this...

b the_label
li r3, 1

the_label:
stw r8, 0 (r31)

Now the branch instruction in the provided example above would be useless. Why would you randomly skip over ASM instructions? Well branches are needed if you wanted to create a subroutine. Think of your list instructions like a road. When the game is preforming the list of instructions one after another, think of that like traffic driving on the road. However, you can now put a fork in the road, and tell the traffic which way route to take. The two routes will then later merge back together.

Let's dive into Conditional Branches. We need a create that 'fork' in the road. The easiest method to create that fork is conditional branching. Conditional branches are branches that only execute base on an 'if'. For example let's look at the 'branch if not equal' instruction...

Branch If Not Equal

bne the_label

li r8, 1

the_label:
stw r8, 0 (r31)

the_label will only be 'jumped to' if the conditional branch is true. In order to set up this 'if' for a conditional branch, we need to make a comparison. The most command instruction to establish a comparison is Compare Word Immediate.

Compare Word Immediate

cmpwi rD, VALUE

NOTE: Values in this instruction are signed

Value in rD is compared to VALUE as signed values.

cmpwi r10, 0xA

The signed value in r10 will be compared to the signed value of 0xA. We have thus created our 'if statement'. So now add in the rest of the instructions from earlier....

cmpwi r10, 0xA
bne the_label

li r8, 1

the_label:
stw r8, 0 (r31)

The value in r10 is compared to the value of 0xA. Then, if the value in r10 is NOT equal to 0xA, you will 'jump' to the_label, thus skipping the 'li r8, 1' ASM instruction. Let's look at another example using a different conditional branch...

Branch If Equal

cmpwi r10, 0xA
beq the_label

li r8, 1

b the_end

the_label:
stw r8, 0 (r31)

the_end:
stw r3, 0x0010 (r24)

As you can see not only are we using 'beq' now, we are adding an unconditional branch and a second label called the_end. You should quickly see why I added the unconditional branch. Remember the road analogy I used earlier... Let's follow the first route of the fork in the road (if r10 does equal 0xA)

If r10 equals 0xA, we jump to the_label. We then execute the first 'stw' instruction.... Now remember the traffic/road analogy, we now go right to the next ASM instruction below, the second 'stw' instruction. The label name itself is NOT a barrier in our 'road' in any way shape or form. The labels are just label names to calculate the branch offsets for the compiler so you don't have to do the work.

Now, let's instead take the second route of the fork in the road. If r10 is NOT equal to A, we do NOT jump to the_label. We instead go straight down our road to the 'li' instruction. After that, we encounter our unconditional branch. This obviously means we take the branch/jump no matter what. We do this because why would we go to the_label when our r10 value was NOT equal to 0xA? That would make no sense. Therefore, we jump to the_end, thus skipping the first 'stw' instruction.

Here is a list of commonly used conditional branch instructions.
beq - Branch If Equal
bne - Branch If Not Equal
bgt - Branch If Greater Than
blt - Branch If Less Than
bge - Branch If Greater Than Or Equal To
ble - Branch If Less Than Or Equal To

Let's go over another compare instruction really quick... 

Compare Word

cmpw rD, rA

NOTE: Values in this instruction are signed.

This will simply compare the signed values of two registers.

cmpw r4, r8
bgt the_label

In this example, if the value in r4 is greater than the value in r8, then the jump to the_label will be taken.


Chapter 9: Illustration

Here's a picture I made to give you a visual guide of what ASM instructions do.... http://mkwii.com/pics/other/ASMpic.png


Chapter 10: Extra Stuff

Let's go over some more symbols that we haven't covered yet.

Period (.):

You can use the period to establish a value to have it's own unique label name. Btw, this has nothing to do with branch labels. Think of these like making definitions, or having 'macros'. The period is followed by the word 'set'. For example:

.set ITEM_MUSHROOM, 0x4

...some ASM here....

li r31, ITEM_MUSHROOM

This now allows the ASM writer to put ITEM_MUSHROOM for any time we wants to use the value of 0x4. Very basic 'macro' per say. Can come in handy if you are writing lengthy ASM.

Plus & Minus (+ and -):

The plus and minus symbols are used for conditional branches. Whenever a branch is done, you can help Broadway by supplying a 'hint'. The plus symbol stands for more-likely, while the minus symbol stands for less-likely. For example....

cmpwi r8, 0xC
bne+ the_label

The plus symbol next to the 'bne' will tell Broadway that the branch is more-likely to occur.

Hash Tag (#):

Whenever someone is writing very lengthy ASM, it can be handy to add notes that will let that someone know why he/she wrote those instructions. Here's an example of using hash tags to add notes/comments:

#Start assembly source

lis r4, 0x8000 #Set 1st half address to the store word to
stw r30, 0x157C (r4) #Store word to memory location 0x8000157C, the offset amount is used to complete 2nd half of address

#End assembly source


Chapter 11: Conclusion & Credits

Alright, this should help get you started writing PowerPC ASM for your cheat codes. For more instruction examples, visit this thread HERE.

Credits:
IBM, Apple, and Motorola (creators of PowerPC ASM)
WiiBrew (a lot of information was gathered from there)
Star (taught me ASM)
Reply
#2
blyatful
Reply
#3
Wow, this guide is actually quite comprehensive and well-written. It helped me a lot in getting started on ASM and led me into learning Hex (which I didn't know beforehand, although it's actually pretty simple). Kudos
Reply
#4
Thank you for the kind words. Assembly by itself isn't that tough to learn, it's just very difficult coming up with code ideas from scratch and applying your ASM knowledge into making an actual cheat code.

I would suggest going through the codes forum and looking at the Source of basic ASM codes. Ones either written by me or Star. We put plenty of good comments in our Source to help others understand how the code(s) work.

EDIT:

Here is essentially the most basic ASM you can do. Writing a value in a register before that value gets stored.

http://mkwii.com/showthread.php?tid=848

I was looking at a value in the RAM Viewer. I noticed it would get written to whenever I did a certain action with my item. Therefore i set a Write BP. I used my item, the value in memory gets written a new value, the Write BP gets set and the game pauses. I see in the Code view that the value in Register 31 is getting stored to a spot in memory. 

This is easy to manipulate. As you can see in the Source, I simply load in a custom value in Register 31 (replacing the legit value), and then including the game's default ASM to allow the game to store the new value to memory. Very simple.
Reply
#5
deleted
Reply
#6
(06-18-2019, 12:52 PM)Cameron_MKW Wrote: Quick question, with signed values why does negative 1 display as 0xFFFFFFFF on Dolphin?

If a value is 0xFFFFFFFF, it CAN be -1 or 4294967295.

If the value is signed (not logical, which most values are signed btw), this will represent -1 in decimal form.

If the value is not signed (logical), it is 4294967295 in decimal form.

Another example:
Let's say a register has the value of 0xFFFFFFFE. If it's signed the value (in decimal) is -2

If you're working with an instruction that is using a 16 bit signed value (such as Load Immediate), then 0xFFFF8000 is the largest negative number that can be written. When 16 bit signed values are used, they are 'sign extended' basically meaning the upper 16 bits (left hand side) of said register will automatically be set 0xFFFF

li r5, -2
li r5, -0x0002
li r5, 0xFFFFFFFE


All the above instructions are the same thing.

--

If this is still confusing, the knowledge of signed vs logical isn't really needed til you start working with complicated comparison-type instructions. (like blt/bgt on a logical value)
Reply
#7
deleted
Reply
#8
(06-19-2019, 08:34 AM)Cameron_MKW Wrote: Oh OK I think I get it now. Thank you

Forgot to mention this...

For signed values on a register... 0x00000000 thru 0x7FFFFFFF is positive, and 0x80000000 thru 0xFFFFFFFF is negative.
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)