Skip to content

Category: javascript

Writing a NES emulator in Javascript: Part 5 – Debugging the CPU

I’ll be honest with you, I’m really not used to working at this low a level. Bitwise math hasn’t been my strength, so it has been important to ensure I have some sort of reference to work from. The popular choice when working with NES emulators is the so-called “golden log” produced by Nintendulator, a particularly accurate NES emulator. By reading through the log and writing a similar output system for my own CPU, it’s easily possible to hand-check the accuracy of the opcodes. After doing this for about a week I found it pretty tedious however, and decided I could speed things up a bit

Automated Testing

A simple solution seemed to be to parse the nestest log and turn it into a javascript-friendly object, and then run the CPU instruction-by-instruction and check the reference. I opted to just check A, X, Y, P and SP, as I still really am not sure how the cycle count works (more on that in another article). First step was to write a simple class that read/parses the nestest log:

class NEStest {
  constructor() {
    this.code = [];
    var fs = require("fs");
    var nt = fs
      .readFileSync("nestest.log", "utf-8")
      .split("\n")
      .filter(x => x != "");
    nt.forEach(line => {
      let segments = line.split(/[ ]{2,}/);
      let line_hash = { flags: { PC: segments[0] } };
      let flags = segments[3].split(/ /);

      flags.forEach(flag_str => {
        var parts = flag_str.split(":");
        line_hash.flags[parts[0]] = parts[1];
      });

      this.code[parseInt(segments[0], 16)] = line_hash;
    });
  }
}

I then added a test() function to the emulator class to be an alternative to boot(). In order to make this code work, it was necessary to modify the CPU.execute() function to take an optional argument that specified how many instructions to execute. Allowing a step-by-step CPU execution enabled checking against the log entries. The code tests against the nestest reference line-by-line until one of the registers doesn’t match the expected value and then bails:

  test() {
    let test_suite = new nestest();

    // put the CPU into the default state
    this.cpu.reset();

    let running = true;

    while (running) {
      /* executing with a numerical argument executes a limited number of instructions */
      this.cpu.execute(1);

      /* look up the comparison line in the nestest object */
      let test_line = test_suite["code"][this.cpu.registers.PC];

      if (!test_line) {
        /* there's no nestest.log entry with this address
        it's safe to assume something went really wrong */
        console.log("CPU address not expected!");
        process.exit();
      }

      Object.entries(this.cpu.registers).forEach(register => {
        if (
          parseInt(this.cpu.registers[register[0]]) !=
          parseInt(test_line.flags[register[0]], 16)
        ) {
          let output_message = `Registers ${register[0]} mismatch\n`;
          output_message += `Got ${this.cpu.registers[register[0]].toString(16)}, expected ${test_line.flags[register[0]]}`;

          console.log(output_message);
          console.log(this.cpu.registers);
          console.log(this.cpu.flags);
          running = false;
        }
      });
    }
  }

This really has made it easier to debug the CPU functionality, as it results in output such as this where there’s an implementation issue:

c812 Beginning execution at 51218			[A: 80 X: 0 Y: 0 P: ad SP: fb CYC: 374]
c812 LDA #$0			[A: 80 X: 0 Y: 0 P: ad SP: fb CYC: 374]
0
c814 Beginning execution at 51220			[A: 0 X: 0 Y: 0 P: 2f SP: fb CYC: 378]
c814 SEC			[A: 0 X: 0 Y: 0 P: 2f SP: fb CYC: 378]
c815 Beginning execution at 51221			[A: 0 X: 0 Y: 0 P: 2f SP: fb CYC: 381]
c815 PHP			[A: 0 X: 0 Y: 0 P: 2f SP: fb CYC: 381]
c816 Beginning execution at 51222			[A: 0 X: 0 Y: 0 P: 2f SP: fa CYC: 385]
c816 PLA			[A: 0 X: 0 Y: 0 P: 2f SP: fa CYC: 385]
c817 Beginning execution at 51223			[A: 3f X: 0 Y: 0 P: 2d SP: fb CYC: 390]
c817 AND #239			[A: 2f X: 0 Y: 0 P: 2d SP: fb CYC: 390]
Registers P mismatch
Got ad, expected 2D
{ PC: 51225, SP: 251, P: 173, A: 47, X: 0, Y: 0 }
{ carry: true,
  zero: false,
  interrupt_disable: true,
  decimal_mode: true,
  break_command: false,
  overflow: false,
  negative: true,
  decimal: false }

Writing a NES emulator in Javascript: Part 4 – Opcodes

In previous articles I talked about the initial process of getting the PRG ROM loaded and executing. Now comes the somewhat lengthy process of implementing all the opcodes that were available on the NES CPU. The fact that the NES brain is so well documented is a huge plus here, including the 6502 programming manual and this opcode list.

Despite what I mentioned in a prior article, I’m no longer using a giant switch statement for all the opcodes. They’re now stored in an object in a separate file, with the name of the opcode (as a constant) being the key. In another file (optable.js) I store information about the instructions, such as byte length and cycle count.

Implementing the instructions

The process I’ve been following for implementing the instructions worked as follows:

  1. Have a good debugger output for instructions as they execute
  2. Execute the instructions and crash when we encounter one that hasn’t been implemented yet, outputting the name of the unimplemented instruction. This can then be looked up in the reference doc and implemented
    while (this.running == true) {
      let opcode = this.fetch(this.registers.PC);

      if (!opcode) {
        /* we've reached the end of execution, here be dragons */
        break;
      }

      /* Debugging code, causes the unimplemented opcode to be written out and the program
         to bail. Useful during development. */

      if (
        typeof optable[opcode] === "undefined" ||
        typeof optable[opcode].op === "undefined"
      ) {
        console.log(
          "Unimplemented opcode: " +
            opcode.toString(16) +
            " at " +
            this.registers.PC.toString(16)
        );
       this.running = false;
       continue;
      }
      /* the bind() call makes the 'this' object the CPU when the opcode closure is executed */
      optable[opcode].op.bind(this).call();

I’ve omitted some code but you get the idea, the opcodes are being pulled from the memory using the Program Counter (CPU.registers.PC), and then get looked up in the optable module. The optable.js file contains entries such as this:

optable[0xEA] = {
  cycles: 2,
  bytes: 1,
  name: "NOP",
  op: opcodes["NOP"]
};

All the opcodes are contained within a file called ops.js, where each opcode function is stored as a value in an object with the key being the opcode byte. Instead of writing extra code to decode the addressing format, it seemed simpler to just implement a function for each addressing mode. The function calls are so small (often one line) that the duplication makes things simpler than writing an addressing mode decoder, but this may have turned out to be a bad choice further down the line. So taking an opcode like LDA in the Indirect addressing mode using the X register we’d have something like:

  LDA_IND_X: function() {
    this.log(
      "LDA ($" + this.next_byte().toString(16) + "), X",
      this.registers.PC
    );
    this.registers.A = this.fetch(this.next_byte() + this.registers.X);
  },

This approach works fine, but I needed some way to be able to test for emulation accuracy. In the next article I’ll discuss the testing system I set up to ensure fidelity in executing the NES code.

Writing a NES emulator in Javascript: Part 3 (booting the first code)

  Note: This is part of a series of posts on writing a NES emulator in Javascript. You can follow the project on github here.

We need a few things to get NES code booting. One of these will be a NES ROM. Doing this I’ll be using the nestest ROM due to copyright restrictions surrounding using an official Nintendo ROM.

iNES format

The format used by most emulators for reading games is the iNES format, which contains the following sections:

Header (16 bytes)
Trainer, if present (0 or 512 bytes)
PRG ROM data (16384 * x bytes)
CHR ROM data, if present (8192 * y bytes)
PlayChoice INST-ROM, if present (0 or 8192 bytes)
PlayChoice PROM, if present (16 bytes Data, 16 bytes CounterOut) (this is often missing, see PC10 ROM-Images for details)
(source: nesdev wiki)

With the header being formatted as follows:

0-3: Constant $4E $45 $53 $1A (“NES” followed by MS-DOS end-of-file)
4: Size of PRG ROM in 16 KB units
5: Size of CHR ROM in 8 KB units (Value 0 means the board uses CHR RAM)
6: Flags 6
7: Flags 7
8: Size of PRG RAM in 8 KB units (Value 0 infers 8 KB for compatibility; see PRG RAM circuit)
9: Flags 9
10: Flags 10 (unofficial)
11-15: Zero filled
(source: nesdev wiki)

Since I have really no idea what to do with the CHR ROM data yet, it’s the PRG_ROM data we’re interested in, as this contains the executable game code. The nestest ROM header looks like this (hexedit output):

00000000   4E 45 53 1A  01 01 00 00  00 00 00 00  00 00 00 00

The bytes 0x4E, 0x46 and 0x53 are the “NES” string. We can see that bytes 4 and 5 are 1, which means the PRG_ROM will be 16KB and the CHR ROM will be 8KB. Therefore the PRG_ROM starts at 0x10 (we don’t have a trainer file) and continues for 16384 bytes. The quick and dirty ROM parser I wrote does the trick of grabbing the data and putting it in a UInt8Array:

    this.rom = new Uint8Array(this.rom_buffer);
    this.prg_r1_size = this.rom[PRG_ROM_SIZE_INDEX];
    this.chr_size = this.rom[CHR_ROM_SIZE_INDEX];
    const ROM_SIZE = ROM_MULTIPLE_SIZE*this.prg_r1_size;
    this.prg_rom = new Uint8Array(this.rom_buffer.slice(PRG_ROM_START_INDEX, ROM_SIZE));

Will it boot?

The NES does some set-up on each reset that gets flags and registers in the right place to begin executing. One thing to remember is that the program ROM is mapped to a region in memory. From the NESdev wiki it seems that the PRG ROM is loaded at 0xC000. In the memory mapping function this is simulated as follows:

fetch(addr) {
if(addr >= 0 && addr <= 0x7FF) { 
  [..] 
} else if (addr > 0x4020 && addr < 0xFFFF) { 
  if (addr >= 0xC000 && addr < 0xFFFF) {
    return this.rom[(addr-0xC000)];
  }
}

The fetch() function pulls a byte out of memory, and if the requested byte index is from 0xC000 to 0xFFFF we want to return the corresponding bytes from the program ROM.

One small problem

Since the code is only for the CPU, if the ROM attempts to use the PPU the emulator will fail. There’s a small workaround. The default reset vector for the emulator is $C004 but while using the nestest ROM there’s an automated mode that starts at $C000 (thanks to this forum post). For this reason, it was necessary to hardcode the reset vector as $C000 to get things rolling. Since I wrote the last part of this article, the structure of the code change, and the giant switch statement was removed. Here’s what the code to get instructions executing looks like now:

  while (this.running == true) {

    let opcode = this.fetch(this.registers.PC);
    optable[opcode].op.bind(this).call();
    [...]

And things are actually booting! This ended up being a hodgepodge of an article due to being written over a long period in which huge structural changes were happening to the code. In the next article I’ll discuss some of the changes I’ve made to improve execution, and how I went about implementing the opcode table.

Check out the current code here.

Writing a NES emulator in Javascript: Part 2 (starting on the CPU)

Let’s start with the most obvious choice: The CPU. The NES’s 2A03 CPU is a modified 6502 processor. There’s an interesting story behind the choice of processor:

“The Nintendo core processor was a 6502 designed with the patented technology scraped off, We actually skimmed off the top of the chip inside of it to see what it was, and it was exactly a 6502. We looked at where we had the patents and they had gone in and deleted the circuitry where our patents were.”

Although there were changes, the NES microprocessor ran 99% of the 6502 instruction set. “Some things didn’t work quite right or took extra cycles,”

One important difference is the removal of the decimal mode, which was patented technology. The flag still exists, but is permanently disabled.

The focus of this implementation was to get code working, and so no effort is being made to optimise anything. For instance, some of the bitwise function are implemented using string operations and will be switched over later.

In order to test against a reference, I’m using nestest.nes which the docs describe as follows:

This here is a pretty much all inclusive test suite for a NES CPU. It was designed to test almost every combination of flags, instructions, and registers. Some of these tests are very difficult, and so far, Nesten and Nesticle failed it. Nintendulator passes, as does a real NES (naturally). I haven’t tested it with any more emulators yet.

I attempted to check the states of all flags after most instructions. For example, CPY and CMP shouldn’t affect the overflow flag, while SBC and ADC should. Likewise, all forms of wrapping ARE tested for- zeropage wrapping being the tests most emulators fail.

Starting the code

The log output for nestest.nes from Nintendulator is used as a reference. The CPU is represented by a class that contains the processor logic and attributes for the various parts of the CPU:

 

class CPU {

  constructor() {
    this.memory = new Memory;
    this.registers = {PC: 0, SP: 0, P:0, A:0, X: 0, Y: 0};
    this.flags = {carry: false, zero: false,
      interrupt_disable: true, decimal_mode: false,
      break_command: true, overflow: false, negative: false};
    this.running = false;
    this.cycles = 0;
    this.logger = new Logger;
    this.stack = new Array;
    this.log("CPU Initialized");
  }

Memory access is split out into its own class which contains functionality for getting and setting memory. The NES includes memory mapped I/O Implementing memory mapping in the CPU involves mapping particular addresses to different hardware elements. The fetch() function of the memory class accomplishes the memory mapping for retrieval detailed here:

 

  fetch(addr) {
    if(addr >= 0 && addr <= 0x7FF) {
      // 0000-00FF Zero-paged region
      // 0100-01FF - Stack Memory
      return this.ram[addr];
    } else if (addr >= 0x800 && addr < 0x0FFF) {
      // mirrors RAM
      return this.ram[addr-0x800];
    } else if (addr >= 0x1000 && addr < 0x17FF) {
      // mirrors RAM
      return this.ram[addr-0x1000];
    } else if (addr >= 0x1800 && addr < 0x1FFF) {
      // mirrors RAM
      return this.ram[addr-0x1800];
    } else if (addr >= 0x2000 && addr < 0x2007) {
      // NES PPU registers
    } else if (addr >= 0x2008 && addr < 0x3FFF) {
      // Mirrors of $2000-2007
    } else if (addr >= 0x4000 && addr < 0x4017) {
      // NES APU and I/O registers
    } else if (addr >= 0x4018 && addr < 0x401F) {
      // APU and I/O functionality
    } else if (addr > 0x4020 && addr < 0xFFFF) {
      if (addr >= 0xC000 && addr < 0xFFFF) {
        return this.rom[(addr-0xC000)];
      }
    }

  }

We'll get to how the ROM part works in the next article.

I opted to implement the opcodes as a giant switch() statement, which while not optimal will get the job done. An alternative approach I could've used was the one Imran Nezar used of creating an object containing all the operations as values, with the opcodes as the keys.:

  execute() {

    this.log("Beginning execution at " + this.registers.PC);
    this.running = true;

    while(this.running == true) {

      let opcode = this.memory.fetch(this.registers.PC);
      switch(opcode) {

      case ops.NOOP:
        let nb1 = this.memory.fetch(this.registers.PC+1);
        let nb2 = this.memory.fetch(this.registers.PC+2);
        let j = utility.Utility.merge_bytes(nb1, nb2);
        this.log("JSR " + j, this.registers.PC);
        let bytes = utility.Utility.split_byte(this.registers.PC+3);
        this.stack.push(bytes[0]);
        this.stack.push(bytes[1]);
        this.registers.SP -= 2;
        this.registers.PC = j;
      break;
      case ops.TSX:
          [...]
      break;
      case ops.TXS:
          [...]
      break;
      case ops.TYA:
          [...]
      break;

In the next article I'll talk a little about how to load a ROM file and how to get the code booting.

Writing a NES emulator in JavaScript: Part 1

Warning: these may be some long articles

Introduction & Inspiration

I’ve always wanted to write an emulator, I’ve found the concept fascinating, and I thought it was finally time to complete a emulation project. I went with the NES for the console, and JavaScript as the language. The reason being the NES is well documented, there are functional emulators out there with the ability to provide debugging output, and I owned a NES myself.

 

About Emulators

If you’re unfamiliar with how emulators work, aside from that they let you play videogames on your computer, the basic explanation is that the software you’re using is a code version of the original concept. Instead of the NES hardware reading and acting on the instructions contained in the ROM code on the cartridge, a program handles the CPU instructions, creates its own RAM, and maps graphical output instructions to visual output on your screen. This is incredibly complex, and hard to get the hang of, but it’s also a lot of fun, and provides some interesting problems to solve.

The height of technological progress

For instance, the NES hardware is a little idiosyncratic, and often restrictive in what it allowed programmers to be able to do. As a result, NES game programmers learned to exploit aspects of the hardware in very interesting ways, relying on the exact hardware implementation’s quirks to produce the result. This could mean that a “perfect” emulator that runs each component exactly may not run games properly.

The Hardware

Let’s take a look at the NES hardware. The NESdev wiki has been an amazing resource so far, and it’s a great place to start from. In these really early stages, gathering an idea of the hardware to be implemented and a rough sketch of how things fit together would be useful.

The goal of the first phase was to get a rough outline of the code and get a few instructions working. In order to do this a few things need completing:

  1. Basic CPU emulation What flags and registers does the CPU use? How does it calculate clock cycles for synchronization?
  2. Memory Mapping How do different parts of hardware map to addresses in memory? (this is the primary way that the CPU communicates with other hardware parts)
  3. Basic ROM Loading. Where is the executable data stored in the data from the cartridge?
  4. Ability to debug instructions Let’s be real, I have no clue what I’m doing with a lot of this, I’m going to need something to check against when I get stuck.

The CPU

The NES CPU core is based on the 6502 processor and runs at approximately 1.79 MHz. Reading this PDF helped immensely. The 6502 has the following registers:

  • A Accumulator Register
  • X, Y Index Registers
  • SP Stack Pointer
  • P Program Counter
  • S Status Flag

At its most simplest, a CPU is only really doing a few simple things. We fetch a byte from memory, decode the opcode, and take some action based on it. And then we do it again. And again. Until we have a reason not to. The CPU is where I’m going to start, and in the next post I’ll talk a bit about how that’s implemented.

Conway’s Game of Life (JavaScript)

The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970.

.

Video recording of simulation, live demo can be found here.

Conway’s Game of Life is awesome for being the simplest example of artifical life. It implements a rule-based system for determining whether elements in a grid should be alive or dead, with a very simple rule system. But like with DNA or Ant Colonies (read about it!) a simple set of rules can result in an amazing array of diverse behaviour.

Conway’s Game of Life isn’t hard to implement, and seems like a great target for optimisation. The logic is pretty simple, we need a grid of cells. A lot of people
use two-dimensional arrays for this, but I opted for a 1D array where the rows/columns are calculated through a translation function. Each cell contains a value of 0 or 1 for dead or alive.

The Code

class Grid {
  constructor() {
    this.grid = new Uint8Array(GRID_SIZE);
    this.next = new Uint8Array(GRID_SIZE);
    this.init();
  }

  dump() {
    return this.next;
  }
  swap() {
    this.grid = this.next;
    this.next = new Uint8Array(GRID_SIZE);
  }
  at(x, y) {
    return y * GRID_WIDTH + x;
  }
  init() {
    this.grid = fast_random(GRID_SIZE);
  }

The first part of the Grid class, the main meat of the code, is pretty simple. The Grid stores a current grid and a next grid, which is updated and then swapped with the current one during animation cycles. When the Grid is initialized it grabs GRID_SIZE bits using the fast_random function (more on this later).

Survival Function

For each cell within the “grid”, we need to calculate the number of neighbours to figure out its fate in the next generation. The rules for the game are as follows:

Any live cell with fewer than two live neighbours dies, as if caused by underpopulation.
Any live cell with two or three live neighbours lives on to the next generation.
Any live cell with more than three live neighbours dies, as if by overpopulation.
Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

 

  update() {
    for (let x = 0; x < GRID_WIDTH; x++) {
      for (let y = 0; y < GRID_WIDTH; y++) {
        let score = 0;
        const CURRENT_LOCATION = this.at(x, y);
        for (let i = -1; i < 2; i++) {
          for (let j = -1; j < 2; j++) {
            if (i == 0 && j == 0) {
              // skip self
            } else {
              score += this.grid[this.at(x + i, y + j) % GRID_SIZE];
            }
          }
        }

Here we’re just looping over every cell of the grid, and for each of the cells adding up the 1s in the surrounding 8 cells. From this we can figure out whether it’s going to be alive or dead based on its current state and its neighbour count:

        switch (this.grid[CURRENT_LOCATION]) {
          case 0:
            if (score == 3) {
              this.next[CURRENT_LOCATION] = 1;
            } else {
              this.next[CURRENT_LOCATION] = 0;
            }
            break;
          case 1:
            if (score > 3 || score < 2) {
              this.next[CURRENT_LOCATION] = 0;
            } else {
              this.next[CURRENT_LOCATION] = 1;
            }
            break;
        }
      }

Pretty simple!

Rendering the Output

I wanted to write several potential renderers to test what gave the best performance. Each simply takes a grid as an array and draws it.

const renderer = RectangleRenderer;
render = new renderer();
let grid = new Grid();

function iterate() {
  grid.update();
  render.render(grid.dump());
  grid.swap();
  requestAnimationFrame(iterate);
}

var ctx;
$(document).ready(function() {
ctx = $("#game-view")[0].getContext("2d");
iterate();
});

Initial Optimisations

Ok so, one problem. With a 256×256 grid the initial start-up takes up to 9 seconds. Some quick JS Profiling in chrome seemed to indicate that the culprit was the init() function.

The init() function was seeding the grid by iterating over the entire grid’s X/Y coordinates and filling each cell with a random 0 or 1. But in order to do this, the Math.random() function was being called for each cell (meaning it needs to be called 65535 times) as the following:

  cell = Math.floor(Math.random()*2)

As it turns out this is extremely wasteful. In theory, each call to Math.random() could be used to supply up to 32 unique bits per call, which would reduce the number of calls significantly. After implementation, the init() function began to run almost instantaneously:

function random_byte() {
  let el = Math.round(Math.random() * (Number.MAX_SAFE_INTEGER/2) + Number.MAX_SAFE_INTEGER/2);
  let bits = [];
  static_counter += 1;

  while (el > 1) {
    let bit = Math.floor(el % 2);
    bits.push(bit);
    el = el / 2;
  }

  return bits;
}

function fast_random(size) {
  let array = [];

  while (array.length < size) {
    let c = random_byte();
    array = array.concat(c);
  }

  return array.slice(0,size);
}

With this function added, the initial delay disappeared entirely for a 256x256 grid. In the next article I'm going to look at some of the algorithms used in speeding up the Game of Life as an implementation level.

Ruby’s Array#map_with_index and each_with_index in Javascript

#each_with_index and #map_with_index are commonly used Ruby features that can be easily achieved in JavaScript

Each with index

In Ruby we can do this:

x = ["a","b","c","d","e"]

x.each_with_index { |el,n| puts "#{n} is #{el}" }

Output:

0 is a
1 is b
2 is c
3 is d
4 is e

In Javascript it’s possible to do something very similar. The documentation for JavaScript’s Array.forEach function states that the three arguments passed to the callback function are:

callback is invoked with three arguments:

the element value
the element index
the array being traversed

Therefore we can do this:

let array = ["a","b","c","d","e","f"];

array.forEach((x,n) => (console.log(x + " is " + n)));
a is 0
b is 1
c is 2
d is 3
e is 4
f is 5

Map with index

In Ruby it’s possible to do this:

x = ["a","b","c","d","e","f"]

z = x.map.with_index { |el,n| (el.ord + n).chr }

puts z

Output:

a
c
e
g
i
k

Similar to with forEach, Map has the following arguments passed to its function argument:

currentValue

The current element being processed in the array.

indexOptional

The index of the current element being processed in the array.

arrayOptional

The array map was called upon.

Which makes it possible to do this:

let array = ["a","b","c","d","e","f"];

let new_array = array.map((x,n) => String.fromCharCode(x.charCodeAt(0)+n));

console.log(new_array);

Output

[ 'a', 'c', 'e', 'g', 'i', 'k' ]

Genetic Algorithms: Evolving “Hello World” in Javascript

I’ve always been extremely interested in evolution as an optimisation algorithm, this post is part of a series of write-ups on experiments using genetic techniques.

If you’re just interested in seeing the end result, check the demo here

Evolution is often described as being a “random” process. Some lay understandings of evolution imply that its process is not feasible given the number of “random” mutations required in order to produce a particular change or result in an organism. The key point neglected in these evaluations is that while mutations are a random component of evolution, the main power is its ability to accumulate “useful” mutations over time. The algorithm presented here is a simple illustration of the power of these kinds of algorithms, future articles with examine more advanced usage of the capabilities.

The problem

Generating the string “Hello, World!” (or any other) from random selections of letters. We start with an initial population of randomly generated strings and allow survivors whose string distance score is the smallest.

The problem space

Given the length of the string N, we will have to search through 26^n possibilities, leading to very rapid combinatorial explosion. For the “Hello, World!” string we will need to search through 2.4811529e+18 (that is, 2.48*10^18) different strings to find the one we’re looking for.

Strings are generated as candidates using the following function, resulting in each initial generation consisting of a series of randomly generated sequences of letters. Note that the alphabet doesn’t contain punctuation, even though it is used in the target string. This is because mutations will result in ASCII characters outside the specified range.

function random_string(len) {

  return new Array(len)
             .fill(0)
             .map((_) => ALPHABET.charAt(get_random_int(0,ALPHABET.length)))
             .join("")
  }

Elements of Genetic Algorithms

Since this example is very simple, it becomes easy to create a set of tunable parameters to investigate the effects on the resulting evolutionary process. These parameters are specified at the top of the JS file:

// string we're targetting
const TARGET_STR = "this string was produced by evolution";
// maximum length of initially created strings
const RANDOM_STR_MAX_LEN = 50;
// mutation rate. 40 means that on average each gene in 40 will be mutated 
const MUTATION_RATE = 40; 
// the tokens to draw from when creating initial strings
const ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" 
// the amount to increase or decrease each ASCII amount by during mutation
const MUTATION_STEPS = 2 
// penalty for being a different length from target string
const PENALTY_FACTOR = 100 
// initial generation size
const INITIAL_GENERATION_SIZE = 100;  
// number of candidates allowed to breed
const ALLOWED_TO_BREED = 40; 
// number of candidates allowed to survive
const ALLOWED_SURVIVORS = 40; 

Iteration

Each new generation is produced partially from the prior ones. Some of the best candidates will survive to the next round, and will also “breed” with other candidates to produce new candidates.

Fitness

After each round “fitness” should be evaluated. This is the digital equivalent of the survival/replication aspect of real evolution. A candidate’s fitness score is used in order to judge whether or not it will survive to the next generation. The algorithm described here uses a numerical distance factor created by summing the integer distance of the character codes of the candidate vs. the target string.

function fitness(str) {

  let smallest = Math.min.apply(null,[str.length, TARGET_STR.length]);
  let score = 0;
  score += Math.abs(str.length - TARGET_STR.length) * 100;

  for(let x=0;x<smallest;x++) {
    score += Math.abs(str.charCodeAt(x) - TARGET_STR.charCodeAt(x));
  }

  return score;

}

Mutation

Mutation introduces the ability to inroduce new genetic material in each round. Interesting results occur if the mutation function is removed or altered, and can result in a single dominant solution very early on with little or no new genetic material. For each child of the candidates in each generation, for every codon there is an 1 in MUTATION_RATE chance of a mutation occuring, which shifts a single letter’s character code by a random amount. If MUTATION_STEPS is 2, the random amount to shift will be between -1 and 1. If it’s 10, the random amount will be between -5 and 5.

function mutate(str) {

  return str
         .split("")
         .map((letter) => 
           (Math.round(Math.random()*MUTATION_RATE) == 2) 
           ? 
             String.fromCharCode(letter.charCodeAt(0)+Math.round(((Math.random()*MUTATION_STEPS)-(MUTATION_STEPS/2)))) 
             : 
             letter)
         .join("");

}

Breeding/Recombination

In order to retain the benefits of the optimal candidates, we cross-breed the best solutions. There are numerous ways to do this. One might be to simply take the odd numbered characters from one string, the even numbered characters from another string, and splice them together. Another could be to just use the first half of one string and the second half of the other. A solution often used in genetic algorithm research is creating “cut points” in the genomes of the two candidates. For instance, imagine two strings:

AAAAAAA
BBBBB
We select cut points 3 and 4 in string1, and 2 and 4 in string2, the resulting child ends up being:
  str1 0-3 AAA
  str2 2-4 BB
  str1 4-5 A
  = AAABBA

These functions are implemented as follows:

function breed(str1, str2) {

  let cp_1 = get_random_int(0, str1.length-1); 
  let cp_2 = get_random_int(cp_1, str1.length-1);
 
  // cp_1 is in the first string, defined as the first cut-point
  // this means we will take genetic material from str1 from 0 -> cp1
  // this will then be combined with the genetic material between cp_3 and cp_4 in str2
  // so we have str1[0,cp_1] + str2[cp_3,cp_4] + str1[cp_2,str1.size]

  let cp_3 = get_random_int(0, str2.length-1);
  let cp_4 = get_random_int(cp_3, str2.length-1);

  return (str1.slice(0, cp_1) + str2.slice(cp_3, cp_4) + str1.slice(cp_2, str2.length)).toString();

}

Put it all together and run it in the console and you end up with a generational history something like this:

 [lWTogtaMKlaXTHFozonCcqkpIrpESsEIqSSl, 916]
 [hduqCqurauqCqtrEsvrhprahDawErvrpvrpen, 349]
 [ieprBstrhtrDrtrEsvqbtaagDbwEavrputagn, 295]
 [hepsDrurirqCrdsCprrctaafBbxDavpptughn, 256]
 [iegsCquritgCrctBprpct#beBbxDcuonuthgn, 232]
 [uggtAqtrithBvbtBprpbtcbdBbxDcuonuthgn, 210]
 [uggs#rtrjnhAwasBprqctcid#ayCcvolurihn, 193]
 [ughs#rtrinh#xatAprpducdd#ayBdvnlutign, 180]
 [thhs#strinh#wat#prpcucdd#bxBdvnlutiin, 169]
 [this#strinh#wat#qroduced#byBdvnlutiin, 160]
 [uhis#strinh#wat#pqoduced#by#cvolutiin, 152]
 [this#string#xas8pqpduced#by#evoluthjn, 142]
 [this#string9was6qqoduced9by#evoluuijn, 134]
 [thhs9string7was8produced6by#evolutikn, 125]
 [this7strhng7was8prodtced5by#evolutiln, 122]
 [this7rtrgng5was2produced3bz#evolutimn, 113]
 [thhs6strgng3was2producfd2by8evolution, 105]
 [this6strgng1was2produced2by5evolution, 98]
 [thjs4string0was1producee0by3euolution, 91]
 [thjs3string0was1produced0by1fvolution, 87]
 [this1ssring#was#prodtced#cy#evplutjon, 80]
 [thit0string#was#prodtced#bz#evolution, 75]
 [this0string#was#prodtced#by#euolutjon, 69]
 [this#string#was#produced#by#evoluuion, 63]
 [this#ssring#was#produced#by#evolution, 59]
 [this#string#was#qroduced#by#fvolution, 55]
 [this#string#was#prodvced#by#evolution, 51]
 [this#string#was#psoduced#by#evolution, 46]
 [this#string#was#producdd#by#evolutipn, 38]
 [this#string#was#procuced#by#evplutioo, 26]
 [uhis#string#was#produced#by#evolutioo, 22]
 [this#stoing#was#produced#by evolution, 20]
 [shis#stting#was#produced#by evolution, 17]
 [tiis stting#was#produced bx evnlutiom, 11]
 [this#stting was#procuced bx evnlution, 9]
 [this#stsing was producee by evomution, 4]
 [this string wat producee by evolution, 2]

Feel free to try the live version here or check the code out from github to play with.