DEV Community

Sh Raj
Sh Raj

Posted on

Creating a C compiler in JavaScript

Creating a C compiler in JavaScript is a complex and ambitious project that involves several components, including lexical analysis, parsing, semantic analysis, and code generation. Below is a simplified and high-level example of how you might start building such a compiler. This example will focus on the lexical analysis (tokenization) and parsing stages, which are the first steps in compiling C code.

Step 1: Lexical Analysis (Tokenization)

The lexical analyzer (lexer) converts the input C code into a stream of tokens.

class Lexer {
  constructor(input) {
    this.input = input;
    this.tokens = [];
    this.current = 0;
  }

  tokenize() {
    while (this.current < this.input.length) {
      let char = this.input[this.current];

      if (/\s/.test(char)) {
        this.current++;
        continue;
      }

      if (/[a-zA-Z_]/.test(char)) {
        let start = this.current;
        while (/[a-zA-Z0-9_]/.test(this.input[this.current])) {
          this.current++;
        }
        this.tokens.push({ type: 'IDENTIFIER', value: this.input.slice(start, this.current) });
        continue;
      }

      if (/[0-9]/.test(char)) {
        let start = this.current;
        while (/[0-9]/.test(this.input[this.current])) {
          this.current++;
        }
        this.tokens.push({ type: 'NUMBER', value: this.input.slice(start, this.current) });
        continue;
      }

      switch (char) {
        case '+':
          this.tokens.push({ type: 'PLUS', value: '+' });
          this.current++;
          break;
        case '-':
          this.tokens.push({ type: 'MINUS', value: '-' });
          this.current++;
          break;
        case '*':
          this.tokens.push({ type: 'STAR', value: '*' });
          this.current++;
          break;
        case '/':
          this.tokens.push({ type: 'SLASH', value: '/' });
          this.current++;
          break;
        case '=':
          this.tokens.push({ type: 'EQUAL', value: '=' });
          this.current++;
          break;
        case ';':
          this.tokens.push({ type: 'SEMICOLON', value: ';' });
          this.current++;
          break;
        case '(':
          this.tokens.push({ type: 'LPAREN', value: '(' });
          this.current++;
          break;
        case ')':
          this.tokens.push({ type: 'RPAREN', value: ')' });
          this.current++;
          break;
        default:
          throw new TypeError('Unexpected character: ' + char);
      }
    }
    return this.tokens;
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Parsing

The parser converts the stream of tokens into an abstract syntax tree (AST).

class Parser {
  constructor(tokens) {
    this.tokens = tokens;
    this.current = 0;
  }

  parse() {
    let ast = {
      type: 'Program',
      body: []
    };

    while (this.current < this.tokens.length) {
      ast.body.push(this.parseStatement());
    }

    return ast;
  }

  parseStatement() {
    let token = this.tokens[this.current];

    if (token.type === 'IDENTIFIER' && this.tokens[this.current + 1].type === 'EQUAL') {
      return this.parseAssignment();
    }

    throw new TypeError('Unknown statement: ' + token.type);
  }

  parseAssignment() {
    let identifier = this.tokens[this.current];
    this.current++; // skip identifier
    this.current++; // skip equal sign

    let value = this.parseExpression();

    this.expect('SEMICOLON');

    return {
      type: 'Assignment',
      identifier: identifier.value,
      value: value
    };
  }

  parseExpression() {
    let token = this.tokens[this.current];

    if (token.type === 'NUMBER') {
      this.current++;
      return {
        type: 'Literal',
        value: Number(token.value)
      };
    }

    throw new TypeError('Unknown expression: ' + token.type);
  }

  expect(type) {
    let token = this.tokens[this.current];
    if (token.type !== type) {
      throw new TypeError('Expected ' + type + ' but found ' + token.type);
    }
    this.current++;
  }
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Code Generation

Finally, the code generator converts the AST into the target language, which could be JavaScript or any other language.

class CodeGenerator {
  generate(node) {
    switch (node.type) {
      case 'Program':
        return node.body.map(statement => this.generate(statement)).join('\n');
      case 'Assignment':
        return `let ${node.identifier} = ${this.generate(node.value)};`;
      case 'Literal':
        return node.value;
      default:
        throw new TypeError('Unknown node type: ' + node.type);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Putting It All Together

Here is how you might use the lexer, parser, and code generator:

const input = `x = 42;`;
const lexer = new Lexer(input);
const tokens = lexer.tokenize();
console.log('Tokens:', tokens);

const parser = new Parser(tokens);
const ast = parser.parse();
console.log('AST:', JSON.stringify(ast, null, 2));

const generator = new CodeGenerator();
const output = generator.generate(ast);
console.log('Output:', output);
Enter fullscreen mode Exit fullscreen mode

This will tokenize the input, parse it into an AST, and generate JavaScript code from the AST.

Note

This example is highly simplified and only handles a tiny subset of the C language. A full-fledged C compiler would require handling a much larger set of tokens, parsing complex expressions, statements, declarations, types, and generating more sophisticated code.

Top comments (0)