| 1. |
The equivalent binary code obtained after translaction of source code . |
|
Answer» Alot of the answers are either too simple or too complex so I will explain this as a guy who self-studied compiler construction. Basically what happens first is that your SOURCE code is fed to the compiler program. The compiler first removes all the comments from the source code by converting all those comments into SPACES! This is a step called preprocessing. After preprocessing, the compiler then splits up the preprocessed source code into individual sets of characters. The individual sets of characters such as words, numbers and their prefixes or suffixes, and other symbols or combinations of those symbols. Here’s a quick example: i += 10; After preprocessing, the compiler would take i += 10; and break it up like this: i - identifier. += - add assignment operator. 10 - integer number. ; - semicolon. this process of breaking up the source code into individual sets of characters is known as tokenization or lexical analysis. The program/system of a compiler that handles this is called a lexical analyzer or lexer for short. The individual sets of characters are given a mean as to what the sets represents, these are then called tokens. going back to our example from above, i is an alphabetic word so it’s checked to see if it’s a keyword for the language, otherwise it’s an identifier and thus given a value that tells the compiler that it’s either a keyword or an identifier. 10 is given the token value to mean that it’s an integer. += is a combination of two symbols but the combination has meaning so it’s given its own value which the compiler knows as “add assignment”. Same thing with the final symbol, the semicolon. In the next step, the compiler uses these tokens in the 2nd system called the Parser. The Parser’s job is to apply the language GRAMMAR on the tokens given from the lexer. Going back to our example again, the parser applies a grammar rule such as: i += 10; In many languages, using a variable by itself counts as an expression. Notice that the += and ; are in quotes, which means that those two operators are required, which means that you must use that operator/symbol where the parser requires it or you’ll GET a syntax error! If I were to omit the ; semicolon token and the parser requires it, it would give me a syntax error where it expected a semicolon token but got something else. if the grammar of the source code is sound, then the Parser uses the tokens produced by the lexer to produce what is called an Abstract Syntax Tree. In the most simplest explanation, the Abstract Syntax Tree is a tree data structure representation of the syntax structure of your source code. Here’s an example using the C programming language: // foo.c void foo() { uint32_t n = 1; while( n ) { if( n==1 ) n += 5; else if( n==21 ) break; } } The function foo above would produce an abstract syntax tree like: module: foo.c function-definition: void foo() variable-declaration: uint32_t n = 1; while statement: —— main-expression: n != 0 ——— variable-expr: n ——— relational-expr: != ——— integer-expr: 0 —— code block: ——— if-statement: ———— main-expression: n==1 ————— variable-expr: n ————— relational-expr: == ————— integer-expr: 1 ———— code block: ————— main-expression: n += 5 —————— variable-expr: n —————— compound-expr: += —————— integer-expr: 5 ——— else statement: ———— code block: ————— if statement: ——————expr-statement: ———————variable-expr: n ——————— relational-expr: == ——————— integer-expr: 21 —————— code block: ——————— break statement |
|