Science  People  Locations  Timeline
Index: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Home > Context-free grammar


 Contents
In linguistics and computer science, a context-free grammar (CFG) is a formal grammar in which every production rule is of the form
V → w

where V is a non-terminal symbol and w is a string consisting of terminals and/or non-terminals. The term "context-free" comes from the fact that the non-terminal V can always be replaced by w, regardless of in what context it occurs. A formal language is context-free if there is a context-free grammar that generates it.

Context-free grammars are powerful enough to describe the syntax of programming languages; in fact, the syntax of most programming languages are specified with the help of context-free grammars. On the other hand, context-free grammars are simple enough to allow the construction of efficient parsing algorithms which, for a given string, determine whether and how it can be generated from the grammar. See Earley parser for an example of such an algorithm. LR parser and LL parser are further methods to parse more restrictive subsets of context-free grammars. See also parsing expression grammar for another type of formal grammar particularly suited to programming languages.

BNF ( Backus-Naur Form) is the most common notation used to express context-free grammars.

1 Examples

1.1 Example 1

A simple context-free grammar is

S → aSb | ε

where | is used to separate different options for the same non-terminal and ε stands for the empty string. This grammar generates the language

which is not regular.

1.2 Example 2

Here is a context-free grammar for syntactically correct infix algebraic expressions in the variables x, y and z:

S → x | y | z | S + S | S - S | S * S | S/S | (S)

This grammar can for example generate the string "( x + y ) * x - z * y / ( x + x )".

1.3 Example 3

A context-free grammar for the language consisting of all strings over {a,b} which contain a different number of a's than b's is

S → U | V
U → TaU | TaT
V → TbV | TbT
T → aTbT | bTaT | ε

Here, T can generate all strings with the same number of a's as b's, U generates all strings with more a's than b's and V generates all strings with fewer a's than b's.

1.4 Other examples

Context-free grammars are not limited in application to mathematical ("formal") languages. The ancient Indian linguist Panini described Sanskrit using a context-free grammar. Recently, it has been suggested that a class of Tamil poetry called VenpaVenpa in Tamil) is a form of classical tamil poetry. Classical Tamil poetry has been classifed based upon the rules of metric prosody. Such rules form a context-free grammar. Every Venpa consists between two to twelve lines. Popular books written in Venpa is governed by a context-free grammar.

2 Derivations and Syntax trees

There are basically two ways to describe how in a certain grammar a string can be derived from the start symbol. The simplest way is to list the consecutive strings of symbols, beginning with the start symbol and ending with the string, and the rules that have been applied. If we introduce a strategy such as "always replace the left-most nonterminal first" then for context-free grammars the list of applied grammar rules is by itself sufficient. This is called the leftmost derivation of a string. For example, if we take the following grammar:

(1) S → S + S
(2) S → 1

and the string "1 + 1 + 1" then the left derivation of this string is the list [ (1), (1), (2), (2), (2) ]. Analogously the rightmost derivation is defined as the list that we get if we always replace the rightmost nonterminal first. In this case this would be the list [ (1), (2), (1), (2), (2)].

The distinction between leftmost derivation and rightmost derivation is important because in most parserA parser is a computer program or a component of a program that analyses the grammatical structure of an input, with respect to a given formal grammar, a process known as parsing. Parsers can be made both for natural languages and for programming languages the transformation of the input is defined by giving a piece of code for every grammar rule that is executed whenever the rule is applied. Therefore it is important to know whether the parser determines a leftmost or a rightmost derivation because this determines the order in which the pieces of code will be executed. See for an example LL parsers and LR parsers.

A derivation also imposes in some sense a hierarchical structure on the string that is derived. For example the structure of the string "1 + 1 + 1" would, according to the leftmost derivation, be:

S→S+S (1)
S→S+S+S (1)
S→1+S+S (2)
S→1+1+S (2)
S→1+1+1 (2)
{ { { 1 }S + { 1 }S }S + { 1 }S }S

where { ... }S indicates a substring recognized as belonging to S. This hierarchy can also be seen as a tree:

S /|\ / | \ / | \ S '+' S /|\ | / | \ | S '+' S '1' | | '1' '1'


This tree is called a concrete syntax tree (see also abstract syntax treeIn computer science, abstract syntax tree (AST) is a data structure representing something which has been parsed, often used as a compiler or interpreter's internal representation of a computer program while it is being optimized and from which code gener) of the string. In this case the presented leftmost and the rightmost derivation define the same syntax tree, however there is another (leftmost) derivation of the same string possible

S→ S + S (1)
S→ 1 + S (2)
S→ 1 + S + S (1)
S→ 1 + 1 + S (2)
S→ 1 + 1 + 1 (2)

and this defines the following syntax tree:

S /|\ / | \ / | \ S '+' S | /|\ | / | \ '1' S '+' S | | '1' '1'

If for certain strings in the language of the grammar there is more than one parsing tree then the grammar is said to be an ambiguous grammarIn computer science, a grammar is an ambiguous grammar if some string in the language can be generated in more than one way (i. it has more more than one parse tree or more than one leftmost derivation). A language is inherently ambiguous if it can only b. Such grammars are usually hard to parse because the parser cannot always decide which grammar rule it has to apply.



Read more »

Non User