This short article is for those students, programmers, and computer scientists who already have a basic idea of what a parser is and does but who want to know what that mysterious reference to “IELR(1)” means in the Bison parser generator manual.
The IELR(1) Parsing Algorithm
The IELR(1) parsing algorithm was developed in 2008 by Joel E. Denny as part of his Ph.D. research under the supervision of Brian A. Malloy at Clemson University. The IELR(1) algorithm is a variation of the so-called “minimal” LR(1) algorithm developed by David Pager in 1977, which itself is a variation of the LR(k) parsing algorithm invented by Donald Knuth in 1965. The IE in IELR(1) stands for inadequacy elimination (see last section).
The LR(1) part of IELR(1) stands for Left to right, Rightmost derivation with 1 lookahead token. LR(1) parsers are also called canonical parsers. This class of parsing algorithms employs a bottom-up, shift-reduce parsing strategy with a stack and state transition table determining the next action to take during parsing.
Historically, LR(1) algorithms have been disadvantaged by large memory requirements for their transition tables. Pager’s improvement was to develop a method of combining the transition states when the transition table is generated, significantly reducing the size of the table. Thus Pager’s algorithm makes LR(1) parsers competitive with other parsing strategies with respect to space and time efficiency. The phrase “minimal LR(1) parser” refers to the minimal size of the transition table introduced by Pager’s algorithm.
Limitations of Pager’s Algorithm
Minimal LR(1) algorithms produce the transition table based on a particular input grammar for the language to be parsed. Different grammars can produce the same language. Indeed, it is possible for a non-LR(1) grammar to produce an LR(1) parsable language. In practice, LR(1) parser generators accept non-LR(1) grammars with a specification for resolving conflicts between two possible state transitions (“shift-reduce conflicts”) to accommodate this fact. Denny and Malloy found that Pager’s algorithm fails to generate parsers powerful enough to parse LR(1) languages when provided certain non-LR(1) grammars even though the non-LR(1) grammar generates an LR(1) language.
Denny and Malloy show that this limitation is not merely academic by demonstrating that Gawk and Gpic, both widely used, mature software, perform incorrect parser actions.
Denny and Malloy studied the source of the deficiencies of Pager’s algorithm by comparing the transition table generated by Pager’s algorithm to the transition table of an equivalent LR(1) grammar and identified two sources of what they term inadequacies that appear in the transition table from Pager’s algorithm but not in the LR(1) transition table. Denny and Malloy’s IELR(1) (Inadequacy Elimination LR(1)) algorithm is an algorithm designed to eliminate these inadequacies when generating the transition table that is virtually identical in size to that of Pager’s algorithm.
This article is based on my answer to a question on StackExchange.