/ COMPUTER SCIENCE, COMPILERS, MATHEMATICA

# Finding All Wolfram Language Operators

In this second article, Part 1 of an n part series on Defining the Wolfram Language, we start getting our hands dirty hunting down every single operator in Mathematica and each operator’s linguistic properties. To my knowledge, nobody outside of Wolfram has created such an exhaustive list before. The operator properties we hope to document are arity, affix, associativity, and precedence. In the next article, “Generalizing PEMDAS: What is an operator?,” we will define and discuss the signficance of these linguistic properties of operators for programming languages generally. If these terms are foreign to you, do not worry, you will not need to know what these terms mean for this article, but you can read the next article before reading Part 1 without any confusion.

## Data Sources

It is surprisingly difficult to get accurate information about all Wolfram Language operators implemented in Mathematica. There are a few different sources of information about operators and their properties. I list all publicly known sources (as of August 2018) in the table below, the last two of which appear to be described nowhere else on the public internet. We will explore these sources in more detail in the remainder of this section.

Source Description
Official Documentation The most common operators only. Usually no precedence or associativity.
WolframLanguageData[] Only has info for most common operators. For most operators, gives precedence rank, arity, affix. Less often associativity info is apparent.
Precedence[] Undocumented built-in function. When provided the long name of an operator, gives precedence. Only works with some operators and is often incorrect.
UnicodeCharacters.tr Gives explicit precedence, affix, and associativity for 338 operators that are a single Unicode character outside of ASCII character set. Note that this system file uses a different precedence numbering scheme than other sources and which applies only to the notebook interface.
SystemConvert MLStringDataDump $ExtractedOperators Undocumented list of 272 single character unicode operators all but one of which is in UnicodeCharacters.tr. A proper subset of the next source, but I include it for completeness. SystemConvert MLStringDataDump $Operators Undocumented list of 340 operators—characters only. Includes most operators, even single and multi-character ASCII operators. Excludes Association brackets, box operators, and obscure bracketing operators.
InternalSymbolNameQ[] Undocumented function that apparently identifies a string as a “symbol name.” Computing the complement of the set of Unicode characters for which SymbolNameQ evaluates to True gives a list of operators, including some single-character box operators found in no other list.

The undocumented3 system variable SystemConvertMLStringDataDump$Operators is a list of 340 operators without any additional information. The list is only available in the notebook frontend. The undocumented function SystemConvertMLStringDataDumpOperatorQ[] works by checking membership in this list. This is the only list, excluding the documentation and WolframLanguageData, that includes all single and multi-character ASCII operators. As with most other sources, no box operators are included. It also excludes Association brackets, obscure bracketing symbols, many LargeOp:Prefix operators, and a handful of other operators found in UnicodeCharacters.tr. We have scoured the documentation, taken a deep dive into Mathematica system files, and explored undocumented features described nowhere else outside of this article save the offices of Wolfram. It may surprise you, then, that there are more operators than we have found so far. ### The Complement of InternalSymbolNameQ Characters Our strategy in this section is to make a list of every Unicode character in the character range used by Mathematica that is not a letter, letter-like form, or member of SystemConvertMLStringDataDump$Operators, which we asign the shorter alias ops. Using the definition of LetterLikeQ[] along with the opaque built-in function InternalSymbolNameQ[], we can search the list of Unicode characters to find the complement of the set of characters we can already categorize. Our search criteria is as follows:

The following code produces a table of code/character pairs matching the criteria above:

The result is a bit of a surprise: We get a seemingly random handful of ASCII characters, Association brackets, several of the LargeOp:Prefix operators that are missing in the other lists along with a small handful of other Unicode operators also from UnicodeCharacters.tr, and, most interestingly, single character versions of multicharacter box operators that are not in any other source. These box operators are:

The list also includes the nonoperator characters 0xf767: \[ErrorIndicator], 0x2043: \[SkeletonIndicator], the ASCII delete character 0x007f (127), the ASCII bell character 0x0007, and 0xf3ad, which is an unassigned noncharacter Unicode codepoint “for private use.”

Since every other Unicode character is either already identified as an operator or in a set known to exclude operators, any unidentified single character Unicode operators are guaranteed to be in this list. We have found them all.

### Other “Operators”

There are other lexemes—meaningful strings in the language—that could be considered operators but that do not appear outside of the documentation:

• String-related characters like " and escape characters \n, \t, etc.
• The character representation operators: \[name], \:nn, and \.nnnn.
• The number representation operators: ^^, *^, and , though the single  character does appear in the previous subsection.
• The comment matchfix operator (*...*).

These lexemes are such basic units of the language that they may be processed by Mathematica during or even prior to the lexical analysis phase of the code parsing process similar to how C/C++ compiler drivers preprocess #define, #include, and other preprocessor directives before sending the result to the compiler.

These sorts of language elements are not unique to Wolfram Language, of course, and language tool authors generally make choices about how to handle them based on their syntactic role within the language and what simplifies the software engineering task.

[1]: While this may be a minor complaint, the symbolic names of operators do not always correspond to the function of that name. For example, Infix is the symbolic name for the ternary operator a~f~b which evaluates as f[a, b], whereas the Infix[] function is just a function that effects the display of f[a, b], writing it in terms of the Infix ternary operator. Meanwhile, the square brackets operator for function application f[e] is apparently nameless despite the existence of the Construct[]` function.

[2]: I have borrowed this code with minimal modifications from Mathematica Stack Exchange user Mr. Wizard.

[3]: In fact, the only mention of this variable on the public internet appears to be this article.

#### Robert Jacobson

I am an assistant professor of mathematics at Roger Williams University. I write about mathematics, education, mathematics education, computer science, and whatever interests me.