alex
Alex is a tool for generating lexical analysers in Haskell
| LTS Haskell 24.16: | 3.5.4.0 | 
| Stackage Nightly 2025-10-25: | 3.5.4.0 | 
| Latest on Hackage: | 3.5.4.0 | 
alex-3.5.4.0@sha256:38b3481bb0d5eb58c9dd6579e7904859372c3c0c82a8db86e58177649c289621,4546Module documentation for 3.5.4.0
There are no documented modules for this package.
Alex: A Lexical Analyser Generator
Alex is a tool for generating lexical analysers, also known as “lexers” and “scanners”, in Haskell. The lexical analysers implement a description of the tokens to be recognised in the form of regular expressions. It is similar to the tools “lex” and “flex” for C/C++.
Share and enjoy!
Documentation
Documentation is hosted on Read the Docs:
For basic information of the sort typically found in a read-me, see the following sections of the docs:
Changes
Changes in 3.5.4.0
- Fix issue #277:
compatibility of generated code with {-# LANGUAGE ImpredicativeTypes #-}, thanks Nadia Yvette Chambers!
- Simplify cabal install(PR #272), thanks Antoine Leblanc!
- Document examples/words.x, thanks Piotr Justyna!
- Tested with GHC 8.0 - 9.12.2.
Andreas Abel, 2025-08-03
Changes in 3.5.3.0
- Fix critical bug in automaton minimizer (PR #270), thanks Antoine Leblanc!
- Tested with GHC 8.0 - 9.12.2.
Andreas Abel, 2025-04-06
Changes in 3.5.2.0
- Use byteSwap16#andbyteSwap32#on big-endian architectures instead of handrolling the implementation (PR #260).
- More descriptive error in alexScan; inlinealexScanUser(PR #262).
- Tested with GHC 8.0 - 9.12.1.
Andreas Abel, 2024-12-30
Changes in 3.5.1.0
- Drop generating output for GHC < 6.4.
- Use qualified imports in generated code (except for Prelude) (Issue #258).
- Suppress warnings tabsandunused-importsfor generated code (Issue #255).
- Tested with GHC 8.0 - 9.8.2.
Andreas Abel, 2024-02-29
Changes in 3.5.0.0
- Add option --numeric-version.
- Remove deprecated -vas alias for--version.
- Add -vas placeholder for a future--verboseoption.
- Make alex{G,S}etUserStateavailable with themonadUserState-bytestringwrapper (Issue #220).
- Debugging lexer: print character in addition to its ASCII code (PR #252).
- Tested with GHC 8.0 - 9.8.1.
Andreas Abel, 2023-12-30
Changes in 3.4.0.1
- Address new x-partialwarning of GHC 9.8.
- Alex 3.4.0.1 needs GHC 8.0 or higher to build. The code it generates is the same as 3.4.0.0, so it will likely work for older GHCs.
- Tested with GHC 8.0 - 9.8.1.
Andreas Abel, 2023-10-29
Changes in 3.4.0.0
- New wrappers to lex strict Text:strict-text,posn-strict-text,monad-strict-textandmonadUserState-strict-text(PR #240). These complement the existing wrappers forStringandByteString.
- Tested with GHC 7.0 - 9.6.2.
Andreas Abel, 2023-06-20
Changes in 3.3.0.0
- Add an Ordinstance toAlexPosn(Issue #233). This breaks developments that define their own (orphan)instance Ord AlexPosn. If this is the derived stock instance, the fix is to delete the orphan instance and requirebuild-tool-depends: alex:alex >= 3.3.0.0.
- Switch to Haskell PVP versioning with four digits.
- Tested with GHC 7.0 - 9.6.1.
Andreas Abel, 2023-05-25
Change in 3.2.7.4
- The user-supplied “epilogue” Haskell code is now put last in the generated file. This enables use of Template Haskell in the epilogue. (Issue #125.)
- Tested with GHC 7.0 - 9.6.1.
Andreas Abel, 2023-05-02
Change in 3.2.7.3
- Amend last change (3.2.7.2)
so that Alex-generated code does not need LANGUAGE PatternGuards.
- Tested with GHC 7.0 - 9.6.1.
Andreas Abel, 2023-04-14
Change in 3.2.7.2
- Fix bug with out-of-bound access to alex_checkarray. (Surfaced with GHC’s JS backend, fixed by Sylvain Henry in PR #223.)
- Tested with GHC 7.0 - 9.6.1.
Andreas Abel, 2023-04-03
Change in 3.2.7.1
- Fix bug with repeated numeral characters outside of r{n,m}repetitions. This was a regression introduced in 3.2.7.
John Ericson, 2022-01-23
Changes in 3.2.7
- 
Allow arbitrary repetitions in regexps. Previously, the r{n,m}and related forms were restricted to single digit numbersnandm.
- 
DFA minimization used to crash on tokens of the form c*which produce automata with only accepting states. Considering the empty set of non-accepting states as an equivalence class caused minimization to crash with exception.
- 
The small_baseflag is removed. Extremely old GHCs will no longer build.
- 
A number of bug fixes and clearer diagnostics. 
John Ericson, 2022-01-20
Changes in 3.2.6:
- 
Support for the GHC 9.2. The array access primops now use the fixed-sized numeric types corresponding to the width of the data accessed. Additionally, the primops to convert to and from fixed-sized numeric types have been given new names. 9.2 isn’t cut yet, so these changes are somewhat speculative. Unfortunately, GHC must used a released version of Alex (and Happy) at all times until further changes have been made, so we must make the release to actually implement these changes in GHC. If the final GHC 9.2 ends up being different, this release will be marked broken to make it less likely people use it by accident. 
John Ericson, 2020-12-15
Changes in 3.2.5:
- Build fixes for GHC 8.8.x
Simon Marlow, 2019-11-04
Changes in 3.2.4:
- Remove dependency on QuickCheck
- Change the way that bootstrapping is done: see README.md for build instructions
Simon Marlow, 2018-03-29
Changes in 3.2.3:
- fix issue when using cpphs(#116)
Simon Marlow, 2017-09-08
Changes in 3.2.2:
- Manage line length in generated files [GH-84]
- Fix issue when identifier with multiple single quotes, e.g. foo''was used
- Allow omitting spaces around =in macro definitions
- Include pre-generated Parser.hsandScan.hsin the Hackage upload, to make bootstrapping easier.
Simon Marlow, 2017-09-02
Changes in 3.2.1:
- Fix build problem with GHC; add new test tokens_scan_user.x
Simon Marlow, 2016-10-18
Changes in 3.2.0:
- Allow the token type and productions to be overloaded, and add new
directives: %token,%typeclass,%action. See “Type Signatures and Typeclasses” in the manual.
- Some small space leak fixes
Simon Marlow, 2016-10-08
Changes in 3.1.7:
- Add support for %encodingdirective (allows to control--latin1from inside Alex scripts)
- Make code forward-compatible with in-progress proposals
- Suppress more warnings
Simon Marlow, 2016-01-08
Changes in 3.1.6:
- sdistfor 3.1.5 was mis-generated, causing it to ask for Happy when building.
Simon Marlow, 2015-11-30
Changes in 3.1.5:
- Generate less warning-laden code, and suppress other warnings.
- Bug fixes.
Simon Marlow, 2015-11-25
Changes in 3.1.4:
- Add Applicative/Functorinstances for GHC 7.10
Simon Marlow, 2015-01-06
Changes in 3.1.3:
- Fix for clang(XCode 5)
Simon Marlow, 2013-11-28
Changes in 3.1.2:
- Add missing file to extra-source-files
Simon Marlow, 2013-11-11
Changes in 3.1.1:
- Bug fixes (#24, #30, #31, #32)
Simon Marlow, 2013-11-11
Changes in 3.1.0:
- necessary changes to work with GHC 7.8.1
Simon Marlow, 2013-09-16
Changes in 3.0 (since 2.3.5)
- 
Unicode support (contributed mostly by Jean-Philippe Bernardy, with help from Alan Zimmerman). - 
An Alex lexer now takes a UTF-8 encoded byte sequence as input. If you are using the “basic” wrapper or one of the other wrappers that takes a Haskell String as input, the string is automatically encoded into UTF-8 by Alex. If your input is a ByteString, you are responsible for ensuring that the input is UTF-8 encoded.
- 
Alex source files are assumed to be in UTF-8, like Haskell source files. The lexer specification can use Unicode characters and ranges. 
- 
alexGetCharis renamed toalexGetBytein the generated code.
- 
There is a new option, --latin1, that restores the old 8-bit behaviour.
 
- 
- 
Alex now does DFA minimization, which helps to reduce the size of the generated tables, especially for lexers that use Unicode. 
Release Notes for version 2.2
- 
Cabal-1.2is now required.
- 
ByteStringwrappers: use Alex to lex ByteStrings directly.
Release Notes for version 2.1.0
- 
Switch to a Cabal build system: you need a recent version of Cabal (1.1.6 or later). If you have GHC 6.4.2, then you need to upgrade Cabal before building Alex. GHC 6.6 is fine. 
- 
Slight change in the error semantics: the input returned on error is before the erroneous character was read, not after. This helps to give better error messages. 
Release Notes for version 2.0
Alex has changed a lot between versions 1.x and 2.0. The following is supposed to be an exhaustive list of the changes:
Syntax changes
- 
Code blocks are now surrounded by {...}rather than%{...%}.
- 
Character-set macros now begin with ‘ $’ instead of ‘^’ and have multi-character names.
- 
Regular expression macros now begin with ‘ @’ instead of ‘%’ and have multi-character names.
- 
Macro definitions are no longer surrounded by { ... }.
- 
Rules are now of the form <c1,c2,...> regex { code }where c1,c2are startcodes, andcodeis an arbitrary Haskell expression.
- 
Regular expression syntax changes: - 
()is the empty regular expression (used to be ‘$’)
- 
set complement can now be expressed as [^sets](for similarity with lex regular expressions).
- 
The 'abc'form is no longer available, use[abc]instead.
- 
‘ ^’ and ‘$’ have the usual meanings: ‘^’ matches just after a ‘\n’, and ‘$’ matches just before a ‘\n’.
- 
‘ \n’ is now the escape character, not ‘^’.
- 
The form "..."means the same as the sequence of characters inside the quotes, the difference being that special characters do not need to be escaped inside"...".
 
- 
- 
Rules can have arbitrary predicates attached to them. This subsumes the previous left-context and right-context facilities (although these are still allowed as syntactic sugar). 
Changes in the form of an Alex file
- 
Each file can now only define a single grammar. This change was made to simplify code generation. Multiple grammars can be simulated using startcodes, or split into separate modules. 
- 
The API has been simplified, and at the same time made more flexible. 
- 
You no longer need to import the Alexmodule.
Usage changes
The command-line syntax is quite different.
Implementation changes
- 
A more efficient table representation, coupled with standard table-compression techniques, are used to keep the size of the generated code down. 
- 
When compiling a grammar with GHC, the -gswitch causes an even faster and smaller grammar to be generated.
- 
Startcodes are implemented in a different way: each state corresponds to a different initial state in the DFA, so the scanner doesn’t have to check the startcode when it gets to an accept state. This results in a larger, but quicker, scanner. 
