Expand description
§Logos
Create ridiculously fast Lexers.
Logos has two goals:
- To make it easy to create a Lexer, so you can focus on more complex problems.
- To make the generated Lexer faster than anything you’d write by hand.
To achieve those, Logos:
- Combines all token definitions into a single deterministic state machine.
- Optimizes branches into lookup tables or jump tables.
- Prevents backtracking inside token definitions.
- Unwinds loops, and batches reads to minimize bounds checking.
- Does all of that heavy lifting at compile time.
§Example
use logos::Logos;
#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \t\n\f]+")] // Ignore this regex pattern between tokens
enum Token {
// Tokens can be literal strings, of any length.
#[token("fast")]
Fast,
#[token(".")]
Period,
// Or regular expressions.
#[regex("[a-zA-Z]+")]
Text,
}
fn main() {
let mut lex = Token::lexer("Create ridiculously fast Lexers.");
assert_eq!(lex.next(), Some(Ok(Token::Text)));
assert_eq!(lex.span(), 0..6);
assert_eq!(lex.slice(), "Create");
assert_eq!(lex.next(), Some(Ok(Token::Text)));
assert_eq!(lex.span(), 7..19);
assert_eq!(lex.slice(), "ridiculously");
assert_eq!(lex.next(), Some(Ok(Token::Fast)));
assert_eq!(lex.span(), 20..24);
assert_eq!(lex.slice(), "fast");
assert_eq!(lex.next(), Some(Ok(Token::Text)));
assert_eq!(lex.slice(), "Lexers");
assert_eq!(lex.span(), 25..31);
assert_eq!(lex.next(), Some(Ok(Token::Period)));
assert_eq!(lex.span(), 31..32);
assert_eq!(lex.slice(), ".");
assert_eq!(lex.next(), None);
}§Callbacks
Logos can also call arbitrary functions whenever a pattern is matched, which can be used to put data into a variant:
use logos::{Logos, Lexer};
// Note: callbacks can return `Option` or `Result`
fn kilo(lex: &mut Lexer<Token>) -> Option<u64> {
let slice = lex.slice();
let n: u64 = slice[..slice.len() - 1].parse().ok()?; // skip 'k'
Some(n * 1_000)
}
fn mega(lex: &mut Lexer<Token>) -> Option<u64> {
let slice = lex.slice();
let n: u64 = slice[..slice.len() - 1].parse().ok()?; // skip 'm'
Some(n * 1_000_000)
}
#[derive(Logos, Debug, PartialEq)]
#[logos(skip r"[ \t\n\f]+")]
enum Token {
// Callbacks can use closure syntax, or refer
// to a function defined elsewhere.
//
// Each pattern can have it's own callback.
#[regex("[0-9]+", |lex| lex.slice().parse().ok())]
#[regex("[0-9]+k", kilo)]
#[regex("[0-9]+m", mega)]
Number(u64),
}
fn main() {
let mut lex = Token::lexer("5 42k 75m");
assert_eq!(lex.next(), Some(Ok(Token::Number(5))));
assert_eq!(lex.slice(), "5");
assert_eq!(lex.next(), Some(Ok(Token::Number(42_000))));
assert_eq!(lex.slice(), "42k");
assert_eq!(lex.next(), Some(Ok(Token::Number(75_000_000))));
assert_eq!(lex.slice(), "75m");
assert_eq!(lex.next(), None);
}Logos can handle callbacks with following return types:
| Return type | Produces |
|---|---|
() | Ok(Token::Unit) |
bool | Ok(Token::Unit) or Err(<Token as Logos>::Error::default()) |
Result<(), E> | Ok(Token::Unit) or Err(<Token as Logos>::Error::from(err)) |
T | Ok(Token::Value(T)) |
Option<T> | Ok(Token::Value(T)) or Err(<Token as Logos>::Error::default()) |
Result<T, E> | Ok(Token::Value(T)) or Err(<Token as Logos>::Error::from(err)) |
Skip | skips matched input |
Filter<T> | Ok(Token::Value(T)) or skips matched input |
FilterResult<T, E> | Ok(Token::Value(T)) or Err(<Token as Logos>::Error::from(err)) or skips matched input |
Callbacks can be also used to do perform more specialized lexing in place
where regular expressions are too limiting. For specifics look at
Lexer::remainder and
Lexer::bump.
§Errors
By default, Logos uses () as the error type, which means that it
doesn’t store any information about the error.
This can be changed by using #[logos(error = T)] attribute on the enum.
The type T can be any type that implements Clone, PartialEq,
Default and From<E> for each callback’s error type.
§Token disambiguation
Rule of thumb is:
- Longer beats shorter.
- Specific beats generic.
If any two definitions could match the same input, like fast and [a-zA-Z]+
in the example above, it’s the longer and more specific definition of Token::Fast
that will be the result.
This is done by comparing numeric priority attached to each definition. Every consecutive, non-repeating single byte adds 2 to the priority, while every range or regex class adds 1. Loops or optional blocks are ignored, while alternations count the shortest alternative:
[a-zA-Z]+has a priority of 1 (lowest possible), because at minimum it can match a single byte to a class.foobarhas a priority of 12.(foo|hello)(bar)?has a priority of 6,foobeing it’s shortest possible match.
Re-exports§
pub use crate::source::Source;
Modules§
- This module contains a bunch of traits necessary for processing byte strings.
Structs§
Lexeris the main struct of the crate that allows you to read through aSourceand produce tokens for enums implementing theLogostrait.- Type that can be returned from a callback, informing the
Lexer, to skip current token match. See alsologos::skip. - Iterator that pairs tokens with their position in the source.
Enums§
- Type that can be returned from a callback, either producing a field for a token, or skipping it.
- Type that can be returned from a callback, either producing a field for a token, skipping it, or emitting an error.
Traits§
- Trait implemented for an enum representing all tokens. You should never have to implement it manually, use the
#[derive(Logos)]attribute on your enum.
Functions§
- Predefined callback that will inform the
Lexerto skip a definition.
Type Aliases§
- Byte range in the source.