| Bill Allombert on Mon, 27 Oct 2025 19:15:06 +0100 |
[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]
| Re: [EXTERN] Re: Re: Re: Definition of tokens in GP language |
On Mon, Oct 27, 2025 at 04:52:18PM +0100, Hong-Phuc Bui wrote: > Hi, I'm there again :) > > Thanks for sharing information. It motivates me more to write a lexer for Pygments. > Now I'm reading both files: the file lang.l in gp2c-repository and the function pari_lex() in the file anal.c. > If I understand correctly, I have two choices: > > 1) The function pari_lex() works fully correctly and can handle all conner cases in GP language, > but it's written by hand. > => Porting to Python is not as easy as I wish (well writing a lexer was never easy :)). > > 2) The generated lexer from lang.l can now also handle all conner cases, but is not yet solid-rock as the function pari_lex() for now. > => Porting in python, for example by using PLY[1] or RegexLexer[2] with > State-Management, may be easier, but the Python lexer may not handle all > corner cases? Well hopefuly, I should be able to fix the lex parser if we find other bugs. But how you define a token depends on how you want to use it. Will you feed them to a parser or will you them directly as a base for syntax hilighting ? There are special constructs that are not handled as token for parsing purpose but are semantically tokens: Some time <- is 'less minus' some time it is 'left_arrow' depending whether there is a preceding | [a|b<-c] tokens are [ a | b < - c ] but this is to be understood as [ a | b <- c ] while [a,b<-c] tokens are [ a , b < - c ] but this is just [ a , b < - c ] In the other direction )-> is a token but for most purpose it should be read as ) 'right_arrow' so that it matches the previous ( Cheers, Bill.