Bill Allombert on Mon, 27 Oct 2025 19:15:06 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: [EXTERN] Re: Re: Re: Definition of tokens in GP language


On Mon, Oct 27, 2025 at 04:52:18PM +0100, Hong-Phuc Bui wrote:
> Hi, I'm there again :)
> 
> Thanks for sharing information. It motivates me more to write a lexer for Pygments.
> Now I'm reading both files: the file lang.l in gp2c-repository and the function pari_lex() in the file anal.c.
> If I understand correctly, I have two choices:
> 
> 1) The function pari_lex() works fully correctly and can handle all conner cases in GP language,
> but it's written by hand.
> => Porting to Python is not as easy as I wish (well writing a lexer was never easy :)).
> 
> 2) The generated lexer from lang.l can now also handle all conner cases, but is not yet solid-rock as the function pari_lex() for now.
> => Porting in python, for example by using PLY[1] or RegexLexer[2] with
> State-Management, may be easier, but the Python lexer may not handle all
> corner cases?

Well hopefuly, I should be able to fix the lex parser if we find other bugs.

But how you define a token depends on how you want to use it. Will you feed
them to a parser or will you them directly as a base for syntax hilighting ?

There are special constructs that are not handled as token for parsing purpose
but are semantically tokens:

Some time <- is 'less minus' some time it is 'left_arrow' depending whether there is 
a preceding |

[a|b<-c] tokens are             [ a | b < - c ]
but this is to be understood as [ a | b <- c ]
while 
[a,b<-c] tokens are [ a , b < - c ]
but this is just    [ a , b < - c ]

In the other direction )-> is a token but for most purpose it should be read as ) 'right_arrow'
so that it matches the previous (

Cheers,
Bill.