GTK+ Forums

Discussion forum for GTK+ and Programming. Ask questions, troubleshoot problems, view and post example code, or express your opinions.
It is currently Sun Nov 23, 2014 9:14 pm

All times are UTC




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: glib lexical scanner
PostPosted: Wed Sep 26, 2012 8:27 am 
Offline
GTK+ Guru

Joined: Sun Jul 08, 2012 3:14 pm
Posts: 107
Location: Coventry, UK
Dear friends,
Sorry for two back to back post. But the content of this post is related to previous post
The problem is in parsing. I am trying to parse a bibtex file, which has structure:

Quote:
@phdthesis{chow1983thesis,
author = "Chowdhury, D.",
institution = "Department of Physics, IIT, Kanpur",
location = "Kanpur",
publisher = "Department of Physics, IIT, Kanpur",
school = "Department of Physics, IIT, Kanpur",
title = "{The Spin Glass Transition}",
year = "1983"
}


I tried few options like learning flex etc and failed. Currently I am using shellscript to get the data. So, the scheme is now as follows:
grep & awk to select data -> write those data in file->C reads those data as array->show in treeview, which can clearly compete for the worst way of getting things done.
I found lexical scanner can do this but found only one example here. Cant manage this example to read from file. Please show me some way.


Top
 Profile  
 
 Post subject: Re: glib lexical scanner
PostPosted: Wed Sep 26, 2012 11:06 pm 
Offline
Never Seen the Sunlight

Joined: Thu Mar 24, 2011 2:10 pm
Posts: 328
Location: Sydney, Australia
Ok the link you gave is just a generic lexical scanner. If you want something more specialised then you can find btparse (libbtparse0 in most linux repositories, or by searching for btparse from the cpan website). perl as a language is more widely used for lexical scanning and you'll probably find perl based versions to have more capabilities.
My experience with bibtex has taught me to be cautious though. As it has been around since the B.U. era (before UTF) and not adapted too well to the change over, it has some severe limitations that have forced people to customise their own kind of bibtex styles (when I try to cite journals with cyrillic characters it is a nightmare) which might not sit well with the conventions expected of the parser.


Top
 Profile  
 
 Post subject: Re: glib lexical scanner
PostPosted: Wed Sep 26, 2012 11:13 pm 
Offline
GTK+ Guru

Joined: Sun Jul 08, 2012 3:14 pm
Posts: 107
Location: Coventry, UK
Yes,
btparse is surely an option.
But my goal is to using the general scanner, as, if everything goes well, within few days I may fall in need to parse html file and in that case, I will need to handle a real scanner.

That is why, I am thinking of using the general scanner, which may proof helpful in near future.


Top
 Profile  
 
 Post subject: Re: glib lexical scanner
PostPosted: Thu Sep 27, 2012 11:46 am 
Offline
Never Seen the Sunlight

Joined: Wed Jul 23, 2008 10:31 am
Posts: 2406
Location: Slovenia
Hi.

I never used GScanner before so I took this opportunity to see what's all about. What I came up with is this quite general Bibtex parser that can extract information from entries in almost no time.

It still parses text string, embeded in application code, but since replacing it with reading from text is trivial, I thought I would let this as a practice to reader;)

Here is the code:
Code:
#include <glib.h>
#include <string.h>

/* Test data */
static const gchar *ttest = "@phdthesis{chow1983thesis,\n"
                            "author = \"Chowdhury, D.\",\n"
                            "institution = \"Department of Physics, IIT, Kanpur\",\n"
                            "location = \"Kanpur\",\n"
                            "publisher = \"Department of Physics, IIT, Kanpur\",\n"
                            "school = \"Department of Physics, IIT, Kanpur\",\n"
                            "title = \"{The Spin Glass Transition}\",\n"
                            "year = \"1983\"\n"
                            "}";


static void
output_entry (GHashTable *table)
{
  GHashTableIter iter;
  char *key, *val;

  g_print ("Citation entry:\n");
  g_hash_table_iter_init (&iter, table);
  while (g_hash_table_iter_next (&iter, &key, &val))
    g_print ("  %16s: %s\n", key, val);
  g_print ("\n");
}

static guint
parse_entry (GScanner   *scanner,
             GHashTable *table)
{
  /* Entry starts with @ */
  g_scanner_get_next_token (scanner);
  if (scanner->token != '@')
    return G_TOKEN_ERROR;

  /* Now get identifier */
  g_scanner_get_next_token (scanner);
  if (scanner->token != G_TOKEN_IDENTIFIER)
    return G_TOKEN_ERROR;

  g_hash_table_insert (table, g_strdup ("type"),
                       g_strdup (scanner->value.v_identifier));

  /* Brace */
  g_scanner_get_next_token (scanner);
  if (scanner->token != G_TOKEN_LEFT_CURLY)
    return G_TOKEN_ERROR;

  /* ID */
  g_scanner_get_next_token (scanner);
  if (scanner->token != G_TOKEN_IDENTIFIER)
    return G_TOKEN_ERROR;

  g_hash_table_insert (table, g_strdup ("id"),
                       g_strdup (scanner->value.v_identifier));

  while (TRUE)
    {
      char *key, *val;

      g_scanner_get_next_token (scanner);
      if (scanner->token != G_TOKEN_COMMA)
        return G_TOKEN_ERROR;

      g_scanner_get_next_token (scanner);
      if (scanner->token != G_TOKEN_IDENTIFIER)
        return G_TOKEN_ERROR;

      key = g_strdup (scanner->value.v_identifier);

      g_scanner_get_next_token (scanner);
      if (scanner->token != '=')
        {
          g_free (key);
          return G_TOKEN_ERROR;
        }

      g_scanner_get_next_token (scanner);
      if (scanner->token != G_TOKEN_STRING)
        {
          g_free (key);
          return G_TOKEN_ERROR;
        }

      val = g_strdup (scanner->value.v_string);
      g_hash_table_insert(table, key, val);

      g_scanner_peek_next_token (scanner);
      if (scanner->next_token == G_TOKEN_RIGHT_CURLY)
        break;
    }

  /* Eat last curly brace and return */
  g_scanner_get_next_token (scanner);
  return G_TOKEN_NONE;
}


int
main (int    argc,
      char **argv)
{
  GScanner *scanner;
  GHashTable *table;
  guint ret;

  scanner = g_scanner_new (NULL);
  g_scanner_input_text (scanner, ttest, strlen (ttest));

  table = g_hash_table_new_full (g_str_hash, g_str_equal, g_free, g_free);
  do
    {
      g_hash_table_remove_all (table);
      ret = parse_entry (scanner, table);

      if (ret == G_TOKEN_ERROR)
        break;
      else
        output_entry (table);

      g_scanner_peek_next_token (scanner);
    }
  while (scanner->next_token != G_TOKEN_EOF &&
         scanner->next_token != G_TOKEN_ERROR);

  /* finsish parsing */
  g_scanner_destroy (scanner);
  g_hash_table_destroy (table);

  return 0;
}


Cheers,
Tadej


Top
 Profile  
 
 Post subject: Re: glib lexical scanner
PostPosted: Thu Sep 27, 2012 12:53 pm 
Offline
GTK+ Guru

Joined: Sun Jul 08, 2012 3:14 pm
Posts: 107
Location: Coventry, UK
Tadej,
This is a huge help for me.
Thanks a lot.


Top
 Profile  
 
 Post subject: Re: glib lexical scanner
PostPosted: Fri Feb 15, 2013 11:49 pm 
Offline
GTK+ Guru

Joined: Sun Jul 08, 2012 3:14 pm
Posts: 107
Location: Coventry, UK
Hi Tadej and others,
Sorry for reopening such a old thread again.
As this code you posted, works fine when the key="value" structure, since, "value" is identified as string, it seems to be not working if the value is in {value format}, i.e. something like
author = {Chowdhury, D.}, format.

I was trying to readjust your code and failed. Do you have any quick suggestion?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group