Tech Talk

The Benefits of yacc

By Tony Sumner

I have a friend in the Department of Linguistic Science who asked me for some advice on a new project that she has just begun. Her professor has been collecting data on a number of languages and the first task was to design and create a database of this data. The data did not concern itself with the things that might come to mind, like the grammar and vocabulary, but with the structure of the syllables that go to make up the words of the language. My first thought, incidentally, was that it was hardly worth bothering with a database since there cannot be that many languages altogether and it was a surprise to me to learn that the number of languages in the world probably exceeds 100,000.

A syllable is made up of consonants and vowels and the consonants may be further divided into categories like sibilants, fricatives, etc. The idea is to store a description of all the syllables that each language uses in terms of the primitive elements and then interrogate the database to find out which languages allow any given syllable. For example the syllable “trst” is allowed in Serbo-Croat but not in French (Serbs and Croatians enjoy rolling their r's even more than the French do). To illustrate the procedure the following is a specification of the syllables of an artificial language I shall call Ptui. It is much too simple to be a real language but it has enough complexity to be interesting.

C: c p n t
V: a o I
Syllable struct: (C) V (C)

The brackets in the description of the syllable structure enclose optional elements; thus a syllable must have a vowel and it may begin and/or end with a consonant. You can see that "cat" and "cop" are possible syllables in Ptui but "cant" is not. There are not many possible syllables but the Ptui can say things like “catonap i pono nipa conit op nacotan”. I like to think of the syllable “o” in Ptui expressing a polite skepticism. The team has collected about 150 languages so far and they will be collecting many more. One of them has just returned from the US where he has been studying American Indian languages. They asked me to advise on how the data might be analyzed. It seemed to me that what was wanted here was a parser and so I began to think of getting yacc to do most of the heavy work. yacc is a program for parsing sentences made up of words, which is much the same as here except that the syllable plays the role of the sentence and the consonants, vowels, etc. correspond to the words. The name yacc is derived from “yet another compiler-compiler” and I suppose it must be one of many though I am not familiar with any others, yacc is capable of generating compilers for full-scale computer languages like Fortran or C so the analysis of Ptui is trivial by comparison but it provides a good vehicle for an article on how to derive the structures that yacc needs. You must supply two things: a lexical analyzer and a set of grammar rules for the language. yacc then generates a parser, which is a program you can run to parse any sentence; the resulting parse may be handed to another process such as a code generator but in this case all we are asking the parser to do is to say whether the "sentence" is legal, i.e., whether a given syllable is possible in Ptui. The lexical analyzer is a function, written in C, that reads each character and decides what it is, that is, whether it is a consonant or a vowel or illegal. 

The grammar rules describe legal syllables; here the grammar rules are merely a re- statement of the syllable structure, in this case (C)V(C). The complete yacc program needs two more components - a main function, which just calls the parser yyparse(), and an error function yyerror() to put out error messages. The program for Ptui is as follows:

%token C V;
%%
line: pre V post
pre: /* empty */ | C
post: /* empty */ | C
%%
#include <stdio.h>

int yylex()
{
  int ch;
  while((ch=getchar())==' ')
    ;
  switch (ch)
  {
     case 'c': case 'p': case 'n': case 'f':

      
return (C);
     case 'a': case 'o': case 'i':
       return (V);
     case '\n':
       return 0;
     default:
       printf("err: %c\n", ch);
       return -1;
  }
}

void yyerror(char *s)
{
  printf("%s not OK\n\n",s);
}

void main() { yyparse(); }

The first part, from “%token” to “%%” contains the declarations of the primitive elements, in this case just consonant and vowel. The declaration of single-character tokens may be omitted but I have included them for completeness; when we start analyzing languages seriously we shall be using symbols from the International Phonetic Alphabet to represent complex sounds and some of these will be multi-character tokens. The next part of the program, up to the next “%%”, contains the grammar rules; these say that a syllable always has a vowel, that it may begin with a consonant and that it may end with a consonant, all of which is just another way of stating the syllable structure.

The function yylex() is the lexical analyzer. What it does is read one character from the input stream and return the corresponding token or -1 if the character is not recognized or zero at the end of the syllable. Suppose the program [grammar - ed.] is in a file ptui.y. To create the parser you run yacc with:

yacc ptui.y

that creates a file y.tab.c that you compile with

cc y.tab.c -o ptui

and the object file ptui then contains the parser. Try it with “cat” for example and it just exits, meaning that this syllable is allowed. Try “cant” and the reply is “not allowed”. The next task for the research team is to write a program that generates a yacc program like the one above for every language for which they have data. The program will have to read the lexicon and the syllable structure and generate the grammar rules and the lexical analyzer. Currently I am using awk for this task but that is another story (c.f. CVu vol 3 issue 2). The Free Software Foundation has produced a version of yacc called Bison and you can get the full source code plus an .EXE file to run on any Intel-powered PC and a superb reference manual in both plain and TEX formatting which explains all about Bison with many examples.

Software

For Windows users, you can get Cygwin tools free from Cygnus (now RedHat) (http://sourceware.cygnus.com/cygwin/)

Linux and most Unix users already have all the software.

References

lex & yacc 2nd Ed. by John R. Levine, Tony Mason & Doug Brown, O’Reilly, ISBN 1-56592-000-7 (http://www.oreilly.com/catalog/lex/index.html)

Bison v1.29, Free Software Foundation (http://www.gnu.org/manual/bison-1.25/html_mono/bison.html)

[Editor’s Comment

Tony Sumner originally wrote this article for CVu several years ago. However, it struck me as relevant today for a number of reasons.

First, I am currently working on a project that offers the user a number of choices: for example, a CPU, a Storage Type, and a Disk Controller. The number of CPUs, Storage Types, and Disk Controllers are going to grow during the lifetime of this product. The product needs to verify that valid combinations are being used. Thus, we are faced with a potential for a maintenance headache due to the possible exponential growth in combinations. Obviously, a tool like yacc would have been valuable if it had been used. Why something like yacc was not used is a surprise since most computer science graduates must have worked with LALR parsers at some point in their academic careers. Maybe this article will twig their memories.

It should also be noted that yacc is copyrighted software that belonged to AT&T. As such, the Free Software Foundation wrote a replacement for yacc, called bison. Bison accepts yacc grammars and some extensions, but it uses slightly difference file naming conventions. Running grammar file, ABC.y, through bison produces file ABC.ytab.c. Yacc defaults to always producing file y.yacc.c. Other than that, you should not have any other compatibility issues. There is also another yacc-like product called byacc. Unfortunately, I have little information on it. It is available under cygwin.

I also ran into a second anomaly while looking for yacc-lookalikes. None of the major PC software sites had any yacc-related software. It is as if the knowledge of syntax analysis and LALR parsers that we have developed over the years has dropped into a black hole.]

Historic Note

In researching for this issue of ACCent, I discovered that the March 1992 issue of CVu talked about Linux v0.12. That version ran on a minimum of a 386sx with 1MB of memory and a 20MB disk partition. Ray Bellis, the reporter for CVu, knew a good thing when he saw it.

In the January 1992 issue of CVu, Dr. Keith Standish wrote about QNX, the topic of this month’s ACCU meeting in Silicon Valley.

Editorial

By Reg. Charney

Open versus Closed Source

This week, the true meaning of the Open Source Software movement was driven home to me. I was working on some code dealing with Microsoft’s registry under Unicode. The documentation for the API routine did not answer some basic questions, and the test programs did not work as described. When the supplied documentation is inadequate, you waste hours or days trying to check out what really happens. As long as Microsoft’s code is closed, we are not their partners in making their platform more valuable, we are their victims. debugging their code in the dark and wasting my precious time. Not only that, but when I submit the bug, I have no guarantee that it will ever be fixed, much less fixed in a timely fashion. Thus, dealing with closed source like Microsoft’s means that we are at the mercy of their agendas and priorities, not yours and not mine.

Later in the week, I came across some source that I could view, only to discover some really strange things were happening (See the lead technical article). Again, however, in a closed system, there is only very inefficient ways of fixing bugs such as I found. Under OSS, bugs would be fixed in days or weeks. In the world of Microsoft, it will take months or years. They are trapped in a paradigm of their own making. Unfortunately, we are trapped in the same world.

JavaOne 2000

The fifth annual Java One conference was held again in San Francisco. Again, it was large and well attended. Estimates place the number of attendees at more than 20,000. There was also a full calendar of seminars and courses. Given all that, it did not have the “buzz” of the conference last year. It was missing something – a sense of excitement. For a more detailed technical comparison of this year’s conference to last year’s conference, see the Trends section.

Welcome to VA Linux

We want to welcome VA Linux as our newest sponsors. They have been a major force in the Linux and Open Source Software movement. We are very proud and pleased to have the endorse our efforts to reach this community with news and information that it can use.

Letters to the Editor

You know you are alive when people react to you. So, I really appreciated the two emails we received, one from the UK and one from a colleague, both with differing views on last month’s articles.

Mixing Storage Types Bug

Hi Reg,

Nice magazine -- I look forward to future issues. And congratulations on gaining the sponsor; I hope they're the first of many.

In your article Mixing Storage Types Bug, I think you might have added a longer explanation contrasting the _three_ ways that memory is allocated -- globally (existing for the duration of the program, or at least the main() function), locally (existing only within a limited scope), and dynamically (lifetime specifically managed by code).

The problem in the example is not that the use of "global and local storage has been mixed" up, but that dynamic memory demands special handling, unlike the other two. The issue is touched on in the next to last paragraph, which points out the improper use of delete to free storage which wasn't allocated by new, but it could have been emphasized more, particularly since there was no discussion of why new had been used when the constructor and copy constructor were added to struct S.

Basically, this boils down to a matter of class design, rather than syntax. The language exists to enable you to express a functional design. The creator of a class needs to think about what concepts are being embodied in the code and what entity "owns" and manages the data in the class. If the class has a "has a" relationship with the data member, then the lifetime of the data is the same as its containing object, and it's appropriate for an instance to allocate and release memory to hold the data. But there are other sorts of relationship -- "holds a", "uses a", "is aware of" -- where the two objects may well have different lifetimes, and thus any allocated memory needs to be managed by some other entity. Of course, when you cut things down to the smallest possible example, the meaningful context gets lost.

The real problem is in f(), which has been written with certain assumptions which don't match those of struct S in Listing 2. A good argument can be made that external functions should not be mucking about with internal members of a class -- and after all, when you come down to it, a struct and a class are the same thing (and a union is almost the same thing, which surprises some people, but that's a subject for a different article). If struct S kept its data private and offered a public set_pi() function, then f() wouldn't be able to cause the damage.

NOTHING in the above should be construed as an offer to write an article for ACCent, so don't ask -- I have to write a talk for Francis [Glassborow, ACCU editor-in-Chief - ed.] and JaCC before I do anything else. But you can put it in a letters column if you want to; edit as necessary.

See you in Toronto! [at the next X3J16 ANSI/ISO C++ Standards meeting. - ed.]

Cheers,
Lois Goldthwaite
<loisg_AT_ospace.demon.co.uk>

[Editor: Thanks for this non-article ;-). From my example of struct S, it is clear that little thought was given to ownership – a very important concept. The use of struct to bypass the need to provide access functions contributed to the problems I encountered. I would posit that most programming is done this way. This “programming on the cheap and nasty with little thought about design – just to get the job done” leads to a lot of wasted time and effort.]

"Downtrend" in job market

I am speculating that if a company can’t fill but 1 opening out of 20, what is the use of listing a 21st identical job opportunity?

I’m hearing about a trend where people are hired based on their skills with no particular task pending for them. When they show up it’s basically “Here's all the stuff we want to do but don’t have time for – do what you feel you’d make the best impact on.” A colleague was saying that the people with the hiring authority were scientists, not coders, and everything they wanted to have done was way over their heads.

Erich Fickle

Adventures Abound

Inroads to Free Software Development

by David Burley, Marble Horse Free Software Group

Copyright © 2000 David Burley, All Rights Reserved. 

[David offers suggestions for anyone wondering how to get started helping with Free Software. He wrote this editorial for www.freshmeat.net. I have reprinted it here because of its relevance to our mission as a forum for Open Source Software and to encourage others to get involved. – ed.]

Inspired by a comment from Waldo L. Jaquith to Jacob Moorman's The Importance of Non-Developer Supporters in Free Software, this document simply describes a few ways to get involved in projects. It is aimed to be as general as possible so as to be applicable to all projects. 

Many understand the developmental model used by free software development groups and GNU/Linux distributions. However, few understand how to properly get involved. This problem has perplexed many and reduces the number of individuals who are able to become involved in free software development and testing.

Making Contact

The first step before starting in on any project is making contact with the existing maintainer of the project. Communication is the key to reducing redundant efforts, ensuring that work isn't done twice and maximizing the value of the work of individuals. This step may yield the discovery of a dormant project which needs help. It may also bring to one's attention that they can join an effort another person has started. Be sure to evaluate the skills needed for any undertaking and decide on how to help based upon those skills. Nearly any undertaking will be a learning experience.

[Look to sites like http://sourceforge.net or www.freshmeat.net for open source projects - ed.]

Testing 

The most fundamental job a novice may perform for a free software project is to test and review or audit the released software and documentation. It is most helpful to the maintainers of the projects to have a third party review their work. Reporting back the results of such tests, and submitting changes to documentation according to the projects problem reporting mechanism, can help ensure a higher quality release. Project maintainers may not respond or acknowledge the work appropriately as they should [1], however, one should not become discouraged; it is unfortunate, but this extremely important position tends to get ignored often.

Documentation 

The next level of support one can lend to a project is documentation. There are literally thousands of free software programs and projects in existence. There is consistently a lack of high-quality documentation available for these projects. Documentation comes in many forms; the most common are: manual pages (or man pages), Linux Documentation Project (LDP) HOWTOs, user manuals and guides. Documentation should be written in a consistent format which is useful to its target audience. Keep in mind that documentation meant to be read by technically-oriented individuals may also be of use to a more wide-scale audience and should be written in such a way that all can understand. Feel free to refer the reader to outside sources of information.

Creating proper documentation takes a lot of time and skill. Remember all the processes learned in English Class and put them to use. Get a reference book and put it to good use. Keep in mind that documentation written may be translated to other languages or read by persons whose native tongue is not English. Clarity is of the utmost importance.

It is good practice to submit updates to documentation when you find mistakes, bad grammar, or old information (broken URLs, incorrect company contact information, etc.). It is also a good time to review the documentation thoroughly when one is reading it.

Write on topics close to the heart (write what you know). Good documentation takes a lot of time to write. When one is intimately involved with the subject matter, they are more apt to pay attention to detail. It is not only important to list how to do something, but also why it works and what the command does. Making an end user more informed is the point of the document and thus no issue should be left untouched. Start the document by describing what the program does and document the command line arguments used by the program. This first document is the most crucial part of program documentation; it is the start of a full fledged user manual. A quick start guide is another good starting point for a manual. They tend to save the reader a lot of time and get to the point. Rome wasn't built in a day and neither will a useful user manual.

Programming 

There are many tasks which may be performed by a novice programmer. Most of these projects involve a minimal set of programming skills and basic documentation skills. One must first get a hold of the latest developmental version of a piece of software. Next, use the software and review its visual appeal and error messages. Document the issues and make a list of what could be made to look better or reworded to make more sense. Consider possible ways to remediate the issues and attempt to fix them. Although one may not be able to solve all the issues they documented, they can post a patch along with a listing of the issues resolved and a list of issues which remain to be solved. 

Adding additional error output and debugging code can be of great use to a project, big or small. Such additions don't tend to take a lot of time but are more adept to be ignored by the program's maintainer. Good software provides consistent, easy-to-read messages which are both of practical benefit to the end-user and useful to the developer during the troubleshooting process. Submit changes in the form of a patch to the author and they will usually be happy to accept them.

Skilled developers have their work cut out for them. As to be expected, the majority of work needed in free software development is coding. Grab the latest developmental code, take a peek at the wish list or TODO list, and hack away. 

Submit patches back to the project maintainer and watch the project become the program that all in the world use for years to come.

Public Image 

Those who are webmonkey's can help create a better looking website for a project. Most project maintainers do not have time to spend writing PHP or HTML to make a good web page. However, a web presence tends to be one's first look at a project and thus is a way for it to gain outside interest. Get out a copy of The Gimp, a favorite text editor and hack up a web presence for the project. Don't forget to make it easy for the project maintainers to update the site with new information.

Lastly, whenever getting involved in a project, join its mailing lists and newsgroups. Post messages when an update is made with a URL to the patch or documentation. Be alive and active in the project and get noticed. Free software does not grow on trees and needs your help. No longer can one say, “but I don't know how to get involved.” Select an appropriate starting point and get to work! 

*[1] See Jacob Moorman's “The Importance of Non-Developer Supporters in Free Software” for more information on the very important related topic.

David Burley is a Core Developer for the Marble Horse Free Software Group. He actively engages in documentation, coding, and advocacy of Free Software projects. David is also a second year Computer Engineering major at the University of Cincinnati.

Sponsorship Packages

ACCent is aimed at programmers interested in Open Source Software and programming languages like C, C++, Java, Python and Perl. It offers three benefits to sponsors: it appeals to a highly focused group of technically capable people; it gives the sponsor good publicity and public relations by associating the sponsor with a well-known and well-respected non-profit organization like the ACCU; and the monthly issues mean that the sponsor gets a form of corporate advertising for the price of sponsorship. 

The ACCU is an international non-profit organization aimed at improving the professionalism of its programming members. Its web site is www.accu.org. We currently have about a 1,000 members worldwide and are in the early stages of building a local Silicon Valley chapter, which is one of the reasons for the newsletter. Also, to misquote a politician, "All programming is local." – so our newsletter is published with the needs of Silicon Valley programmers in mind. 

There are three levels of annual sponsorship: Gold, Silver and Bronze. The benefits are based on sponsorship level. 

Bronze - includes the following five (5) benefits:

  1. Listed as a Bronze-level sponsor on the front cover of the newsletter
  2. Entitled to a black and white 1/4 page ad in each issue, when we go to an 8 page version of the newsletter(*).
  3. Will be entitled to 50 free reprints of the the newsletter each month.
  4. Sponsor's logo and link will appear on our ACCU U.S. chapter’s online version of the newsletter(**).
  5. We currently distribute the newsletter in electronic form using PDF format. These electronic versions of the newsletter are produced in full color in two sizes: American letter size (8.5"x11"); and in A4 size (8.271"x11.698") for distribution in Europe.

Silver - includes the following five (5) benefits:

  1. Listed as a Silver-level sponsor on the front cover of the newsletter
  2. Entitled to a black and white 1/2 page ad in each issue, when we go to an 8 page version of the newsletter(*).
  3. Will be entitled to 100 free reprints of the the newsletter each month.
  4. Sponsor's logo and link will appear on our ACCU U.S. chapter online version of the newsletter(**).
  5. We currently distribute the newsletter in electronic form using PDF format. These electronic versions of the newsletter are produced in full color in two sizes: American letter size (8.5"x11"); and in A4 size (8.271"x11.698") for distribution in Europe.

Gold - includes the following seven (7) benefits:

  1. Listed as a Gold-level sponsor on the front cover of the newsletter
  2. Entitled to a two-color 1/2 page ad in each issue, when we go to an 8 page color version of the newsletter(*).
  3. Will be entitled to 100 free reprints of the the newsletter each month.
  4. Will be listed on the ACCU's U.S. chapter's public banners.
  5. Sponsor's logo and link will appear on our ACCU U.S. chapter online version of the newsletter(**).
  6. We currently distribute the newsletter in electronic form using PDF format. These electronic versions of the newsletter are produced in full color in two sizes: American letter size (8.5"x11"); and in A4 size (8.271"x11.698") for distribution in Europe.
  7. When we go to two-color printing, the second color will be free to Gold sponsors.

(*) We currently produce a 6 page newsletter, but will go to 8 pages when we gain sponsors to amortize the cost of the newsletter.
(**) The online version of the newsletter is planned to start with our June 2000 issue.

Contact:

Reginald B. Charney, Editor
1330 Trinity Dr.
Menlo Park, CA 94025
Tel: 650-233-9082

Book Review

Learning Python
by Mark Lutz & David Ascher
O’Reilly, ISBN 1-56592-464-9

Recommendation ***

I have used this book as an introduction to Python. I found it coherent, well organized, and easy to read. Since it was aimed at beginners, it covers the basics. and avoids a lot of the complexity every mature language has. It starts with an overview and most importantly, how to get a free copy of Python using the net. Each subsequent chapter covers some aspect of the language and usually ends with a section on Gotchas, a Summary, and Exercises. All were worthwhile. As a bit of a language lawyer, I really appreciated the Gotchas section that covered the parts of the language that did not fit into a nice neat package. Overall the book was broken into three parts: the core language; the tools, frameworks and applications built on the first part; and a series of appendices covering resource, platform specific details and solutions to the exercises. It also had a comprehensive index. 

How I used this book is interesting. Due to an unforeseen emergency, I found that I had to give a course on Python in five days. To do this, I needed to create a slides covering most basic aspects of the language. I found that this book’s format made this task easier than I had expected. The really interesting part was that as I stepped through the book, I was able to use the Python interpreter I had downloaded.to check everything I had learned and produce slides simultaneously. In effect, I was using the slides as my notes for learning the language. Sometimes , I noted something that was incorrect, but I could later go back and correct the mistake, test the correction and produce an example, as needed. It made the task of creating a series of lecture notes fun. It is my hope that these notes and slides will eventually be available on our website.

Lastly, I learned some valuable lessons while doing the notes for this course. I needed to learn what it took to learn things quickly and well. At the risk of sounding like a salesman, I hope to share these lessons with you in future issues.

Reviewed by Reg. Charney

Trends

By Reg. Charney

JavaOne 2000 Conference

As I mentioned earlier in the Op Ed. column, I did not get the “buzz” this year at this conference as I did at last year’s conference. Unfortunately, the figures below don’t give a clear reason why. The feeling is more subjective than quantitative, which is funny since this column tries to be as objective as possible. Perhaps, it is that I was looking for advances in Java technology, especially in the area of tools and development aids. 

For B2B aficionados, the show was a great success. Topics and seminars in this area abounded. Also, this conference may have heralded the coming age of Java-enabled devices in one form or other. The biggest surprise was in the marketplace for Java-based businesses. There was almost no presentations on leveraging the Java technology into an IPO. It seems that the naiveté of dot com’s is finally wearing off. 

There seems to be an understanding that there needs to be some value or business reson d’etre beside the applications being written in Java. Maybe, we are maturing?

What’s Hot, What’s Not

Cimmetry Systems, Inc. (www.cimmetry.com) had a Web-based viewing, markup and collaboration tool for dealing with graphics in over 200 formats. This product would be great for engineering firms and environments where sharing drawings was important. Evergreen Internet (www.evergreen.com) sells open source B2B and B2C components based on EJB/CORBA/RMI). Infomatec (www.infomatec.com) offers a compressed Linux OS for with a Java MVM for very compact appliances. Also, as noted in the charts, embedded Java and wireless devices are major areas of interest. While not strictly Java related, Sun’s free StarOffice 5.2 is really effective. It handled DOC files that Microsoft’s Office 2000 hung while opening.