|
By Allan Kelly
Last month I introduced character encodings, Unicode, UTF-x and
UCS-y (UCS Transformation Format-16, where UCS stands for Universal
multiple-octet coded Character Set). This month I want to pick up
from where I’ve left off talking about character and platforms.
Although UTF-8 encoding may be more compact it can also be more
difficult to work with because the representation of a given
character can vary in the number of bytes needed. When data is
compressed the size difference will probably disappear, as all text
data is heavily redundant. Indeed, a typical wide string in English
on a Solaris-Sparc machine (UTF-32) will have three bytes of zero
for every one non-zero byte.
However, wide character schemes suffer from endian issues. On a
little endian Intel machine running NT a wide letter ‘A’
will be encoded as 0x4100, while on
a big endian Sparc Solaris machine the same letter ‘A’
is encoded as 0x00000041.
Technically the encoding used by NT is UTF-16LE while the Solaris
encoding is UTF-32BE. For an NT application this may not be a
problem as NT is Intel only these days, and while you may decide to
ignore Solaris on Intel, remember that some Linux boxes will be big-endian
and some little-endian.
This particularly becomes a problem when you want to exchange
data between machines – you must ensure that the same encoding and
endianness is used between all machines.
There is another big difference in how different machines handle
different characters and this comes from the API a machine uses. NT
programmers can use the Win32 API in either narrow or wide mode, and
the filing system supports Unicode filenames.
However, on Solaris and other Unices, the API is more limited and
only supports narrow characters. Actually, even in the NT world you
may be limited to narrow characters because there are really two
APIs: the Win32 API which has wide support, and the C API which,
like Unix, demands narrow characters.
In fact, Microsoft extends the C API with a host of _underscored
and prefixed functions - there are no fewer than four versions of strlen
(strlen, wcslen,
_mbslen and _mbstrlen).
Needless to say, if you are looking for portable code you have to
make some decisions as to what you are going to support.
Luckily, the C++ standard does bring some sense to this situation
but at the cost of increasing the amount you have to learn. The
first thing you notice is that most string and character handling
code actually takes the character type as a template parameter.
The next thing to notice is the locale mechanism whereby the
program knows what language to speak, how to format dates,
currencies and so on.
Locale has come to mean different things in different
environments: C provides a locale API tied to the current process,
so a Solaris process can choose to change its locale but the whole
process has just one locale. NT extends this to individual threads
which can change their locale, so within one process you can have
multiple locales.
In C++ locale is not an OS feature but a language feature. You
can create locale objects and use them to manipulate characters, in
effect, a single thread can have multiple locales!
I said at the start last month’s piece that this was intended
to be an introduction to the topic. I don’t claim to have all the
answer, nor indeed the answers for your project. However there are a
few snippets of advice I can offer:
- Ensure you team have a clear idea of what you are doing with
characters
- Work with your language features as much as you can: modern
C++ is good here, but it is not perfect
- Keep you data in a consistent encoding
- Define an API string type which matches the character encoding
your OS wants, then provide functions to convert strings to and
from, in other words: hide the data type behind an abstraction
- Name functions in terms of their result not their internals:
e.g. MakeWide rather than NarrowToWide
- Use overloaded functions to provide a multiplicity of
functions, include redundant features so developers don’t have
to stop and thing about the data type, these can be optimised
later: e.g. MakeWide(const
std::string&), MakeWide
(const char*), MakeWide
(const wchar_t*)
References:
By Reg. Charney
I almost started out by saying what we were not going to do in
this new year. That is, we were not going to whine, not going to
lament the job situation, not belabor Microsoft’s continued
crushing of opportunity and innovation, or bemoan other perceived
ills. We have done with that— 2001 was a bad year by most
measurements. We’re putting all that behind us.
I also realized that the ACCU and the local Silicon Valley
chapter had a lot planned for 2002. In fact, we have more planned
for this year than we did for 2001.
First, the ACCU is going to become more organized. We plan to
have a small conference here in the Valley, place and time yet to be
determined. Also, we should be in a position to offer a series of
quick one-day weekend courses on various subjects in which many of
our members are expert.
Second, we also plan to have some great speakers. As mentioned on
the front page, Bjarne Stroustrup, the inventor of C++, will be
speaking to us on February 12th. We also expect other significant
speakers later in the year.
I am also pleased that more of our members are becoming involved
in running the chapter and in contributing to this newsletter. In
terms of the newsletter, we are also going to seek advertisers more
actively. I believe that we are the only newsletter of this kind in
the Valley. In point of fact, I also believe that we are also one of
the oldest newsletters in the Valley. We are now entering our third
year of publication.
Databases and Tools
I have been looking at open source databases and tools, like
report writers, forms designers, and SQL generation. While I will
report more fully in a later issue, a GUI front end for PostgreSQL
called PgAccess, (http://ns.flex.ro/pgaccess)
has really impressed me. I mention it now because a few of people
have asked me about such a tool.
Software Craftsmanship by
Pete McBreen, Addison-Wesley, ISBN 0-201-73386-2
I like concise books. I read this one over a weekend and a few
days of commuting on the 101 express bus to Palo Alto. McBreen
comments on the ofttimes ineffective software development process.
He explores the term “software engineering” and its common
practice: the waterfall model/cycle, the resulting team/corporate
structure and its shortcomings—leading to expensive software,
sometimes buggy and late. The author gives ample references, both to
classic works and online material.
According to McBreen, “software engineering” has tried to
apply the lessons from the industrialization of physical production
to software development. Labor is divided between groups of people,
and after the analysts and designers have figured out how to
structure the solution, hordes of (often) average coders implement
the resulting specifications. Often this solution ends up being
legacy software because the maintainers don’t have the big picture
and resist change for fear of breaking something. The problem is
that the engineering process was developed almost 30 years ago to
solve large-scale multi-year projects and the world has changed
since then. A lot of development is now done using small teams and
short product cycles. McBreen proposes that we start viewing the
development process differently. In his view, software and the
people who develop it are capital. Developing software is as much a
social process and a learning process as it is a technical process.
It is a craft (science and art combined). To get zero-defect,
useful, timely applications, we should use small teams of software
craftsmen, journeymen and apprentices. This analogy comes from the
traditional world of craftsmanship (blacksmiths in particular),
where craftsmen want to do quality work, stand behind it, and be
recognized for it. They stake their reputation on their work and as
such focus on quality and timeliness. Software craftsmen have
learned the intricacies of software development, including analysis
and design, and take on journeymen, who participate and learn from
the master. Journeymen in turn take on apprentices to train as their
successors. Once a small team of such masters, journeymen and
apprentices is built and has delivered an application, it stays
together to keep the application alive and to make sure it evolves
and continues to be valuable. The team spreads its knowledge about
the whole system to every member, minimizing reliance on a single
team member. Such a team will consistently produce great software
applications because each member strives to improve skill and
reputation.
McBreen uses eXtreme Programming and Open Source projects, to
support his view. His mission is to return the focus to the people
who develop software and to put formal processes where they belong.
He includes tips on how to pursue software craftsmanship in a
company, but not nearly enough. Another book on the subject would be
most welcome, Pete! The book is a true pleasure to read. Developers
will be left longing to work on a team of craftsmen. Managers will
gain insight into how to build great teams of developers. My hope is
that this book will start a new wave of approaching software
development so that we can put the fun back where it belongs—into
our everyday jobs.
—Oluf Nissen
The Unified
Modeling Language User Guide by Booch et al, Addison
Wesley, ISBN 0-201-57168-4.
I give this book on UML a pass. It is not as crisp as Fowler and
Scott's UML Distilled, nor as witty as Booch's earlier
Object-Oriented Analysis and Design. I found it difficult to look up
concepts and to follow the numerous cross references. It took a lot
of time to read, even to look up short subjects. It would best serve
an intermediate reader, and it covers a great deal of territory.
I started reading the book with two objectives. 1) Find out what
the dashed lines mean in an object diagram, and 2), find out how to
present software architecture in a top-down, general-to-particular
manner. The first objective was attained when I found that a dashed
line is a “dependency” (p160), that a dependency means many
things, including “creation”, “trace”, “refinement” and
“bind” (p.61), and that it can be understood as the old Booch
“using” relationship (pp. 53, 137). One object reaches to
another object with a dashed line and “uses” it.
The second objective was harder, since I was looking for a
decomposition into systems and subsystems, and this does not appear
in the book until a great deal of work is done. Chapter 12
(Packages), and 31 (Systems and Models) give much of the answer. I
didn’t find it very intuitive to put subsystem decomposition so
late in the modeling process, so I changed things around and decided
to model “systems” with object diagrams, using “systems”
rather than classes. I could then model system interactions with
sequence diagrams of “systems”. This worked much better.
At this point I became aware of a third objective: finding out
how to move from UML diagram to code. This is no small matter.
Perhaps it is because code is essentially procedural, as the central
processor runs one instruction at a time. I have not come to a
satisfactory solution for this problem. However, it is possible to
translate sequence diagrams easily into pseudo-code. One can write
useful pseudo-code at the system level and the detail level. A
little coordination between diagram and pseudo-code, and one can
have a satisfactory, tenth view (p. 24) into the software design.
I have been critical so far, but I stand in awe of Booch,
Rumbaugh, and Jacobson. I will point out several of the many fine
sections in the book: the discussion of components in chapter 25;
the exception hierarchy on p. 285; the explanation that use-case
scenarios drive the UML model ( p. 33), and the supporting Figure
2-20 on p. 31; the exposition of activity diagrams in Chapter 19,
which update the old flowchart methodology; etc. etc. There is much
to use.
—Daniel Bonbright
By Ali Çehreli
Last month I promised to include Windows XP jobs in this month's
charts. But the numbers are still too small: Only 6 jobs in October,
and 3 jobs in both November and December have been posted. It looks
like we'll have to wait some more time for XP numbers to be
distinguishable among other Windows platforms.
Nothing has changed since last month. Once again: Linux and
Windows 2000 among the platforms, and ASIC among the technologies
have been the most trendy. All three are becoming less trendy
though.
The only promising aspect of this month's data is the drop in the
drop (Figures 1 and 2). Both figures indicate that we are at least
at a local minimum.


As a proud Silicon Valleyite, I always like to talk about my
first hand experiences and some local hearsay. Last month, I wrote
about the company I worked for laying off employees, most of them
H1-B holders. The good news is that all but two of them found jobs
in a few weeks. Similarly, all of the laid-off employees of an
optical networking startup found jobs in a very short time. Some
people say that we are in a circulation period.
|