Tech Talk

SciPy '02 Report

by Walter Vannini

The first "Python for Scientific Computing Workshop" (www.scipy.org) was held at Caltech on September 5 and 6. As one of the few attendees from the commercial world, I got quite an education into the concerns of the scientific world, and how Python is helping out.

There were roughly 70 attendees, mostly from government organizations, universities, and research institutions. The largest contingents were from the astronomy and bionformatics community, although many other fields, such as particle physics, were represented.

One of the big selling points of Python, beyond the fact that it's open source, is its use as a glue language. There are many existing C and Fortran based applications that have been developed, and they need to be used in a coordinated way. I got the impression that many organizations had tried the scripting language Tcl originally, but found that it didn't scale. Also, there are huge compiled libraries of C and Fortran source that provide very useful services. Open source tools like SWIG, the "Simplified Wrapper and Interface Generator", and f2py, the "Fortran to Python Interface Generator", can be used to make the C and Fortran code available to Python applications. There is a large Fortran community in the scientific world, and f2py is making Python a useful scripting language for that community. Another big plus is that Python is easy to use and understand, yet powerful. Since most of the scientific users are not primarily programmers, this is an important advantage.

The unifying theme of the two day workshop/conference was "SciPy", an open source Python library that includes modules for signal processing, integration, special functions, and of course graphics and plotting. It's built on top of the Numeric module, and is currently supported by Enthought Inc (www.enthought.com/). Enthought was a major organizer of the conference1, and three representatives of the company (Eric Jones, David Morrill, Travis Vaught) attended the conference. They gave several excellent presentations, covering a wide variety of topics: the Numeric module, parallel computing, community development, and Chaco. Chaco is a toolkit for plotting that Enthought (primarily David Morrill) originally developed for a client, and has now made available to the open source community.

SciPy and Numeric make it easy to do many of the things that the scientific community wants to do. To many attendees, SciPy with Numeric is seen as an open source alternative to Matlab. The ongoing work by the SciPy community is in fact making this a reality. One of the most productive members of that community, Travis Oliphant of Brigham Young University, gave an in depth tutorial introducing SciPy, and later gave another presentation describing what SciPy still needs. As well as more functionality, more documentation is being seen as very important. Travis invited us to help out.

In the scientific world, speed and optimization of numeric computations is often a priority, so that wrapping of compiled code is done not just to reuse a huge amount of existing Fortran code, but also to selectively replace portions of Python code. Pat Miller, from the Lawrence Livermore National Laboratory, described some experimental techniques he's working on to directly optimize Python code. If he's successful, even more of the scientific community has a reason to switch to Python.

There were several talks given by the people at Art Olson's Molecular Graphics Laboratory at Scripps (www.scripps.edu/pub/olson-web). Michel Sanner described some of the molecular visualization tools that are being developed at Scripps using Python. As well as viewing molecular structures via PyOpenGL to gain insight, researchers at Scripps enjoy actually holding models to gain further understanding. Attendees got to handle some of the physical molecular models manufactured by the 3d printer from Z Corporation (www.zcorp.com). Although the models are roughly $20 each, the cost of a low end printer is about thirty thousand dollars. I'm looking forward to prices dropping a couple of orders of magnitude.

As well as the in depth tutorials and presentations, there were lightning talks on a variety of uses of Python and SciPy. Along with the expected bionformatics and astronomical applications, there were applications involving financial analysis, weather research, brain-machine interfacing, and quantum chemistry. There was a report on a metal casting application, and an example of its use involving two uranium hemispheres. A satellite image processing application was described. One of its requirements, based on National Security considerations, was that it had to process images with file sizes of 2 GB in minutes. This is ongoing work, and changes to the Python Imaging Library (PIL) will probably need to be made to properly handle files of that size.

There were many opportunities to chat with people during breaks. When I was asked what I did with Python I replied that I was a Python enthusiast, and that I used Python as an administrative tool and to help automate parts of C++ programming (and as a handy command line calculator 2). But, as a contract programmer, all the paying projects I've found are C++ based, not Python based. Another developer I spoke with told me that he was in a similar position, except that he was a Ruby enthusiast who could only find paying Python projects to work on.

It looks like there will be a SciPy '03, but the dates haven't been set yet.

SciPy Status

The scientific Python (SciPy) package already has:

  • graphics and plotting
  • integration
  • special functions
  • signal processing
  • image processing
  • genetic algorithms
  • ordinary differential equation (ODE) solvers
  • unconstrained optimization
  • parallel programming tools
  • Fast Fourier transform
  • interpolation
  • statistical functions
  • linear algebra and blas routines based on LAPACK
  • simulated annealing
  • input/output modules

The SciPy community still wants the following features:

  • unit test cases for existing modules
  • documentation
  • constrained optimization
  • nonlinear conjugate gradient
  • Krylov subspace iterative solvers
  • PDE solvers
  • Computational geometry utilities
  • wavelets
  • more input/output modules

A "hot list" will soon be available at www.scipy.org.


1 The conference was also hosted by The National Biomedical Computation Resource and The Center for Advanced Computing Research. Michel Sanner and Michael Aivazis of these institutes played key roles in making the conference a reality.

2 E.g., using Python 2.2 (and later):

python -c "print (10**36)/998999"

generates the Fibonacci sequence.

Editorial

By Reg. Charney

Sun ONE Conference

In this last editorial, I reported on my impressions of LinuxWorld West. Typical of this economic climate, the show was smaller than in previous years, but the quality of attendee was “higher” in terms of being decision makers for the companies that they represent. Also, management seems to be more willing than ever to investigate and in invest in Linux than before. At the same time, I also got a positive feeling about IBM’s commitment to Linux.

I wanted to bring this up because the contrast to what I saw at the Sun ONE conference was pretty stark. Historically, Sun has been a Unix shop. Their Solaris operating system is a very well respected operating system—in many cases setting the standard for Unix variants. They have also dominated the Unix server world. Included in Sun’s stable of software is StarOffice, the unofficial Unix answer to Microsoft’s Office suite. At this conference, Sun introduced its first Linux based computer, the LX50. It is a low end x86 based 1U server that is designed to run a variant of Linux. The Linux version is based on Redhat 7.2 with significant changes.

Ok, Sun has joined the Intel and Linux camps–but how committed are they to Intel or Linux? My impression is that Sun has no real commitment to either. At the moment, it is a marketing necessity. I have based my conclusion on several things. While the LX50 is an impressive machine, it is only one machine, not a line of machine. There was no indication that the LX50 was the first of a series. Based on the success of the LX50, Sun may come out with other models, but there is no momentum present. The second reason that I question Sun’s commitment to the Open Source movement is the license that comes with StarOffice 6.0. Many companies bundle Open Source programs and legitimately charge for the bundle. However, the software is still covered by the GPL. The license for Sun’s StarOffice is very restrictive.

Humor

Managerium

New Heavy Element Discovered: Managerium

A major research institution has recently announced the discovery of the heaviest element yet known to science. This new element has been tentatively named “Managerium.” Managerium has 1 neutron, 12 assistant neutrons, 75 deputy neutrons, and 111 assistant deputy neutrons, giving it an atomic mass of 312.

These 312 particles are held together by forces called morons, which are surrounded by vast quantities of lepton-like particles called peons. Since Managerium has no electrons, it is inert. However, it can be detected as it impedes every reaction with which it comes into contact.

A minute amount of Managerium causes one reaction to take over 4 days to complete when it would normally take less than a second.

Managerium has a normal half-life of 3 years; it does not decay but instead undergoes a reorganization in which a portion of the assistant neutrons and deputy neutrons exchange places. In fact, Managerium’s mass will actually increase over time, since each reorganization causes some morons to become neutrons, forming isodopes.

This characteristic of moron-promotion leads some scientists to speculate that Managerium is formed whenever morons reach a certain quantity in concentration. This hypothetical quantity is referred to as “Critical Morass.”

You will know it when you see it.

Trends

By Reg. Charney

The Slowdown Continues

Without a clear picture of where the economy is going, we continue to have contraction. (See Figure #1.) The number of job openings continue to shrink and the demand for Internet specific skills seems to be the worst hit. At this point I would also like to note that I have been collecting these figures from various Web sites with www.dice.com being one of the principal sites. In depending on the Web, we have a level of distortion in these figures.

I believe that the Net is the media of last resort for posting job openings. Thus, the numbers we see here are a reflection of hiring practices for jobs that can’t be filled easily, quickly or locally. At the moment, it is a buyer’s market and employers are being deluged with applicants. This makes it easier to hire using the preferred techniques, like friends of current employees, and those that are recommended to employers by sources that employers respect. Also, with the ready supply of talent that is now available, employers can afford to be picky about where they advertise for jobs. This is used to control the volume of applicants they consider. For example, posting a job on the internal job site or on the Web can make a differences of one or two orders of magnitude in the number of responses that the HR department needs to deal with. It is also my personal opinion that most HR departments have never been comfortable with the Net as a means of recruiting.

Figure #2: Job Openings for Device Driver Developers in SV

Total

UNIX

Linux

Windows

Win2K

WinNT

WinXP

1,201

432

173

323

12

141

2

1,133

401

169

358

10

154

1

978

336

141

294

14

131

9

922

309

124

308

15

117

16

934

348

124

292

10

106

17

861

312

131

273

13

97

14

760

286

117

252

12

93

11

843

298

117

268

11

86

10

748

274

113

242

11

79

8

555

207

71

185

16

62

15

76

72

18

43

2

11

6

Figure #2 represents actual numbers of jobs openings for device driver developers in Silicon Valley since we started tracking the appearance of Windows XP. Relative to the total number of job openings in the Valley, only device driver type devices for Windows XP and NT have increased. That is, 6/76 is an increase from last month of 15/555. Unfortunately, this month only represents 6 actual jobs. Thus, with so few openings, it is hard to say the result shown in Figure #3 is a trend.