By Reg Charney
I recently needed to append a series of files to a 50MB output file. While it is possible to combine these files using some editors. or using the cat command, or using redirection, but most of these tools have a significant drawback. They require 50MB+ or more of memory or temporary disk space. This was a problem for me because I did not have the extra 50MB to spare.
I modeled this program on the Unix mv and cp commands. A list of input files are followed by a target file. The input files are appended to the target file. If the target file does not exist, it is created. If an input file does not exit or can not be read, it is ignored. I also decided to write this program in C++ using object-oriented techniques.
There are a few “ugly” things about most implementations (you may have others ;-). One of them is the need to expose the buffer and its size as shown in the main loop pseudo-code below:
| open the output file for each input file { open input file iSZ = size of input file allocate char[iSZ] buffer read file into buffer write output from buffer close input file free buffer } |
This doesn’t appeal since it exposes implementation details at too high a level. A second problem is less obvious. What happens if one of the input files is also the output file? Do we get an infinitely large file? The third problem concerns the size of the files. The fragment shown shows a simple implementation that assumes it is possible to read any sized input file into memory. Of course, this is naïve. Ideally, this should not be a concern at the highest level of the program.
| #include <stdio.h> #include <fstream.h> #include <memory> const int MAXBUF = 1024; class FBuf { class OFile : public FBuf { class IFile : public FBuf { int main(int argc, char* argv[]) OFile oF(argv[argc-1]); // loop over each
argument, except the last argument return 0; |
We can use the fact that this application must read and write the input files in order. That is, only one input file is opened at a given time. Thus, it is valid to use a static buffer pointer and size for all files. Also, since both the current input file and the output file can use the same buffer, we have defined OFile and IFile as descending from FBuf. FBuf contains two protected static data members. Thus, every instance of IFile and the instance of OFile share the same buffer pointer and the size of the current input file. This eliminates the need to expose the implementation in the main function.
The second potential problem occurs if one of the input files is also the output file. While the current implementation may read in the whole file before trying to write the output, we could have a problem is the input file is read in pieces because of size constraints. The normal way of preventing the use of input and output files overlapping is to compare the full UNC of each input file against the UNC of the output file. However, there is simpler solution in this case. The output file is opened for exclusive use. Thus, any attempt to open the output file for input will fail. In this case, this is fine.
Lastly, we may encounter files that are too large to fit in memory. Since the details with respect to file size are hidden in FBuf, IFile, and OFile, the main program can basically ignore these details. The concession to the possibility of large files is the inner loop that will read and write as long as there is input in the input file. The constant MAXBUF can be set to any value. If the size of the file is less than MAXBUF, the whole file will be read in just once.
By Reg. Charney
To a hammer everything looks like a nail. As technologists, we believe any problem can be solved using technology. Unhappily, this is not true. We need to be aware of our environment. Thus, Dan Gillmor's presentation on Intellectual Property rights was so important. As he outlined, not only will you need to solve problems, but you may now need to solve them in unique and/or non-optimum ways if IP issues keep locking up innovative or trivial ways of doing things. Not only will your manager control your code production, so will the company lawyers. They don’t know anything about programming and couldn't care less. Also, think of all the paperwork you will need to do to protect your code as part of the company's IP portfolio. But you don’t need to worry — all this is not technical! You don’t need to get involved, join the Electronic Freedom Foundation (www.eff.org) or join groups like DVD-DISCUSS (http://eon.law.harvard.edu/archive/dvd-discuss/) to see what you can do to protect yourself and those you love.
Unless you have been living in a cave for the last couple of years, you know that your rights to “fair use” of CDs and DVDs that you have purchased for your personal use in your own home have been restricted by a court that believes that antiquated revenue models from media corporations take precedence over your “fair use” rights. To the rescue comes art! See the sites listed in http://web.lemuria.org/DeArt/Sep/ for art examples. Art also comes to the rescue to allow you to play DVDs on any player, including your computer. See how at http://www.theregister.co.uk/content/54/25274.html.
We are always looking for articles. If you are interested in writing about your scripts or applications in other languages, we would be happy to consider publish them,
Standard C++ IOStreams and Locales by Angelika Langer and Klaus Kreft, Addison-Wesley, ISBN 2000, 0-201-18395-1
This book is aimed at advanced C++ programmers. Half of the book is an extremely detailed examination and explanation of IOStreams, internationalization and locales. The other half is a reference guide.
The authors state their goal as to “focus on the underlying concepts and the more advanced programming techniques that IOStreams and locales support.” They succeed at this, covering these topics thoroughly, providing both a good overview and a wealth of low level details. I was glad to see eight pages devoted to “The Stream State”, an important topic for IOStreams users, but one that is often omitted. They provide many useful tips not seen elsewhere, and point out traps and pitfalls.
I was only able to find one minor error which was not already listed in the errata on the author's web site. For such a complex and detailed book, this is quite impressive.
The index was shorter than I expected in a book of this scope, but in practice I was able to find what I needed in the index, so it seems sufficient.
I don't know of any other book or reference material that covers IOStreams in nearly this complete and thorough a manner.
However, locales are also covered in Appendix D of Stroustrup’s “The C++ Programming Language” (also available from Stroustrup’s web site), which is a 64 page tutorial description of locales. Advanced usage of locales is sufficiently complex that anyone attempting it should refer to both this book and to Stroustrup's appendix.
The writing is precise, yet readable. This book is essential for all interested in advanced usage of IOStreams or of locales.
— Wayne Vucenic
A colleague of mine once programmed on-board satellite controllers. On the project was a mechanical engineer whose job is was to track the weight of the payload. Every week he would ask “how much does the software weigh.” And every week my friend would say “Nothing, go away.” Then one week the payload petty officer arrived with a large box of Hollerith cards, indignantly proclaiming “These cards weigh over eight pounds!” My friend said “You don't understand. The software isn't the cards, it's the holes.” The still-mystified mechanical engineer left, never to ask about the weight of software again.
— from Greg Colvin’s email signature
May I have the serenity to accept the things I cannot change, the strength to change the things I cannot accept, and the cunning to hide the bodies of those who got in my way.
— D. C. Sessions’ email signature
[Ed. If you come across signatures that you think are humorous, send me (editor@accu-usa.org) a line. Other readers may enjoy them too.]
By Ali Çehreli
The total number of available jobs nationwide has increased, but the jobs in the valley, as well as the software jobs both nationwide and in the valley have dropped (Figure 1). Even though the number of jobs concerning almost all programming language has dropped in the valley, C++ came out to be the most in demand because of having a lesser drop (Figure 2).


Among our data points, SmallTalk and Lisp/Clos have both increased their figures from no jobs to only one job this month. Delphi has been another language with good performance: up to 8 jobs from the last month's 4. Obviously these figures are too small to be considered phenomenal.