|
By Reg Charney
Requirements
I recently needed to append a series of files to a 50MB output
file. While it is possible to combine these files using some
editors. or using the cat command, or using redirection, but most of
these tools have a significant drawback. They require 50MB+ or more
of memory or temporary disk space. This was a problem for me because
I did not have the extra 50MB to spare.
Feature Set
I modeled this program on the Unix mv
and cp commands. A
list of input files are followed by a target file. The input files
are appended to the target file. If the target file does not exist,
it is created. If an input file does not exit or can not be read, it
is ignored. I also decided to write this program in C++ using
object-oriented techniques.
Problems to Overcome
There are a few “ugly” things about most implementations (you
may have others ;-). One of them is the need to expose the buffer
and its size as shown in the main loop pseudo-code below:
open the output file
for each input file
{
open input file
iSZ = size of input file
allocate char[iSZ] buffer
read file into buffer
write output from buffer
close input file
free buffer
} |
This doesn’t appeal since it exposes implementation details at
too high a level. A second problem is less obvious. What happens if
one of the input files is also the output file? Do we get an
infinitely large file? The third problem concerns the size of the
files. The fragment shown shows a simple implementation that assumes
it is possible to read any sized input file into memory. Of course,
this is naïve. Ideally, this should not be a concern at the highest
level of the program.
#include <stdio.h>
#include <fstream.h>
#include <memory>
const int MAXBUF = 1024;
class FBuf {
protected:
static char *bp;
static int bSZ;
};
char* FBuf::bp = 0;
int FBuf::bSZ = 0;
class OFile : public FBuf {
ofstream oF;
char* oFN;
public:
OFile(char* oFN_) : oFN(oFN_)
{
oF.open(oFN,
ios::app|ios::binary,
filebuf::sh_none);
if (!oF)
throw oFN;
}
~OFile()
{
oF.close();
}
void write()
{
if (oF.is_open())
{
oF.write(bp, bSZ);
delete [] bp;
bp = 0;
if (!oF)
throw
oFN;
}
}
};
class IFile : public FBuf {
char* iFN;
ifstream iF;
int fSZ;
int iP;
public:
IFile(char* iFName)
: iFN(iFName),
fSZ(0), iP(0)
{
iF.open(iFN,
ios::in|ios::binary);
if (!iF)
throw iFN;
iF.seekg((streampos)0,
ios::end);
fSZ = iF.tellg();
iF.seekg((streampos)0,
ios::beg);
if (!iF || fSZ == 0)
throw iFN;
bSZ = (fSZ < MAXBUF) ?
fSZ : MAXBUF;
}
~IFile()
{
delete [] bp;
bp = 0;
iP = 0;
iF.close();
}
operator bool()
{
return (iP < fSZ);
}
char* read()
{
try
{
int oldP = iP;
int newP = iP +
bSZ;
iP = (newP <
fSZ) ? newP : fSZ;
bSZ = iP - oldP;
if (bSZ)
{
bp =
new char[bSZ];
if (!bp)
throw iFN;
iF.read(bp, bSZ);
}
}
catch (char *s)
{
printf("Error processing input file "
"%s\n", s);
throw;
}
catch (...)
{
; // ignore all
else
}
return bp;
}
int sz()
{
return iF.gcount();
}
};
int main(int argc, char* argv[])
{
//* check cmd line args
if (argc < 3)
{
printf("\n\nUsage: append in-file1
"
"[
, in-file2 ]..."
"output-file\n\n");
return 1;
}
OFile oF(argv[argc-1]);
// loop over each
argument, except the last arg
for (int i=1;i<argc-1;i++)
{
try
{
IFile iF(argv[i]);
while (iF)
{
iF.read();
oF.write();
}
}
catch (char *s)
{
printf("Error in file
'%s'\n", s);
}
catch (...) // catch all
{
continue;
}
}
return 0;
} |
Problem Solutions
We can use the fact that this application must read and write the
input files in order. That is, only one input file is opened at a
given time. Thus, it is valid to use a static buffer pointer and
size for all files. Also, since both the current input file and the
output file can use the same buffer, we have defined OFile
and IFile as
descending from FBuf.
FBuf contains two
protected static data members. Thus, every instance of IFile
and the instance of OFile
share the same buffer pointer and the size of the current input
file. This eliminates the need to expose the implementation in the
main function.
The second potential problem occurs if one of the input files is
also the output file. While the current implementation may read in
the whole file before trying to write the output, we could have a
problem is the input file is read in pieces because of size
constraints. The normal way of preventing the use of input and
output files overlapping is to compare the full UNC of each input
file against the UNC of the output file. However, there is simpler
solution in this case. The output file is opened for exclusive use.
Thus, any attempt to open the output file for input will fail. In
this case, this is fine.
Lastly, we may encounter files that are too large to fit in
memory. Since the details with respect to file size are hidden in FBuf,
IFile, and OFile,
the main program can basically ignore these details. The concession
to the possibility of large files is the inner loop that will read
and write as long as there is input in the input file. The constant MAXBUF
can be set to any value. If the size of the file is less than MAXBUF,
the whole file will be read in just once.
By Reg. Charney
To a hammer everything looks like a nail. As technologists, we
believe any problem can be solved using technology. Unhappily, this
is not true. We need to be aware of our environment. Thus, Dan
Gillmor's presentation on Intellectual Property rights was so
important. As he outlined, not only will you need to solve problems,
but you may now need to solve them in unique and/or non-optimum ways
if IP issues keep locking up innovative or trivial ways of doing
things. Not only will your manager control your code production, so
will the company lawyers. They don’t know anything about
programming and couldn't care less. Also, think of all the paperwork
you will need to do to protect your code as part of the company's IP
portfolio. But you don’t need to worry — all this is not
technical! You don’t need to get involved, join the Electronic
Freedom Foundation (www.eff.org)
or join groups like DVD-DISCUSS (http://eon.law.harvard.edu/archive/dvd-discuss/)
to see what you can do to protect yourself and those you love.
Unless you have been living in a cave for the last couple of
years, you know that your rights to “fair use” of CDs and DVDs
that you have purchased for your personal use in your own home have
been restricted by a court that believes that antiquated revenue
models from media corporations take precedence over your “fair use”
rights. To the rescue comes art! See the sites listed in http://web.lemuria.org/DeArt/Sep/
for art examples. Art also comes to the rescue to allow you to play
DVDs on any player, including your computer. See how at http://www.theregister.co.uk/content/54/25274.html.
We are always looking for articles. If you are interested in
writing about your scripts or applications in other languages, we
would be happy to consider publish them,
Standard C++
IOStreams and Locales
by Angelika Langer and Klaus Kreft, Addison-Wesley, ISBN 2000,
0-201-18395-1
This book is aimed at advanced C++ programmers. Half of the book
is an extremely detailed examination and explanation of IOStreams,
internationalization and locales. The other half is a reference
guide.
The authors state their goal as to “focus on the underlying
concepts and the more advanced programming techniques that IOStreams
and locales support.” They succeed at this, covering these topics
thoroughly, providing both a good overview and a wealth of low level
details. I was glad to see eight pages devoted to “The Stream
State”, an important topic for IOStreams users, but one that is
often omitted. They provide many useful tips not seen elsewhere, and
point out traps and pitfalls.
I was only able to find one minor error which was not already
listed in the errata on the author's web site. For such a complex
and detailed book, this is quite impressive.
The index was shorter than I expected in a book of this scope,
but in practice I was able to find what I needed in the index, so it
seems sufficient.
I don't know of any other book or reference material that covers
IOStreams in nearly this complete and thorough a manner.
However, locales are also covered in Appendix D of Stroustrup’s
“The C++ Programming Language” (also available from Stroustrup’s
web site), which is a 64 page tutorial description of locales.
Advanced usage of locales is sufficiently complex that anyone
attempting it should refer to both this book and to Stroustrup's
appendix.
The writing is precise, yet readable. This book is essential for
all interested in advanced usage of IOStreams or of locales.
— Wayne Vucenic
A colleague of mine once programmed on-board satellite
controllers. On the project was a mechanical engineer whose job is
was to track the weight of the payload. Every week he would ask “how
much does the software weigh.” And every week my friend would say
“Nothing, go away.” Then one week the payload petty officer
arrived with a large box of Hollerith cards, indignantly proclaiming
“These cards weigh over eight pounds!” My friend said “You
don't understand. The software isn't the cards, it's the holes.”
The still-mystified mechanical engineer left, never to ask about the
weight of software again.
— from Greg Colvin’s email signature
May I have the serenity to accept the things I cannot change, the
strength to change the things I cannot accept, and the cunning to
hide the bodies of those who got in my way.
— D. C. Sessions’ email signature
[Ed. If you come across signatures that you think are humorous,
send me (editor@accu-usa.org)
a line. Other readers may enjoy them too.]
By Ali Çehreli
The total number of available jobs nationwide has increased, but
the jobs in the valley, as well as the software jobs both nationwide
and in the valley have dropped (Figure 1). Even though the number of
jobs concerning almost all programming language has dropped in the
valley, C++ came out to be the most in demand because of having a
lesser drop (Figure 2).


Among our data points, SmallTalk and Lisp/Clos have both
increased their figures from no jobs to only one job this month.
Delphi has been another language with good performance: up to 8 jobs
from the last month's 4. Obviously these figures are too small to be
considered phenomenal.
|