heaven:~> tar xzvf foo-1.2.3.tar.gz
heaven:~> ls foo-1.2.3/
bin configure doc include Makefile src
build COPYING etc INSTALL MANIFEST test
config COPYRIGHTS HISTORY lib README UPDATE
heaven:~>
If you haven’t used *nix much, this is a typical list of files and directories that a program comes with. Most programs have a README file. Other common files include CHANGELOG, NEWS, and AUTHORS. Also, some programs have different names for the same type of file, such as LICENSE, COPYING, and COPYRIGHT.
I was curious to see how common each file is, so I looked at many of the programs that ship with RedHat 9 and calculated some basic statistics. Out of 412 programs total, here’s the frequency of each file, grouped by type:
Filename | Percent of projects with this file | Percent of projects with this type of file |
---|---|---|
README | 73% | 75% |
MANUAL | 1% | |
USAGE | 0% | |
COPYING | 49% | 59% |
LICENSE | 5% | |
LICENCE | 1% | |
License | 0% | |
COPYRIGHT | 3% | |
Copyright | 2% | |
ChangeLog | 41% | 56% |
CHANGES | 9% | |
Changelog | 1% | |
CHANGELOG | 1% | |
Changes | 0% | |
changelog | 0% | |
NOTES | 1% | |
RELNOTES | 1% | |
VERSION | 1% | |
RELEASE | 0% | |
NEWS | 39% | 42% |
ANNOUNCE | 2% | |
WHATSNEW | 0% | |
WhatsNew | 0% | |
announce | 0% | |
AUTHORS | 33% | 42% |
THANKS | 5% | |
CREDITS | 3% | |
MAINTAINERS | 0% | |
TODO | 24% | 24% |
ToDo | 0%% | |
INSTALL | 12% | |
Install | 0% | |
BUGS | 5% | 7% |
PROBLEMS | 1% | |
Problems | 0% | |
TROUBLESHOOTING | 0% | |
FAQ | 4% | 4% |
HACKING | 2% | 2% |
HISTORY | 1% | 1% |
PROJECTS | 1% | 1% |
It’s not surprising to see that README is by far the most common file. However, I was surprised at the number of different names for the same types of files, especially for license and changelog types of files. However, it’s reassuring that the most common names, COPYING and ChangeLog respectively, are used 90% and 80% of the time. For the license files specifically, COPYING is the GNU standard. (Personally, I prefer the more straightforward LICENSE.)
Judging from this lineup, a de facto standard set of files would include README, COPYING, ChangeLog, NEWS, and for larger projects, AUTHORS.
Also, note that the total percentages for each type of file don’t all add up. This is due to rounding.
This was inspired by Eric Raymond‘s new book The Art of Unix Programming, which discusses best practices for releasing Open Source software. The good distribution-making practice section of his release practices HOWTO is also very relevant.
2 thoughts on “distribution file statistics”
Likes