Things-softs-and-all: December 2009

Tuesday, December 29, 2009

Python

PART I PYTHON MIND, BEGINNER'S MIND



		Hour 1 What is Python? 2 The Python Interpreter 3 Basic Arithmetic with Python 4 Variables and Control Flow 5 Basic Data Types I: The Numeric Data Types 6 Basic Data Types II: Sequence and Dictionary 7 Functions and Modules 8 Useful Miscellany



		Hour 1 What Is Python?



		Beautiful is better than ugly. —Tim Peters



		The first hour of this book introduces Python: what it is, its history, and what it is and isn't good for. You won't even need a computer for this part unless you do the exercises that call for reading a paper on a Web site.



		Why Program? Why Program in Python?



		If all you want to do with a computer is balance your checkbook, you do not need to know how to program. Better tools are available—such as pencil, paper, and calculator. And if all you use your computer for is word processing and page layout, again, you don't need to learn programming; lots of programs are available that do what you want extremely well.



		But if no software is available that does what you need, or if what exists is unsatisfactory, the only answer is to roll your own. This simple principle has probably led to more programming breakthroughs, and better software, than any other. Linux is the perfect example; Linus Torvalds, unhappy with existing implementations of UNIX for PCs, decided to write his own version. Today, Linux is popular enough to worry Bill Gates and Microsoft.



		UNIX was developed at AT&T's laboratory in Murray Hill, New Jersey, in the early '70s. A powerful multiuser operating system, it was the brainchild of Dennis Ritchie, Brian Kernighan, and Ken Thompson, who had spare time and a spare computer that no one else wanted to use. Even this unwanted computer was much too expensive for home hobbyists until the late '80s. Those of us who used UNIX in our everyday jobs looked at the feeble operating systems available for PCs and just laughed. We were spoiled by our ''big iron." In 1987, however, Andrew Tannenbaum developed a very small UNIX-like operating system that would run on home PCs; he called it Minix. Linus Torvalds later developed a more portable and more useful version of it called Linux, which has become at least as capable as commercial versions and runs even on very inexpensive home PCs. The major difference between Minix and Linux, in the early days, was that the licensing for Minix was more restrictive than that for Linux. The difference today is that thousands of Linux hobbyists are out there, and nearly everything that you could want to do has at least been started by someone else.



		Scientists, especially, often have needs for software that doesn't exist and frequently write their own to further their own research agendas. Although I'm not a scientist, I do have a research agenda; I find the Mayan calendar fascinating, and I spent years writing C programs to help me pursue this interest. When I found Python, I rapidly abandoned the code I'd already written and reimplemented everything using Python. The programs and libraries I ended up with are cleaner, simpler, smaller, and much more powerful, and I was able to build everything I needed in far less time than the original code took.



		Many other people find themselves in the position of having to learn at least some programming in order to automate repetitive, boring tasks. An example would be some small programs that collect a team's weekly reports from a special directory or folder, checks that everyone has updated their report, performs some simple processing to combine the individual reports into one, and prints the result or emails it to the team leader. I had to do something like this early in my programming career, and I succeeded, but with a great deal of hackery, using several different scripting languages. If Python had been available at the time, I could have done it in less time, with fewer lines of code, and in a single programming language.



		Here is a list of several such repetitive tasks that I've had to deal with over the years; many of them can now be done satisfactorily in Python:



		• Collecting reports, processing them into a larger one



		• Checking URLs in a Web document for connectedness



		• Periodically making backups of important files and directories



		• Sending an automatic report by email to fool your boss into thinking you're really accomplishing something



		• Automatically drawing PERT charts from much simpler input



		• Making a list of every file in a particular directory tree, and doing different things with each file based on its suffix, or extension



		• Making lists of files in a particular order, which can be used in other programs



		• Keeping track of your video collection



		In the past, writing your own software for special purposes meant learning a great deal of complicated and arcane syntax before even the simplest programs could be written. FORTRAN, an early but still popular language, is well suited to scientific programming because it has many useful mathematical features, but its syntax is—well, non-obvious. C, another language you've probably heard of, has many adherents because of the power it gives to the programmer, but it is not at all difficult to write tricky, almost unreadable programs with it. C programmers admit that the language encourages bad programming habits, but ''FORTRAN enforces them." Python, in contrast, enforces—or at least encourages—good programming habits, and it attempts to shorten the learning curve so that the time spent learning details of the language is reduced as much as possible.



		The following are some small working programs in FORTRAN, C, and Python; they don't do much, but the traditional beginner's program in any language merely prints the phrase "Hello, World" so that you can see it.



		FORTRAN:



		PROGRAM PRINT *, "Hello World" END PROGRAM



		C:



		#include main() { printf("Hello World\n"); }



		Python:



		print "Hello World"



		I believe that Python is very nearly the perfect first computer language; the syntax is relatively simple and unadorned, and instead of ''many ways to do a thing," usually one obvious best way exists—or only a few good ways. Important features, some of them extremely useful for scientific programming, are built into the standard Python distribution instead of being add-on packages that must be tracked down, requiring considerable expertise to install. For fancier programming tasks, you do need to add an extension package called Tcl/Tk, but the Windows installation package will automatically add this for you. Despite the language's basic simplicity, it allows complex and sophisticated ideas to be expressed in an intuitive way because it applies, systematically and rigorously, a concept called object-oriented programming (OOP).



		In the hours that follow, I will try to give you a solid understanding of the basics of Python programming. The first third of the book covers the most basic elements of Python. The middle third covers objects from the ground up because objects are of fundamental importance in using the full power of the language. The final third of the book covers Python's portable graphical user interface, tkinter. Also in the final third, we'll cover a bit of Web common gateway interface (CGI) programming, just to give you a hint of how useful such skills can be.



		In this book, I assume you know nothing about programming. Without the preconceptions and unnecessary information acquired from other programming languages, you will have a distinct advantage, and I hope that it will become evident that programming computers is very often much easier that it is made out to be. I do, however, assume that you can use computers and have a basic knowledge of programs such as word processors, text editors and the command-line interface for your particular platform (DOS or your favorite shell such as ksh, csh, or bash on UNIX).



		Although I've talked about many rational reasons to program, the final point I want to make is that building programs that work is fun. There's nothing quite like the thrill of being able to put together some basic instructions and seeing the result run exactly the way you imagined it on a computer—a machine that some people see as obstreperous and frustrating. Programming in Python, for me, has never been just a job. I hope that after you finish this book, you will be able to have just as much fun as I do in programming.



		The History of Python



		Python was developed in late 1989 by Guido van Rossum over a Christmas vacation when his research lab was closed and he had nowhere to go. He drew features from many other languages, such as ABC, Modula-3, C (at least the less controversial



		features), and several others. He was fond of watching Monty Python's Flying Circus on television, and when it came time to name the language, he chose Python. After use and experimentation among a small group of friends and colleagues. Python was released into the public domain in 1991. Unlike some other languages, Python is not only completely free, it has no restrictions whatsoever on its use. Programs developed in the language are not required to be released into the public domain, programmers are not required to submit changes back to Guido, and programs written in Python can be sold to and by anyone without licensing fees.



		In addition to the language syntax itself, the decision to place Python in the public domain has been a major factor in its adoption worldwide. Other languages may have more users, but few languages can boast such a passionate user community as Python. Python may be a young programming language, but these passionate users gather for international conferences at least once a year, and sometimes more often.



		The user community has created an informal organization dedicated to supporting and expanding the use of Python: the Python Software Activity (PSA), which has about 300 individual and 30 corporate members. Individual members pay good money to support Python; corporate members pay more. Membership is not required, it is strictly voluntary; no one ever has to pay a penny to use Python for any reason, so it is remarkable that so many have contributed to the language's growth.



		In the early years, one question frequently asked was, ''What if Guido gets hit by a bus?" The community worried that if Guido died, so would Python. In 1998, the Python Consortium was founded as a means for ensuring the survival and growth of Python. Corporate members of the consortium pay a substantial sum of money to Guido to work on Python (and nothing else), provide other useful services to the Python community, and appoint a successor for the time he might be unwilling or unable to direct the future course of Python.



		It seems certain that Python will not just survive but will move into the twenty-first century with vigor. Guido describes himself as a "conservative programmer" who is determined that Python will change at his direction and only in directions that he thinks are necessary. A primary advantage of this outlook is that programs written in early versions of Python will continue to run unchanged, for the most part, in future versions. I began using Python when the version was 1.3, and it is now at 1.5.2; all the code I wrote still runs, without change, on the latest version. Version 1.5.2 is the basis for the examples and code snippets in this book; the next version, 1.6, is due out later in 1999 or early 2000, and should run, without change, all code given in this book. Sometime in late 2000 or early 2001, version 2.0 should go into beta, and almost all the code in this book should still run perfectly. Although the code here was written using 1.5.2, it has been



		tested on earlier versions, where possible. If you encounter any problems due to version differences, check the book's Web site for any corrections already noted. If you don't find a correction or revision at the Web site, write to the author with a full description of the problem.



		goto Considered Harmful



		In 1968, Edsger W. Dijkstra, one of the greats of programming, wrote a letter to the editor of Communications of the ACM in which he argued that the goto statement, a feature of virtually all programming languages at the time, had an adverse effect on programmers' thinking.



		A goto is an instruction in computer language that tells the computer to go to another place in the program and execute the instructions found there. When those instructions are finished, the programmer must remember where the computer must be told to goto next. Programming without gotos is called structured programming, because it is usually quite clear to the programmer and readers what the program is doing at any one time. The use of goto in a program means that the actions of the program must be painstakingly traced, usually with a larger possibility of error.



		In 1968, the vast majority of programs were written in what is now known as spaghetti code, a style marked by lots of gotos, no modularization, and few subroutines. We'll discuss subroutines, or procedures and functions, in later lessons, as well as modularization.



		I was writing COBOL programs in 1968. Like everyone else, I wrote spaghetti code; that stuff was darn hard to read, even for the author, and nearly impossible to read unless you wrote it. The only structured programming constructs were of limited utility; they were hard to use, and documentation was hard to get. Often, the only way to determine the logic of a program was by referring to something called a flowchart, a specialized diagram with particular symbols for input, output, and decisions. If the author of the program didn't draw a flowchart, the only way to get one was to feed the program to another program that read the first one and drew the diagram automatically. Few programs were written from flowcharts drawn beforehand—which was supposed to be the correct way to write a program. Speaking from personal experience, I can truthfully say that goto does, indeed, encourage you to make a mess of your program.



		COBOL is one of the earliest programming languages; it helped popularize the use of computers back in the era of large mainframe machines, such as the IBM 360. It was invented by Admiral Grace Hopper and colleagues. Admiral Hopper is known for the saying, ''It is easier to obtain forgiveness than permission." COBOL was notable for being written in human-readable, understandable words, not specialized numbers meaningful only to computers. Today, it is considered hopelessly wordy. I considered including a "Hello, World" program to demonstrate just how wordy, but it was too long.



		Both Benjamin Whorf and Noam Chomsky believe that language structures thinking; you can think only the thoughts that you have words to express. Very simply, if your language contains no future tense, you find it difficult (if not impossible) to think about events not in the past or present. Computer programming languages are good evidence for this viewpoint; if no "words" or methods exist in a computer language to express something you want to do, it is hard to think about the things you're not allowed to do. Initial programming languages were monolithic constructs, perpetrated (rather than created) to further the proprietary aims of the companies that invented them. It was almost impossible to add new features to the languages because you didn't have access to the source of the language or to the creators of the language. In addition, no provision had been made to allow programmers to add anything to the language that hadn't been thought of by the designers.



		Extensibility became a goal for language designers; when Guido designed and built Python, he made it very easy to add features and modules to the language. He also left out the goto statement, providing instead many useful structured programming techniques, and he designed the language around a new set of principles: object-oriented design. Early software engineering theorists maintained that the data on which a program operated was paramount; define how your data was structured, they said, and the methods you used to manipulate it would be self-evident. Later, objects, software entities that combined both the structured data and the manipulative methods, would become even more important in programming. Guido made it easy to use objects in Python, much easier than most other languages. Some languages grafted objects into a primarily linear core, but Python was designed from the outset to have an object orientation. You can think about objects in Python, and we'll spend several hours on mastering the skills and techniques of object-oriented programming in the second third of the book. You'll find that when you can think about objects, it is extremely simple to implement them in Python.



		What Python Is and Is Not Good At



		Python is an excellent language for many purposes. Python programs can take the place of shell scripts, sometimes reducing significantly the number of lines required to perform a job. Quite a few C++ programmers out there use Python for prototyping, which means that instead of writing out in laborious detail the specifications for a program, they build a prototype in Python that could include a graphical user interface (GUI). Because Python is so well designed, prototyping with it takes less time than writing or drawing a full specification (sometimes dramatically less time). Programmers who do this sort of thing also say that they end up with a better end product because Python encourages clear and elegant thinking. Even when the final shipping version of a product must be written in C++, they say that using Python first results in a much smaller, much betterdesigned result.



		However, a few areas exist where Python does not shine. Operating on very large text files using complicated regular expressions (RE) with Python generally takes much longer than it would with Perl, for example. Although the differences between the two languages in RE handling can often be minimized by using some simple optimization techniques, it's often true that Perl is better suited for a few tasks than Python. In general, however, the speed at which a programmer can build a tool or prototype more than makes up for the difference in execution speed. The time between idea and implementation in most languages accounts for a great deal of the high cost of programs, but with Python this expensive gap can be shortened so much that, according to Frank Stajano, programmers can have ''executable ideas." Writing down your ideas in Python clarifies them and will often give you a production-quality program in a very short time.



		Programs that must act in real-time are probably not suitable for complete implementation in Python. Interpreted languages, in general, are too slow for the sort of instant response expected from such server programs. For example, Python would be a poor choice for a voice mail engine that was required to serve several hundred telephone lines at once. However, you can choose several options to improve the reaction time of such an engine. First, some crucial areas of the code can be written in a low-level language such as C, and the Python engine can treat the sections in C the same way that it treats built-ins. The string module is written in C; therefore, operations on strings in Python are extremely quick. This means that Python is extensible, making the speed of low-level languages available to Python. Second, low-level languages can embed Python, making it easy to call Python functions from C (or similar languages). This embeddability makes the power of Python available to the low-level language. These two properties make Python an extremely practical glue language, one that enables existing parts to work together as a unified whole.



		Summary



		In this first hour, you've learned what programming is, why you might want to know how to program, and why Python is a good choice for your first programming language. You've learned some of the history of the language and a bit of the history of programming languages in general, and you've been exposed to one of the most influential ideas of computer science: the language you choose to program in affects what you can program and how it is structured. Finally, we covered some areas where Python is not necessarily the best choice and some ways to make up for Python's weaknesses in those areas.



		Workshop



		*Q&A*



		Q Is Python portable?



		A Python can be made to run on nearly every platform; pre-compiled binaries are available for most operating systems. The only major operating system that doesn't support it is NetWare, and, at least one person has admitted publicly that he's working on the port.



		Q Can I use Python as a CGI language?



		A Yes. All that is required is that the machine running the Web server must both support and allow Python programs. You may run into resistance from Web server administrators, however, who might not want to add another CGI language to their systems.



		Q Is Python secure for CGI programming?



		A It's a safer language for this purpose than Perl, but not as safe as Java.



		Q Will Python pave the way to fame and fortune?



		A Not necessarily, but it sure won't hurt you to learn it.



		Q Are any major companies using it for production programs?



		A Yes, indeed. Companies that rely on it include NASA, Yahoo, Red Hat, Infoseek, and Industrial Light and Magic. We also know that some other big players in the computer industry use it but are reluctant to admit it because they feel that Python gives them a real competitive edge. Building things faster than other companies makes them very responsive to customers and very likely to get repeat business.



		*Quiz*



		1. Good reasons to pick Python as your first programming language are



		a. Power, speed, monolithic



		b. Flexible, extensible, embeddable



		c. Elegance, clarity, simplicity



		d. Real-time, powerful regular expressions, similarity to C



		2. Who first proposed that using goto in programs led to unstructured programs?



		a. Nicklaus Wirth



		b. Edsger Dijkstra



		c. Benjamin Whorf



		d. Benjamin Charles E. Thompson, Jr.



		3. Who invented Python?



		a. Tim Peters



		b. Ivan Van Laningham



		c. Guido van Rossum



		d. Edsger Dijkstra



		*Answers*



		1. Good reasons to pick Python are b and c; it's flexible, extensible, embeddable, elegant, clear, and simple.



		2. Edsger Dijkstra considered the goto statement harmful because it allowed programmers to ''make a mess of their programs."



		3. Guido van Rossum is the creator of Python, despite what Tim Peters may claim and what I would wish. However, Tim has contributed greatly to the development of Python (in addition to being a very funny guy), whereas I consider myself lucky to use it.



		Exercises



		Visit http://www.python.org/, the home base for all things Python. As this book was being written, another extremely valuable on-line resource appeared. The Vaults of Parnassus, which can be found at http://www.vex.net/parnassus/. This is a rich site with full searching capability, and is full of links to software written in Python all over the world.



		Join either the Python mailing list or the Python-tutor mailing list. You can find out more and can join at http://www.python.org/psa/MailingLists.html.



		Read Edsger Dijkstra's short letter about goto, which can be found on the ACM Web site at



		http://www.acm.org/classics/oct95/



		It's tough going; for a little relief, try Edward Hall's excellent book based on Whorf's theories, The Silent Language.



		Make your own list of repetitive tasks that you hate to do that you think you might want to have a computer do for you. As you go through this book, refer back to the list to see if you can think of solutions. Try to build at least one of these timesavers before you get to the end. If you only write one program and manage to save yourself even a few minutes of time per week, you've paid for this book.



		If you are the determined, completist sort, you could read Shunryu Suzuki's Zen Mind, Beginner's Mind. Another fun read (well, I think it's fun!) is Richard Wexelblat'fs History of Programming Languages. Complete bibliographic information for all books mentioned in the text is in Appendix A, ''References." Of course, if you are a determined completist, you should consider writing a book.

Subscribe to: Comments (Atom)