Botwiki:Threading
From Botwiki
When creating a bot that interacts with users or other systems over a network connection, one often faces the problem that lag greatly reduces the performance. This can be avoided by w:multitasking, a process that became widely used during the 90s.
Multitasking on a UNIX system can be done two ways: forking and threading. The main difference between those two, is the use of the memory. Two threads within one process share one and the same address space, and have thus access to the same variables. When forking is used on the other hand, the process is split into two, and both processes do not share the same address space. This means that for communication, some kind of interprocess communication must be used. For more information on forking, see the Wikipedia article on forking.
Contents |
[edit] Locking
Contrary to two processes, multiple threads share the same address space. This might cause problems one or more threads write to a variable, while another tries to read from it. Data corruption and unexpected crashes may follow.
To solve this, one needs to make sure that one and only one thread has access to a variable at the same time. This is done using locks. In Python, locks are available in the threading module. If a thread needs to access a shared variable, it should first acquire a lock, read the variable and release it as soon as possible.
[edit] Dead locks
A common error in programming is a dead lock. There can be two causes of this: a thread that has acquired a lock may have unexpectedly terminated without releasing the lock, or it may be a design error, which will generally only manifest itself under rare conditions.
Example of a dead lock caused by an exception:
import thread, threading, time lock = threading.Lock() shared_variable = "some string" def read(): time.sleep(1) lock.acquire() print 'Our variable is', shared_variable lock.release() def write(): lock.acquire() # Casting to an int will raise a ValueError shared_variable = int(shared_variable) lock.release() thread.start_new_thread(write, ()) thread.start_new_thread(read, ())
In this example, the write() function will exit prematurely with a ValueError, without ever releasing the lock. Then the read() function will wait until eternity to read the variable. The solution to this is a try .. finally' clause, which will release the lock any way:
def write(): lock.acquire() try: # Casting to an int will raise a ValueError shared_variable = int(shared_variable) finally: lock.release()
[edit] When not to use threads
Don't use threads to be able to edit to a site very fast. Not only will the server admins get angry, also the pywikipedia framework has a builtin, thread-safe edit throttle, although that does not seem to work always.
[edit] Thread-safety and pywikipedia
The author has checked the following methods and classes for thread-safety: Site, Page.get, Page.put.
[edit] Using the threadpool module
Sometimes it is necessary to have jobs performed by another thread, so that the current thread can continue. As explained above, starting a thread for each job is a bad idea. Therefore the threadpool module for Python has been developed. This module implements a thread pool which allows scripts that require performing concurrent jobs, an efficient and thread safe way to do this.
The advantages of this is that the overhead of starting a thread is gone. Also the number of threads can be very easily limited. By the use of events, no thread will run into a so called "busy while loop". This module also implements a method to exit all threads as soon as they have finished their jobs.
BlogMarks
del.icio.us
digg
Fark
Furl
Newsvine
reddit
Segnalo
Simpy
Slashdot
smarking
Spurl
Wists
