Python Web Application Deployment Frustrations

by Jeremy Jones

As I've posted before, my wife wanted me to build her a website. Initially, I planned on building it using Plain Old HTML. It was going to be a plain storefront and customers would phone in orders. Then she decided that it would be more convenient if they could upload their images to us rather than email them. CGI would work perfectly for that. Then, we thought that maybe a store catalog and integrated shopping cart would be cool. I started digging into PHP for that. I shied away from TurboGears because I thought hosting would be a problem. After looking around, I decided that hosting was a non-issue, so I built her site in TurboGears.

I settled on Dreamhost for hosting because of price and FastCGI support. FastCGI is one of a handful of methods for deploying Turbogears in a hosted environment. FastCGI has been a source of frustration for me during this process and I don't expect the frustration to go away any time soon. It just seems really quirky.

I finished my wife's site yesterday. We did a final walkthrough of the site and I did a few finishing changes. I then began the "deployment to production" process last night. I followed the instructions on the "Installing TurboGears on Dreamhost" wiki.

Thus begins my frustration. Copy my files over. Not a problem. Modify the tg_fastcgi.fcgi script. Not a problem. Make a couple of changes to my TurboGears config file. Not a problem. Drop in a .htaccess file. Not a problem. Test that tg_fastcgi.fcgi runs properly from the command line. Not a problem. Point my browser at my site and get it to kick off the FastCGI process(es). Hmmm. It looked like it was trying to start something. I saw CPU utilization increase, but not on any process I had access to view. Then after what seemed like forever, as if by magic, there were maybe a dozen tg_fastcgi.fcgi processes running. That was liberating. The site was running. And it was pretty snappy, too.

There didn't appear to be any obvious problems. Except when I needed to change something, then I had to ``killall`` the tg_fastcgi.fcgi processes so the change would take effect. FastCGI is apparently more finnicky starting up right after you've just killed it. I again saw some unknown process eat a little CPU and then there were entries appearing in my log file that looked like this:

[Wed Jan 18 08:09:17 2006] [error] [client ] FastCGI: incomplete headers (0 bytes) received from server "/home/(my account)/(my domain)/tg_fastcgi.fcgi"

And then, after a while, it just magically came up.

Again, no obvious problems. Except when I added an item to my shopping cart. When I went to view my cart, there was nothing there. Then, when I clicked "View Cart" again, there was my item. Click again and it's gone. Click again and it's there again. Round and round we go. I've created a magical disappearing-reappearing shopping cart! Cool! Wait. Not cool. Customers won't like that. Neither would my wife. I figured that the problem may be caused by the multiple tg_fastcgi.fcgi processes not sharing session data properly. Aarggghh. I switched over from using RAM as session storage to file-based session storage. The problem immediately went away.

Then I started getting 500 errors and entries in my log file that look like this:

server.log: self._lockFile(lockFilePath)
server.log: File "/home/(my account)/lib/lib/python2.4/site-packages/CherryPy-2.1.1-py2.4.egg/cherrypy/lib/filter/sessionfilter.py", line 345, in _lockFile
server.log: raise SessionDeadlockError()
server.log:SessionDeadlockError

And 500 errors in the browser. And an unusable website. There appears to have been a bug entered against CherryPy which was supposed fix this problem. Maybe I hit a corner case. I don't know. But it looks like another session-oriented issue. Maybe FastCGI isn't playing nicely with the session storage files.

So, I have a web application which is difficult to modify quickly because FastCGI doesn't appear to have a nice "restart" option. (If someone knows of one, I'd appreciate you posting it here. I found a reference to giving a ``killall -USR1``, but I really don't want to try that right now. The server is running OK for the moment). It seemingly randomly spews 500 errors and has session deadlocking issues. There is also sometimes a significant lag during the first request after there have been no requests for a while. The site has been (mostly) fun building. Deployment has been a beast, though.

I'm not blaming Dreamhost or TurboGears or FastCGI or CherryPy or anything else. I'm just venting a bit. It's good to do that every once in a while. I guess tonight I'll start trying to find solutions to the relevant problems.

21 Comments

guet2
2006-01-18 11:38:47
fcgi on dreamhost
Try


killall -USR1 dispatch.fcgi


This will restart all the fcgi processes the next time they're called, so it'll be slow on the first call after this, then quick again. In theory this shouldn't disturb anyone using the site.


killall -9 dispatch.fcgi


This will force all the processes to immediately exit, potentially dropping sessions, so try doing a tail on the logs first to check no one is using the site.


These are for rails on dreamhost but I imagine the same works for python fcgi. Perhaps you could store your sessions in a database table instead?

guet2
2006-01-18 11:40:18
fcgi on dreamhost
oops, forgot to change the name of the fcgi process, but you get the idea!
jmjones
2006-01-18 11:52:38
fcgi on dreamhost
I've actually tried the ``killall -USR1`` since blogging about this and I've had mixed results. Sometimes, the app comes right back up. Other times, it takes literally minutes for it to come back up.
RickCopeland
2006-01-18 12:31:49
I had similar issues, but the following...
seems to do the trick (in prod.cfg):



# SESSIONS
sessionFilter.on = True
sessionFilter.storageType="File"
sessionFilter.storagePath="/apps/session"
sessionFilter.storageFileDir="/apps/session"


I'm not sure whether I need both "storagePath" and "storageFileDir", but I haven't noticed any problems since configuring it like this.

jmjones
2006-01-18 12:39:20
I had similar issues, but the following...
Here's my session area:



sessionFilter.on=True
sessionFilter.storageType="File"
sessionFilter.storagePath="(my home dir)/app_deploy/session/"


I don't think you need both. This resolved the disappearing reappearing problem, but I still have the session deadlock problem.

evenview
2006-01-18 16:49:06
some links
Here's some comments about FastCGI and Apache re unreliable sockets ...
http://www.vmunix.com/mark/blog/archives/2006/01/02/fastcgi-scgi-and-apache-background-and-future/


This talks about mod_fastcgi and sockets. Sounds ugly.
http://rfc1437.de/page/django-apache-and-fcgi/


Maybe these instructions using mod_scgi instead will help.
http://simon.bofh.ms/cgi-bin/trac-django-projects.cgi/wiki/DjangoScgi


Will your host let you run lighttpd and have Apache be only a proxy? Then this will help:
http://www.cleverdevil.org/computing/34/deploying-turbogears-with-lighttpd-and-scgi


Rick Thomas


tazzzzz
2006-01-19 04:48:04
mod_fastcgi
I've generally heard that fastcgi on Apache is not very well supported and that it works a lot better on lighttpd.
jmjones
2006-01-19 05:17:26
mod_fastcgi
I'm discovering that hosting is maybe as big of an issue as I thought it was when I was considering doing PHP. FastCGI should work OK for what I'm doing with it. I know it's dated and doesn't appear to be well maintained, but my simple app shouldn't be this flaky running under it. Now, to be fair, it has been pretty stable since yesterday late afternoon. Doh. Spoke too soon. I clicked another browser tab to it and....deadlock error...again.
RickCopeland
2006-01-19 06:35:48
I had similar issues, but the following...
It's possible you still have old instances of the FCGI server hanging around. You might try killing all the instances of the FCGI app and then let them come back up. I use "pkill tg_fastcgi.fcgi; ps -A" repeatedly until there's nothing left. Once they're gone, they can take a few minutes to reappear. (I believe Apache gets tired of respawning them and backs off for ~5 minutes between respawning attempts after a while.)
jmjones
2006-01-19 06:48:35
I had similar issues, but the following...
I've been having better success with ``killall -USR1 tg_fastcgi.fcgi``. I typically keep giving that command until they're all dead. And when I'm changing code, this is the way to go. But I haven't been changing code since I blogged about this. And for non-deployment issues, I don't think I should need this at all. I shouldn't have to kill all my processes because the server starts randomly throwing errors. This session deadlock one is a beaut. I get inconsistent behavior if I use memory session storage; I get random lock ups if i use file session storage.
RickCopeland
2006-01-20 08:25:45
I had similar issues, but the following...
Well, the inconsistency is straightforward: tg_fastcgi is spawned in multiple processes (no shared memory), so they each have their own copy of the memory-based session dictionary. When a request comes in, it's randomly assigned to one of the running FCGI processes, and you'll get the inconsistency problem.


As for the deadlock error, I haven't seen that in a long time on my server. I can't really say I know what fixed it, but I don't see it anymore. Have you tried deleting all your session files? (Not that any of this *should* be necessary...)

jmjones
2006-01-20 10:11:03
I had similar issues, but the following...
That's exactly what I surmised was the problem with inconsistency and memory-based sessions which is why I switched to file-based.


I never deleted my session files in order to get it to unlock. I got to where I always deleted them after I had killled all the fastcgi processes.


Your situation is something that terrifies me. It's the same issue I had with the random "NoneType is not callable" with Kid templates and it's the same issue I'm having now with the deadlock problem. It's not reproducible. And it's random. And it appears to clear up for a while, then all of a sudden ... bam! You have a randomly occurring bad behavior on your hands. That's much worse than something crashing in a repeatable fashion. That's something I can't take a chance on.

speno
2006-01-22 18:26:55
what's the business?
Forget the technical details, I'm curious as to what kind of business your wife is starting. Please tell us!
jmjones
2006-01-23 05:25:15
what's the business?
Ssshhhhhh......it's a secret :-) It's going to probably wind up being fairly diverse, but initially she's designing and having printed 5x7 custom birth/adoption announcements. She'll also probably do custom holiday cards at the end of this year. It's kind of hard to describe with words. The parents send her a number of digital images (mostly of the baby :-), she does her graphical design magic on them and creates a really good looking (IMHO) card to send to friends and family announcing the new baby. Maybe I'll post a link when the site's up...if that's not seen as a conflict of interest or something.


So, for the boring technical details you don't want :-), the product catalog I've mentioned contains general designs that she's worked upand any example images she wants to tie to each design. The shopping cart (admittedly overkill for this site, but, hey..) allows customers to drop in a design with custom information such as the baby's birth date, time, weight, etc - stuff you'd want on a birth announcement. The payment system passes the customer off to PayPal so they can handle credit card processing - 'cause that's something I don't want to mess with! And I've already begun working on phase 2, which is integrating this site into her CafePress store(s).


Back to the non-technical details, at CafePress, she has some designs that go along with the birth theme that she's selling on TShirts, buttons, magnets, etc.


Hope that wasn't too much information! Thanks for asking!

RickCopeland
2006-02-01 10:49:01
Source of deadlock error (I think)
I know you've moved on to Django, but I just noticed this blurb on the CherryPy website for "what's new in 2.1.2" and thought you'd be interested:
...
3. The sessionfilter is now more stable, especially the file backend (the use of session.acquire_lock is now better documented in the book. If your site is used in an environment where you have multiple concurrent requests from the same browser then acquire_lock should be called by methods that write to the session).
...
The short of it is that the deadlock only occurs when you're doing what both you and I apparently were doing, e.g. making multiple simultaneous requests on the same session (from the same browser).
AlanFord
2006-02-17 23:02:50
my frustrations
Hi, I found your post very helpful. Before I go on whining, I want to ask you a question: how is the deployment of Django app going? Having same problems? How's the speed?


I am now in a similar situation as you were beginning this year. I am trying to deploy my TurboGears app on DreamHost. Everything worked out as advertised on the TG Wiki, and I was hilarious to see my app running.


And then the misery began.


If you don't interact with the webapp for a while, it takes a lot of time to load. More than a minute. Unacceptable. I saw that the server starts up actually. And it's not only if you don't interact with it. It happened also after just a few minutes playing around.


It's generally very slow to respond. I didn't profile to see where the bottlenecks are, but even for simple operations - one query with few records in the database - takes several seconds to execute.


The app has a lot of redirects: part of it is implemented using command pattern, so eventually all commands get redirected to an appropriate URL. After the command finishes, it redirects back to the main page. Bad idea. Redirects don't really work always in this set up. They make it slower. I think that the problem is that they go through the FastCGI instead of being resolved by cherrypy.


And sessions - I had similar problems with them.


I am considering switching to Django, so I will welcome your feedback. I also thought of RoR, but I am afraid of the learning curve. I know it would be fun, but I think it would just take too long, especially that I only have theoretical knowledge of Ruby (and I have been programming in Python for a while).


jmjones
2006-02-18 03:48:01
my frustrations
I'm glad you found my post helpful. You sound very much like me, from the pain you've experienced to considering Django and RoR. I'll try to answer your questions as best as I can.


First, the first stage of the Django deployment is complete. We have some more features in the works, so it's by no means a "done site". The process of deployment was a snap. But then again, so was the TG app. I am happy to say that I am having no problems, neither the problems I had with TG, nor new problems cropping up. The speed is quite good, I think. As you mention, FastCGI does take some time to come back up if idle for a while, but I've mostly alleviated that with a simple checker script that runs on my server at home and hits a test URL on the app. You can see for yourself how it performs by going to http://pitterpatprints.com/. The pages are slow to load the first time due to the heavily graphical nature of the site, but after that, it's pretty snappy. http://pitterpatprints.com/store/products/ and http://pitterpatprints.com/store/product/some_product_id/ are almost totally database driven, so that will show you something about DB response time.


Hope this helps. Feel free to post here with questions or email me directly. You should be able to find my email address on my author's page or on the main page of this blog entry.

AlanFord
2006-02-18 13:41:37
my frustrations
I checked out your website - good work in a short time. Indeed it's just the first page that takes some time to load, and the others go fast enough. I wonder though how this would all work on a high traffic web site.


I will try the trick with a crontab to keep my FastCGI alive.


There is still some brushing up to do with these frameworks - you still need to jump through a lot of loops to get the basic things running smoothly.


How does the deployment work with Django - do you also have to kill of the FastCGI processes for it to pick up your changes?

jmjones
2006-02-18 18:21:09
my frustrations
Thanks. I'm no designer, which is what I really ought to have been for this site. But it's functional and not too terribly bad on the eyes.


For Django, you do have to kill the fastcgi processes in order for them to pick up your changes. But, you should be able to issue a ``killall -USR1 your_fastcgi_process_here`` and it doesn't totally kill all of them. It just makes them aware that they need to reload when the next request comes in. At least that's how I understand it.

AlanFord
2006-02-19 12:21:43
my frustrations
I find the deployment to be a real problem - I can not accept that I need to bring down the web site down for 15 minutes every time I want to deploy a new version.


This is one of the reasons why I like web applications - that all your users have one and only one version of your code, and that you can easily deploy new versions any time.

jon_perez
2006-02-22 16:17:58
Anyone tried Spyce?
One Python web solution that I believe has been given short shrift is Spyce. If you are coming from PHP/JSP/ASP but already know Python, Spyce is the most natural transition you could hope to make.


Plus Spyce has some of the best features of JSP (custom tags) and ASP (active handlers) plus some unique features of its own. I daresay many people might find MVC to be a turn-off after trying out Spyce's facilities.



http://en.wikipedia.org/wiki/Spyce
http://spyce.sf.net