Monday, September 27, 2010

I'm Knee Deep in GDB...

Learning things I never wanted to know...

Update:
I'm working with a Stackless framework which uses the asynchronous features of Psycopg2. On top of this, I've been using SQLAlchemy... unfortunately I have recently noticed that after a few hundred or so requests, the application simply hangs. After analyzing a core dump, it seems that something is attempting to acquire a lock... and deadlocks. It is very deterministic, as it always happens at the same point in my unit tests.

I've narrowed down the issue to something inside SQLAlchemy... I attempted to fix it but SQLAlchemy is so full of dark magic that I'm really quite lost. I guess I was expecting a bit too much to think that regular SQLAlchemy would would correctly with an asynchronous driven psycopg2...

12 comments:

lorg said...

That's one of the best kinds of learning :)

Sparks said...

Well? Don't just leave us hanging there! :D

Amin's Weblog said...

me too. eager to know if you got them working properly.

flow said...

"SQLAlchemy is so full of dark magic that I'm really quite lost"---which pretty much sums it up for me. i've put a lot of work into understanding and using ORMs, but finally given up. nowadays i believe ORMs are simply wrong, and i'm pretty much back to hand-written SQL again. yes, it sucks, and it can't be the solution in the end, but it still is way better than dealing with all the conceptual overhead of ORMs. all i want is a hut by the seaside, yet what i get with SQLAlchemy is a grand villa with pool and helipad. and a really fat user's manual. way too heavy for my taste.

mike bayer said...

simon - care to define "magic" ? One person's "magic" is another's idiomatic Python.


flow - that's fine that ORMs (and non-ORM SQL abstraction layers, which SQLA also provides) are not for you. Though as far as ORMs being "wrong" for any situation, rest assured ORMs once understood and used correctly can save hundreds if not thousands of man-hours on any project. We've had over 11K downloads in the last 20 days so clearly a good number of folks are getting some use out of the product.

Simon Wittber said...

Hi Michael. First, as a looong time SA user, I value and appreciate the software, so please don't think I'm dissin' SA :-)

By dark magic, I mean... SA does many things for me, that I never usually need to think twice about. Using SA itself is a breeze, thanks to the excellent documentation. However when I dive into the SA source code... wow there is so much stuff in there, doing so many things, my head spins.

As someone else said, "Any sufficiently advanced technology is indistinguishable from magic." And that is what SA is. Advanced Technology.

flow said...

@mike bayer """Though as far as ORMs being "wrong" for any situation, rest assured ORMs once understood and used correctly can save hundreds if not thousands of man-hours on any project."""

this is the benefit i had been hoping for, but it always stayed out of my reach. of course when i say "wrong" this is meant as a very personal assessment, like japanese natto smells and tastes "wrong", but that's just me. and yes, the way i am hand-assembling SQL statements these days does have a "wrong" smell to me, and i readily admit that. it is just so much conceptually simpler and it works. the one benefit of the barebones approach is that you get a nice collection of nicely-formatted SQL statements that you can pass around for other people to use. i mean, on the one hand, SQLA strives to make development independent of underlying RDBMS used, which is good; but then, in order to understand and re-use application code that queries a DB with SQLA, wouldn't other projects be forced to use SQLA themselves? IMHO that somewhat diminishes the cross-project code reusability. i have no very good solution to that conundrum.

mike bayer said...

If you're trying to integrate an asychronous execute/fetch scheme, you'd pretty much be dealing with engine/base.py. You'd probably need to rip out most of base.py and turn it into some system that accepts many execute() calls across the whole application and returns results back to synchronous clients, i.e. the Session and its related agents. I've yet to take the time to figure out working with asynchronous systems since its never been necessary (I use multiprocessing instead), but I would be incredulous if using async effectively only requires a "fix" - its tremendously different.

mike bayer said...

flow: There is a small but constant...ahem..."flow" of users who seem like they spent some time with SQLA, and come out of it with an anti-ORM stance. Since I have your attention though, are you aware of the SQL Expression Tutorial ? This is a single page that walks through the SQL expression language, a plain Pythonic abstraction over SQL and is designed for the non-ORM user, or for someone who wants to build their own object-relational layer with the benefit of a SQL abstraction. This is what reddit did - the way they're using their database is entirely not at all what an ORM would provide.

I understand the non-ORM use case extremely well since I spent years avoiding all ORMs of any kind. This is why I wrote the SQL expression language and table metadata system first, that's what I needed for most of my ad-hoc object management frameworks in the past.

alexandre-fayolle said...

Last time I had such kind of deadlock, it was a mixture of python-level lock and SQL Table lock:

thread1 enters python critical section (protected by threading.Lock), executes SQL instruction that creates a database lock (such as UPDATE table set col = col+1) and exits critical section.

thread2 enters critical section, tries to execute same sql query, but is blocked by the lock because thread2 uses a different connection

thread1 wants to enter critical, but thread2 holds the lock. Deadlock, because thread1 will never be able to commit() which would release the db-level lock.

Took some time to hunt down...

flow said...

(1/2) mike, i went back into the SQL expression language tutorial and started reading from top. indeed, it doesn't look all that fearsome, and i remembered why i used to be such an avid fan of ORMs back then. ok there's stuff like metadata, engines, connections, metadata binding, and all kinds of methods you can have to call on some of the objects that represent connections, or tables, or engines at some point in time. when reading that, i find it 'somewhat complicated, but probably manageable'; back when i actually did it, however, i remember i became more and more jaded with that. i always tried to abstract the nitty-gritty OOP API details away from me, and shoved them into convenience methods that 'just worked', so i could concentrate on my data.

i've walked down that way, which leads away from classical OOP with their 'smart objects' (object = data structures + methods); now i live in a world where i deal with data (just that, data---dictionaries, strings, numbers, lists...) and libraries (no state, a bit smart, a teeny bit of simple inheritance, otherwise just methods).

so nowadays code like addresses.c.email_address.like('%@msn.com') makes me shudder a bit. it reminds me of yegge's piece, "Execution in the Kingdom of Nouns", where instead of making someone "get the garbage bag from under the sink", we end up with "RidersGuild.getRiderNotificationSubscriberList().getBroadcaster().run( new BroadcastMessage(StableFactory.getNullHorseInstance()));". sqlalchemy is sure simpler than that, but IMHO SQL.like( email_address, '%@msn.com' ) still feels more natural than the above example. sometimes, i still get python's 'foo,bar'.split(',') vs. ','.join(['foo','bar',]) the wrong way around. there are good reason both split and join are string methods, but you wouldn't exactly use this particular example to convince people OOP is the advanced concept that will power tomorrow's space rockets.

flow said...

(2/2) To my surprise, i found that the SQL expression tutorial states: "Going from what one understands to be a textual SQL expression into a Python construct which groups components together in a programmatic style can be hard." yeah, i found it oftentimes so hard i ended up with great numbers of SQL snippets littered throughout the code, because i felt unable or unwilling to figure out how to say that in SQLA instead of in SQL proper.

the tutorial shows how to use literal SQL, but then (quite rightly) warns: "We’ve now gained the ability to construct very sophisticated statements. We can use all kinds of operators, table constructs, text, joins, and aliases. The point of all of this, as mentioned earlier, is not that it’s an “easier” or “better” way to write SQL than just writing a SQL statement yourself; the point is that it’s better for writing programmatically generated SQL which can be morphed and adapted as needed in automated scenarios."

what i am doing right now is pasting together short stretches of SQL templates that get the tablenames and so on interpolated into them. i guess sqlalchemy works much like this under the hood. the next logical step for me is probably to build a library so i can more easily, and with a greater deal of resilience against malicious data, build statements; it might turn out to look like SQL.select( SQL.from( SQL.as( 'main', 'a' ), SQL.as( 'main', 'b' ), ), SQL.where( SQL.and( SQL.equals( SQL.field( 'a', 'user_id' ), SQL.field( 'b', 'user_id' ) ), SQL.equals( SQL.field( 'a', 'user_id' ), 'Mary' ), ... and so on. add a little formatting and maybe use WHERE etc instead of SQL.where and so on, and the code could become an SQL look-alike. i like the conceptual simplicity of that approach. guess it was simply the perceived complexity of ORMs together with my growing OOP-abstinence that made me stop using sqlalchemy.

Popular Posts