Ruby Conference 2006 – Day 1 Evening (Friday, 20 October 2006)
Sunday, 22 October 2006
Posted by austin in: Ruby PDF, RubyConf, trackback
Dinner tonight was at Old Chicago with Hal Fulton, Ara Howard, Patrick Hurley, Tim Pease, and various others whose names I can’t remember offhand. Great dinner, and I was able to fully explain the problem with WHY the Ruby extension situation on Windows is so bad. I also started talking about THE big problem that I have with Transaction::Simple and haven’t figured out how to solve in a general way (details below). They weren’t quite understanding it, so before the matz Roundtable came up, I showed them a test case that I had come up with while talking with Francis Cianfrocca (who is behind EventMachine and the implementation of Net::LDAP).
The matz Roundtable was pretty short; not too many questions were asked this year, and the discussion didn’t continue for an hour as it did the year before. I was shot down when asking for “become” behaviour (related to the Transaction::Simple bug). After the Roundtable, I managed to snag matz to talk about the problem which led me to request this. I showed him the test case:
#!/usr/local/bin/ruby
require 'rubygems'
require 'transaction/simple'
class Child
attr_accessor :parent
end
class Parent
include Transaction::Simple
attr_reader :children
def initialize
@children = []
end
def < <(child)
child.parent = self
@children << child
end
end
parent = Parent.new
puts "parent.object_id: #{parent.object_id}"
parent << Child.new
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
puts "starting transaction"
parent.start_transaction
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
puts "aborting transaction"
parent.abort_transaction
puts "aborted transaction"
puts "parent.object_id: #{parent.object_id}"
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
producing the output:
parent.object_id: 3265800 parent.children[0].parent.object_id: 3265800 starting transaction parent.children[1].parent.object_id: 3265800 aborting transaction aborted transaction parent.object_id: 3265800 parent.children[0].parent.object_id: 3265500 parent.children[1].parent.object_id: 3265800
This bug affects PDF::Writer’s table generation and contributes significantly to the high memory usage. What’s happening is that when you call Parent#start_transaction, Transaction::Simple creates a transaction checkpoint with Marshal::dump. When you call Parent#rewind_transaction or or Parent#abort_transaction, the transaction checkpoint is reverted. This reversion is extremely robust except for this one item. What we really need is something like:
self = Marshal::restore(checkpoint)
Obviously, that won’t work and this leads to the problem that is illustrated above. After long discussion with Tim Pease, Patrick Hurley, and Matz, we came up with a workaround that can work for the example bug and for PDF::Writer. It’s not super-efficient, though. Essentially, I will modify Transaction::Simple to have callback methods for post-processing after a transactional operation. Something like this:
class Parent
def post_restore_hook
@children.map! { |child|
child.parent = self unless self.object_id == child.parent.object_id
child
}
end
end
parent = Parent.new
puts "parent.object_id: #{parent.object_id}"
parent < < Child.new
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
puts "starting transaction"
parent.start_transaction
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
puts "aborting transaction"
parent.abort_transaction
parent.post_restore_hook # would be called automatically in the real case
puts "aborted transaction"
puts "parent.object_id: #{parent.object_id}"
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
Which produces the output:
parent = Parent.new
puts "parent.object_id: #{parent.object_id}"
parent < < Child.new
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
puts "starting transaction"
parent.start_transaction
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
puts "aborting transaction"
parent.abort_transaction
parent.post_restore_hook
puts "aborted transaction"
puts "parent.object_id: #{parent.object_id}"
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
This isn't great: it doesn't feel very Ruby to me, but it does get the job done. It's also not very efficient. After thinking about this for the better part of an hour, matz has suggested that there might be a very ugly hack that’s possible that he’ll look at for me, which may be able to implement everything in Transaction::Simple.




Comments
The need here appears to be for a transaction which incorporates the entire data structure, not simply the “parent” object (which appears to be the way that Transaction::Simple likes to work). This is analogous to a database transaction that when aborted, only revert one table when actually mutliple table were actually modified.
I’ve never user “transaction-simple” before, so I’m not sure what methods it provides. But it seems that there should be a way to marshal the entire data structure and then restore it (a deep-copy versus shallow-copy).
BTW: Thanks for writing PDF::Writer. I love it! :)
Transaction::Simple does incorporate the entire data structure (it uses Marshal::dump on self). The only thing it doesn’t/can’t incorporate is the current object ID and that’s what gets disconnected. I’ll post more about this in a little while as I get caught up on my blogging for the conference.
OK. I misunderstood the problem earlier.
Just to make sure that Marshal worked the way I thought it did, I tried this:
—————————————-
parent = Parent.new
parent
My previous post got cut short. I think it was the ‘
Try again. Sorry. :)
————————————
parent = Parent.new
parent
I suspect that “abort_transaction” restores a marshaled copy of itself (into another object) and then iterates over that objects attributes copying them back to the original object(?).
Perhaps a solution would be to instead make the existing object simply a proxy (or wrapper) for the restored object, forwarding all of its method calls to the restored object. This seems a bit more ruby-ish to me.
Feel free to correct any bad assumptions I’ve made.
Thanks…
The only way to do that would be to actually make it so that you’re always working on a proxy object. That’s not what I had wanted to do, but it is going to be one option I provide in Transaction::Simple 1.4 (I’m also providing callback hooks). However, matz does recognise that this is a legitimate case and we are looking for a “nice” way to change something in Ruby 1.9 to allow for exactly this case.