Sunday, May 31, 2009

Serialization for D part 5 of N

I figured out why the hard-link stuff wasn't working as I was expecting; I had the annotation on the referring type rather than the referenced type. Fixing that wasn't much of a problem and cleaned up some other code in the process.

Well, with that, I think I'm done with all the easy parts.

The next bit I'm planning on tackling is 3rd party types. That is how to serialize types that you don't control and can't add stuff to.

I don't really like any of the options I've come up with but I think I've got something that will work. If anyone has any better ideas, I'm interesting in hearing them.

The Compile Time Option


It would be nice to make it just a call to function like Serialize(value,sink); but that only works as long as all the overloads are all in the same module.


module t1;
int Baz(int){return 1;}


module t2;
int Baz(float){return 2;}


import t1;
import t2;
import std.stdio;

void main() { writef("%d, %d\n", Baz(cast(int)0), Baz(cast(float)0)); }


m.d(5): Error: t1.Baz(int) at t1.d(2) conflicts with t2.Baz(float) at t2.d(2)
m.d(5): Error: undefined identifier Baz

Several other ideas also fall afoul of this issue forcing me to conclude that their is no way to, at compile time, cooperatively resolve overloads. That is, at some single point in the code, the user needs to fully state every source of overloading. I really don't want this.

The Run Time Option


The next thought is to use run time lookup (an AA based on T.stringof or T.mangleof) to locate the processing functions. I already use this to extract derived types. In that case I at least know that everything in question is an Object but for this case I don't. Another issue I don't like here is that I ether need to do unsafe casts on function pointers, data pointers or return types. I think I can make sure this is valid if all access is thought template accessors but I'd rather not have to. And finally, this option is to hard to test because you can't be sure it's right until you exercise every type, including those in third party libs that might change on you.

The Boot Time Option


The option I think I'll go with sort of fakes the run time option but with better compile time support. The way I'm planning on working this is to have a ThirdPartyAccess(T) template that defines global function pointer variables for Serializing and Deserilizing. These function pointers are accessed freely at run time to do the work. As for the derived type setup, these will be populated at program startup by static constructor generated when the used defines what function to use for a given type:


module TheirTypes;

import SomeTypeLib;

void SerSomeType(SomeType, Sink!(char)) { ... }
SomeType DeserSomeType(Source!(char)) { ... }

mixin WorkWith!(SomeType, SerSomeType, DeserSomeType);

I still don't like this because I can't figure out a clean way to be sure each and every function pointer is populated exactly once. The best I have come up with is to have a -debug switch that turns on check code in a static destructor to verify all the pointers are non-null. This has the odd side effect or making it a good idea to put in a quick start-n-stop feature to exercise this code. At least, a with static constructor ordering, this will fail reliably when things are wrong.

The Link Time Option


Now, I think I can make the boot time option work but I don't like it that much better than the run time option so I'll sketch out a feature I'd like to have for this use case. For lack of a better name I'll call it extern templates. The idea is stolen from C's extern global variable. In effect, let the user use whatever variable they want, define them wherever they want and let the linker sort it all out:


module a;
template Foo(T) { extern int V; }

int SomeFn()
{
writef("%d, %d\n", Foo!(int).V, Foo!(float).V); // prints "5, 6"
}



module m1;
import a;

Foo!(int).V = 5;


module m2;
import a;

Foo!(float).V = 6;

In the example code, the template allows references to a variable and the code in m1 and m2 actually provides the definition. When compiled, m1.obj and m2.obj would contain the symbols and a.obj would only contain external references to them. I'm sure there are some interesting corner cases that make this not so nice (and the definition syntax is down right ugly) but it sure would help in some cases.

No comments:

Post a Comment