Sunday, May 31, 2009

Serialization for D part 5 of N

I figured out why the hard-link stuff wasn't working as I was expecting; I had the annotation on the referring type rather than the referenced type. Fixing that wasn't much of a problem and cleaned up some other code in the process.

Well, with that, I think I'm done with all the easy parts.

The next bit I'm planning on tackling is 3^rd party types. That is how to serialize types that you don't control and can't add stuff to.

I don't really like any of the options I've come up with but I think I've got something that will work. If anyone has any better ideas, I'm interesting in hearing them.

The Compile Time Option

It would be nice to make it just a call to function like Serialize(value,sink); but that only works as long as all the overloads are all in the same module.


module t1;
int Baz(int){return 1;}


module t2;
int Baz(float){return 2;}


import t1;
import t2;
import std.stdio;

void main() { writef("%d, %d\n", Baz(cast(int)0), Baz(cast(float)0)); }


m.d(5): Error: t1.Baz(int) at t1.d(2) conflicts with t2.Baz(float) at t2.d(2)
m.d(5): Error: undefined identifier Baz

Several other ideas also fall afoul of this issue forcing me to conclude that their is no way to, at compile time, cooperatively resolve overloads. That is, at some single point in the code, the user needs to fully state every source of overloading. I really don't want this.

The Run Time Option

The next thought is to use run time lookup (an AA based on T.stringof or T.mangleof) to locate the processing functions. I already use this to extract derived types. In that case I at least know that everything in question is an Object but for this case I don't. Another issue I don't like here is that I ether need to do unsafe casts on function pointers, data pointers or return types. I think I can make sure this is valid if all access is thought template accessors but I'd rather not have to. And finally, this option is to hard to test because you can't be sure it's right until you exercise every type, including those in third party libs that might change on you.

The Boot Time Option

The option I think I'll go with sort of fakes the run time option but with better compile time support. The way I'm planning on working this is to have a ThirdPartyAccess(T) template that defines global function pointer variables for Serializing and Deserilizing. These function pointers are accessed freely at run time to do the work. As for the derived type setup, these will be populated at program startup by static constructor generated when the used defines what function to use for a given type:


module TheirTypes;

import SomeTypeLib;

void SerSomeType(SomeType, Sink!(char)) { ... }
SomeType DeserSomeType(Source!(char)) { ... }

mixin WorkWith!(SomeType, SerSomeType, DeserSomeType);

I still don't like this because I can't figure out a clean way to be sure each and every function pointer is populated exactly once. The best I have come up with is to have a -debug switch that turns on check code in a static destructor to verify all the pointers are non-null. This has the odd side effect or making it a good idea to put in a quick start-n-stop feature to exercise this code. At least, a with static constructor ordering, this will fail reliably when things are wrong.

The Link Time Option

Now, I think I can make the boot time option work but I don't like it that much better than the run time option so I'll sketch out a feature I'd like to have for this use case. For lack of a better name I'll call it extern templates. The idea is stolen from C's extern global variable. In effect, let the user use whatever variable they want, define them wherever they want and let the linker sort it all out:


module a;
template Foo(T) { extern int V; }

int SomeFn()
{
   writef("%d, %d\n", Foo!(int).V, Foo!(float).V); // prints "5, 6"
}


module m1;
import a;

Foo!(int).V = 5;


module m2;
import a;

Foo!(float).V = 6;

In the example code, the template allows references to a variable and the code in m1 and m2 actually provides the definition. When compiled, m1.obj and m2.obj would contain the symbols and a.obj would only contain external references to them. I'm sure there are some interesting corner cases that make this not so nice (and the definition syntax is down right ugly) but it sure would help in some cases.

Thursday, May 28, 2009

Oh, Man I want one of these!!

This "in house" print on demand system prints books faster than some places can make you coffee (and they call it the "Espresso machine")

Wednesday, May 27, 2009

Serialization for D part 4 of N

There added isn't much this time. The biggest thing is support for repeated objects (think hardlinks). Aside from that it's mostly cleanup and renaming (breaking changes, sorry) and a few tweaks to the Source/Sink interfaces.

One of these days, I really need to get around to improving the comments...

source link

Monday, May 25, 2009

Serialization for D part 3 of N: with Code!

I have published reasonably working version posted here. It's very much alpha and I haven't documented it much at all.

What works is user defined types (structs and classes) where every member is to be serialized. The one fun case that works is driveled classes (that wasn't as hard as I though it would be) as long as the base class is also serializable.

A quick tutorial

First, invocation:


void main()
{
 // something to work with
 SS ss = SS(42, 2.717, new DS(1));

 // A data sink
 Si c = new Si();

 //------- Run the serializer
 marshal(c,ss);

 // show the data
 writef("%s\n", c.get));

 // make a Source from that data
 So s = new So(c.get)

 //------- deserialize stuff.
 SS sd = SS.DemarshalMe();

 // show the results
 writef("%s\n"%s\n", sd);
}

To serialize something, you need a data sink to dump it into and to deserialize something you need a source to pull from. In this example, I use an basic sink that just stores into a buffer and a source that pulls from one.

Next, The data types.


/// A struct to serialize
struct SS
{
 mixin Serializable!();

 int foo;
 float bar;
 CS cs;

 char[] toString() { return format("<%s,%s,%s>", this.tupleof); }
}

/// a base class to serialize
class CS
{
 mixin Serializable!();

 int i = 5; float j = 3.1415926;

 this(){}
 this(int) { i++; j++; }

 char[] toString() { return format("<%s,%s>", this.tupleof); }
}

/// a derived class to serialize
class DS : CS
{
 mixin Serializable!();

 real f = 2.717;

 this(){}
 this(int) { f++; super(0); }

 char[] toString()
  { return format("<%s,%s>", super.toString()[1..$-1], this.tupleof); }
}

Friday, May 22, 2009

Serialization for D part 2 of N

Well I did a quick hack and got a really basic version going. It only handles XML (with some ugly formatting restrictions) as the format and is restricted to structs, classes that are used like structs, and basic numeric types.

The up side is that the API/usage is just like I was expecting to do it and I didn't run into any problems. The down side is that I was kinda expecting it to take longer that this (all hyped up and nothing to do).

The next bit I'm going to work on, is to loosen up the XML formatting, but that should be easy. After that I'm going to dive into making polymorphism work.
On that note, I've thought of a few more issues I will need to deal with:

Public and Private Base Class Members

The off the cuff solution is to call base.Serialize either before or after dealing with all the direct members. But that doesn't address how to figure out what the base class members are.

Template Types.

The primary problem here will be making sure the correct type is deserialized, a.k.a. error checking. For internal nodes of struct types this shouldn't be a problems because there it is all static so if it's correct at the top level, it's correct throughout.

Template Types + Polymorphism

This adds another wrinkle to that problem. The wrinkle pops up where a template class derived from a non template class. This can result in trying to deserialize a type that wasn't instantiated in the code doing the deserialization.

Unions

This one is going to be nasty. I think I'll just ignore it for now. Later some sort of callback might make it work. As I said, not now.

Reading between the lines.

I just ran across this news article. The up shot of it is that a 13 year old kid in Minnesota who is suffering from Hodgkin's lymphoma, attempted to refuse traditional treatment for it (possibly for religious reasons) apparently with the approval of his mother. Somehow it ended up with a court ruling that he could not refuse treatment because his doctors said he will die without it. After that,he and his mother fled for Mexico.

Now the fleeing the country bit might be illegal depending on the custody situations but who cares. The biggest piece of the story in my opinion is that a court decided that it has the authority to overrule someone on what happens to there own body! (and this is barely even mentioned until halfway down the page)

Now I'll grant that there might be more to it, for instance; dad says, do the treatment, mom says do what the kid wants and the court goes with dad. Even if that is true, from the way it's written it seem that the court ruled based on what it thought was best for the kid, not based on who it thought should decide. And from the way it's written, it seems the author this that is just fine.

Wednesday, May 20, 2009

Finally, some rational thought in mainstream media!

This guy, who seem to known what he's talking about most of the time, got quoted by CNN about post 9/11 airport security.

As he puts it, the mentality seems to be "cover your ass". Nobody in a position to do anything is willing to say "no" for fear that something will happen and they won't be able to say they did everything they could.

This sort of reminds me (by contrast) of the time I was talking to a Philosophy masters student and after a bit of back and forth I asked him the question "Would you rather have security measures that make you feel safe, that you know are utterly useless or ones that don't make you feel safe and actually make you safer?" To my utter shock, he asked for the first.

Serialization for D part 1.1 of N: comments

This is as much notes to my self as anything:

Thanks to Chad for pointing out that I need to consider what license I will be using. (I'm planning of being very permissive there.)
Several other people have pointed me at their own work. I guess I'll have to check what license they use. OTOH, while I really want to make a good usable library, this is just as much a "lets have fun with template" project.
Tim Matthews pointed out another source/sink I will want to consider: databases. Being able to use a byte array would cover that one I think.

Tuesday, May 19, 2009

Serialization for D part 1 of N

I'm planning on starting work on a template based serialization liberty for D. The objective is to be able to, from a single line or two of code per type, generate the needed function to stuff complex types into a file, network socket, byte buffer or whatever. At this point I'm still puzzling out what the interface will be.

What will using this library with a type look like?
What will invoking the code look like?
What format will be generated?
What will the source or destination for the data look like?
How to work with polymorphic types?
What types of limitations will it impose?

Taking each of these in the order I thought of them in

Usage

Ideally I want using the library to be a simple as possible, a single line of code would be best. For built in types, this is easy as the system should "just work". User defined types will be the more interesting case. I expect I will want to build several different solutions for different cases. For example, in the case where everything should be just slurped up or spit out, it should be as a simple as no arguments mixin:

 class C
 {
  int i;
  someClass c;
  mixin Serializable!();
 }

Other cases, for instance where some values need to be omitted or where proper use of constructors needs to be observed, could use other approaches.

Process only the members on a given list
Process only the members not on a given list.
Deserialize by calling a constructor on the given values
Tag nodes on the way out and resolve references so that multiply referenced objects come through correctly resolved.
Various combinations of the above

One other bit that might work is a module level "make everything serializable" template that does default serialization of every type in the module. The problem with this is that to make it work the system as a whole will need to deal with both external and internal definitions of serialization functions. On the other hand, this will be needed anyway for built in types and maybe, 3^rd party library.

One minor point is how to structure everything. Because it has a lot of redundancy in the different combinations, some way to reuse code is needed.Something like the recursive self mixin trick might do well.

Invocation

The hard part here is the naming (one of the two most difficult tasks in programming). For now I'm just going to go with the simple solution:

 MyType myType;
 myType = MyType.Deserialize(source);
 myType.Serialize(sink);

If anyone has a better idea what to name them I'm open to suggestions.

The other side is how to structure adding serilization for external in types. Built in types are easy, a simple template function that get called if T.Serialize doesn't exits (or the other way around). The harder cases is 3^rd part types as (last I checked) getting template to overload from several sources is hard.

Format

For this there is a few interesting option classes:

Machine specific binary
Machine neutral binary
Text: XML/Jason/YAML/CSV

For starters, I'm going to do just XML as I can read it so debugging will be easier. After that, I might go for a machine specific binary format and try to abstract out the rendering parts to reduce redundancy.

Data Source/Sink

This is one point I'm not sure on. What should be the primitive source and skink? A Phobos or Tango stream would be one option, but it would be nice to be Phobos/Tango agnostic. Also, might other systems be desirable? I think I'll try and keep the implementation dependent code for this loosely coupled and well contained so switching later is easier.

Derived Types

One major problem will be how to handle polymorphic instances. The best I can think of is to have them embed a type tag into the output stream and then use some sort of tag to delegate map to deserialize the stream. This has some interesting implications on the operation; for instance with XML, it indicates that the outermost tag of a type should, at least in part, be serialized by the child type but deserialize by the parent type. One options is, given these types

 class C { }
 class D : C { int i; }
 class E : C { float x;}

 struct S { C ofTypeC; }

an instance of type S is serialized as:

 <S><ofTypeC typeis="D"><i>42</i></ofTypeC></S>

This would have S.Serialize output up to the attribute and then hand off via a virtual Serialize call to the member. On the other hand, S.Deserialize, would consume the full ofTypeC tag and then use the attribute to call the correct Deserialize function. This clearly ends up adding a some annoying overhead so some way to detect (or indicate) that no derived types will be used (and check to make sure that is so) would be nice.

Limitations

One thing I'm not going to handle is marching references for aliasing and slicing of arrays. Just because something is a slice of something else on one side, doesn't make it so on the other.

Aside from that, I don't known what limitations I'm going to impose, but I'm sure I'll find out as I go along.

Thursday, May 14, 2009

Automatic Discretization of a 2D Geometric Obstacle Map

The following is a report I put together for a Technical writing class about some work I did for an "AI for mechanical engineer" class.

Abstract:

The problem of routing pipes and wires around real world obstructions falls under the category of path finding and as such tends to be a computationally complex problem. It can become even more complex than the classical single path problem if many paths need to be found and the choices for them interact, for instance thought space constraints, electronic interference or crosstalk. One approach to reducing the computational load is to reduce the geometric map to an abstract discreet graph and select and evaluate options categorically. This report considers and evaluates techniques for doing this conversion.

Introduction

A number of system layout problems can be broken down into two steps; positioning the major components, and routing the connections between the components. In order for the results from the first step to be automatically evaluated, the second has to be automated. This problem is known as the path finding problem and is known to in general, be of exponential complexity. One way to make this problem more tractable is to convert the obstacle map into a much simpler mathematical graph resulting in a much smaller exponential complexity problem. This will allows more options to be considered and more complex problems to handled.

I have considered two solutions to this problem and, after producing partial solutions to both, determined that one was significantly better and continued that solution to a fully working proof of concept.

The final results only demonstrate a concrete implementation of the concept showing that it is reasonable to compute and that it generates reasonable results. The implementation doesn’t attempt to be more than reasonably efficient.

Discussion

Options Considered

I considered two basic operation principles. The first principle works by finding local maxima for the closest obstacle function. That is to locate positions that can’t be moved away from any nearby obstacles without moving closer to another one. Intuitively this is like inflating a balloon until it fills some region and calling its center a node. These positions were located My implementation used simple hill climbing but partial swarm optimization or simulated annealing might provide better performance.

The second solution worked by locating “pinch points” and treating them as region boundaries. This can be though of as defining rooms by picking what will be counted as a door. The pinch points (doors) were identified by selecting from the lines of closest approach between pairs of obstacles.

General Results

The balloon solution was simpler to implement and, based on preliminary results, tended to generate reasonable results. To generate reliable results over the full map it required many starting locations. This caused it to run very slowly, hampering its usefulness. This might be overcome if some type of dynamic density function could be uses to select more starting locations in smaller passages and fewer in open areas.
The pinch point solution was more complicated to implement but also generate reasonable results and ran many times faster. Based on these results, I chose to only finish implementing this solution.

Restrictions
The final algorithm assumes an input map consisting of non overlapping, convex polygons. The implementation imposes a few more restrictions:

The line of closest approach between any two polygons must not share an endpoint with any other for another pair or polygons.
The map must be bounded by a known set of polygons, in a known order.

In practice these might be unnecessary restrictive, for instance some concave polygons might work and some cases of shared line endpoints will work, but the exact conditions where this is allowed are harder to define. Also, improvements in the implementation could allow further relaxation of these requirements.

Methods

Figure 1: The example input map

I tested the program on an example map was selected to provided a variety of features and scales. These test helped in the development of the program by giving concrete results and by reveling a number of flaws (that were fixed) and limitations (that need to be avoided) during the development process.

Figure 2: Minimal separation lines

The algorithm operates in several stages. The first stage is feed the initial map and finds the shortest line between each pair of obstacles. Only lines that don’t cross inside of another obstacle are kept.

Figure 3: Region boundary selections resulting in a plainer graph.

This set is then filtered to remove crossing lines with a preference for keeping the shortest lines. This results in a plainer graph with the obstacles as nodes and the lines as edges.

Figure 4: Map with node selection

The interior regions are then identified from this. My implementation does this using a wall following technique. Because all the connecting edges don’t connect to a single point this is somewhat complicated and somewhat of a kluge. To ensure that all interior regions are found, the edges are scanned and each one is traversed exactly once in each directions (this can also be thought of as on both sides) skipping over cases where the edge has been traversed from a previous starting point. This requires either that the exterior edges all be marked from the start as traversed or that the wall following method correctly follow the outside as well.

Figure 5: Final graph with nodes and connections

Once the interior sections are identified, the connectivity graph can be found by connecting any two nodes that are bounded from the same link.

Complexity Analysis

Finding all links and filtering on overlap with polygons: O(n³) maybe closer to O(n²)
 The number of links is O(n²) and each needs to be checked with each polygon giving the upper case. Better data structures could reduce the cost of the intersection test but not to a constant cost.
Filtering links for overlap: O(n³) maybe closer to O(n²)
 Each of the O(n²) links needs to be checked with each previously accepted link. Given that the result is a plain graph, only O(n) links can be accepted giving the upper bound. Again, with better data structure this might go even lower, but not to a constant.
Region selection: O(n)
 Each of the O(n) links will be traversed twice and each traversal has a cost (for finding the next link) proportional to the O(1) average number of edges per node.
Connect the regions: O(n)
 This can be done in constant time per link.

The worst case, the solution is O(n³), however this might be able to drop to nearer O(n²). Given that this is a precursor to a O(b^d) operation and is cutting back on both b and d, even the worst case result is a clear win for any reasonably sized problem.

Conclusion

My work has show that the basic algorithm of segmenting the map along lines of closets approach is sound. However it also reveals some weaknesses in my implementation. The largest weakness is that the wall following solution is not robust enough to deal with the edges of the map. As this part was not my primary interest I didn’t put much work into it. For a production implementation, it would be worth while to spend some time exploring better options.

Wednesday, May 13, 2009

Hm... this sounds familiar

Jon Skeet: Coding Blog -- Language proliferation

Sounds like MS is creating support of a large number of languages on .NET. They are still GPLs but it's looking where I think they should go.

I think I started with the wrong lable on this one, what I'd like to see is not a pile of domain specific languages (DSLs) but special purpose languages (SPLs).

Friday, May 8, 2009

Turnabout's fair play

Quoting from a letter to Scientific American by Oscar Estévez about Creationism, Intelligent Design (ID) and the theory of evolution:

The questions they [students] should ask are, Does ID make predictions? And can those predictions be tested? If the answers to both are negative, they themselves can conclude that ID is "only a nonscientific theory."

I'll ask the same in return:

Does the theory of evolution make predictions? And can those predictions be tested? If the answers to both are negative, you can conclude that the theory of evolution is "only a nonscientific theory."

I challenge Mr. Estévez to point to point to a predictions made by the theory of evolution that can be tested and where the test will show it to be either true or false.

Note that there are a number of predictions made by the theory of evolution than can be show to be true but can't be show to be false. For example, failing to demonstrate the creation of new species can always be discounted on the grounds of "We just didn't run the experiment long enough" or "we haven't found the right fossil yet".

Wednesday, May 6, 2009

Pilotless airplanes and securety implications.

This article brings up some thoughts from Israel about making airlines without a cockpit. The tone is a bit condescending (I don't think the author likes the idea, and I'd be hesitant to fly on one my self) but down at the bottom is a comment that is down right caustic, starting with:

I hope those science engineers had figured out the motives of Al Queda for taking control of one of these unmanned planes.

and continuing with concerns about someone getting remote access to the airplane and using it for mischief. Now that is one thing I'm not to concerned about. For one, the flight program on an unmanned airliner would be one of the most carefully designed programs ever written. If you can count on anything, ti would be that it can't be overridden remotely. As for other attacks, in this day and age, if you can access the avionics software of a plain, I'll bet you can lockout all the cockpit controls and fly it into whatever you want so I don't think there is any more risk cropping up here than we have right now. Frankly, I suspect physical takeover of a maned plane is more likely.