FOray
|
FOray Development: FOTree Design NotesContents
TimingThere are two major epochs in FOTree processing: 1) parse time, and 2) use time. One of the biggest design decisions is how to divide the work between these two. FOray has chosen to essentially defer as much work as possible until use-time (late binding), for the following reasons:
Some of these reasons are based more on gut feel than hard facts or examples, and they may frankly be wrong. The decision to do late binding is not written in stone, but reflects our best thinking to date. Comparisons between FOrays approach and some of the others is probably in order. One drawback to late binding is that it might tend to duplicate some processing, by forcing values to be computed more often than with early binding. We think that in most cases, the effect of this is insignificant, especially when considered against the tradeoffs of extra memory consumption (which also used processing time to allocate and garbage-collect). However, this bears more research. Also, in some cases, it may make sense to cache the results of a use-time computation rather than repeatedly computing the value. The computation of table column widths is an example where FOray currently uses this approach. It may be possible to make a more general solution available as an option in the future, but that is a low priority at the moment. Data StructureIf you are going to modify (as opposed to simply use) FOray FOTree, it is very important that you understand how the data is structured. Although FOTree attempts to follow the XSL-FO standard in the way that its classes are organized, the standard itself is relatively complex. There is a significant amount of detail, and several axes that must be handled in order to get the data stored and retrieved in a predictable manner. There are probably a large number of possible ways to accurately handle the job. Here is an outline of the hierarchy of data in the FOray FOTree: StreamRenderer (implements FOTreeControl) | |--FOTreeBuilder | |--Namespace | | | |--Classes for converting elements | and attributes into FO objects | and properties | |--Root | |--PageSequence | |--etc. The structure of the FO objects and the classes that represent them is fairly easy to follow, but the storage and retrieval of the property values is much more complex. The first principle to grasp is that although the API only exposes methods to obtain refined, computed traits, the internal storage of the data is of the raw property values. Of primary importance is the distinguishing between the following three concepts:
The abstract java superclass Property is the container object for each FO property. A collection of Property instances is stored in the PropertyList, which is attached to a PropertyManager, which is in turn attached to the FObj instance. Each Property instance stores its parent FObj, its type (stored as an integral) and its value. The integral type is important for at least two reasons. First, there is no one-to-one relationship between the Property subclasses and the property types. The subclasses are used used more for processing and programming convenience and not really at all for distinguishing between property types. The second reason for storing a type is that some performance efficiencies can be achieved by avoiding casting and instanceof operations when an object is searching for a specific property. Each property can have exactly one value, which will be a subclass of PropertyValue. PropertyValue instances can be:
One of the complexities of the standard is that certain functions and expressions are not designated in terms of the XSL-FO datatypes, but instead as Numeric. We have chosen to handle this complexity with a java interface called Numeric, which provides certain methods that are suitable to such functions and expressions. For example, IntegerDT, NumberDT, and LengthDT all implement this interface. NamespacesNamespaces are handled through the Namespace abstract class. Subclasses of Namespace are responsible for being able to convert elements and attributes in the namespace into instances of FObj and Property subclasses. There are some standard tools available to do that, but using these tools is not required. Before parsing begins, Namespace instances are registered with the FOTreeBuilder. During parsing, a list of registered namespaces is consulted, and the appropriate Namespace instance is then called upon to do the actual conversion to FObj and Property instances. FObj subclasses must identify which Namespace they belong to. For the standard namespaces, this does not require any extra memory, as the Namespace instance can be obtained pretty easily from the Namespace registry. However, for non-standard namespaces (i.e. those that you might add), you may need to cache the Namespace instance in the FObj subclass instance itself. Property instances do not explicitly know what namespace they are in. However, this information is implied in the propertyType variable. Although there is no formal mechanism or any enforcement, the propertyType values are segregated by namespace. Here are the ranges assigned to each of the standard namespaces: At the moment, namespace clashes are not a big issue at the attribute level. The only non-FO attribute that is supported is the "xml:lang" that is specified by the XSL-FO standard as an FO shorthand. However, we think that the infrastructure is robust enough to handle much more complexity within the scheme outlined above.
The propertyType variable is a short, so there are 65,536 possibilities. Please do not ever use items in the range between -100 and 0 or the standard ranges above. Non-standard namespaces (i.e. those not directly supported by FOray) are encouraged to use negative values to avoid conflict with future standard namespaces that might be added to FOray. It is possible that enforcement of these ranges may be added in the future. This can easily be done by simply passing the Namespace instance to the Property constructor, and requiring the Namespace subclasses to report the range of propertyType values that they are claiming. These ranges can then be checked for conflict as they are registered. Challenges in Creating an Independent FO TreeThe overriding challenge in creating an independent FO Tree is that fully resolving many FO Tree values is dependent on information from outside of the raw FO Tree data. One area where this seems to be true is Fonts. The FOray model wants FOTree to not be dependent on layout or rendering information, so that it can be reused in several render contexts. Since the availability of Fonts may differ from one render context to another, ideally Font resolution should be deferred until layout. However, some FOTree attributes depend on information from the resolved font. (The baseline and alignment properties in the Area Alignment property section are good examples). There are several potential ways to handle this:
ValidationThe order in which an FO object and its properties are created is important, especially when validation is considered. Here is the order in which the key events occur:
It may turn out to be useful to fire events at other places along the way. Note that the FObj.start() method runs in pre-traversal order and that the FOjb.end() method runs in post-traversal order. From all of this, the optimal times for various validation tasks on an FObj instance is as follows:
fo:marker and fo:retrieve-markerEssentially we are asked to graft a marker's content into a retrieve-marker location. This presents some interesting challenges. The brute force approach is to simply copy the marker content each time it is needed by a new retrieve-marker. This interfers with some of FOray's goals, especially leanness of memory use and the ability to round-trip the FOTree. So we are left with these major challenges:
fo:marker inheritanceOne solution considered is to lock the marker with a retrieve-marker instance, then release it when done. However, this is inefficient. The AreaTree must be involved, therefore it must, for every trait computation, search within itself to see if it is in a marker-generated area, then get the retrieve-marker, then lock the marker, then remember to undo it all when done. The solution implemented is to pass something through the FOTree that tells it how to do the subsitution when it is needed. The is FOContext interface provides a method that can return the appropriate RetrieveMarker instance to use when a Marker instance is found in the FOTree. AreaNode implements this. fo:marker line-breakingLine-breaking present a challenge, because it has its own abstraction of the data it needs. Some extra work was required to be able to pass FOContext information through the ine-breaking system, be used within that system, and then passed through to the processes that create Area instances. The solution chosen was to add subinterfaces for LineText and LineNonText to the FOTree system (FOLineText and FOLineNonText). These subinterfaces have methods that can be used to wrap real data items up with their context information, to that the context-aware values are used by the line-breaking, and then can unwrap them on the other side for Area instantiation. To DoKnown code deficiencies are recorded in the source code itself, and tagged with the string "TODO". General features that we would like to add include the following:
Resolved IssuesCompound properties can be created either with a short form or a complete form, or both. See Section 5.11 of the standard for an example using space-before. Coding either one is pretty straightforward, but handling the situation where both can occur in any order makes the code much more complex, requiring some method of keeping track of whether a component of the property was explicitly set, or whether an initial value was created. All of this can be avoided by ensuring that any short form is processed before any complete form. This can be accomplished by processing the attributes in alphabetical order. To accomplish this, we have chosen the expedient of a virtual sort of the attributes before they are processed. The SAX-generated attribute list itself is untouched, but a separate integer array is created which holds the order in which the attribute list elements should be processed. The sort itself occurs in int[] FOTreeBuilder.sortAttributes(Attributes attlist). Open IssuesThis section contains issues that we essentially have had to bypass in order to keep moving, but for which we did not want to lose track. They represent known deficiencies in the current FOray FOTree implementation.
Issues with the XSL-FO StandardRemarks in this section apply to the XSL-FO 1.1 Working Draft.
|