Working on the ADVANCE Flow Engine for the
ADVANCE 7th Framework EU project we came across the need to have type comparison between XML schemas.
One reason is to allow the user to wire different but co- and contravariant input and output types instead of the classic type invariant approach. This co- and contravariance is a common property of most modern programming languages (e.g., Java 5+, C# 4.0+). Because the blocks communicate via XML, and the type of XML is described via XSD, we need a way to compare two XSDs: are they equal? Does A.XSD extend B.XSD?
The second reason is that we need to communicate with external web services and ourselves need to provide web service entry points. Having a co- and contravariant type defined instead of a classical fixed XSD should be a benefitial in the long term.
Therfore, our small group at
SZTAKI has figured out a way to think about and compare types described in XSD. See a prototype implementation at
the SourceForge site of the ADVANCE project.
I'll try to describe the idea for the schema comparison. Note that XSD has some support for restrictions and extensions on complexContent elements, but our approach does not rely on this kind of functionality, as many XSD type definitions are somewhat independently written. Our approach is based on the actual element-attribute-content structure of the schemas to be compared.
RelationsIn order to compare two things, we need a comparison method which may return the following results (see XRelation enum):
- NONE: the two things are not related
- EQUAL: the two things are the same
- EXTENDS: the first thing extends the second thing
- SUPER: the first thing is a super(class) of the second thing, or in other direction, the second thing extends the first thing (opposite as EXTENDS).
We will use these cases and count them at various levels to decide if a complicated type or capability (described later) corresponds to which relation after all.
Type is a set of CapabilitiesThe first abstraction we take is that we define an XML type (XType class) as a set of capabilities (XCapability class).
We should think about the XML elements, attributes and sometimes the content as a capabilities. This way, an element's type is the set of its attributes, child elements and content.
CapabilitiesCapabilities, like fields in a Java class can have name and a type, where type can be a simple type (int, string, etc.) or another complex type (and type are just a set of capabilities).
In addition, XML has the notion for cardinality, e.g. an element may occur in several forms (zero or once, once, zero or many times, once or many times, zero times), which property is also appended to the capability. Therefore, each capability is potentially a List of a simple or complex type: a monad (?!).
XSD to XTypeThe XSchema utility class features a simple XSD parser which turns the schema into an object graph of XType and XCapability objects. Of course, the XSD definition is very complicated and we are sure the simple parser does not cover all of its aspects in terms of where and how things can be described in XSD (in place, as a global type, a type among the node parents, etc.). Our parser currently expects schemas which descibe such XMLs, where an element may have attributes, a string-like content XOR other elements (e.g., mixed content is not handled).
Such XSD may be as follows:
<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'>
<xs:element name="class1">
<xs:complexType>
<xs:sequence>
<xs:element name="field1" type="xs:string"/>
<xs:element name="field2" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Comparing typesWe can define the type comparisons as basically a set theoretic contains-relation (with some special cases).
Type A extends Type B if
- Type B's capabilities are all present within in Type A's capabilities and
- each of those capability pairs, CA and CB, CA extends CB
If any of the capabilities has a SUPER relation, the two types are automatically considered not-correlated.
The super direction is just the test for Type B extends Type A.
The XType.compareTo(XType) method returns the relations above.
Comparing capabilitiesHow can we compare capabilities against each other? How do we now a capability CA is present in Type B?
The simplistic approach to compare the names of the capabilities (like field names), then compare their simple or complex types (recursively).
However, sometimes what we would think as capabilities are named differently in different locations: Just look at Java's size naming: array.length, String.length(), Collection.size(), getLength(), getSize(), size(), count() etc.
We can somewhat solve this problem by introducing aliases. If you look at the XName class, you'll find a name and a set of Strings which may be other aliases for the very same capability. If you are writing one of the XSDs, you may attach an <annotation> tag with a custom content which lists these aliases:
<element name='size' type='int'>
<annotation><advance:capability-aliases
xmlns:advance='http://www.advance-project.eu/schemas/xmltypesystem'>
length, count</advance:capability-aliases></annotation>
</element>
(As of writing this post, the XSchema parser does not look for such kind of additional data.)
However, the second problem rising is that capabilities named the same actually mean a different thing. Therefore, a secondary type dimension is attached to the XName construct dubbed Semantic Token (represented by an XSemantics class and a similar extra annotation).
Even if two names look like the same, their associated semantics might be incompatible, or one's semantics is actually an extension to the other's. Using the same comparison mechanism for types, capabilities and naming, this kind of extra information should help to determine the relations.
Of course, if your XSD types name similar things similarly, you don't need these optional annotations.
XSchema parser and test programThe XSD parser and test program XSchema gives a simple sample how to use this new XML-based type system.
The example has two types, where type2.xsd has an extra field defined for its class1 element.
Root element troubleYou may recognize, that the parser returns a type, which is basically a virtual outer type, which has a single capability named as the root node and describing its complex type.
This may be problematic in cases where the root node can be named as anything, and only ITS content should matter when comparing.
I think there is no problem, because you may start the A.compareTo(B) at this single capability's complexType node:
A.capabilities.get(0).complexType.compareTo(B.capabilities.get(0).complexType)The second smaller issue is with a root element which is only a simple type. In this case, you need to compare the valueType fields of the first capabilities.
We may later on specify convenience methods to do these things.
Is this type-thingy new?We hope so.
We applied the category theoretic approaches of duality and monads to develop this solution, which approaches themselves descend from last century mathematics and are used in programming language type systems.
Our opinion is that yes, this is can be considered as (computer) scientifically new.
But you might know better...