Previous Up

Chapter 3  The SchemaValidator and SchemaType Classes

3.1  The SchemaValidator Class

The public and protected operations of the SchemaValidator class are shown in Figure 3.1. First, I will discuss the public operations, and then the protected one.


Figure 3.1: The SchemaValidator class
// Access with #include <config4cpp/SchemaValidator.h>

class Configuration {
public:
    enum Type {CFG_NO_VALUE       = 0, // bit masks
               CFG_STRING         = 1, // 0001
               CFG_LIST           = 2, // 0010
               CFG_SCOPE          = 4, // 0100
               CFG_VARIABLES      = 3, // 0011 = STRING | LIST
               CFG_SCOPE_AND_VARS = 7  // 0111 = STRING | LIST | SCOPE
    };
    ...
};
class SchemaValidator {
public:
    enum ForceMode {DO_NOT_FORCE, FORCE_OPTIONAL, FORCE_REQUIRED};

    SchemaValidator();
    void wantDiagnostics(bool value);
    bool wantDiagnostics();
    void parseSchema(const char ** schema, int schemaSize)
                                        throw(ConfigurationException); 
    void parseSchema(const char ** nullTerminatedSchema)
                                        throw(ConfigurationException); 
    void validate(
        const Configuration *    cfg,
        const char *             scope,
        const char *             localName,
        bool                     recurseIntoSubscopes,
        Configuration::Type      typeMask,
        ForceMode                forceMode = DO_NOT_FORCE) const
                                        throw(ConfigurationException); 
    void validate(
        const Configuration *    cfg,
        const char *             scope,
        const char *             localName,
        ForceMode                forceMode = DO_NOT_FORCE) const
                                        throw(ConfigurationException); 
protected:
    void registerType(SchemaType * type) throw(ConfigurationException);
};

3.1.1  Public Operations

The overloaded wantDiagnostics() operation enables you to get and set a boolean property, the default value of which is false. If you set this to true, then detailed diagnostic messages will be printed to standard output during calls to parseSchema() and validate(). These diagnostic messages may be useful when debugging a schema.

The parseSchema() operation parses a schema definition and stores it in an efficient internal format. The schema can be specified as an array of strings plus the size of that array, or as a null-terminated array of strings. The parseSchema() operation will throw an exception if the parser encounters a problem, such as a syntax error, when parsing the schema.

After you have created a SchemaValidator object and used it to parse a schema, you can then call validate() to validate (a scope within) a configuration file. If you want, you can call validate() repeatedly, perhaps to validate multiple configuration files. The validate() operation merges the scope and localName parameters to form the fully-scoped name of the scope (within the cfg object) to be validated.

The recurseIntoSubscopes parameter specifies whether validate() should validate only entries in the scope, or recurse down into sub-scopes to validate their entries too.

The typeMask parameter is a bit mask that specifies which types of entries should be validated. For example, CFG_VARIABLES specifies that variables (but not scopes) should be validated.

By default, validate() respects use of the @optional and @required keywords in the schema. However, if you specify FORCE_OPTIONAL for the forceMode parameter, then validate() will act as if all identifiers in the schema have the @optional keyword. Conversely, FORCE_REQUIRED makes validate() act as if all identifiers without an "uid-" prefix in the schema have the @required keyword.

There are two versions of the validate() operation. The version with four parameters uses true for the recurseIntoSubscopes parameter and CFG_SCOPE_AND_VARS for the typeMask parameter.

3.1.2  Using registerType() in a Subclass

Later, in Section 3.2, I will explain how you can implement new schema types. If you implement new schema types, then you will need to write a subclass of SchemaValidator to register those new schema types. Figure 3.2 illustrates how to do this.


Figure 3.2: A subclass of SchemaValidator
#include <config4cpp/SchemaValidator.h>
using config4cpp::SchemaValidator;

class SchemaTypeDate { ... }; // Define a new schema type
class SchemaTypeHex { ... };  // Define a new schema type

class ExtendedSchemaValidator : public SchemaValidator
{
public:
    ExtendedSchemaValidator()
    {
        registerType(new SchemaTypeDate());
        registerType(new SchemaTypeHex());
    }
};

Registration of new schema types is trivial: the constructor of the subclass simply calls registerType() to register one instance of each of the new schema types.

Once you have implemented the ExtendedSchemaValidator class to register new schema types, your applications need only create an instance of ExtendedSchemaValidator (instead of SchemaValidator) to be able to make use of those new schema types.

3.2  The SchemaType Class

The SchemaValidator class perform very little of the validation work itself. Instead, it delegates most of this work to other classes, each of which is a subclass of SchemaType (shown in Figure 3.3). There is a separate subclass of SchemaType for each schema type. For example, the Config4Cpp library contains SchemaTypeBoolean, which implements the boolean schema type, SchemaTypeInt, which implements the int schema type, and so on.


Figure 3.3: The SchemaType class
// Access with #include <config4cpp/SchemaValidator.h>
// or          #include <config4cpp/SchemaType.h>

class SchemaType {
public:
    SchemaType(
        const char *               typeName,
        const char *               className,
        Configuration::Type        cfgType);
    virtual ~SchemaType();

    const char * typeName()  const;
    const char * className() const;
    Configuration::Type cfgType()   const;
protected:
    virtual void checkRule(
        const SchemaValidator *        sv,
        const Configuration *          cfg,
        const char *                   typeName,
        const StringVector &           typeArgs,
        const char *                   rule) const
                                   throw(ConfigurationException) = 0; 
    virtual void validate(
        const SchemaValidator *        sv,
        const Configuration *          cfg,
        const char *                   scope,
        const char *                   name,
        const char *                   typeName,
        const char *                   origTypeName,
        const StringVector &           typeArgs,
        int                            indentLevel) const
                                   throw(ConfigurationException); 
    virtual bool isA(
        const SchemaValidator *        sv,
        const Configuration *          cfg,
        const char *                   value,
        const char *                   typeName,
        const StringVector &           typeArgs,
        int                            indentLevel,
        StringBuffer &                 errSuffix) const;

    SchemaType * findType(
        const SchemaValidator *         sv,
        const char *                    name) const;
    void callValidate(
        const SchemaType *             target,
        const SchemaValidator *        sv,
        const Configuration *          cfg,
        const char *                   scope,
        const char *                   name,
        const char *                   typeName,
        const char *                   origTypeName,
        const StringVector &           typeArgs,
        int                            indentLevel) const
                                   throw(ConfigurationException); 
    bool callIsA(
        const SchemaType *             target,
        const SchemaValidator *        sv,
        const Configuration *          cfg,
        const char *                   value,
        const char *                   typeName,
        const StringVector &           typeArgs,
        int                            indentLevel,
        StringBuffer &                 errSuffix) const;
};

3.2.1  Constructor and Public Accessors

When the constructor of a subclass of SchemaType calls its parent constructor, the parameters specify the name of the schema type, the name of the class that implements it, and the configuration entry’s type, which is one of: CFG_STRING, CFG_LIST or CFG_SCOPE. You can see an example of this in Figure 3.4.


Figure 3.4: Example constructor of a subclass of SchemaType
SchemaTypeInt::SchemaTypeInt()
    : SchemaType("int", "config4cpp::SchemaTypeInt",
                 Configuration::CFG_STRING)
{
    // Nothing else to do in the constructor
}

Parameter values passed to the parent constructor are made available via the typeName(), className() and cfgType() operations shown in Figure 3.3.

The SchemaValidator class invokes registerType() to register an instance of each of the predefined schema types and, as previously shown in Figure 3.2, a subclass of SchemaValidator can invoke registerType() to register instances of additional schema types.

3.2.2  The checkRule() Operation

The SchemaValidator class invokes the checkRule() operation of an object representing a schema type when that type is encountered in a schema rule. I will illustrate this through the schema shown in Figure 3.5.


Figure 3.5: Example schema
1  const char * schema[] = {
2      "timeout = durationMilliseconds",
3      "fonts = list[string]",
4      "background_colour = enum[grey, white, yellow]",
5      "log = scope",
6      "log.dir = string",
7      "@typedef logLevel = int[0,3]",
8      "log.level = logLevel"
9  };

When parsing the first line of the schema, SchemaValidator invokes checkRule() on the object representing the durationMilliseconds schema type. When parsing the next line in the schema, the SchemaValidator invokes checkRule() on the object representing the list schema type, and so on.

Among the parameters passed to checkRule() is typeArgs (of type StringVector), which contains the arguments, if any, for the type. This parameter will be empty for the rules in lines 2, 5 and 6 of Figure 3.5. For the rule in line 3, typeArgs will contain one string ("string"); and for the rule in line 4, it will contain three strings ("grey", "white" and "yellow"). You might think that typeArgs should be empty for the rule in line 8. However, the logLevel type used in line 8 was defined in line 7 to be int[0,3]. Because of this, when checkRule() is called for the rule in line 8, typeArgs will contain two strings ("0" and "3").

The implementation of checkRule() must determine whether the strings in typeArgs are valid, and throw an exception containing a descriptive error message if not. For example:

Deciding whether the typeArgs parameter contains acceptable strings is the primary purpose of checkRule(). Most of the other parameters are provided to help checkRule() make that decision and to format an informative exception message if necessary.

One of the demonstration applications provided with Config4Cpp is called extended-schema-validator. That demo contains a class called SchemaTypeHex that implements a hex (hexadecimal integer) schema type. That class’s implementation of checkRule() is shown in Figure 3.6. A bold font indicates how the operation makes use of parameters.


Figure 3.6: Implementation of SchemaTypeHex::checkRule()
void SchemaTypeHex::checkRule(
    const SchemaValidator *  sv,
    const Configuration *    cfg,
    const char *             typeName,
    const StringVector &     typeArgs,
    const char *             rule) const throw(ConfigurationException)
{
    StringBuffer             msg;
    int                      len;
    int                      maxDigits;

    len = typeArgs.length();
    if (len == 0) {
        return;
    } else if (len > 1) {
        msg << "schema error: the ’" << typeName << "’ type should "
            << "take either no arguments or 1 argument (denoting "
            << "max-digits) in rule ’" << rule << "’";
        throw ConfigurationException(msg.c_str());
    }
    try {
        maxDigits = cfg->stringToInt("", "", typeArgs[0]);
    } catch(const ConfigurationException & ex) {
        msg << "schema error: non-integer value for the ’max-digits’ "
            << "argument in rule ’" << rule << "’";
        throw ConfigurationException(msg.c_str());
    }
    if (maxDigits < 1) {
        msg << "schema error: the ’max-digits’ argument must be 1 or "
            << "greater in rule ’" << rule << "’";
        throw ConfigurationException(msg.c_str());
    }
}

The only parameter not used in the body of the operation is sv, which is of type SchemaValidator. That parameter is used by the checkRule() operation in the list, table and tuple types when invoking findType() to determine if items in typeArgs are names of types.

3.2.3  The isA() and validate() Operations

Subclasses of SchemaType should implement the isA() and validate() operations. However, the default implementation of isA() is suitable for list-based types, and the default implementation of validate() is suitable for string-based types. Because of this, a subclass of SchemaType needs to implement only one of these two operations.

3.2.3.1  String-based Types: isA()

If you are providing schema support for a string-based type, then you must implement the isA() operation. Among the parameters passed to this operation is a string called value; the isA() operation should return true if value can be parsed as the schema type. For example, the SchemaTypeInt::isA() operation returns true for "42" and returns false for "hello, world".

If isA() returns false, then the operation can optionally set the errSuffix parameter (which is of type StringBuffer) to be a descriptive message that explains why the string is not suitable. This message will be appended to an exception message.

Figure 3.7 illustrates how isA() might be implemented for a schema type that denotes hexadecimal integers. A bold font indicates how the operation makes use of parameters. This implementation of isA() contains two straightforward checks. First, it checks whether value consists of hexadecimal digits. Second, if typeArgs specifies a maximum number of digits, then isA() checks if this limit has been exceeded.


Figure 3.7: Implementation of isA() for a hex type
bool SchemaTypeHex::isA(
    const SchemaValidator *  sv,
    const Configuration *    cfg,
    const char *             value,
    const char *             typeName,
    const StringVector &     typeArgs,
    int                      indentLevel,
    StringBuffer &           errSuffix) const
{
    if (!isHex(value)) {
        errSuffix << "the value is not a hexadecimal number";
        return false;
    }
    if (typeArgs.length() == 1) {
        //--------
        // Check if there are too many hex digits in the value
        //--------
        int maxDigits = cfg->stringToInt("", "", typeArgs[0]);
        if (strlen(value) > maxDigits) {
            errSuffix << "the value must not contain more than "
                      << maxDigits << " digits";
            return false;
        }
    }
    return true;
}

bool SchemaTypeHex::isHex(const char * str)
{ ... } // implementation will be shown later in this chapter

3.2.3.2  List-based Types: validate()

Config4* has three built-in, list-based schema types: list, tuple and table. Each of these schema types takes arguments, for example:

const char * schema[] = {
    "@typedef money = units_with_float[\"£\", \"$\", \"€\"]",
    "fonts      = list[string]",
    "point      = tuple[float,x, float,y]",
    "price_list = table[string,product, money,price]"
};

Each of those list-based schema types implements validate() in a similar way, so I will discuss only the implementation for the table schema type, using the definition of price_list in the above example.

If you want to implement schema support for a list-based type, then you should implement the validate() operation in a manner similar to that described above. I recommend that you examine the source code of the SchemaTypeList, SchemaTypeTable or SchemaTypeTuple class for concrete details.

3.3  Adding Utility Operations to a Schema Type

The infrastructure within Config4Cpp to support a built-in data type is split over three classes:

In this chapter, I have explained how you can provide schema validation support for a new type by writing a SchemaType<Type> class and registering it in a subclass of SchemaValidator. However, I have not yet explained how you can write a subclass of Configuration to implement the lookup<Type>(), is<Type>() and stringTo<Type>() operations.

The Configuration class is an abstract base class, and its static create() operation creates an instance of a hidden, concrete subclass. This enforces a separation between the public API and the implementation details of Config4*. Most of the time, this separation is beneficial. However, it has a drawback: you cannot write a subclass of Configuration to add additional operations, such as lookup<Type>(), is<Type>() and stringTo<Type>().

A good way to workaround this drawback is to define the desired functionality as static operations in the SchemaType<Type> class. For example, if you are writing a class called SchemaTypeHex (for hexadecimal integers), then you can implement lookupHex(), isHex(), and stringToHex() as static operations in the SchemaTypeHex class. This is illustrated in Figure 3.8.


Figure 3.8: Utility operations in the SchemaTypeHex class
class SchemaTypeHex : public SchemaType
{
public:
    SchemaTypeHex()
        : SchemaType("hex", "SchemaTypeHex", Configuration::CFG_STRING)
    { }
    virtual ~SchemaTypeHex()  

    static bool isHex(const char * str)
    {
        int                   i;
        for (i = 0; str[i] != ’\0’; i++) {
            if (!isxdigit(str[i])) { return false; }
        }
        return i > 0;
    }

    static int lookupHex(
        const Configuration * cfg,
        const char *          scope,
        const char *          localName) throw(ConfigurationException)
    {
        const char * str = cfg->lookupString(scope, localName);
        return stringToHex(cfg, scope, localName, str);
    }

    static int lookupHex(
        const Configuration * cfg,
        const char *          scope,
        const char *          localName,
        int                   defaultVal) throw(ConfigurationException)
    {
        if (cfg->type(scope, localName)
            == Configuration::CFG_NO_VALUE)
        {
            return defaultVal;
        }
        const char * str = cfg->lookupString(scope, localName);
        return stringToHex(cfg, scope, localName, str);
    }

    static int stringToHex(
        const Configuration * cfg,
        const char *          scope,
        const char *          localName,
        const char *          str,
        const char *          typeName) throw(ConfigurationException)
    {
        unsigned int          value;
        StringBuffer          msg;
        StringBuffer          fullyScopedName;
    
        int status = sscanf(str, "%x", &value);
        if (status != 1) {
            cfg->mergeNames(scope, localName, fullyScopedName);
            msg << cfg->fileName() << ": bad " << typeName
                << " value (’" << str << "’) specified for ’"
                << fullyScopedName;
            throw ConfigurationException(msg.c_str());
        }
        return (int)value;
    }
protected:
    ... // checkRule() and isA() were shown earlier in this chapter
}

With this technique, application code can call cfg->lookup<Type>() for built-in types, but must call SchemaType<Type>::lookup<Type>() for other types. For example:

try {
    logFile = cfg->lookupString(scope, "log.file");
    timeout = cfg->lookupDurationMilliseconds(scope, "idle_timeout");
    addr    = SchemaTypeHex::lookupHex(cfg, scope, "base_address");
} catch(const ConfigurationException & ex) {
    cerr << ex.c_str() << endl;
}

Previous Up