A Bottom-up Approach to discovering Governance Issues

by Rick Jelliffe

I've been making some presentations this week on XML Governance. The aspect of governance in particular is the promotion of evidence-based management, with governance involving higher-level management asking lower-level management "What objective evidence do you have that you are taking care of issue X?" The trouble is that it is very difficult to come up with a good list of Xs.

So the approach I am suggesting is that as well as the top-down approach, there also can be a bottom-up approach where you invert the question so that we ask "Given that we have these technical artifacts (e.g. XML), what information can we extract from them and what issues can it be used as evidence for?" In this way, we come up with a list of the issues for which there can be objective evidence, and management can cherry pick the issues which are useful,

One case-history I gave was of a markup operation who installed context-based full-text indexing system. They started using it in an entirely different way to the way we had expected: they went through the list of words that had been marked up as keywords and then looked for every instance of that word where it wasn't marked up as a keyword. This allowed them much better consistency.

But taking my bottom up suggestion, it also can be used as a governance input: the technologists first report "It is possible to get evidence (a measure or metric) of the words that have not been marked up correctly" which then allows the managers to trace to or add the business requirement "All keywords should be marked up" which in turn leads us to the governance requirement "How to you prove that this business requirement that all keywords should be marked up is being met?"

Things like the Extensibility Manifesto may help formulate useful issues for governance, but it is top-down. And the trouble with top-down is that sometimes tracing from issue to evidence peters out or stalls on the way down: a fine sounding abstract requirement that is unmeasurable. Now this is, of course, the basis of many of my company's tools and my work on validation and metrics: concentrating on the possibilities that XML allows for evidence gathering, and then trying to progress this upstream to management questions and the governance issues.

Roger Costello has been making a series of "best practice" papers on Schematron over at XML-DEV recently. While these are very important to think about and to gather intelligence about, in a sense bests practices represent a kind of middle-out or context-free approach, which I think can be criticized because abstract statements of principle move away from the worlds of evidence (at the low level) and governance (at the high level). For example, at the moment there is a discussion on whether it is better to embed Schematron schemas in XSD schemas or to have separate documents: a good question. (I have posted to say "well what about XSD types inside Schematron rules too?")

But my main comment was that perhaps whether a constraint is bundled with the grammar is kept independently should perhaps follow organizational lines: database people can look after static kinds of storage requirements, and analyst people can look after the business rules -checking. It may be that dividing constraints between schema documents should be based on who is looking after them. Now this would correspond to a management requirement "A separation of concerns should be implemented to reduce the intra-organization dependencies on data and applications." And the relevant governance question would be "How do you prove that you have a separation of concerns in your data and schemas?" And the evidence would be to trace from each constraint in a schema to the driver for that requirement, and showing that a particular schema only has traces to a requirements set by a single organizational entity.

So in summary the bottom-up approach starts with technical artifact (e.g. XML) then finds out what it can evidence ("what can I measure?") , then extracts potential management requirements which could be analyzed using that evidence, which then suggests possible governance questions. The bottom-up approach never descends in airy-fairy handwaving or impossible to implement abstractions: it seems a practical approach. The result will be partial: if you start with XML as the artifact you will get "XML Governance" issue raised. And the issue of "What can I measure?" leads directly into the worlds of complex validation (e.g. Schematron) and the need to develop good metrics in general.


2007-07-23 11:17:33
Here is one to consider. Even if we can trace it to a single entity, if we set up the constraints without due attention to the relationships, we still have trouble.

When sending reports for aggregation for some form of analysis based on events, typically you need a timeline. In the abstract formulation you have an 'event' tracking system. An analyst might pull out a report and start checking for datetime fields. But which datetime field is the actual event datetime? Is it the time of report filing? Is it the time of encounter (in disease systems, literally when the patient is first seen and diagnosed)? If the report time is used as the event time, what is the effect of that on threshold signalling for alerts?

And so on. A report is often a compendium of separate events stamped and labeled but sometimes not adequately documented, and worse, represented in say a mapping system without adequately proofing that the events plotted have a causal relationship if the datetimes are confused. In fact, a single report may actually be presenting an n-dimensional set of timelines with little or no correlations other than they are in the same pdf.

If de-identification processes are applied (removing all trace of the actual person for the sake of privacy), this can become more confused as the statistics are rolled up. Examples of systems where this can be seen are NIBRS and possibly some health reporting systems for syndromic surveillance. In a launch-on-warning philosophy, this is bad juju.

Organizational bundling is the right approach until one encounter bleeds across several organizations and the de-identification blanks that out (the case of several responder systems reporting the same event separately or failing to discover it is one event as in the case where one event cascades out to cause multiple events of different types all called in separately). Then the event model has to account for bottom-up bundling into the groups and a rectification process is initiated potentially across multiple real-time command and control systems.

Rick Jelliffe
2007-07-23 14:43:19
Len: But the bottom up approach is based on what can be evidenced from an artifact: so you probably would expect de-identified documents to trace to different managerial and governance issues than documents with more data. I am not trying to say that XML-based inference of management and government issues will generate a complete set of issues, just an evidence-based set.
2007-07-24 06:07:12
And I agree with that. I'm not being clear.

What I notice is that bundling and filtering change that evidence by breaking chains and possibly hiding superstitious or mistaken associations. It is a case where the evidence itself is misleading. This would be just as true if the representation were datasets, CSV, etc. Co-occurrence values help to discover these but it is better if these are introduced early as the bottom is created.

Hmmm... are co-occurrence constraints an example of top down or bottom up design or neither? Often we are not designing a complete system. We are creating a piece of it and that tends to be bottom up. It is as I try to hook up to near neighbor systems that I often discover the collisions of semantics and relationships. In this case, it was the imposition of two different abstract layer entity sets that I have to merge somehow that made me take a second hard look at the event types and where they instantiate in the timelines and realized that reasoning over these is better in one abstraction than the other but not really well documented in either. In this case systems of datasets designed for statistical inferencing are combined with systems for real-time control and notification. I haven't thought that through to determine if that is a special case or one that occurs when any two abstract classes are merged.

If the abstract classes (say HL7 systems) are mapped into event type abstractions, there is a collision of governance goals.