Database self-documenting Wiki

From Artesano
Jump to: navigation, search

Some background

When it comes to documentation IT-people tend to be lazy. This is quite true if they happen to be software developers or database designer and administrators or sysadmins. Java developers have the advantage of having JavaDoc at their disposal. This enables them to enter comments directly in the source code and then generate a html documentation. It also requires discipline, but it's a considerable improvement from no documentation at all or the need to scan the source code to find the information. Unfortunately database engineers don't have such a utility and often start typing information in documents or other centralized information system, such as a wiki.

In my case I started using the capabilities of psql to describe objects and copied the results in the wiki. That is a good approach if you have a stable environment and few changes. Nevertheless it can happen too quickly and too easily that some changes remain undocumented, such as new columns in a table, new constraints or indexes on a table or renaming of objects. For that reason I was looking for a simpler and more efficient solution. This is the topic of this article.

Description

The idea is to use 2 Extensions available for MediaWiki to extract the information directly from the database. The information from the database is pre-formatted and delivered through a function. This enables to modify the way information is displayed for each object in a single place (the function's body). Additionally you can have a user for the wiki that can only use this function, adding a bit of security to the system.

Minimal Wiki version: 1.17+

This extension implements mechanisms to retrieve data from a variety of data sources, including databases. Below in the section on Implementation I will describe the capabilities that I use to retrieve the description of database objects directly from a PostgreSQL database. If you want to use ExternalData for other data sources or if you need other features, then follow the link above to the detailed description of the extension.

Minimal Wiki version: 1.19+

This extension is not strictly necessary. It disables wiki caching for a specific page, rebuilding the html each time the page is refreshed, either clicking on the browser's refresh button or simply landing on the page. Notice that this does not disable client side caching, if any is in place.

In stable environments, where changes are seldom you can easily avoid using this extension and refresh the cache manually when you know that changes have been made adding ?action=purge to the link to the page to be re-cached.

The functions in the downloadable scripts are work in progress (let me know if you improve them). I will describe them in more detail as soon as the self-documenting capability will be available for functions as well. For the moment let's just focus on the function that is actually used in this use case (note that the following description is generated using one of the functions described).


Function "public.create_wiki_doc(p_title integer, p_schema_names text[], p_object_names text[])"

This is a short form of the same 4 arguments function for tables and views. It creates a set of wiki markup texts from a set of PostgreSQL tables and views.

Owner: Charles Clavadetscher (charles)
Schema: public
Language: plpgsql
Executable: Public

Signature

public.create_wiki_doc(
 p_title integer,
 p_schema_names text[],
 p_object_names text[]
)

Return value

Type Description
SETOF text

Arguments

Position Name Type Description
1 p_title integer
2 p_schema_names text[]
3 p_object_names text[]



The function loops over the schema names and in each of these loops, loops through the object names. Each time a match is found it calls another function in the script to generate a wiki formatted text for that specific object in that specific schema that is returned to the main loop. At the end you end up with a set of wiki descriptions. The function creating the wiki markup takes care of some problems related to the way how MediaWiki interprets and displays some special characters. In particular:

  • || is transformed to | | to not be confused with the delimiter between columns in a wiki table row.
  • E'\n' that can be entered to achieve comments on more lines is transformed to <br/>.

Implementation

  • Install the extensions on your wiki as explained in their instructions (follow links above).
  • Create a user in your database able to connect to the database that you want to document. This user should be able to execute the function public.create_wiki_doc(INTEGER,TEXT[],TEXT[]). If you modified the script and installed the functions in another schema, then make sure that you give the necessary USAGE permission on that schema to your user.
  • Run the function creating script (create_wiki_doc.sql) in the database you want to integrate with your wiki.
  • Configure the database connection in LocalSettings.php (the configuration file of MediaWiki).This will boil down to creating an entry of this type (you can use whatever you want instead of "wikireader", just keep it in mind, you will need it later).
$edgDBServer['wikireader'] = "your_server";
$edgDBServerType['wikireader'] = "postgres";
$edgDBName['wikireader'] = "your_database";
$edgDBUser['wikireader'] = "your_user";
$edgDBPass['wikireader'] = "the_password_of_your_user";

Now everything is set up to create a page with automated documentation for your database.

In the page where you want to place your documentation you simply add the following entry:

 __NOCACHE__
 
 Any wiki text...
 
 {{#get_db_data:
 db=wikireader
 |from=public.create_wiki_doc(3,'{list_of_schemas}','{list_of_tables_and_views}');
 |data=documentation=create_wiki_doc
 }}
 {{#for_external_table:<nowiki />
 {{{documentation}}}
 }}
 
 Any wiki text...

This will retrieve the documentaion of the listed objects on the fly.

Note that if you need to write text between DB calls you must remove the collected data before the next call using

 {{#clear_external_data:}}

Example: You can see the rendering of this part of the wiki source in the section Example.

==Example==

You are supposed to take a look at the wiki source for this section.

{{#get_db_data:
db=wikireader
|from=public.create_wiki_doc(3,'{library}','{lib_book,lib_abstracts}')
|data=documentation=create_wiki_doc
}}
{{#for_external_table:<nowiki />
{{{documentation}}}
}}

{{#clear_external_data:}}

Any wiki text...

{{#get_db_data:
db=wikireader
|from=public.create_wiki_doc(3,'{library}','{lib_language,lib_medium_type}')
|data=documentation=create_wiki_doc
}}
{{#for_external_table:<nowiki />
{{{documentation}}}
}}

Will replace the code with the descriptions of the objects and display it to the user.

Parameters:

db   : The name that you used in the configuration for the database connection in LocalSettings.php (in our example wikireader).
from : Here you should only decide the level to the titles (in this case 3) and which objects should be displayed.
data : Here you don't need to be creative.

Now if you also are disciplined enough to enter comments wherever necessary on the objects in your database, then your documentation will include them. Here for example how table lib_book could be documented. Notice the usage of newlines in the description of column lib_book.medium_type.

COMMENT ON TABLE library.lib_book IS 'The books and other items of my private library.';
COMMENT ON COLUMN library.lib_book.book_id IS 'A unique ID for a book. Notice that this refers to the object, i.e. 2 specimen of the same book will have 2 IDs.';
COMMENT ON COLUMN library.lib_book.title IS 'The title of the book. Always in Latin characters. Foreign language books using a non latin alphabet are translitterated.';
COMMENT ON COLUMN library.lib_book.notes IS 'Free notes on the book. This field is searched using full text search.';
COMMENT ON COLUMN library.lib_book.language_name IS 'The language in which the book is printed.';
COMMENT ON COLUMN library.lib_book.medium_type IS E'For books always "book".\nOther medium types:\n- Comix\n- CD\n- DVD\n- Map';
COMMENT ON COLUMN library.lib_book.location IS 'Well meant, but my flat is small enough to find everything without a specific location.';
COMMENT ON COLUMN library.lib_book.lent_to IS 'The person who lent the item in the format "first_name last_name <email>"';
COMMENT ON COLUMN library.lib_book.lent_since IS 'The date the item was lent.';
COMMENT ON COLUMN library.lib_book.last_notified IS 'The last reminder date to the lender.';
COMMENT ON COLUMN library.lib_book.notification_count IS 'The number of notifications sent so far.';
COMMENT ON COLUMN library.lib_book.next_notification IS 'The date for the next notification to the lender.';
COMMENT ON INDEX lib_book_book_id_idx IS NULL;
COMMENT ON CONSTRAINT lib_book_language_name_fkey ON library.lib_book IS 'Make sure that only items in languages available in the list can be added.';
COMMENT ON CONSTRAINT lib_book_medium_type_fkey ON library.lib_book IS 'Make sure that only medium types available in the list can be added.';

Example

You are supposed to take a look at the wiki source for this section.


Table "library.lib_book"

The books and other items of my private library.

Owner: Charles Clavadetscher (charles)

Columns

Column name Type Modifiers Description
book_id integer not null default nextval('library.lib_book_book_id_seq'::regclass) A unique ID for a book. Notice that this refers to the object, i.e. 2 specimen of the same book will have 2 IDs.
title character varying(250) not null The title of the book. Always in Latin characters. Foreign language books using a non latin alphabet are translitterated.
notes text Free notes on the book. This field is searched using full text search.
language_name character varying(50) The language in which the book is printed.
medium_type character varying(50) For books always "book".
Other medium types:
- Comix
- CD
- DVD
- Map
location character varying(30) Well meant, but my flat is small enough to find everything without a specific location.
lent_to character varying(300) The person who lent the item in the format "first_name last_name <email>"
lent_since date The date the item was lent.
last_notified date The last reminder date to the lender.
notification_count integer default 0 The number of notifications sent so far.
next_notification date The date for the next notification to the lender.

Indexes

Index name Index type Definition Description
lib_book_book_id_idx PRIMARY KEY btree (book_id)
book_note_trgm_idx gist (notes gist_trgm_ops)
lent_to_trgm_idx gist (lent_to gist_trgm_ops)
title_trgm_idx gist (title gist_trgm_ops)

Foreign keys

Foreign key name References Description
lib_book_language_name_fkey FOREIGN KEY (language_name) REFERENCES library.lib_language(language_name) Make sure that only items in languages available in the list can be added.
lib_book_medium_type_fkey FOREIGN KEY (medium_type) REFERENCES library.lib_medium_type(medium_name) Make sure that only medium types available in the list can be added.

Referenced by

Table name Foreign key name Definition
library.lib_author_book lib_author_book_book_id_fkey FOREIGN KEY (book_id) REFERENCES library.lib_book(book_id)



Any wiki text...


Table "library.lib_language"

Owner: Charles Clavadetscher (charles)

Columns

Column name Type Modifiers Description
language_name character varying(50) not null

Indexes

Index name Index type Definition Description
lib_language_language_name_idx PRIMARY KEY btree (language_name)

Referenced by

Table name Foreign key name Definition
library.lib_book lib_book_language_name_fkey FOREIGN KEY (language_name) REFERENCES library.lib_language(language_name)

Table "library.lib_medium_type"

Owner: Charles Clavadetscher (charles)

Columns

Column name Type Modifiers Description
medium_name text not null

Indexes

Index name Index type Definition Description
lib_medium_type_medium_name_idx PRIMARY KEY btree (medium_name)

Referenced by

Table name Foreign key name Definition
library.lib_book lib_book_medium_type_fkey FOREIGN KEY (medium_type) REFERENCES library.lib_medium_type(medium_name)



Comments on functions, arguments and return values

As of the version 9.5 of PostgreSQL it is possible to create comments on functions but not on their arguments and return values. While waiting for this feature to be implemented you can use the following custom functions to insert and retrieve entries in the pg_catalog table pg_description:


Function "public.comment_on_function_argument(p_proname text, p_position integer, p_comment text)"

A function to create comments on function arguments and return value.

Owner: Charles Clavadetscher (charles)
Schema: public
Language: plpgsql
Executable: Public

Signature

public.comment_on_function_argument(
 p_proname text,
 p_position integer,
 p_comment text
)

Return value

Type Description
boolean

Arguments

Position Name Type Description
1 p_proname text
2 p_position integer
3 p_comment text

Function "public.param_description(objoid oid, posnum integer)"

A function to retrieve the comment on a function argument or the return value.

Owner: Charles Clavadetscher (charles)
Schema: public
Language: sql
Executable: Public

Signature

public.param_description(
 objoid oid,
 posnum integer
)

Return value

Type Description
text

Arguments

Position Name Type Description
1 objoid oid
2 posnum integer



You can find this function in comment_on_function_components.sql.

Example: These entries will produce the output that you see above.

COMMENT ON FUNCTION public.comment_on_function_argument(text,integer,text) IS 'A function to create comments on function arguments and return value.';
SELECT public.comment_on_function_argument('public.comment_on_function_argument(text,integer,text)',-1,'A boolean value to indicate if the description entry could be done.');
SELECT public.comment_on_function_argument('public.comment_on_function_argument(text,integer,text)',1,'The function signature: function_name(comma separated list of data types).');
SELECT public.comment_on_function_argument('public.comment_on_function_argument(text,integer,text)',2,'The 1-based position of the paramenter for the comment. Use -1 for the return value.');
SELECT public.comment_on_function_argument('public.comment_on_function_argument(text,integer,text)',3,'The comment itself.');

Notice that if you only use the function name without the argument list, the documenting function will deliver the description of all overloaded functions with that name. You can see this feature in action in the following section.

List of create_wiki_doc functions

Function "public.create_wiki_doc(p_title integer, p_schema_names text[], p_object_names text[])"

This is a short form of the same 4 arguments function for tables and views. It creates a set of wiki markup texts from a set of PostgreSQL tables and views.

Owner: Charles Clavadetscher (charles)
Schema: public
Language: plpgsql
Executable: Public

Signature

public.create_wiki_doc(
 p_title integer,
 p_schema_names text[],
 p_object_names text[]
)

Return value

Type Description
SETOF text

Arguments

Position Name Type Description
1 p_title integer
2 p_schema_names text[]
3 p_object_names text[]

Function "public.create_wiki_doc(p_title integer, p_type text, p_schema_names text[], p_object_names text[])"

Create a set of wiki markup texts from a set of PostgreSQL objects. Notice that you must qualify or cast at least one of the p_schema_names or p_object_names to TEXT[] in order to enable PostgreSQL to pick this function instead of the base one.
Example: use array['schema_name1','schema_name2',...¨] or '{schema_name1,schema_name2}'::TEXT[] instead of a pure string representation of the array like '{schema_name1,schema_name2}'.

Owner: Charles Clavadetscher (charles)
Schema: public
Language: plpgsql
Executable: Public

Signature

public.create_wiki_doc(
 p_title integer,
 p_type text,
 p_schema_names text[],
 p_object_names text[]
)

Return value

Type Description
SETOF text

Arguments

Position Name Type Description
1 p_title integer
2 p_type text
3 p_schema_names text[]
4 p_object_names text[]

Function "public.create_wiki_doc(p_title integer, p_type text, p_schema_name text, p_object_name text)"

Create an wiki markup text from a PostgreSQL object. This is the base function. Be careful when using NULL values for schema and objects. You may end up with a very long page.

Owner: Charles Clavadetscher (charles)
Schema: public
Language: plpgsql
Executable: Public

Signature

public.create_wiki_doc(
 p_title integer,
 p_type text,
 p_schema_name text,
 p_object_name text
)

Return value

Type Description
text

Arguments

Position Name Type Description
1 p_title integer
2 p_type text
3 p_schema_name text
4 p_object_name text



Additional documentation

I prepared a slide set for presentation on this topic that has some additional information to the one you can read here. Feel free to download it.