The API interface from SPIP to SQL

The functions to call and important things to know about the SQL interface when writing SPIP extensions.

First of all, it’s important to remember that Structured Query Language only became standardised very slowly, and that each progressive implementation addressed the shortcomings of the specification in its own particular way. The early versions of SPIP only recognised a single implementation of SQL, which allowed it to focus on accessing SQL through a single function, spip_query, which had only one single argument, that argument simply being the query itself. Porting SPIP to use different implementations of SQL has meant that this model had to be abandoned; that original function still remains available for backwards compatibility, but its use should be actively discouraged from now on. It is vastly preferable to use the following set of functions described here instead, which additionally succeed in neutralising the clear majority of code insertion attacks on functional SPIP sites.

The first port of SPIP to another database server other than MySQL3 was carried out for PostGreSQL version 8.2. It was immediately followed by a double port to SQLite 2 and 3. These ports are based on replacing the single sql_query function by as many functions as there are SQL instructions, (select, update etc), each having as many arguments as the SQL instruction has applicable clauses (where, sort by, limit, etc.), supplemented with an additional optional argument specifying the SQL database. The SQL operands (particularly dates and hexadecimal numbers) remain written in MySQL syntax, with the interface functions taking care of rewriting such arguments whenever the need arises. Some supplementary functions are offered too: some are indispensable (access to successive rows in a select result set), and some are short-hand functions (counts, accessing a row that we know is unique, etc). Some generalised data are similarly offered to the programmer: the character sets in use, the presence of an LDAP server, etc.

The double port for SQLite did not require any revision of the function set that was first defined for the single PostGres port, which has led to that interface remaining consistent over time just as it was first defined. Nonetheless, such tool sets are inevitably forced to evolve sooner or later, as witnessed by the new interface offered with SPIP 2.0, which now includes a manager of its own versions, allowing SPIP to employ an SQL connection with several versions of the interface simultaneously. In this way, any SPIP extensions which respect this new interface are guaranteed to continue working with future versions of SPIP.

As a final note, the decision to develop this architecture was largely imposed by the absence of an abstraction library that was as freely available to be used as SPIP itself is: the current PHP extensions on offer only address standardising calls to the basic functions, which is only the relatively minor visible tip of the iceberg that the application developer is confronted with when working with non-uniform databases.

This article is comprised of three sections. The first two are intended for developers who want to write SPIP extensions; they present the overall architecture and then the functions that are available for use. The third section discusses some implementation details, and is intended for SPIP contributors who want to port SPIP to other SQL platforms or who wish to improve upon the existing ports. Throughout the article, we use the term SQL server to designate an SQL implementation used by SPIP, even if the these implementations are not strictly speaking servers, but rather a set of service protocols.

The overall architecture

First up, we can consider that the SPIP API interface for SQL can be reduced down to the following single function defined in the ecrire/base/abstract_sql.php file:

sql_serveur($ins_sql, $serveur='', $continue=false)

This function starts, if it hasn’t been called previously, by connecting to the server specified as the second argument. This argument is often omitted when calling the function, which then implies a reference to the SQL implementation chosen when the current SPIP site was installed, and memorised by SPIP in a connection file (see the article about Databases within SPIP). If that’s not the case, then the argument must explicitly indicate the name of the connection file to be used, with the .php file extension omitted from the file name. The result returned is another function, which performs the type of action requested by the command string passed as the first argument (e.g. select or fetch) on the SQL database specified by the connection file. The third argument specifies what should be done whenever such a function can not be returned. If it is absent or equal to false, a fatal error will be triggered. Otherwise, there are two situations which can be distinguished. If the connection specified is unknown or inoperative, the value false will be returned. If not, then a data structure will be returned describing the connection (this will be described later in the last section), which makes it possible to both check to see if a function exists without risking a fatal error, and to collect several pieces of information before further usage.

This minimalist vision of the interface allows anything to be done, but with a rather opaque syntax. Suppose for example that $db is the name of a database on the primary server, we do not select it with

$f = sql_serveur('selectdb');
$f($db);

To clarify the syntax, there is a set of functions that sequences the two instructions above for most common scenarios. For example, there is

sql_selectdb($nom, $serveur='') 

Which makes it possible to rewrite the previous operation more simply as:

sql_selectdb($db)

Generally speaking, SPIP’s API interface to the SQL servers is a set of functions (f) uniformly named sql_f, each with its last optional argument being the SQL server name. They each call sql_serveur to obtain a function f that they apply to their arguments, including the server name. All functions with their names constructed in such a way are reserved for the SPIP APIs for the SQL servers.

Viewed from an object oriented perspective, this set of functions represents the methods of the sql object, but we don’t write sql_f instead of sql->f, and it is not necessary to instantiate a class. The presence of the server name in all of the calls makes it possible to simulate operations involving the self object.

This set of functions therefore normally uses sql_serveur. Note here also the function that has been part of SPIP for a long time called:

spip_connect($serveur='')

which simply open the connection to the server, and is therefore equivalent to

sql_serveur('', $serveur, true)

and which returns false if the server is unavailable, and otherwise a data description detailing the server’s available features when it is.

Available functions

The functions in the SQL API can be classified into several groups, each of which will be described in turn by a table listing their arguments. For detailed examples, we recommend viewing the SPIP code itself directly.

The first group of functions concerns reading data from the SQL tables. The unavoidable functions are:

-  sql_select which has arguments being the normal SQL clauses for that instruction, and which returns a Select resource;

-  sql_fetch, almost always used in a loop, and which retrieves the successive rows from a Select result; it returns an array that is indexed by the field names of that resource;

-  sql_free which tells the server to release a resource.

There are some other functions which offer frequently used combinations of these operations:

-  sql_fetsel takes the same arguments as sql_select, which applies its arguments to that same function and then performs an sql_fetch call on the result returned; this is very practical for queries which have a result set of only a single row;

-  sql_getfetsel takes the same arguments as sql_select, which applies its arguments to that same function and then sql_fetch on the returned result, and then then finally extracts the field from the array that is returned which matches the name of the field provided as the first argument (the list of fields consists of just one); this is very practical for queries which have a result set of only one row with only one column;

-  sql_allfetsel takes the same arguments as sql_select, which applies its arguments to that same function and then sql_fetch on the returned resource so long as it has returned a non-empty array; sql_allfetsel actually returns the array of all of the arrays returned by sql_fetch (warning - don’t overload memory restrictions with this function) ;

-  sql_countsel takes the same arguments as sql_select except the first one, and which applies a COUNT(*) instead with the other arguments and returns the number calculated by the database query; this is very practical to find out the number of rows that the Select query would return as written.

-  sql_count which returns the number of rows in a Select resource, just as if you had added COUNT(*) into your query in the first place;

-  sql_get_select takes the same arguments as sql_select , but returns the SQL code for the query without actually executing it. This function can be useful for the requirements of certain plugins or to make it simpler to construct SQL views.

The first four of these functions terminate by calling sql_free, and are therefore preferable wherever possible to calling the three separate select-fetch-free functions, the last of which is easily overlooked.

The table below details the contents and order of the arguments expected by each of these functions.

Function Arguments
sql_select
  1. list of fields: string or array
  2. list of tables : string or array
  3. Where clause: string or array
  4. Groupby clause: comma-separated string or array
  5. Orderby clause: comma-separated string or array
  6. Limit clause: one or two comma-separated integers
  7. Having clause: string or array
  8. server
sql_fetch
  1. resource
  2. server
sql_free
  1. resource
  2. server
sql_count
  1. resource
  2. server
sql_countsel
  1. list of tables: string or array
  2. Where clause: string or array
  3. Groupby clause: string or array
  4. Orderby clause: comma-separated string or array
  5. Limit clause: one or two comma-separated integers
  6. Having clause: string or array
  7. server
sql_fetsel
  1. list of fields: string or array
  2. list of tables: string or array
  3. Where clause: string or array
  4. Groupby clause: comma-separated string or array
  5. Orderby clause: comma-separated string or array
  6. Limit clause: one or two comma-separated integers
  7. Having clause: string or array
  8. server
sql_allfetsel
  1. list of fields: string or array
  2. list of tables: string or array
  3. Where clause: string or array
  4. Groupby clause: comma-separated string or array
  5. Orderby clause: comma-separated string or array
  6. Limit clause: one or two comma-separated integers
  7. Having clause: string or array
  8. server
sql_getfetsel
  1. field name: string
  2. list of tables: string or array
  3. Where clause: string or array
  4. Groupby clause: string or array
  5. Orderby clause: comma-separated string or array
  6. Limit clause: one or two comma-separated integers
  7. clause Having: string or array
  8. server
sql_get_select
  1. list of fields: string or array
  2. list of tables: string or array
  3. Where clause: string or array
  4. Groupby clause: comma-separated string or array
  5. Orderby clause: comma-separated string or array
  6. Limit clause: one or two comma-separated integers
  7. Having clause: string or array
  8. server

In the functions listed above, if the Select clause is supplied as an array, its elements will be concatenated and comma-separated. For arrays used for the Where and Having clauses, its elements must be character strings (sub-arrays with prefixed syntaxes are also acceptable, but are reserved for the compiler). The strings will be combined with a universal join (i.e. they will be concatenated with logical "AND"s as separators).

The From clause is a string (arrays are again reserved here for the SPIP compiler). Warning: if it’s necessary to reference the tables in the other clauses, then you must systematically define aliases in this clause and reference them in the other clauses. As such, you would write:

sql_countsel('spip_articles AS a, spip_rubriques AS r', "a.id_secteur=r.id_rubrique AND r.titre='mysector')

or

sql_countsel('spip_articles AS a JOIN spip_rubriques AS r ON a.id_secteur=r.id_rubrique", "r.titre='mysector'")

whereas the following code with table aliases would simply just not work:

sql_countsel('spip_articles, spip_rubriques', "spip_articles.id_rubrique=spip_rubriques.id_secteur AND spip_rubriques.titre='mysector'")

A second group of functions is comprised of those that modify table contents. These functions are delicate to define since the syntax of the values to be entered into the tables changes from one type of SQL server to the next (notably the dates). For this reason, these functions must have the description of the table to be modified, so that it can know the types of the values expected by the SQL server. SPIP automatically retrieves these data (at the time the table is created), but it is still possible to supply an arbitrary description instead if you so wish (the pen-ultimate argument for these functions, therefore optional and actually seldom of any practical use).

SPIP therefore offers an insert function, sql_insertq, and an update function, sql_updateq, which accept an array of field=>value pairs and takes care of quoting the values depending on their type (using the sql_quote function detailed further below). SPIP also offers sql_insertq_multi for making inserts of several rows by accepting an array of field=>value arrays. For updates where the new values depend on the previous ones (such as with cpt=cpt+1), you can instead use sql_update where the values will be used literally, but you must be extremely careful to intercept any possibility of code injection attacks. There is similarly the sql_replace, function which makes an update to a row that corresponds to a primary key, or which will insert the values as a new row if that row did not previously exist, whereas the sql_replace_multi function works for multiple updates or inserts in the same fashion. Last, but not least, sql_delete will delete the rows from a table that match a given Where clause.

Function Arguments
sql_updateq
  1. table
  2. field=>value to be quoted array
  3. Where clause
  4. description
  5. server
sql_update
  1. table
  2. field=>value array
  3. Where clause
  4. description
  5. server
sql_insertq
  1. table
  2. field=>value to be quoted array
  3. description
  4. server
sql_insertq_multi
  1. table
  2. array of field=>value to be quoted arrays
  3. description
  4. server
sql_replace
  1. table
  2. field=>value to be quoted array
  3. description
  4. server
sql_replace_multi
  1. table
  2. array of field=>value to be quoted arrays
  3. description
  4. server
sql_delete
  1. table
  2. where
  3. server

An unusual group of functions specifically works on operands which will likely be used before those listed above: they do not actually interact with the server, but they return strings that depend on it:

-  sql_quote accepts either a string or a number, and returns a number of the argument was a number or a string representing a integer, otherwise it will return the original string enclosed in apostrophes and with any internal apostrophes protected depending on the particular syntax used by the server (a \ inserted beforehand for some servers, and a second apostrophe for some others);

-  sql_hex accepts a string of hexadecimal characters and returns its representation on the targeted SQL server;

-  sql_in constructs a call to the IN operator and processes any hexadecimal values that might appear in it;

-  sql_test_int is a predicate that returns Vrai (true) if the SQL type passed is an integer;

-  sql_test_date is a predicate that returns Vrai (true) if the SQL type passed is a valid date;

-  sql_multi applies an SQL expression to a field containing a multi block (see 2124{}) to extract from it the portion that matches the indicated language; the reason for performing this operation in SQL is essentially so that we can simultaneously request a sort on that column.

Function Arguments
sql_quote
  1. value
  2. server
sql_hex
  1. value
  2. server
sql_in
  1. column
  2. values
  3. vrai if negation is desired
  4. server
sql_multi
  1. column
  2. language
  3. server
sql_test_date
  1. type
  2. server
sql_test_int
  1. type
  2. server

An important group is comprised of functions that manipulate database and table declarations. For historical reasons, the first version of this interface almost literally adopted the syntax of MySQL3 and should certainly be reviewed some time, especially to make join declarations more visible. The sql_create, sql_alter, sql_showtable and sql_drop_table functions are used for creating, modifying, viewing and deleting a table. The sql_create_view and sql_drop_view functions are used to create or delete a view. The sql_listdbs, sql_showbase and sql_selectdb functions are used to view the available databases, their constituent contents and to select one of them. Note that not all internet hosts necessarily authorise these kind of actions: SPIP will try to work out what it can, and will record its efforts in the spip.log file.

Function Arguments
sql_create
  1. table name
  2. array of (column name) => (SQL type) pairs and their default values
  3. array of (index name) => column(s) pairs
  4. vrai (true) if auto-increment
  5. vrai (true) if temporary
  6. server
sql_alter
  1. MYSQL Alter query
  2. server
sql_showtable
  1. RegExp
  2. vrai (true) if the table is declared by SPIP
  3. server
sql_drop_table
  1. table name
  2. vrai (true) if the exists conditions should be inserted
  3. server
sql_create_view
  1. view name
  2. field selection query (maybe created with sql_get_select)
  3. server
sql_drop_view
  1. view name
  2. vrai (true) if the exists condition should be inserted
  3. server
sql_listdbs
sql_selectdb
  1. database name
  2. server
sql_showbase
  1. RegExp
  2. server

Two functions are made available to control the character encoding when communicating with the server:

-  sql_set_charset, requests that the specified coding be applied;

-  sql_get_charset, asks if a particular character encoding is available on the server.

Function Arguments
sql_get_charset
  1. RegExp
  2. server
sql_set_charset
  1. encoding
  2. server

The last group of functions offer some tools for managing queries and tables; their equivalently named counterparts in the official SQL documentation have been used literally. However, not that sql_explain is used implicitly by the SPIP debugger, directly available using the administration buttons in the public zone whenever you ask SPIP for the loop’s execution plan (or the plan for an entire template file).

Function Arguments
sql_optimize
  1. query
  2. server
sql_repair
  1. table
  2. server
sql_explain
  1. query
  2. server
sql_error
  1. query
  2. server
sql_errno
sql_version

Not in any other logical grouping, the catch-all function sql_query, which really ought to have historically been called spip_query; which ought not to be used now in any circumstance.

Implementing other ports

This section is dedicated to those who wish to port SPIP to other SQL servers, or who need more technical information, especially as regards managing the interface versions. The functions in the ecrire/base/abstract_sql.php file detailed above simply offer a uniform interface to the various servers, but do not themselves perform any processing. It is, in fact, in the ecrire/base/connect_sql.php file that all the real work is carried out.

The essential function is spip_connect which opens the connection to the SQL server indicated by its argument (which is, if omitted, the primary server) and taking into account any connections that have already been made. This opening consists of including a connection file created during the installation of SPIP by the scripts stored in the install directory. A connection file essentially just amount to applying the spip_connect_db function to the values supplied during the installation.

The spip_connect_db function specifically receives the type of SQL server as an argument. This type must be the name of a file existing in the req directory. This file is loaded and must define all of the interface functions defined in the previous section, as well as the req_type_dist function which will be immediately applied to the same arguments as spip_connect_db, except for the type. It is this function which must effectively establish the connection.

Porting SPIP to other SQL servers therefore consists of defining this set of functions and to store them in the req directory.

The interface versions manager relies on the second argument to spip_connect, which specifies the version, the current version being taken as the default. All of the interface functions are defined in the abstract_sql file, are named sql_X, and are the only such functions to be named that way. They all connect by making a call to a variant of spip_connect , the first argument of which is the interface version number. In the case where the abstract_sql file will require a revision, it will be renamed as abstract_sql_N, and the following sed command will be applied (N designates the version number):

s/\(sql_[A-Za-z_0-9 ]*\)/\1_N/

By similarly applying this script to the SPIP extensions based on that version, we can make it possible to call its functions, which will be loaded without any name collisions, since the sed will have prefixed the name of the old ones with the version number. You will only need to add an include instruction referencing the abstract_sql_N file. For the porting work, you will need to rename the files in the req directory at the same time, and write the appropriate new versions.

The coexistence of several versions of the interface during the execution of SPIP relies on the structure describing the server. In reality, this is an array containing:

-  link, resource specifying the connection;

-  db, the database name;

-  prefixe, the table prefix name;

-  ldap, the name of a file which may exist to describe an LDAP server.

The other entries are the available version numbers, and their value is the array of functions implementing that version of the interface.

The other functions in the connect_sql function essentially relate to management of the versions and the processing of a few particular scenarios for declaring the standard tables used in SPIP.

The replacement of spip_query in the previous code for SPIP by this set of functions was partially carried out using automatic Sed scripts. These scripts are part of the libraries for successive versions of the code, and can therefore be retrieved by the developer to be applied to any SPIP extensions that need to be migrated to the new architecture. Here is a comprehensive list of those that have been created up until the date this article was published:

9916 9918 9919 10394 10492 10493 10497 10501 10504 10508 10509 10519 10520 10707 10742

Important warning: these scripts do not uniformly apply to just any use of spip_query, and it is essential to carefully examine them to make sure that they correctly apply to the code being migrated.

Author Mark Published : Updated : 24/07/23

Translations : català, English, Español, français