Documentation for Alájar 1.8.3

Author: Max Attar Feingold

Table of Contents

Introduction

Alájar is a small, portable, componentized HTTP server that implements a subset of the HTTP protocol and provides a foundation upon which powerful web-based applications can be built. The objective of Alájar is to make it easy for developers to generate high-performance interactive HTML content.

Under the Alájar programming model, compiled web application code is executed in-process by dynamically linked libraries that implement the standard C++ Alájar interfaces. In this way the application controls exactly what the web server will send back to the client.

Features

Alájar can operate both as a standard web server and as a framework for developers to write custom web applications. Through the creation of simple text configuration files, web administrators can create virtual domains (hereafter referred to as page sources) under which incoming requests from clients can either be treated as simple HTTP requests or be passed along to application code in order to generate a custom response.

Using this scheme, Alájar allows interactive HTML pages to be built on the fly in a fast and scalable manner. It should be noted that both Java applets and JavaScript / Dynamic HTML can be used with Alájar, as they are client side technologies that are merely delivered by the server through text or binary files. The services offered by Alájar include the following:

  • SSL-encrypted connections
  • GET requests: Alájar can operate as a simple web server, allowing clients to request HTML files, text documents, images, compressed archives and multimedia files.
  • POST requests: Alájar will parse the results of POST operations (both textual form submissions and file uploads) and pass them through to the corresponding page source.
  • HEAD, TRACE and all the other standard HTTP methods (with the exception of PUT and DELETE).
  • Basic and Digest HTTP authentication: each page source can choose to require authentication from clients in order to access their content. The actual authentication decisions are delegated to the page sources themselves.
  • Form parsing and handling, including forms submitted via GET query strings.
  • Cookie parsing.
  • Request logging.
  • Statistics collection.
  • File caching.
  • A configurable dynamic thread pool.
  • Configuration file access.
  • Report files to which diagnostic information can be written.

Server requirements

Minimal system requirements for a basic Alájar server are:
  • A supported operating system
  • About 25MB of dedicated RAM, in addition to the requirements of any active page sources

Configuration

A configuration file called Alajar.conf must exist in the same directory as the Alájar executable. This file must contain the following parameters (the values given are examples). Order does not matter, spaces before and after the '='sign are forbidden and no slashes at the end of directory names should be used:
  • HttpPort=80
  • EnableHttpsPort=1
  • HttpsPort=443
  • HttpsPublicKeyFile=server.cer
  • HttpsPrivateKeyFile=server.pfx
  • FileCache=1
  • ApproxNumFilesInFileCache=2000
  • MemoryCache=1
  • InitNumThreads=5
  • MaxNumThreads=30
  • DefaultThreadPriority=Normal
  • CounterPath=Counters
  • LogPath=Logs
  • PageSourcePath=PageSources
  • ConfigPath=Config
  • ReportPath=Reports
  • StatisticsPath=Statistics The directories specified must exist on the computer running Alájar. Path names can be absolute or relative to the path of the Alájar executable.
  • EnableHttpPort is 1 if HTTP traffic should be enabled; 0 if not.
  • HttpPort is the TCP port to be used for HTTP traffic.
  • EnableHttpsPort is 1 if HTTPS traffic should be enabled; 0 if not.
  • HttpsPort is the TCP port to be used for HTTPS traffic.
  • HttpsPublicKeyFile is the public key file for HTTPS traffic. On Windows, this should be a .cer file using DER encoding. If OpenSSL is used, this should be an OpenSSL .pem file.
  • HttpsPrivateKeyFile is the private key file for HTTPS traffic. On Windows, this should be a .pfx file using PCS12 encoding. If OpenSSL is used, this should be an OpenSSL .key file.
  • FileCache is 1 if the file cache is to be used; 0 if not
  • ApproxNumFilesInFileCache is a guess of the maximum number of files that the file cache will contain at any one time.
  • MemoryCache is 1 if internal memory caches are to be used by Alájar; 0 if not. In general, this option should only be disabled in order to facilitate tracking down in-process memory leaks.
  • InitNumThreads is the base number of worker threads that are used by the thread pool.
  • MaxNumThreads is the maximum number of worker threads that can be used by the thread pool.
  • DefaultThreadPriority is the default priority of thread pool threads. It can be set to one of the following values:
    • Lowest
    • Lower
    • Normal
    • Higher
    • Highest
  • CounterPath is the directory in which the SSI counter files are placed
  • LogPath is the directory in which the logs are placed
  • PageSourcePath is the directory in which page source DLL's are placed
  • ConfigPath is the directory in which page source configuration files are placed. A file called Default.conf must be present to specify the properties of the default page source
  • ReportPath is the directory in which report files are stored
  • StatisticsPath is the directory in which statistics files are stored

The configuration files in the ConfigPath directory must contain the following parameters (the values provided are examples):

  • 401File=Errors/401.html
  • 403File=Errors/403.html
  • 404File=Errors/404.html
  • 500File=Errors/500.html
  • 501File=Errors/501.html
  • Override401=0
  • Override403=0
  • Override404=0
  • Override500=0
  • AllowDirectoryBrowsing=1
  • BasePath=www_root
  • UseDefaultFile=1
  • DefaultFile=index.html
  • UsePageSourceLibrary=1
  • PageSourceLibrary=Admin.dll
  • PageSourceClsId=8B631302-8CFA-11d3-A240-0050047FE2E2
  • OverrideGet=1
  • OverridePost=1
  • UseAuthentication=Digest
  • DigestAuthenticationNonceLifetime=30
  • UseLogging=1
  • UseCommonLogFormat=1
  • DenyIPAddressAccess=
  • OnlyAllowIPAddressAccess=127.0.0.1@192.168.0.*
  • DenyUserAgentAccess=Mozilla/2.0 (Compatible; AOL-IWENG 3.0; Win16)
  • OnlyAllowUserAgentAccess=
  • FilterGets=1
  • FilterGetExtensions=gif;jpg
  • FilterGetAllowReferers=http://localhost*;https://localhost*;

These parameters are used to configure the default page source, which is the one used when no other page source matches the request URI. Each page source must have a file with identical contents in the ConfigPath directory (*.conf) in order to be registered with Alájar upon startup.

  • Name is the name of the page source, the name of the "virtual directory" that will be requested by clients. Page source names are case insensitive
  • xxxFile is the file to be sent to the client when error xxx occurs.
  • Overridexxx is 1 if xxx errors should be overridden.
  • AllowDirectoryBrowsing is 1 if directory browsing should be allowed and directory indexes generated; 0 if not
  • BasePath is the root path of the page source where its files are placed. I.e., http://server/pagesource/file.html maps to BasePath/file.html on the local disk
  • UseDefaultFile is 1 if the server should look for the default file before testing whether or not directory browsing is allowed; 0 if not
  • DefaultFile is the default file to be loaded when a directory is requested, if UseDefaultFile is 1
  • UsePageSourceLibrary is 1 if a page source DLL should be used; 0 if not
  • PageSourceLibrary is the name of the DLL (with no extension: .dll is assumed) to be associated with the page source. The DLL should be placed in the PageSourcePath directory. If UsePageSource is 0 then this parameter is ignored.
  • PageSourceClsid is the universally unique identifier associated with the class that implements the page source. Multiple page sources to be implemented in the same library by using different CLSIDs.
  • OverrideGet is 1 if GET operations should be overridden by the page source library; 0 if not
  • OverridePost is 1 if POST operations should be overridden by the page source library; 0 if not
  • UseAuthenticationdetermines the type of HTTP authentication used by the page source. It can be set to one of the following values:
    • None
    • Basic
    • Digest
  • DigestAuthenticationNonceLifetime is the number of seconds that nonces should live when Digest authentication is enabled.
  • UseLogging is 1 if requests to this page source should be logged; 0 if not
  • UseCommonLogFormat is 1 if the page source's log should be written using the standard Common Log Format, 0 if the default Alajar format is preferred. This value is ignored if UseLogging is 0.
  • Threading is Free if the page source can be accessed by multiple threads simultaneously, Single if only one thread should access the page source code at the same time. This value is ignored unless UsePageSourceLibrary is set to 1
  • DenyIPAddressAccess is a list of IP addresses that should be blocked from the web server. Any request coming from these addresses will be denied access to server resources. The IP addresses should be separated by @ characters, and can use * as a wildcard parameter. For example, DenyIPAddressAccess =127*@192*; will deny access to all addresses beginning with 127 and 192. The wildcard terminates parsing, so 127.*.0.1 will block 127.0.0.2 also.
  • OnlyAllowIPAddressAccess specifies a list of IP addresses that are granted access to the server to the exclusion of all others. OnlyAllowIPAddressAccess overrides DenyIPAddressAccess, so any excluded addresses in the DenyIPAddressAccess list will be ignored. The syntax for OnlyAllowIPAddressAccess is the same as for DenyIPAddressAccess.
  • DenyUserAgentAccess is a list of user agents (web browsers) that should be blocked from the web server. Any request coming from these user agents will be denied access to server resources. The syntax for DenyUserAgentAccess is the same as for DenyIPAddressAccess.
  • OnlyAllowUserAgentAccess specifies a list of user agents that are granted access to the server to the exclusion of all others. OnlyAllowUserAgentAccess overrides DenyUserAgentAccess, so any excluded user agents in the DenyUserAgentAccess list will be ignored. The syntax for OnlyAllowUserAgentAccess is the same as for DenyIPAddressAccess.
  • FilterGets is 1 if GET operations should be filtered - i.e. restricted by extension and referers; 0 if not. This option is useful primarily to avoid deep linking to images within a web site.
  • FilterGetExtensions is a semi-colon separated list of file extensions whose GET operations should be filtered.
  • FilterGetAllowReferers is a semi-colon separated list of valid referers that can be used for the files being filtered. Entries may end in a wildcard character (*).

Logging and reporting

Daily log and report files are written on a per-page source basis into the server's log directory.

Statistics

Daily statistics files are written on a global basis into the server's log directory.

Implementation details

The heart of Alájar is the HttpServer object, which contains the core functionality of the web server. An HttpServer object contains a FileCache object that provides it with file I/O, a ThreadPool object to which a main socket delegates client requests, and a hash table of PageSource objects.

The FileCache provides a standard GetFile() / Release() interface for its clients. The backend is a hash table containing CachedFile objects corresponding to the files in the cache. The FileCache relies on the OS for memory mapped I/O and page replacement policies.

The HttpServer object uses two special classes to communicate with the page sources: HttpRequest and HttpResponse. An HttpRequest object is assigned to each new client connection; it contains logic that, given an open socket, decodes the HTTP headers from the client and makes the information available to the page source. HttpResponse objects contain the logic that handles the server responses: they allow page sources to specify the nature of the response that the client should receive. Responses can be HTTP error codes, URL redirects, files or data buffers with an associated mime type. Response objects also contain lists of cookies to be associated with the client or deleted.

Execution of Alájar proceeds as follows: an HttpServer object is created and started, after which the main server thread is spawned and the main function blocks until a termination signal is received.

When the server starts it configures itself according to the Alajar.conf configuration file and the page source configuration files found in the Config directory, after which it opens a socket and listens on the specified port number. When an incoming connection is received, the task of responding is passed on to the ThreadPool, which stores the task in a FIFO queue. Eventually a worker thread will pick up the task, and obtain an HttpRequest object that will decode the HTTP request and hand it off to an HttpResponse object. If the request falls under the control of a particular page source (as determined by the first directory name referenced in the request's URI), then that page source is called and its response is passed along to the client. Otherwise, the request is handled like a normal HTTP request by the default page source, the response being either a file or an error message.

When data is POST'ed by a client, the server passes the data on to the appropriate page source for processing. If a file is submitted via a file requester form, then the file is saved as a temporary file and the name is passed to the page source. If no page source can be identified, then the server discards the submitted data and returns a standard error message.

It is important to observe that if a page source is configured to be free-threaded, there are no guarantees that the same thread will consistently call into the same page source for all requests. Consequently,no data should be stored by the page source in thread local storage and no thread specific operations should be performed. However, if the page source is configured to be single threaded, it is guaranteed that it will be called into with the same thread.

Development

Download the sdk to obtain the headers and libs needed to develop for Alájar.

Alájar provides a plug-in architecture for web applications in which custom code can be invoked when the client requests a resource from a particular URL. Alájar plug-ins are called page sources, and are implemented in page source DLL's. Multiple page sources can be implemented in one DLL. Each page source is a C++ object that implements the IPageSource interface, as defined in Alajar.h. The sample Admin and Almonaster page sources that ship with Alájar provide good examples of development methodology.

Every page source DLL must provide the following export function:

extern "C" int CreateInstance(const Uuid& uuidClsid, const Uuid& uuidIid, void** ppObject);

The Uuid type referenced below is a universally unique identifier, defined in Osal's IObject.h. The uuidClsid argument is the Uuid specified in the page source's configuration file. If a library implements several page sources, this function will be called once for each different page source. The uuidIid argument is the Uuid of the interface that Alajar expects the class referenced by uuidClsid to implement. At present, this argument will always be IID_IPageSource and will refer to the IPageSource interface as defined in Alajar.h.

The actual definition of IID_IPagesource is in Alajar.lib.

If the page source recognizes the class id and interface id, it returns an instance of the interface in ppObject and returns OK. Otherwise, it returns an error code such as ERROR_FAILURE.

The IPageSource interface contains the following methods:

int OnInitialize(IHttpServer* pHttpServer, IPageSourceControl* pControl);

int OnFinalize();

int OnGet(IHttpRequest* pHttpRequest, IHttpResponse* pHttpResponse);

int OnPost(IHttpRequest* pHttpRequest, IHttpResponse* pHttpResponse);

int OnError(IHttpRequest* pHttpRequest, IHttpResponse* pHttpResponse);

int OnBasicAuthenticate(IHttpRequest* pHttpRequest, bool* pbAuthenticated);

int OnDigestAuthenticate(IHttpRequest* pHttpRequest, bool* pbAuthenticated);

const char* GetAuthenticationRealm (IHttpRequest* pHttpRequest);

OnGet and OnPost are called when the corresponding events are generated by a client request. All available information about the request is provided through the IHttpRequest interface, and all information about what the web server's response should be is provided to the server via the IHttpResponse interface. The value returned should OK if all is well and ERROR_FAILURE if an internal error occurred (the report can be used to provide more information concerning the error).

OnInitialize is called when the web server is starting up, while OnFinalize is called when the web server is shutting down. These two functions allow page sources (which may contain complex data structures and file I/O schemes) to execute construction and destruction code when appropriate. The interface pointers provided in OnInitialize are should be copied and stored by the page source, and there is no need to AddRef and Release them. The same rules as with OnGet apply in terms of return values.

OnError is called when an HTTP error is generated by a client request, even when returned by the OnGet or OnPost methods. The same rules as with OnGet apply in terms of return values.

OnBasicAuthenticate and OnDigestAuthenticate are called when authentication is requested by a page source and a client requires authentication in order to proceed. If the page source accepts the user's credentials, it should set the *pbAuthenticated field to true and return OK.

The IHttpRequest interface

The IHttpRequest interface is used by page sources to obtain information about the client's request. The following methods can be used:
  • GetMethod returns the HTTP method requested: GET, POST, PUT, HEAD, or TRACE.
  • GetVersion returns the HTTP version requested: HTTP09, HTTP10, or HTTP11.
  • GetUri returns the URI requested (e.g., /directory/subdirectory/file.html).
  • GetFileName returns the name of the requested file, resolved to a full path on the local machine.
  • IsFileNameCanonical returns true if the requested file name could be resolved to a full path by the server; false if not.
  • GetBrowserName returns the name of the browser that the client used to make the request.
  • GetClientIP returns the IP address of the client.
  • GetReferer returns the request's referer field.
  • GetNumForms returns the total number of forms submitted by the client
  • GetForm returns an IHttpForm interface pointer representing the form submitted with the given name or index, or NULL if the form was not submitted.
  • GetFormNames returns an array of all submitted form names, or NULL if no forms were submitted
  • GetFormBeginsWith returns an IHttpForm interface pointer representing the first form submitted with the given string as a valid prefix in its name, or NULL if no such form could be found.
  • GetNumCookies returns the total number of cookies submitted by the client
  • GetCookieNames returns an array of all submitted cookie names, or NULL if no cookies were submitted
  • GetCookie returns an ICookie interface representing the cookie submitted with the given name or index, or NULL if the cookie was not submitted.
  • GetCookieBeginsWith returns an ICookie interface representing the first cookie submitted with the given string as a valid prefix in its name, or NULL if no such cookie could be found.
  • GetAuthenticationUserName returns the login name provided by the client, if authentication is being used.
  • BasicAuthenticate and DigestAuthenticate used to check the password provided by the client against the password stored by the page source.
  • GetHeaders returns the headers sent by the client; the string returned by this method is not guaranteed to be the complete set of headers submitted, including all forms and cookies.

The IHttpResponse interface

The IHttpResponse interface is used by page sources to define the response that will be sent back to the client. The following methods can are used:
  • GetStatusCode returns the status code the response will use when it is sent.
  • SetStatusCodeis called to specify the response's HTTP status code. The allowed codes are:
    • HTTP_200 OK
    • HTTP_301 Moved Permanently (use the SetRedirect call instead)
    • HTTP_400 Bad Request
    • HTTP_401 Not Authorized
    • HTTP_403 Forbidden
    • HTTP_404 Not Found
    • HTTP_500 Internal Server Error
    • HTTP_501 Not Implemented
    • HTTP_503 Service Not Available
  • GetStatusCodeReason returns the reason for the HTTP status code the response will use when it is sent.
  • SetStatusCodeReasonis called to specify the reason for the response's HTTP status code. The allowed reasons are:
    • HTTP_REASON_NONE
    • HTTP_REASON_IPADDRESS_BLOCKED
    • HTTP_REASON_USER_AGENT_BLOCKED
    • HTTP_REASON_GET_REFERER_BLOCKED
    • HTTP_REASON_STALE_NONCE
  • SetMimeType specifies the MIME type of the response if it is a dynamically generated response.
  • SetFile specifies the name of the file to be used as response content.
  • SetRedirect specifies the URI to be used for a redirection. This function also sets the status code to HTTP_301.
  • SetNoBuffering specifies that the response should be streamed to the client and not buffered until complete.
  • Clear removes all existing response data. This can only be called if buffering is still enabled.
  • Flush sends all existing buffered data. This can only be called if buffering has been disabled.
  • WriteData and WriteText can be used to add dynamically generated data to the response.
  • WriteTextFile and WriteDataFile can be used to send a text or binary file in response to a request, respectively.
  • CreateCookie and DeleteCookie can be used to create or update and delete cookies, respectively.
  • AddHeader can be used to add a header to the response. This can only be called before buffering is disabled.
  • AddCustomLogMessage can be used to add a custom text field to the server's log. This will only show up if the common logging format has been disabled.
  • GetCustomLogMessages returns the custom log messages that were added to the response.
Aside from these constraints, page sources can more or less do whatever they want. However, some care must be taken not to corrupt memory or generate unhandled exceptions, since the page source code runs in the same process as the Alájar executable and other page sources.

The IHttpServer interface

The IHttpServer interface is used by page sources to obtain information about the server that they are operating under. The following methods can be used:
  • GetHttpPort returns the HTTP port on which server is accepting requests.
  • GetHttpsPort returns the HTTPS port on which server is accepting requests.
  • GetHostName returns the server's host name.
  • GetIPAddress returns the IP address that the server machine is using.
  • GetServerName returns the name and version of the HTTP server.
  • GetNumThreads returns the number of threads in the server's thread pool.
  • GetStatistics returns a structure containing server statistics.
  • GetNumPageSources returns the number of page sources registered with the server.
  • EnumeratePageSources returns an IPageSourceEnumerator interface pointer that can be used to enumerate the various page sources registered with the server.
  • GetReport returns the web server's report file.
  • GetConfigFile returns the web server's configuration file.
  • GetFileCache returns the web servers's file cache.
  • Start starts the web server.
  • Shutdown shuts down the web server.
  • Restart restarts the web server.
The IHttpForm interface is used to obtain information about a given form submission:
  • GetNumForms returns the number of forms with submitted by the client.
  • GetForm returns a sub form corresponding to the given index. Multiple forms with the same name may be submitted by a client.
  • GetName returns the name of the form
  • GetValue returns the value submitted in the form as a string
  • GetIntValue returns the value submitted in the form as an integer
  • GetFloatValue returns the value submitted in the form as a floating point number
  • GetTimeValue returns the value submitted in the form as a UTCTime type
  • GetTypereturns the type of the form. There are three possible types:
    • SIMPLE_FORM
    • FILE_FORM
    • LARGE_SIMPLE_FORM
  • GetFileName returns the name of the file on the local machine if the form is of type FILE_FORM

Tools

To facilitate the development of simple applications, Alájar provides a text preprocessor called Asfpp, which takes as input a text file in a specific format and produces C++ code as output. To illustrate this procedure, a file with the following content:
<% #include <stdio.h> %>
 <html>
 <% for (int i = 0; i < 100; i ++) { %>
     <p><% Write (i);
 } %>
 </html>
... produces the following C++ code: <small>#include "Alajar.h"</small>
#include <stdio.h>
#define Write pHttpResponse->WriteText

int RenderTest (IHttpRequest* pHttpRequest, IHttpResponse* pHttpResponse, Context* pContext)
{
    pHttpResponse->WriteData ("\n\n<html>\n\n");
    for (int i = 0; i < 100; i ++) {
        pHttpResponse->WriteData ("\n<p>");
        pHttpResponse->WriteData(i);
    }
    pHttpResponse->WriteData ("\n\n</html>");
}
The result, when compiled and executed by a page source, is an HTML file containing the following text:
<html>

<p>0
<p>1
<p>2
...
<p>99
</html>
The ASF format was designed to resemble other popular scripting language formats and facilitate the conversion of code from these to Alájar. The <% and %> markers are used to encapsulate C++ code, while the rest of the text is the HTML to be sent to the IHttpResponse object. Asfpp is included in the standard Alájar distribution.

Administration

Alájar includes a page source called admin that provides basic administrative functionality. access to /admin is password protected via Alájar's built-in support for basic authentication. Through the admin page source, administrators can:
  • View a complete statistical breakdown of the requests the server has received since startup
  • View page source logs and report files
  • View the server report file
  • Restart and shut down page sources
  • Restart and shut down the server
  • Change server config file parameters
  • Change page source config file parameters

Last edited Oct 11, 2011 at 5:05 AM by mfeingol, version 19

Comments

No comments yet.