PHP XZ (LZMA) Extension

God I love acronyms, don’t I? It certainly seems that way from my posts. Anywho, I was recently trolling bugs.php.net and stumbled upon someone looking for an XZ compression/decompression extension for PHP. It seemed like the kind of thing I could sink my teeth into, so I decided to take a stab at it. I’m actually kind of shocked there wasn’t already one. There are extensions (built-in nowadays) for several other compression formats (zlib, bz2, etc) but none for xz/lzma2. I’ve hammered the beginnings of the extension and look forward to making it better. This might actually be the first thing I’ve done that might come in handy to a select few PHP users out there who need XZ streams. :)

 

You can check it out at https://github.com/payden/php-xz. Installation instructions are in the README.

 

Cheers.

libwebsock SSL example

I am generally pretty interested in how people end up here on my site. That being said, I’ve recently noticed quite a few requests coming in on analytics with the keywords “libwebsock ssl example.” This article is for you all. :) It occurred to me that I don’t really have any recent examples of the current API and using SSL. There is the github API page which briefly goes over the functions that are meant to be called by client code. I actually use libwebsock for the chat on this site and a cheap GoDaddy certificate for SSL on it. So, I’ll show you how I have that set up in my chat program. Because GoDaddy uses a certificate chain or bundle of certificates, it was necessary for me to allow specifying a chainfile. There are two API functions for telling libwebsock to use SSL. libwebsock_bind_ssl and libwebsock_bind_ssl_real are their names. The actual work is done in libwebsock_bind_ssl_real. libwebsock_bind_ssl is just a convenience function that calls libwebsock_bind_ssl_real with chainfile = NULL. Anyway, I have three files related to my SSL certificate on the box: ubiety.net.key, ubiety.net.crt, and sf_bundle.crt. I listen for IPv4 requests as well as IPv6, so this is the block of code in my chat program that uses libwebsock to abstract away the WebSockets stuff:

 
  libwebsock_bind_ssl_real(ctx, "50.116.45.75", "443", "ubiety.net.key", "ubiety.net.crt", "sf_bundle.crt");
  libwebsock_bind_ssl_real(ctx, "2600:3c02::f03c:91ff:feae:96f3", "443", "ubiety.net.key", "ubiety.net.crt", "sf_bundle.crt");
 

That’s it! You just pass in the libwebsock_context you created in libwebsock_init(), the ip address and port you want to bind to, the key file, the certificate file, and chain file. If you don’t need to use the chainfile, either use libwebsock_bind_ssl or just pass NULL for chainfile. Hope this helps. Always feel free to email/comment with questions regarding libwebsock. Cheers.

PHP SAPI Documentation

First, I’d like to preface this by saying that PHP is a complicated piece of software and that I am by no means an expert when it comes to PHP internals. If you find something wrong with this article, please let me know! My current employer introduced me to using nginx with PHP-FPM over FastCGI. I fast became a fan of nginx because of its small footprint and, in my opinion, cleaner, more modular design and implementation. I quickly switched to using nginx/PHP-FPM as my main means for deploying PHP web applications. I recently became interested in the FastCGI protocol itself and how exactly PHP-FPM works. This led me investigate more in regards to PHP internals and the different methods one may use to get PHP code to run in a meaningful manner. This article will attempt to give a high-level overview of what the SAPI layer is, why it’s important, a few common SAPI modules, and the very basics of how it works. The PHP developers recognized the need for PHP code to be run from different contexts and environments somewhat early and implemented a layer to allow the environment to be easily swapped out while still using the same core Zend Engine and PHP core code. This layer is called the SAPI layer or Server Application Programming Interface. This is a perfect example of good programming practice to allow flexibility and extensibility through modular design. For example, the PHP developers probably had no idea upon initial design of this layer that it would one day be used to process PHP scripts in a separate process given information from a webserver over FastCGI and then pass the results back to the webserver over this same FastCGI transport. However, due to this modularity of design, such a SAPI module exists and is called PHP-FPM. I would wager that almost all distributions of PHP come with at least two of the SAPI module implementations though you may not know them by that name. Two of the most popular PHP SAPI modules are the CLI (command line interface) and mod_php for Apache. The CLI SAPI, of course, allows one to run a binary, usually named ‘php’, and execute PHP scripts passed in as arguments from the command line having the output rendered to the standard output (stdout) and standard error (stderr) streams on the console. The mod_php SAPI allows the Apache webserver to service requests for specific PHP resources, have PHP read the PHP file, tokenize and execute it, and capture the output (usually HTML of course) to be sent back to the web client. The mod_php SAPI also is able to communicate certain information about the request to the webserver (query string, POST data, server host name, remote client ip address, etc, etc) to PHP so PHP can use this information during the processing of the PHP script. All of this is accomplished through the SAPI layer. So, now that I think we have a good idea of what it is and why it is important, let’s take a little look at how the basics work. I would like to introduce the sapi_module_struct. This struct is a bit large and probably rather daunting at first but it’s really not all that bad once we get to know it. I’d also like to say that if you are not a developer looking to develop your own SAPI module or just a curious fellow/gal you should probably go on about your business at about this point. :) The following comes straight out of the PHP source distribution under the path ‘main/SAPI.h’.

 
 
struct _sapi_module_struct {
        char *name;
        char *pretty_name;

        int (*startup)(struct _sapi_module_struct *sapi_module);
        int (*shutdown)(struct _sapi_module_struct *sapi_module);

        int (*activate)(TSRMLS_D);
        int (*deactivate)(TSRMLS_D);

        int (*ub_write)(const char *str, unsigned int str_length TSRMLS_DC);
        void (*flush)(void *server_context);
        struct stat *(*get_stat)(TSRMLS_D);
        char *(*getenv)(char *name, size_t name_len TSRMLS_DC);

        void (*sapi_error)(int type, const char *error_msg, ...);

        int (*header_handler)(sapi_header_struct *sapi_header, sapi_header_op_enum op, sapi_headers_struct *sapi_headers TSRMLS_DC);
        int (*send_headers)(sapi_headers_struct *sapi_headers TSRMLS_DC);
        void (*send_header)(sapi_header_struct *sapi_header, void *server_context TSRMLS_DC);

        int (*read_post)(char *buffer, uint count_bytes TSRMLS_DC);
        char *(*read_cookies)(TSRMLS_D);

        void (*register_server_variables)(zval *track_vars_array TSRMLS_DC);
        void (*log_message)(char *message TSRMLS_DC);
        double (*get_request_time)(TSRMLS_D);
        void (*terminate_process)(TSRMLS_D);

        char *php_ini_path_override;

        void (*block_interruptions)(void);
        void (*unblock_interruptions)(void);

        void (*default_post_reader)(TSRMLS_D);
        void (*treat_data)(int arg, char *str, zval *destArray TSRMLS_DC);
        char *executable_location;

        int php_ini_ignore;
        int php_ini_ignore_cwd; /* don't look for php.ini in the current directory */

        int (*get_fd)(int *fd TSRMLS_DC);

        int (*force_http_10)(TSRMLS_D);

        int (*get_target_uid)(uid_t * TSRMLS_DC);
        int (*get_target_gid)(gid_t * TSRMLS_DC);

        unsigned int (*input_filter)(int arg, char *var, char **val, unsigned int val_len, unsigned int *new_val_len TSRMLS_DC);

        void (*ini_defaults)(HashTable *configuration_hash);
        int phpinfo_as_text;

        char *ini_entries;
        const zend_function_entry *additional_functions;
        unsigned int (*input_filter_init)(TSRMLS_D);
};

Okay, like I said, there’s a bit much to swallow here. Let’s just hit on a few key members and callbacks in this article and hopefully we can get more specific and explore further in future articles. First, name and pretty_name are pretty self explanatory. I will say that pretty_name is what gets displayed in phpinfo() as Server API. It seems to me that the startup and shutdown callbacks in the struct are pretty much wrappers to php_module_startup and php_module_shutdown in most cases and are fairly simple in their implementation. The startup callback needs to be called after sapi_startup() and the shutdown callback needs to be called before sapi_shutdown(). I still need to dig a little deeper to discover what all goes on at this stage. Next, are the activate and deactivate callback functions. Activate is called during php_request_startup() which is run once at the beginning each request. Deactivate is called at the end of each request and will run before we call php_request_shutdown(). We use these hooks to perform any work that needs to happen before each request and after each request executes. For example, in the deactivate callback of my simple FastCGI SAPI, I use this hook to send out my FCGI_END_REQUEST over the transport and close the connection which signals to the webserver that there is no further output for this request. ub_write is a very important callback which gets called during execution of the PHP script when PHP wants to write data out. ub_write in the CLI SAPI, in most cases, simply writes the data passed into the callback to stdout. In contrast, ub_write in the PHP-FPM SAPI enqueues the data to be sent over the wire in a FCGI_STDOUT record back to the webserver. This callback is where we decide how to handle the data that PHP is outputting. The send_headers callback gets passed a pointer to a sapi_headers_struct where one might loop over the headers and perform certain modifications or additions to the headers while writing them out. The read_cookies callback simply returns a string which gets parsed out and populated in the $_COOKIE PHP superglobal. In my simple FastCGI SAPI and in PHP-FPM we pull this from FCGI_PARAMS that has been sent across with the key HTTP_COOKIE and return it. The read_post callback gets called (usually multiple times) getting passed a buffer to fill and the number of bytes available in the buffer. In PHP-FPM this is drained from FCGI_STDIN records which the webserver sends from the POST data posted by the web client and written out to the buffer accordingly. This is then parsed by PHP and populated in the $_POST superglobal. As you can see, all of these callbacks provided by the SAPI layer allow us to control how we pass certain data to PHP and how we handle the output generated by PHP. Again, this allows us to easily swap out the context or environment in which we run PHP code and easily harness the power of PHP in many different ways. This has been just a brief introduction to how some of these things tie together to get the job done. All of this information was gleaned from other SAPI modules and inspecting PHP core code. I still have a long way to go in understanding all the nuances and subtleties of the SAPI layer and look forward to bringing you more information about developing PHP SAPI modules. Also, it has come to my attention that there is absolutely no documentation on how to go about writing one of these SAPI modules (other than the source code of existing ones) and I’d like to change that. I will be working on some tutorials for jumping into SAPI module development as I progress in my own endeavors into this area. Keep an eye out for them! Thanks for reading.

Netcat flexibility

I thought I’d take a moment to write a quick blurb about a wonderfully flexible and useful utility I use somewhat frequently: netcat. Netcat is a relatively easy to use network utility. It allows quick creation of network connections. It’s most useful to me when used in conjunction with other programs to quickly tack on network capabilities. Have you ever had to copy a directory tree quickly from one machine to another on a LAN? Most people would choose scp or rsync over ssh for something like this. What about when the tree size is really quite large and you would like to squeeze every ounce of speed out of the copy? If encryption is not important (e.g. on a LAN), why waste the CPU cycles on it? One trick to avoid encryption for these copy tasks is to use netcat with tar. For example, say I had a somewhat large directory structure on my work box and I wanted to copy the raw bytes across the network to my laptop using netcat. On the laptop, after changing directories to where I want the tree extracted, I would fire up netcat to listen on an arbitrary port and to pipe any incoming data to tar for extraction. The command to do this might look like:

 
nc -l -p 3333 | tar -xvf -
 

On my work computer, if in my current directory I had a sub-directory named ‘tree’, I would execute the following command:

 
tar -cvf - tree | nc <laptop_ip> 3333
 

This command tells tar to create an archive of the directory ‘tree’ and output the resultant archive to stdout (-). We then pipe that output to netcat which sends it across the wire to <laptop_ip> on port 3333. Netcat on the laptop receives the archive byte stream and passes that byte stream to the running instance of tar (tar -xvf -) which will take the byte stream and extract that archive to the current directory. Done deal. I use this quite often for LAN copies and I feel I must repeat that this is probably not a good idea if you are copying sensitive data across the internet. Use rsync over ssh or plain old scp for those instances. I’m sure there are many more cases where you will find netcat to be the right utility for the job. I find that many utilities I use on a Linux system are very good at a singular purpose. I find it extremely elegant that Linux (and other *NIX variant) systems were designed to allow this kind of modularity and easy cooperation between thousands of programs by a simple concept such as piping. It makes me happy. :)

Until later.

Fun in a modern browser

Recently, I’ve been playing around with more of the HTML5 APIs that modern browsers provide and I must say, I am impressed.  It seems the line between web application and native application is being blurred bit by bit.  I was almost a little taken aback to discover we can now read arbitrary binary files (only after the user has selected one via a file input type, of course) and inspect the binary data using nothing but JavaScript and the new APIs I just mentioned.  As a test, and to familiarize myself a bit more, I just wrote a little MP3 ID3 tag reader using nothing but a bit of HTML and some JavaScript.  I use a file input type in HTML and bind to the onchange event.  When onchange is fired the first time, I create a new Web Worker.  On the first and each subsequent onchange event, I also post the File object that represents the file the user selected to the Web Worker.  The Web Worker receives the File object, creates a new FileReaderSync and reads the File into a new ArrayBuffer.  It should be noted that FileReaderSync is only available inside a Web Worker because it is synchronous and blocks while reading the file.  When the file has been read into the ArrayBuffer, I create a new DataView for this buffer and proceed to pull the appropriate bytes out for the various ID3 attributes based on the known offsets of where the ID3 tag should be.  I then create a simple object with the tag attributes and post it back to the main UI thread, where I fire off the ID3Reader’s onload event.  None of this was terribly complicated and the interfaces are pretty easy to use.  All in all, I’m pretty excited about the possibilities this opens up for web developers.  Now, I just need to force myself to learn some OpenGL ES and start playing with 3D graphics acceleration.  :)  Anywho, you can check out the source for my musings at https://github.com/payden/html5-mp3-id3-reader.

 

Busy busy busy

Well, I haven’t had a chance much lately to do any further work on my Ubiety project. I finally got around to making some more changes tonight. I also did some work getting everything IPv6 compatible in libwebsock. It should handle IPv6 address bindings and IPv6 clients just as well as IPv4 ones. I also purchased my SSL certificate for the instance of Ubiety that will run on my VPS. This means traffic to and from the server is encrypted now. The URI used by your browser for connecting to the Ubiety server is wss://ubiety.net. There are both A and AAAA (IPv6) records for ubiety.net, so if you’re surfing IPv6 style, you should have no problems connecting. The work I just did involved persistence while browsing a blog. The way this works is the messages for a channel (website/blog) are recorded on the server side in a history buffer. When a client connects or reconnects, the time of their connection is checked against the channel history and any messages *after* their connection are packaged up in a WebSocket message and sent to the client. The messages are popped into the messages div on arrival. It may be a crude way to do things, but it works for now I guess. The problem is that browsers don’t support persistent WebSocket connections. I can understand why, but it does make persistent applications problematic. I also added a quick window state flag that is stored server side so window state is remembered across page views. I still need to work on handling connection issues and retrying the connection if problems arise. Anywho, I will keep plugging away at this thing until I’m satisfied with it. It’s become a sort of pet project for me. :)

Cheers.

Ubiety Progress

I’ve been spending some time here and there working on a little side-project rooted in my old AjaxChat plug-in for WordPress. I’ve always disliked having to poll constantly to implement a chat system on a web page. Now that WebSockets have become more standard and there are drop-in flash fallbacks, I’ve become interested in writing code in this area. :) I have a server side WebSocket library written in C which I’ve dubbed libwebsock. Ubiety is the name of the new WordPress plug-in that implements a chat system for blogs using WebSockets instead of polling. The WebSockets server runs on my VPS and will be a centralized place the messages pass through. I chose this architecture for a few reasons. I was using Pusher as a service through which the chat messages would be sent. The problem with this route is that you have to sign-up for an account with Pusher and enter in some settings to get the plug-in working in WordPress. I’d like to make using the plug-in as simple as possible. You just drop the directory in your `wp-content/plugins` and click on activate in your administration panel. Done. I’d also like to consider the possibility of messaging between blogs in the future. This could easily be done with a centralized chat server. I’m also considering putting the source of the ubiety-server (the centralized WebSockets server running on my VPS which links against libwebsock) out so anyone could run one. I don’t want to do this just yet for various reasons. While I’ve got a version of Ubiety running on my site here (lower right corner), I have yet to tag a release. It’s just not ready yet. I will continue testing here on my blog and get some things straightened out. There’s a few issues in particular I need to hammer out:

 
  • Persistence while browsing a blog
  • Client-side error reporting if connection can’t be established or is interrupted
  • Purchasing of an SSL certificate for ubiety.net so all communications will be encrypted
 

Anywho, I just wanted to start testing out my new plug-in here on my blog. If you’ve played with my old AjaxChat plug-in, I think anyone will agree that using WebSockets is *much* faster than the polling method used in AjaxChat. Let me know what you think in the comments. Thanks.

My take on Google App Engine

It sucks. I want to just leave it right there and not expend one more ounce of energy on anything to do with GAE, but, unfortunately, I feel I must justify that statement. I’ve been working on a GAE Java project for about 5 months now. I’d just like to say that this is my first and my last project that involves GAE in any way, shape or form. I’d also like to point out that without the deficiencies in Google App Engine, I would not be writing this entry. I had to change one line of code and deploy yet again. This means I’ve got a good twenty minutes of writing before I found out whether or not my change had any effect. Yes, it seriously takes around 20 minutes for me to push out a one-line change in the code to production. You might say: “Well, that’s fine. That’s why we have the development environment.” Yes, that is fine. The problem arises when you have some code that works perfectly on your development environment and for one reason or another fails in production. The task which I’m currently working on involves using the URLFetch service to perform a login to a 3rd party site and scrape some data out of the HTML. It seems to me that when using the URLFetch classes in the development environment, cookies are stored and presented for the next request automatically. This is not the case in production. It doesn’t bother me either way, but at least be consistent about it Google. I also completely understand that it’s a very difficult problem to solve and that having a few instances where the development environment isn’t up to scratch should be expected. However, this is not the first time I’ve had code work wonderfully in the development environment and then bang my head for hours while I tweak little bits of code and deploy them to my ‘test’ production environment. The development cycle in these cases typically goes like this: change a few lines of code, wait 20 minutes while it’s deployed, find out it doesn’t work as expected, curse and bang your head against the wall, rinse and repeat. There has also been plenty of other things that have led me to not enjoy working with Google App Engine. One is the lack of flexibility. The white-listing of classes has also been a nuisance, but I guess that at least makes some sense. Another big problem is the lack of enforcing a type on High-Replication data store puts. It would have helped very much to know what type my data was stored as and what type it is when I retrieve it. It seems to me the abstraction at this level caused more problems for me than it solved. My disdain for App Engine really boils down to the following points:

 
  • Slow deployment
  • Just because it works in the dev environment doesn’t mean it will work in production
  • Lack of High-Replication data store type enforcement
  • Lack of support for certain 3rd party Java classes
 

Long story short, if I ever have an employer who wants to use App Engine for anything in the future, I am going to lobby against it vehemently. Almost nothing that I have ever done with App Engine has been an enjoyable experience or worked the way I expected. Maybe some of that is my problem, and that is why this is *my* take on Google App Engine. I’d certainly be interested to hear the opinions of others who have used GAE for their projects and encourage you to comment if you have. Thanks!

libwsclient functional

I’ve spent some more time working out the basic functionality of libwsclient. As of right now, one can open a connection to a WebSocket server using the standard WebSocket URI, define some callbacks to be fired (onopen, onclose, onmessage), and send text messages across the wire to the server using the library. Right now, any messages that are sent would have to be done in one of the callbacks based on certain conditions. I’m noodling the idea of allowing text to be passed into the running program somehow and having libwsclient receive the data and send it across the wire for you. I might implement this using UNIX sockets and a helper program. For example, something like:

 
echo "Some text to send" | wsclient -s somesock.sock
 

Of course, the file somesock.sock would be defined in your program and made available through an API call. I might come up with some better way to do this. I’ll have to think on it a bit. Below is some simple code that echoes what it receives to standard out. The WebSocket endpoint in the example is mtgox.com‘s streaming API. It will output some data about Bitcoin trading prices. It’s probably not all that interesting data to everyone, but it will show that the library works. :)

 
#include <stdio.h>
#include <stdlib.h>
#include <wsclient/wsclient.h>

int onmessage(wsclient *c, libwsclient_message *msg) {
    printf("Received (%llu): %s\n", msg->payload_len, msg->payload);
    return 0;
}
int main(int argc, char **argv) {
    wsclient *client = libwsclient_new("ws://websocket.mtgox.com/mtgox");
    if(!client) {
        fprintf(stderr, "Unable to initialize new WS client.\n");
        exit(1);
    }
    libwsclient_onmessage(client, &onmessage);
    libwsclient_run(client);
    libwsclient_finish(client);
    return 0;
}
 

More updates to come!

My version of stricmp for C

Well, here I am, banging away at my new client WebSocket library. According to RFC 6455, there are several instances when parsing the response headers from a WebSocket server that I must make case-insensitive comparisons. Alas, libc does not implement a case-insensitive version of their string comparison function, strcmp. My initial solution involved copying the strings, bitwise OR’ing 0×00100000 with the character to convert it to lowercase (if it wasn’t already) and then calling strcmp and returning the result. I wasn’t happy with my solution, so I went and queried my neighborhood Google server for some more information. I ended up stumbling upon the source for GNU libc’s strcmp and decided to modify it slightly for my case-insensitive purposes. I can’t get anywhere near the speeds of strcmp, but I suspect that a case-insensitive search is bound to be slower no matter what. My second version was indeed noticeably faster than my first, so I thought I’d throw the code up here. The only difference really is AND’ing the characters in the strings with the negation of one left-shifted five times. This has the effect of turning lowercase letters into uppercase and not really changing any other characters while we’re at it. This is why I went to uppercase instead of to lowercase; If I turned on the lowercase/uppercase bit no matter what, it would change non-printable characters and in particular change a null byte into something else. Not good. Here’s the code:

 
int stricmp(const char *s1, const char *s2) {
        register unsigned char c1, c2;
        register unsigned char flipbit = ~(1 << 5);
        do {
                c1 = (unsigned char)*s1++ & flipbit;
                c2 = (unsigned char)*s2++ & flipbit;
                if(c1 == '\0')
                        return c1 - c2;
        } while(c1 == c2);
        return c1 - c2;
}
 
Ubiety v0.0.1
Chat ()