Wednesday, July 25, 2007

The Windows Branch

Well I've practically completed porting the existing Traffic Analyzer code over to Windows. One or two sacrifices had to be made, however, most notably I had to cope with the lack of support for certain struct sockaddr conversion functions such as inet_aton which are normally in the arpa/inet.h header. Still, it's compiled (under MinGW, screw you Visual Studio!) and seems to work so that's cool.

Additionally, the packet capture and processing features of the Traffic Analyzer were practically completed last week when I did a partial re-write. Packet processing is now done within the constructors of the three primary classes; Session, ServerResponse and ClientCommand. Sessions encapsulate ClientCommand objects which then encapsulate their corresponding ServerResponse objects. Generic information is extracted from server responses, such as the number of rows affected by a query, error codes, warnings and server status codes. Client Commands could be elaborated upon but that's not a major issue and can be added later when integrating the RuleFilter class.

I'm looking into constructing classes with the MySQL C API so that information can be dumped into the MySQL Auditing Server and configuration data can be downloaded from it. This will be pretty straightforward, in my opinion, and should port to Windows pretty easily. I expect this portion of the coding process to be completed over the long weekend.

Saturday, July 14, 2007

A Strong Handshake...

After many gruelling hours I've managed to put together reliable methods for decoding the contents of both the Server Handshake packets and Client Credentials packets. Due to the fact that both packet types contain variable length-strings representing details such as the server's version and the client's username (added to the fact that I want to use the MySQL C API) I had to stay away from c++ strings and remain in the realm of null-terminated C-strings. Of course, this meant I had to introduce measures for preventing memory leaks since we're talking about variable length (dynamic!) strings.

That was quite a headache given the amount of complex pointer-passing I'm doing, with segfaults rearing their heads left right and center along the way. However now that I've got some solid methods for decoding, creating and destroying these data types, I'm confident that subsequent work on decoding command and response packets will got far more smoothly!


We'll be having a conference call tomorrow and I hope that I'll have the command decoding methods completed before then so that I'll have a nice lump of progress to demonstrate :)

Friday, July 6, 2007

The woes of session tracking...

After a relaxing holiday in France with some friends I now return to coding! I can feel my tan wearing off already as I enclose myself in my room, fingers tapping away on the keyboard, screen glaring its artificial light onto skin which for a few sweet days had basked in the glow of real sunlight!

But that's for losers.

Anyway, the problem right now is the size of the steps I'm taking. The initial traffic analyzer design was simple enough because it didn't attempt to interpret the MySQL packet payloads in any significant way. It discarded unprintable characters and dumped the rest to the console or to a file. Right now, however, I've assigned myself the task of actually interpreting the data I'm receiving and this is not quick or easy.

The problem lies in the passive nature of the system. As with the design of any complex piece of software, you have to think long and hard about what can go wrong. In the case of a packet capture-based system where the input is variable and comes from unpredictable sources this is very much the rule of thumb. I am designing and implementing classes which will differentiate between connections and store data about those connections in discrete storage systems for later processing.

Due to the nature of MySQL packet payloads, only the initial connection handshakes really hold data about who is logging on to the server and what the server's capabilities are. Subsequent packets are aimed at a specific socket which is handled by a MySQL thread spawned for the given session, with all related session data assembled during the initial handshake. The difficulty in aping this system is high, especially when you consider that the auditing system may be executed after the initial handshake packets have already been sent. Little contingencies like that have to be planned for.

So right now I'm putting together a fairly complex set of classes which will identify packet's related session, understand the nature of those packets in terms of their purpose and extract relevant data from them.

Friday, June 15, 2007

Traffic Analyzer - Phase 1

Well, the foundations for the Traffic Analyzer module were finally laid today; the basic classes of the module are now up in the SVN repository.

So far I've only constructed the classes for Unix and not Windows, I've created a separate branch for the Windows code but the associated directory is currently empty. Of course, there may be source files which will end up being common to both branches, however I don't think that maintaining a more complex directory structure in order to have both branches share common source files is something that's currently necessary (or something I currently know how to do!).

So right now the Traffic Analyzer can detect packet traffic and deconstruct packets sequentially using some clever pointer logic and special structs; nothing new there. There are a few command line arguments that the user can use to customize how the program runs and I've even included the old ConfigReader class from previous projects which can be used to read and parse standard .conf files or user-specified configuration files.

The problem right now is threefold:
  1. How do I properly parse the packet payload? I don't understand enough about the special characters I'm currently seeing padding the SQL queries to know what they mean, how I might use them or whether I can discard them.
  2. Will there be much of a headache in porting this code to Windows? I've got some knowledge of Winpcap but what else is there to consider? My experience with Windows program development has been so blissfully limited up until this point.
  3. How much of a pain in the ass is the MySQL API going to be to master? It doesn't look that bad but I expect there's going to be buckets of code needed in order to sanitize every little thing that's entered into the Auditing database. Even getting around this with the help of the MySQL++ libraries wasn't easy last time!
In any case, the next steps involve sorting out the above 3 issues (after entering them on the issues wiki of course!) and killing any bugs found along the way. Overall I'm quite happy with the code I've submitted thus far, I've tried to keep it as modular as possible and I think the distinct purpose of each class used is pretty obvious and changeable. So far so good! We'll see what kind of a mess this all degenerates into once we've progressed down the line a bit further :D

Saturday, May 26, 2007

Our First Project Meeting

Well I have to say, Skype is a piece of crap. Between the abysmal sound quality/compression artifacting and the compatibility issues with my Gentoo install we had a nightmare trying to communicate, eventually resorting to text chat.

I think we clarified a few ideas during this session, mostly spec-related but with one or two interesting ideas cropping up.

One notable thought was local auditing; how do we watch local administrators' queries? We can't capture packets from their session since they're working on the server itself so what can we do? This may involve a whole other branch of code that allows us to send local server events to the auditing server for storage and flagging. I'm not entirely sure how we're supposed to sit between the user and the mysql server, especially if the user is a domain/local administrator and has some determination not to get caught!

Another interesting thought was rules; flagging certain queries depending on their content, preferably based on some modular set of parameters that we can change on the fly. I'm thinking up a nice multi-threaded model for this at the moment, I'll have to go and look up pthread again to get the specifics on mutex usage for this one but I'm pretty certain it's doable. A polling model may be necessary in order to detect the addition of new parameters/rules, however, if we're to assume that they're added through the admin interface to the auditing database itself. I think there are some MySQL trigger-like options that could be used here.

Definitely looking forward to the next meeting and to getting some serious coding done, these distributed-type apps are always a laugh, I'm looking forward to getting my teeth into the MySQL C API specifically since I've only really used the MySQL++ library up until now and that's a bit limited.

I'm hoping that I'll be able to get involved with the design of the admin interface at some point, and that Umair will throw some ideas my way regarding the query interpretation module, adding a nice bit of overlap to our work. I think an interface based on Ajax design principles would be ideal, and not very difficult to implement if we were to write the php database-query functions first and the actual display afterwards. We'll see though...

Thursday, May 24, 2007

Entry the First

Well there hasn't been any actual coding work done yet, this is just the project setup phase. This blog is connected to the Google Summer of Code 2007 Project "MySQL Auditing Software", the project page can be found at this link.

The project team is composed of myself, Umair Mehmood and the project mentor, Sheeri Kritzer. We'll be holding a conference call this Friday on Skype in order to exchange notes and design ideas.

I already have a fair idea of how this software can be designed, based on discussions with the project team members and research done into similar products I believe that a design based on passive packet capture is a good way to go. The resulting captured data can then be interpreted and reformed in whatever way we see fit before being passed on to the database servers.