Visualizing the WordPress 3.0 contributors

After noting that I wanted to start congratulating new WordPress contributors on Twitter, Ozh Richard suggested I make a word cloud, as had been done in some previous releases.

So, based on a Trac report I made for demetris so he can compile the list of contributors, I generated these word clouds. These are based on changesets 12456 to 14319 (thus, as of this morning). Of 1864 commits, 677 of them had props given, for a total of 720 props (some commits had more than one). Patches were contributed by 170 people so far, the most ever (or so I’m told).

It was embarrassingly easy. I did a tab-delimited export of the report, grepped out what I didn’t want, and manually scanned the list for misspellings. Took me maybe 15 minutes on the flight to WordCamp San Francisco (I’m also in the air while posting this). Then I used Wordle for the cool visualization, and TagCrowd for the more functional one. (TagCrowd is also what Peter Westwood used in one of the clouds linked above.)

Frumph asked how a contributor is defined in this context, so let me do that. When the core team commits code, we mention the authors of the patch in the commit message by awarding “props,” such as “props nacin.” (You know, like giving “kudos.” Same deal.) We don’t give them to ourselves, but if there’s no props listed, then you can assume we wrote the code (or forgot the prop).

A disclaimer: These may not be accurate, with the reasons ranging from oddly formatted commit messages all the way to issues with my compiling.

Another disclaimer: Yes, I’m sorry my name is so big. I really am. I contributed a lot of code before becoming a committer, and this does actually exclude commits by the core team, including my own. (We made more than 1,000 commits on our own.) So my name is about six weeks from late December to early February. In 3.1, my name will be much smaller. 🙂

I’d hope this goes without saying, but props are not why I contribute to WordPress. I don’t keep a tally. This is just a cool visualization that shows the sheer breadth of the number of contributors, plus who some of the larger contributors are. Also, quantity does not equal quality.

Without further ado, the pretty Wordle:

And from TagCrowd (click for a much larger size):

I’m attending WordCamp San Francisco 2010

I’ll be in San Francisco this weekend at WordCamp SF. It’s officially a one-day conference, but I’ll be there for meetings and work sessions into the next week:
WordCamp San Francisco 2010
Saturday, May 1: WordCamp
Sunday, May 2: Developer Conference
Monday, May 3 and Tuesday, May 4: Code Sprint!

The full schedule is on the WordCamp website. Here’s what it says about the code sprint, which I imagine may be one of the highlights of the trip:

A number of WordPress core developers will be working at Pier 38, the Automattic Lounge, from 9am onward to work on patching as many bugs as possible for the 3.0 release.

I’ll have a chance to meet with many people I’ve gotten to know well over the last few months. One goal will be to meet with my Google Summer of Code mentors, Andy Skelton and Beau Lebens, and hammer out the scope of the project. Another will be to solve this nasty bug on the plane ride on Friday.

My new Macbook Pro arrived Monday (13″, 2.53 GHz, 4 GB RAM, 250 GB HD), so I’ve been getting my development environment set up and ready to go.

I’ll be probably way too active on Twitter this weekend. Oh, also, I’ll be on the genius bar at one point Saturday. If you’re going to WCSF, find me and say hello!

5 Ways to Debug WordPress

Many plugin and theme authors don’t take full advantage of some really helpful debugging tools in WordPress. Here’s a quick run-down of five cool tools for debugging:

1. WP_DEBUG

define( 'WP_DEBUG', true );

It’s no secret I love this constant and everything it stands for. Define it in wp-config.php and you’ll start seeing PHP notices and also WordPress-generated debug messages, particularly deprecated function usage.

(Added June 27, 2010: You may wish to check out my Log Deprecated Notices plugin.)

There’s also WP_DEBUG_DISPLAY and WP_DEBUG_LOG, which enable you to log these to a wp-content/debug.log file. I’ve added some inline documentation that describes these both well. Some use WP_DEBUG on a live site and just make sure it gets logged.

WP_DEBUG will often reveal potential problems in your code, such as unchecked indexes (empty() and isset() are your friend) and undefined variables. (You may even find problems in WordPress itself, in which case you should file a bug report.)

2. SCRIPT_DEBUG

In the admin, WordPress minimizes and concatenates JavaScript and CSS. But WP also comes with the “development” scripts, in the form of dev.js and dev.css. To use these instead:

define( 'SCRIPT_DEBUG', true );

3. SAVEQUERIES

The WordPress database class can be told to store query history:

define( 'SAVEQUERIES', true );

When this is defined, $wpdb->queries stores an array of queries that were executed, along with the time it takes to execute them.

The database class has additional error and debugging tools, which are documented on the Codex (though when in doubt, check the source).

4. The ‘all’ and ‘shutdown’ hooks

There’s an ‘all’ hook that fires for all actions and filters. Example usage:

add_action( 'all', create_function( '', 'var_dump( current_filter() );' ) );

You’ll be surprised how many hooks get executed on every page load. Good for troubleshooting and identifying the right hook.

There’s also a ‘shutdown’ hook you can use in combination with, say, SAVEQUERIES, and write the query information to the database. It’s the last hook to run.

5. Core Control

There are plenty of great developer-oriented plugins out there, but I’m not sure any list would be complete without Dion Hulse’s Core Control plugin. It is comprised of five modules covering Filesystem methods, HTTP methods, HTTP logging, Cron tasks, and upgrades. A must-have.


This list is by no means exhaustive, just some quick hits to get you started. What tools do you use?

WordPress serializes options and meta for you

When tracking down a potential bug last week, I noticed that many plugin authors were making the same mistake and were making their lives much more difficult in the process. The issue was related to the serialization of data (here’s the PHP manual entry). In the most basic use case, serialization is a way to store arrays and objects directly in the database, which can only store numbers, text, and dates. Serialization takes an array and turns it into a serialized string. For example:

$data = array( 'apple', 'banana', 'orange' );
echo serialize( $data );
// Result is a string we can unserialize into an array:
// a:3:{i:0;s:5:"apple";i:1;s:6:"banana";i:2;s:6:"orange";}

WordPress has a few helper functions that we use instead of serialize() and unserialize() — maybe_serialize() and maybe_unserialize(). The first only serializes data that needs to be serialized — arrays and objects — and the second only unserializes data that is already serialized. (We have a lot of handy functions like these.) At some point in 3.0, something changed, and it caused an error for plugins using get_post_meta(). Matt Martz and I tracked this down to a change in maybe_serialize():

It comes out of a change to maybe_serialize() in r13673, which for a long while serialized already serialized data, and now no longer does. We’ll probably revert this. [Which I did in r14074.]

This shouldn’t have broken plugins however, at least not in this case. But here’s what the plugin was doing:

update_post_meta( '_my_plugin_meta', serialize( array( 'foo', bar' ) ) );
unserialize( get_post_meta( '_my_plugin_meta' ) );

The unserialize and serialize bits are unnecessary. The post, comment and user meta functions, and the functions for options and transients (and site meta) all transparently serialize and unserialize data for you. Thus, this works:

update_post_meta( '_my_plugin_meta', array( 'foo', bar' ) );
get_post_meta( '_my_plugin_meta' );

I explained what was going on in #12930. Thanks to Ipstenu for raising the ticket, as we would have received a lot of complaints due to the change:

More or less, that means that you’re serializing the data, then update_option is serializing serialized data, then get_option is unserializing it once, and unserialize is unserializing it again. r13673 breaks this, as update_option doesn’t serialize the data a second time any more, causing the plugin’s unserialize() to attempt to perform a second unserialize() on data that was only serialized once.

In this case, the change was accidental, and we already went through this once nearly two years ago (see #7347r8100r8372, and others). But sometimes plugin developers that the API incorrectly or make bad assumptions makes it significantly more difficult for us to improve WordPress, as we are very mindful of plugins we may break — even if the plugin is “Doing It Wrong.” So please, don’t make it harder for us to make it easier for you.

Overhauling roles and capabilities, part 2

This is a follow-up to my initial overview of the roles and capabilities system in WordPress. I’d check out Part I first.

I’ve previously explained the complexities of the roles and capabilities system, but I haven’t adequately argued why they are too complex.

Numerous developers have weighed in at Trac ticket #10201, but I want to touch on this anyway. More or less, the current system is fine if you want to know what the current user can do. We load up their roles and merge the capabilities that make up those roles, add any other capabilities the user was granted, and remove any revoked capabilities. Then, we check if the capability we’re checking is in that list of capabilities.

The problem is apparent when you want to know the answer to this question: How many users have capability X? Querying for abstract capability X strikes some as edge case at first, so a better question might be: how many users should be considered at least an author?

To answer this question, we have to load up each user individually and build their derived capability list, by loading up their roles, add-on capabilities, and revoked capabilities. It should be fairly apparent that on a site with many users, the capabilities system can be a performance issue, to say the least.

Let’s remove user-level capabilities from the equation. Now, we can load up all of the roles, figure out which have capability X, and figure out which users have those roles. Clearly a huge performance benefit, as we’re running one query (to fetch the role definitions), unserializing its result, looping through the roles and checking for a cap, then querying the users for that role.

If only it were that easy. We store a user’s roles and capabilities in a serialized array in usermeta. That means we have to again leave SQL with every user’s usermeta value, unserialize it, and check if the user has the role. Core does employ a hack in a few spots for performance reasons, by using LIKE "%editor%" against serialized data. This very unpoetic code is the epitomy of the overly complex capabilities system.

But, we’ll no longer be storing capabilities, or multiple roles. Thus, we can now have our usermeta value be a simple text value with the name of the role. Two example queries:

SELECT * FROM $wpdb->users u INNER JOIN $wpdb->usermeta m WHERE u.id = m.user_id AND m.meta_key = 'wp_capabilities' AND m.meta_value = 'Editor';

SELECT user_id FROM $wpdb->usermeta WHERE meta_key = 'wp_capabilities' AND m.meta_value = 'Editor';

Now that is poetic code, and it would scale much better in an environment with many users. (When I mean many, I mean many, many thousands.) And I’ll point out here that there have been suggestions for a new table, to get this out of usermeta entirely.

The proposed overhaul is controversial, which explains its postponement for multiple releases, but appears to be gaining traction for 3.1. As the wp-hackers thread progressed, I suggested that the proposed overhaul could always end up being watered down a bit before entering core as a compromise, notwithstanding rather solid support from the core developers for the proposal on the table.

That said, I cannot imagine that user-specific capabilities would remain in core. If there is a compromise, it would end up being multiple roles, because we could still avoid serialized data, by using the same meta key multiple times, with different meta values. The one thing I don’t believe we would want to do is take an overly complicated system and oversimplify it, especially given the growth and scale of WordPress as a CMS. If we can sanely implement multiple roles in the new schema, keeping them may be something to consider.

The counter-argument is that we have a chance to simplify the capabilities API to make it manageable, and we should take advantage of that. We could sanely build a core plugin for role management. Multiple roles not supported in the user interface, nor would they be supported in such a plugin, leaving them to be the only piece of the API that would not be exposed.

That brings me to a final question: Is anyone actually using multiple roles? Because I’m noticing that WP_User::remove_role() was broken and WP_User::add_role() had problems as well. Both were fixed in the 3.0 development cycle after having issues for a few versions. Makes you wonder.