Posted: Wed Mar 07, 2007 11:00 pm Post subject: Helpful Tip To Speed Up Forums
- Index page
By default phpBB collects all topics that are new to see which forum have been updated since your last visit. When user post count approached 50,000 posts a day this process took more than 10 seconds to load. Eventually the process was dropped, and a cheat was put in-place.
This was taken out:
Code:
if ( $userdata['session_logged_in'] )
{
$sql = "SELECT t.forum_id, t.topic_id, p.post_time
FROM " . TOPICS_TABLE . " t, " . POSTS_TABLE . " p
WHERE p.post_id = t.topic_last_post_id
AND p.post_time > " . $userdata['user_lastvisit'] . "
AND t.topic_moved_id = 0";
if ( !($result = $db->sql_query($sql)) )
{
message_die(GENERAL_ERROR, 'Could not query new topic information', '', __LINE__, __FILE__, $sql);
}
The cheat is the one line right after $forum_data[] = $row;
Code:
$sql = "SELECT f.*, p.post_time, p.post_username, u.username, u.user_id
FROM (( " . FORUMS_TABLE . " f
LEFT JOIN " . POSTS_TABLE . " p ON p.post_id = f.forum_last_post_id )
LEFT JOIN " . USERS_TABLE . " u ON u.user_id = p.poster_id )
ORDER BY f.cat_id, f.forum_order";
if ( !($result = $db->sql_query($sql)) )
{
message_die(GENERAL_ERROR, 'Could not query forums information', '', __LINE__, __FILE__, $sql);
}
It just end up checking to see if the latest post is posted after the user's last visit time. Quick and dirty change.
- Users Online
The index page also scans the users plus session tables for active users. When online users reaches close to a thousand this check becomes costly in time. This isn't even a hack, I ended up having an external script on a cron job to run the exact query on a per minute basis. Result is stored in a hash table, and phpBB then queries the hash table for the list of on-line users.
This code was taken out
Code:
$user_forum_sql = ( !empty($forum_id) ) ? " WHERE session_page = " . intval($forum_id) : '';
$sql = "SELECT u.username, u.user_id, u.user_allow_viewonline, u.user_level, s.session_logged_in, s.session_ip FROM ".USERS_TABLE." u, ".SESSIONS_TABLE." s WHERE u.user_id = s.session_user_id AND s.session_time >= ".( $board_config['time_now'] - 300 ) . " $user_forum_sql ORDER BY u.username ASC, s.session_ip ASC";
And replaced with a query that grabs everything from a hash table:
Perhaps this is one of the more interesting hack and more useful to everyone. When fetching a particular page within a topic, there is a large query that fetches the user info, post info, and post contents. I had the believe that we should always join queries to avoid latency so I never touched that query. As it turns out, when a thread gets bigger and bigger, a large joined query with the where and sort by clause will take a ton of memory to process. If we're looking for the last page of a topic with 1000 replies, MYSQL grabs the post text and user data from all the posts, then splits out the last 15. (or whatever posts each page shows)
So, instead of one query containing:
Code:
$sql = "SELECT u.username, u.user_id, u.user_posts, u.user_from, u.user_website, u.user_email, u.user_icq, u.user_aim, u.user_yim, u.user_regdate, u.user_msnm, u.user_viewemail, u.user_rank, u.user_sig, u.user_sig_bbcode_uid, u.user_avatar, u.user_session_time, u.user_allow_viewonline, u.user_allowsmile, p.*, pt.post_text, pt.post_subject, pt.bbcode_uid
FROM " . POSTS_TABLE . " p, " . USERS_TABLE . " u, " . POSTS_TEXT_TABLE . " pt
WHERE p.post_id in p.topic_id = $topic_id $limit_posts_time
AND pt.post_id = p.post_id
AND u.user_id = p.poster_id
ORDER BY p.post_time $post_time_order
LIMIT $start, ".$board_config['posts_per_page'];
Now start off with
Code:
$p_array = array();
$sql = "SELECT p.post_id FROM " . POSTS_TABLE . " p WHERE p.topic_id = $topic_id $limit_posts_time LIMIT $start, " . $board_config['posts_per_page'];
if ( !($result = $db->sql_query($sql)) )
{
message_die(GENERAL_ERROR, "Could not obtain post index information.", '', __LINE__, __FILE__, $sql);
}
while (list($p_id) = $db->sql_fetchrow($result)) {
$p_array[] = $p_id;
}
$post_index = implode(",",$p_array);
A list of post_id are obtained, now I slip it into the large query
Code:
$sql = "SELECT u.username, u.user_id, u.user_posts, u.user_from, u.user_website, u.user_email, u.user_icq, u.user_aim, u.user_yim, u.user_regdate, u.user_msnm, u.user_viewemail, u.user_rank, u.user_sig, u.user_sig_bbcode_uid, u.user_avatar, u.user_session_time, u.user_allow_viewonline, u.user_allowsmile, p.*, pt.post_text, pt.post_subject, pt.bbcode_uid
FROM " . POSTS_TABLE . " p, " . USERS_TABLE . " u, " . POSTS_TEXT_TABLE . " pt
WHERE p.post_id in ($post_index)
AND pt.post_id = p.post_id
AND u.user_id = p.poster_id
ORDER BY p.post_time $post_time_order";
And all is well. The difference is not noticeable on small systems. The latency in having 2 queries might actually make itself slower in some cases. Though on a forum with 3.3M posts, the difference is 5 minutes vs 1 minute on a topic with 70,000 replies. There is a performance increase even on topics with as little as a few hundred replies. Overall it saves memory, which made everyone happy.
_________________ Join me on FaceBook = symon field
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum