En:PhpBB

Aus YaCyWiki
Version vom 1. Januar 2008, 13:13 Uhr von MiTreD (Diskussion | Beiträge)
(Unterschied) ← Nächstältere Version | Aktuelle Version (Unterschied) | Nächstjüngere Version → (Unterschied)
Wechseln zu: Navigation, Suche

PhpBB 2.x uses something called "sid=" to keep track of pages.

YaCy should learn to strip phpBB's sid= when doing crawls, so

http://www.rechenkraft.net/phpBB/index.php?sid=988f7cf9b9491ca5c258ca359fc67e85

simply becomes

http://www.rechenkraft.net/phpBB/index.php

I've never seen anyone use ?=sid or &=sid to actually specify of switch between content.

phpBB owners solution

Use the "enhance-google-indexing" MOD, http://www.phpbb.com/phpBB/viewtopic.php?t=32328

The only "mod" is this:

#-----[ OPEN  ]------------------------------------------
includes/sessions.php

#-----[ FIND ]------------------------------------------
global $SID;

if ( !empty($SID) && !preg_match('#sid=#', $url) )

#-----[ REPLACE WITH ]------------------------------------------

global $SID, $HTTP_SERVER_VARS;

if ( !empty($SID) && !preg_match('#sid=#', $url) 
&& !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'Googlebot') 
&& !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'slurp@inktomi.com;'))

#
#-----[ SAVE/CLOSE ALL FILES ]------------------------------------------
#
# EoM

just add

&& !strstr($HTTP_SERVER_VARS['HTTP_USER_AGENT'] ,'yacybot') 

and the phpBB is all set.

forum discussion

There's a discussion at the german YaCy-Forum regarding detection and removing of session IDs:

Meine Werkzeuge
Namensräume
Varianten
Aktionen
Gemeinschafts-Portal
Navigation
Werkzeuge