1 ---+ TWPC, aka theTWiki Public Cache
2 Colas Nahaboo http://colas.nahaboo.net
3 This readme is a "under the hood" work document. The official page is at
4 http://twiki.org/cgi-bin/view/TWiki/PublicCacheAddOn
5 mercurial repo http://hg.colas.nahaboo.net/twiki-colas/twpc
6 SVN via http://develop.twiki.org/~twiki4/cgi-bin/view/Bugs/Item5551
9 * pccr Cache Reader, shell version. Slower
10 * pccr.c C version for speed
11 * pcbd Cache Builder, called by pccr on cache misses
12 * pccl Cache Cleaner, run by crontab to clear cache after edits
13 * pcad ADmin commands, web based
14 * pcal Log analyzer to determine best settings of twpc from past usage
16 * pcge script to build all pages, called by pcad
17 * PublicCacheAddOn.txt User/admin Documentation as a twiki page
18 * PublicCachePlugin.pm PublicCachePlugin.txt perl module to trigger cache
19 invalidation on topic change
20 * README_TWPC.txt this file, internal dev info
21 * install uninstall make-pc-config: installation management
22 * make-distrib make-hg-revision: build system for dev
23 Generated files by install:
24 * vief is a copy of the original TWiki bin/view used to build cached pages
25 bin/view is replaced bin pccr.
26 a backup is made in pc-view-backup, just in case...
28 * pc-config a "compilation" of lib/LocalSite.cfg settings
29 * pc-options keep track of last used options on install
32 * cache resides in working/public_cache/cache
33 * inside, there are one folder per web, same name
34 * and files with radix the topic name, and extensions:
35 * .tx uncompressed plain version (including CGI HTTP header)
36 * .gz same, compressed (including CGI HTTP header)
37 * .nc nocache: do not attempt to cache it
38 * .lk lock file: the cache is being (re)built by a process
39 * at cache root, directory _tmp holds temporary files used to build caches
40 in named process_id + extensions:
41 * .raw raw output of TWiki, then uncompressed cache
42 * .mod modified output
43 * .gz compressed cache
44 * at cache root, directory _changers contains the IPs (one file per IP,
45 named as the IP) of editors. The file has the modification time of last
47 * at cache root, directory _expire contains web/topic empty files whose
48 date indicates the time at which the cache should be removed by pccl
49 for this page (the file have thus a date in the close future)
50 * cache clear is done by moving cache into cache.a_number, and removing it
51 30 seconds after, to avoid race conditions and errors that removing a
52 directory under the feet of build processes could cause
55 * in the same dir as twiki log files (data/)
56 * if -q was not given, logs cache hits in the normal twiki logs
57 with user agent cached,gzip or cached
58 * twpc-debug.txt logs lots of misc info for debug, only in -v was
60 * twpc-warnings.txt logs abnormal, but not fatal, conditions:
61 * LOCK_TIMEOUT pcbd waited to long and decided to break log
62 * LOCK_MISSING some race condition occurred
63 * NOT_BUILT_ERR building attemp resulted in an error other than
66 In case of twpc update:
67 * if view is pccr, that means we have a working twpc install
69 * if no pccr file, or view is not pccr, we have a normal/updated twiki
70 * we copy all files, mv view to vief, copy pccr to view
72 Debug messages tracing various steps in data/twpc-debug.txt: (warning: this
74 * HIT file: cache hit (HIT_GZ for gzipped)
75 * BYPASS_QS url: cache ignored as we have a query string ?x=y in url
76 * BYPASS_NC url: cache ignored as url was marked as not cacheable
78 * BUILT url: cache build for url
79 * NOT_BUILT_ERR url: error in getting URL, marking it as not cacheable
80 * NOT_BUILT_AUTH url: URL read-protected, marking it as not cacheable
81 * WAITED n url: waited n seconds for a previous build
82 * MISS: cache miss, followed by either BUILT or NOT_BUILT
83 * WAIT id n url: waits for lock for n seconds
86 * in a pccr web request, we may end up calling another url on same TWiki
87 by wget: we could thus deadlock the server if all
88 the requests are stuck this way.
89 Advise user to raise the number of apache children. However, this should
90 never happen in actual cases, and anyway apache will timeout eventually.
91 * link in view to edit?t=%GMTIME{"$epoch"} would normally render the pages
92 uncachables (would get dirty each second). but it appears that browsers
93 do not cache as soon as there is a query string so we dont care
94 to provide this functionality
95 * install/update/uninstall clears the whole cache, we don't try to
96 determine the ones that really are dirty. better safe than sorry.
99 with --compressed will use gzip
101 i=1000;while let 'i-->0';do curl --compressed -s http://wikidev.nahaboo.org/TWiki/TWikiVariables >/dev/null& done
103 PCCR ALGORITHM VERSIONS
104 * v1 header is in file. tries in order ?query, .gz, .tx, .nc
105 * v2 when editing our IP is marked as a "changer"
106 * views from this IP bypasses cache
107 * after a timeout "cleargrace" (default 17 mn) with no more edit from
108 this IP, cache is reset, if all editors have also not edited for
109 at least "cleargracemin" (default 3mn)
110 * v3 introduced the PCACHEEXPTIME TWiki tag
111 * v4 used the PUBLIC_CACHE_EXPIRE TWiki var
114 * can it be installed and manage the cache without being active?
115 * see if other modules can store ntheir cache in twpc dir
116 * can trigger external command on cache clear?
117 * ? obey if-modified-since
118 * should work on sites with .pl extensions
119 * pcbd could cd to cache first, to avoid half building things if a cache
120 clear happens in mid-build
121 * pcad clear should be callable from cli,
122 * Plugin should use it directly, optionally use wget for mod_perl
123 * scripts could call it to trigger a change (write) e.g. blog-generate
124 * document how other modules/scripts could use the cache
125 * pcge -v should not list private pages?
126 * just after login we are redirected to vief
128 * logs, uncacheable pages, expires. some terse stats moved in menu
129 * stats menu then holds more detailed stats: stats per web
130 decoding it from wget
131 * make-distrib should
133 * deploy Todo & Implementation ,txt pages as wiki pages
135 * option -s space-efficient: only store gzipped version, unzip on the
136 demand. For C, use zlib to inflate.
137 * generational cache: if we know we are doomed, where to build new pages?
140 * pccr: if a changer use cache=cache_changers, including pcbd calls
141 * plugin: on write, clear cache_changers, create a new
142 * on changers expire, clear cache, mv cache_changers as cache
143 * pcad command to clean all cache pages older than ...
144 * C version: make 2 versions, one checking for changer IP and one not
145 make PublicCachePlugin install the first, and cache clear the 2nd
146 variant: change a byte in executable binary
149 * ? see if we can get the mime-type header from the View.pm patch instead of
150 * ? option for let logged people passthrough cache (how to detect them?)
151 * ? put twpc files into a dir other than bin/? cgi/? (but what about view?)
152 * ? optional expire header
153 * ? background crawling process to add & refresh an expire header to the
154 cached pages, for the "ok now the site is final" moment
155 * ? make an apache-based pccr, with rewite rules? see:
156 * http://mail-archives.apache.org/mod_mbox/httpd-users/200701.mbox/%3C1C80FD8A7D2B2745B0396F4D2D0565B401AE4C6D@apwmsg01.alc.ca%3E
157 * ? make a proper generic TrackChangesPlugin and use it: can call hooks,
158 logs unix style: linenum isodate who action web.topic IP [attachment]
159 list all actions (call to writeLog). convert script. per day?
160 * ? check we could force cache in one language / localisation?
161 * ? cache directive in html comments in pages? (to set Expire per page)