Table of Contents
Mirror servers
This page is about planning, creating and maintaining a mirror server network for load balancing GeoGebra website and deployment. Unfortunately, this page became somewhat confusing for now. If you are new to the topic, please go to MirrorServersFAQ instead.
List of possible mirror servers
Currently the following institutions and people plan to contribute with their hardware and knowledge for such a network:
- Seoul National University, Republic of Korea, Myung-Soo Kim <mskim@…> and Han Cho <hancho@…>
- Beijing Normal University, China, Cao Yiming <caoym@…>
- Southern Illinois University, USA, Lingguo Bu <lgbu@…>
- University of Luxembourg, Luxembourg, Yves Kreis <yves@…>
- Johannes Kepler University, Austria, Markus Hohenwarter <markus@…>
- University of Szeged, Hungary, Zoltan Kovacs <zoltan@…>
Resolving http overload by using static mirror servers
We already use Google Code for the installer files (http://code.google.com/p/geogebra/downloads/list), but for the *.jar files we need to have a set of dedicated servers. (Alternatively, the Google Code server may also be used for load balancing, see below.)
The servers will be aliases for jars.geogebra.org by using DNS round robin. Currently the DNS list configuration can be changed by Yves.
Prerequisities
Currently the JAR files are served after the following http requests initiated by a client:
- Webstart. A JNLP file (which is offered to download) is controlling which additional files are to download from a given web directory (<codebase> tag). (We could speed up webstart if we can use a different directory, e.g. jnlp/. This is required not to mess up things with applet start, see below.) Recent Java clients can ask .jar.pack.gz files instead of original .jar files, but all files must be in the same directory.) See http://download.oracle.com/javase/6/docs/technotes/guides/jweb/tools/pack200.html#pack200JNLP for details.
- Applet start. The HTML file is controlling what to download from a given web directory, as above, but the web server can catch the Accept-Encoding part of the HTTP request sent by the client web browser. By using this technique (which is described at http://download.oracle.com/javase/1.5.0/docs/guide/deployment/deployment-guide/pack200.html) the client can tell that it is capable of uncompressing .pack.gz files, and the original .jar files can be transferred in the highly compressed .jar.pack.gz format instead. Currently this is done by using .var files on the server side on www.geogebra.org the following way: Apache is configured to check for *.jar.var files first by using the AliasMatch statement from mod_alias, then a .var file describes what type of file is to be served if the Accept-Encoding part contains pack200-gzip and what if not (see http://httpd.apache.org/docs/current/content-negotiation.html about content negotiation). The first step to make this substitution properly, the mod_alias module is to be used in the sites/ configuration directory of Apache2:
<IfModule mod_alias.c> AliasMatch /webstart/4.0/((unsigned/)?[^/]*).jar "/Library/WebServer/GeoGebra/webstart/4.0/$1.jar.var" AliasMatch /webstart/4.2/((unsigned/)?[^/]*).jar "/Library/WebServer/GeoGebra/webstart/4.2/$1.jar.var" AliasMatch /webstart/5.0/((unsigned/)?[^/]*).jar "/Library/WebServer/GeoGebra/webstart/5.0/$1.jar.var" </IfModule>
This should be unified like something this:
<IfModule mod_alias.c> AliasMatch /webstart/([^/]*)/((unsigned/)?[^/]*).jar "/Library/WebServer/GeoGebra/webstart/$1/$2.jar.var" </IfModule>
To implement this on a mirror server as well, Apache must be configured the same way by using two Apache modules.
To serve the *.jar files in the highly compressed format (for .../packed subdirectories) the Java client on the user's side must be at least version 6 update 10. Earlier clients will still get the unpacked versions for webstart (see the wiki page for Unsigned GeoGebra Applets), but Java 5 clients also use the smaller file for applet start.
This means that (currently) 25 MB of data must be copied to each mirror server during each deployment. Assuming that the bandwidth is 10 Mbps towards each mirror server (and we can do it in a parallel way), 20 seconds are needed to update all mirror servers to the newest version. This is still acceptable. (During the update the *.jar files may be inconsistent between the two versions. Currently the same behavior occurs for each deployment even for only one server, too, since we moved to the Linz development server.)
Test suite on Google Code
Since 3.2 has no more new versions, we can use its set of JAR files for final versions of the latest JAR files. (Google Code does not support automated deletion and overwriting of existing files.) Now the following files have been uploaded to Google Code (from version 3.2.47.0, signed sets from packed/ and unpacked/):
-rw-rw-r-- 1 422244 2011-06-03 19:47 geogebra_cas.jar -rw-rw-r-- 1 197078 2011-06-03 19:48 geogebra_cas.jar.pack.gz -rw-rw-r-- 1 339615 2011-06-03 19:47 geogebra_export.jar -rw-rw-r-- 1 100352 2011-06-03 19:48 geogebra_export.jar.pack.gz -rw-rw-r-- 1 520453 2011-06-03 19:47 geogebra_gui.jar -rw-rw-r-- 1 233350 2011-06-03 19:48 geogebra_gui.jar.pack.gz -rw-rw-r-- 1 26371 2011-06-03 19:47 geogebra.jar -rw-rw-r-- 1 17666 2011-06-03 19:48 geogebra.jar.pack.gz -rw-rw-r-- 1 2473 2011-09-21 10:58 geogebra.jnlp -rw-rw-r-- 1 756300 2011-06-03 19:47 geogebra_main.jar -rw-rw-r-- 1 256886 2011-06-03 19:48 geogebra_main.jar.pack.gz -rw-rw-r-- 1 887627 2011-06-03 19:47 geogebra_properties.jar -rw-rw-r-- 1 685279 2011-06-03 19:48 geogebra_properties.jar.pack.gz
The JNLP file is the same as on www.geogebra.org, but the codebase is changed to
<!-- Specialized for Google Code --> <jnlp spec="1.0+" codebase="http://geogebra.googlecode.com/files/" href="geogebra.jnlp">
Now the following link, http://geogebra.googlecode.com/files/geogebra.jnlp, will use the uploaded files from Google Code, and if the Java RE client supports it, the packed files will be downloaded. This can also do some kind of mirroring of the old 3.2 version.
Suggested setup and logic for a mirror server
Idea 1: One way communication (pulling)
A simple web server (Apache2) on a simple Linux operating system should work. To periodically check for new versions, a script is suggested with a forever running loop (sleeping 5 seconds between checks).
To check for new versions on the main server (158.64.76.83, geogebra.uni.lu) the mirror server should check the following URL: http://www.geogebra.org/webstart/4.2/unpacked/version.txt. (The same method is used by the GeoGebra application as well.) Of course, "4.2" can be changed for the newest version.
The wget command can be used for both downloading the newest *.jar files and the version.txt file. When the version.txt file is newer than the local version, the http://www.geogebra.org/webstart/4.2/filelist.txt file must be downloaded. This contains a list of the needed files. (No files are needed from the debug/ directory.) The required files must be downloaded and being available for http download as simple http://PUT.MIRROR.SERVER.IP.HERE/webstart/4.2/directory/filename.jar files. We are going to provide a simple Linux script which does this job.
The strength of this method are that only one-way communication is required and no change is needed in the deployment logic. So it is quite quick to setup a mirror server this way. Yves always has to add a new entry to the DNS list for a new mirror server. Assuming 5 seconds of sleeping, and 10 Mbps of bandwidth between every 2 servers, the full update is about 1 minute currently: 20 seconds for copying the new files from the development server to the main server, 5 seconds is for sleeping, 20 seconds for synchronizing. This can be quicker if the mirror server directly looks for the *.jar files on the developer server (then the result is 25 seconds). However, if there are many mirror servers, it is a bit ugly that all of them send http requests in each 5 seconds.
Idea 2: Two way communication (pushing)
A simple web server (Apache2) on a simple Linux operating system, and in addition, an ssh service and an rsync application is also required. Apache2 has mod_alias and mod_negotation built in by default on Debian Linux 6, but the AliasMatch directive must be set.
On each deploy the development server in Linz (140.78.96.5, geogebra.idm.jku.at) copies all required files to the mirror servers.
The strength of this method is that the update time will be only 20 seconds (assuming 10 Mbps of bandwidth between every 2 servers), and much less http request are needed. The deployment logic must be somewhat modified for that (it also has to check Yves's DNS list for all mirror servers, but it is possible by using the host command for example). However, we need login accounts for each mirror server and some more software is to be installed.
Crazy JNLP files
It seems that the JNLP files must be created very carefully to make it possible to have a central JNLP file and the JARs could come from various mirrors. Here is a working solution:
<?xml version="1.0" encoding="utf-8"?>
<!-- JNLP File for GeoGebra WebStart Application -->
<jnlp spec="1.0+" codebase="http://www.geogebra.org/webstart/4.2" href="geogebra.jnlp">
<information>
<title>GeoGebra</title>
<vendor>GeoGebra Inc.</vendor>
<homepage href="http://www.geogebra.org/"/>
<description>Dynamic Mathematics for Everyone</description>
<icon href="http://IMAGESERVER/webstart/4.2/geogebra32.gif" width="32" height="32"/>
<icon href="http://IMAGESERVER/webstart/4.2/geogebra64.gif" width="64" height="64"/>
<offline-allowed/>
<shortcut online="true">
<desktop/>
<menu submenu="GeoGebra"/>
</shortcut>
<association mime-type="application/vnd.geogebra.file" extensions="ggb" description="GeoGebra File"/>
<association mime-type="application/vnd.geogebra.tool" extensions="ggt" description="GeoGebra Tool"/>
<related-content href="http://www.geogebra.org/">
<title>www.geogebra.org</title>
<description>www.geogebra.org</description>
</related-content>
<related-content href="http://www.geogebra.org/forum/">
<title>GeoGebra User Forum</title>
<description>GeoGebra User Forum</description>
<icon href="http://IMAGESERVER/webstart/4.2/forum.gif" width="16" height="16"/>
</related-content>
<related-content href="http://www.geogebra.org/en/wiki/">
<title>GeoGebraWiki (International)</title>
<description>GeoGebraWiki (International)</description>
<icon href="http://IMAGESERVER/webstart/4.2/wiki.jpg" width="16" height="16"/>
</related-content>
</information>
<information locale="de">
<description>Dynamische Mathematik für Alle</description>
<offline-allowed/>
<related-content href="http://www.geogebra.org/de/wiki/">
<title>GeoGebraWiki (Deutsch)</title>
<description>GeoGebraWiki (Deutsch)</description>
<icon href="http://IMAGESERVER/webstart/4.2/wiki.jpg" width="16" height="16"/>
</related-content>
</information>
<security>
<all-permissions/>
</security>
<update check="background" policy="prompt-update"/>
<resources> <property name="jnlp.packEnabled" value="true"/>
<j2se version="1.5.0+" max-heap-size="1000m" href="http://java.sun.com/products/autodl/j2se"/>
<jar href="http://JARSMIRROR/webstart/4.2/jnlp/geogebra.jar" main="true"/>
<jar href="http://JARSMIRROR/webstart/4.2/jnlp/geogebra_main.jar"/>
<jar href="http://JARSMIRROR/webstart/4.2/jnlp/geogebra_gui.jar"/>
<jar href="http://JARSMIRROR/webstart/4.2/jnlp/geogebra_properties.jar"/>
<jar href="http://JARSMIRROR/webstart/4.2/jnlp/geogebra_export.jar"/>
<jar href="http://JARSMIRROR/webstart/4.2/jnlp/geogebra_cas.jar"/>
</resources>
<application-desc main-class="geogebra.GeoGebra"/>
</jnlp>
I put "JARSMIRROR" instead of "jars.geogebra.org" everywhere. "IMAGESERVER" can be both "www.geogebra.org" or "jars.geogebra.org", as needed. (It depends on we want to host the image files on our own or let them to the mirror servers.)
Important: the "href" attribute must be present in the "jnlp" tag. Without it a very strange Java exception will come on the client side for the first download of the JNLP file. (The second, third, ... downloads are working, however.) The "codebase" attribute is also needed, however we must tell the full (absolute) web URL later as well (which is different).
Now the JNLP file can be put into e.g. http://www.geogebra.org/webstart/4.2/geogebra-42.jnlp. Since the JNLP file is not used anything than the webstart, we can use the superstart webstart in place from now on.
Consensus
Currently (2011-09-12, 2011-09-27) we (Markus, Mike, Zoltan) think that a mirror server should work the way described in Idea 2.
What to do if you decided to help us by offering a mirror server
First of all: Thank you for that! We really appreciate high bandwidth mirroring for the JAR files. Here are some details on what to do then.
Detailed information on setting up a mirror server
(This is only a recommendation, you can use any compatible systems which give similar results as described below.)
- Install Debian Linux 6.
- Install the following packages: apache2, openssh-server, rsync. (However rsync is currently not used yet, please install it for future convenience.)
- Edit /etc/apache2/sites-available/default and add the following line before the line containing </VirtualHost> (which should be the last line):
AliasMatch /webstart/([^/]*)/((unsigned/)?[^/]*).jar "/var/www/webstart/$1/$2.jar.var"
- For the same file, add the following 4 lines at the end of the <Directory /var/www/> section (just before </Directory>):
<Files *.pack.gz> AddEncoding pack200-gzip .jar RemoveEncoding .gz </Files>
- Now please restart the Apache2 server by entering service restart apache2 (or by /etc/init.d/apache2 restart).
- Create the user geogebra by typing adduser geogebra and set up a password. Then by root, please enter mkdir /var/www/webstart; chown geogebra /var/www/webstart.
Further requests after you set up your mirror server
Please provide us the following details of your installation:
- The IP address of your host machine.
- The login account for SSH (by default this is geogebra, if you followed the steps above). Please provide a port number (22 by default) and/or SSH private keys, passphrases and passwords if needed (normally not needed if you followed the recommendation above). We are going to copy our own RSA public key to your server (which has no passphrase).
- The planned bandwidth for HTTP (download from your server) and SSH (upload to your server).
- The filesystem directory for the /webstart folder (by default this is /var/www/webstart if you followed the steps above).
Please inform us about a planned downtime always (if possible, at least a week earlier). Thank you for your help in advance!
Questions to be discussed
As of 2011-11-04, 2 test mirror servers are set up:
- 140.78.96.5 (ServerLinzDev), member of the round robin DNS list as well
- 59.64.59.60 (in Beijing, China), not member of the DNS list yet
- Currently 4.2 webstart is mirrored to Linz as well. Is the 4.2 webstart stable enough to add mirroring for 4.0? (See statistics here with visitor/access credentials: http://geogebra.idm.jku.at/munin/geogebra/geogebra/index.html, http://geogebra.idm.jku.at/webalizer. Webalizer statistics at response code 404 shows that #1554 was an issue, but the Linz server seems to work properly.)
- By using jars.geogebra.org for serving the JARs for every JNLP file we can cause a slowdown if the China server cannot be reached within a reasonable time (e.g. 35 seconds from UK, 55 seconds from Austria for the full download). Shouldn't we create a geogebra-42-china.jnlp and geogebra-42-linz.jnlp and geogebra-42-luxembourg.jnlp to make a distinction for those people who want to be sure that they can run the webstart the fastest way? We can even do some extra logic by using MaxMind's GeoLite Country and offer the closest server at http://www.geogebra.org/cms/*/download. By using different JNLPs for different regions, we can still manage the JNLP files on our own. However, for applet start, we should describe how to use different mirror servers for different regions to keep applet loading as fast as possible.
- (Clear your webstart cache first if you already tried this link before.) Please test your connection to China: http://www.geogebra.org/webstart/4.2/geogebra-china.jnlp
- (Clear your webstart cache first if you already tried this link before.) Please test your connection to Linz: http://www.geogebra.org/webstart/4.2/geogebra-linz.jnlp
- (Clear your webstart cache first if you already tried this link before.) Please test your usual connection to Luxembourg: http://www.geogebra.org/webstart/4.2/geogebra-luxembourg.jnlp
Attachments
-
GoogleCode.png
(186.9 KB) -
added by zoltan 8 months ago.
Google Code server *.ja.pack.gz files automatically when the Java RE client is recent enough

