Prerequisites
This post assumes you're running Ubuntu linux (or at least a Debian) and that you have both the apache Httpd server as well as the Pentaho BI server installed.
Apache HTTP Server
If you haven't got apache installed, this is your line:
$ sudo apt-get install apache2
You can then control the apache2 Http server using the
apaceh2ctl
script. For instance, to start it, do:$ sudo apache2ctl start
Once it's started you can navigate to its homepage to verify that it is running:
http://localhost/
You can stop it by running
$ sudo apache2ctl stop
If you're changing apache's configuration, you need to restart it to allow changes to take effect using this command:
$ sudo apache2ctl restart.
Java
Pentaho relies on Java. If not installed already you can get it like this:
$ sudo apt-get install openjdk-6-jdk
Pentaho BI Server
If you haven't got the Pentaho BI Server, download the latest version from sourceforge, and unpack the archive in some location you find convenient. (For development purposes I simply keep and run it in a subdirectory of my home directory)
You can start the pentaho BI Server by
cd
-ing into the biserver-ce
directory and then run:$ ./start-pentaho.sh
You can then navigate to its homepage:
http://localhost:8080/pentaho/Home
(Simply navigating to
http://localhost:8080
will automatically redirect you there too).It can be useful to monitor the log while it's running:
$ tail -f tomcat/logs/catalina.out
If you want to change something in Pentaho's configuration, you need to stop the server and then restart it. This is done by running:
$ ./stop-pentaho.sh
Configuring Proxy support for Apache
Boris Kuzmanovic wrote an excellent post to setting up proxy support for Apache. My summary (and adjustment) follows below.
First, change the apache configuration to load the required proxy modules:
$ sudo a2enmod proxy
$ sudo a2enmod proxy_http
Then, edit any site definitions to use the proxy. I just modified the default site definition:
$ sudo geany /etc/apache2/sites-enabled/000-default
Inside the
<VirtualHost>
section, I added these snippets immediately above the </VirtualHost>
that ends the section:
<Location /pentaho/>
ProxyPass http://localhost:8080/pentaho/
ProxyPassReverse http://localhost:8080/pentaho/
SetEnv proxy-chain-auth
</Location>
<Location /pentaho-style/>
ProxyPass http://localhost:8080/pentaho-style/
ProxyPassReverse http://localhost:8080/pentaho-style/
SetEnv proxy-chain-auth
</Location>
After making these changes, we need to restart apache:
$ sudo apache2ctl restart.
These two
<Location>
directives are now effectively tunneled to the respective locations on the Pentaho BI Server, and vice versa, the response is passed back.Using mod_proxy_ajp instead of proxy_http
While the regular HTTP proxy simply works, there is a better, more thightly integrated solution. The regular HTTP proxy basically handles HTTP requests received by the Apache Httpd server by sending a new, equivalent HTTP request, through to the tomcat server. Likewise, Tomcat's HTTP response is then send back as a new equivalent HTTP response to the source of the original, initial request.
So, that's twice a transport over HTTP.
Things can be improved by routing the incoming HTTP request to the tomcat server using a binary protocol called the AJP (Apache JServ) protocol. (For a detailed comparison, see this excellent comparison between HTTP/HTTPS and AJP.)
Fortunately, the steps to setup an AJP proxy are almost identical to those for setting up a regular HTTP proxy. First, enable the ajp proxy module:
$ sudo a2enmod proxy
$ sudo a2enmod proxy_ajp
(Note that the
proxy
module was already enabled as part of setting up the regular http proxy. The line is repeated here for completeness, but not necessary if you completed the steps for setting up support for the regular http proxy. You can enable either or both the proxy_http
and the proxy_ajp
modules, and both require the proxy
module.)Then, we edit again the site configuration to use the proxy. Since the locations
/pentaho/
and /pentaho-style/
were already used, we first comment those out:
#<Location /pentaho/>
# ProxyPass http://localhost:8080/pentaho/
# ProxyPassReverse http://localhost:8080/pentaho/
# SetEnv proxy-chain-auth
#</Location>
#<Location /pentaho-style/>
# ProxyPass http://localhost:8080/pentaho-style/
# ProxyPassReverse http://localhost:8080/pentaho-style/
# SetEnv proxy-chain-auth
#</Location>
Then we add equivalent lines going via the AJP proxy:
ProxyPass /pentaho ajp://localhost:8009/pentaho
ProxyPassReverse /pentaho ajp://localhost:8009/pentaho
ProxyPass /pentaho-style ajp://localhost:8009/pentaho-style
ProxyPassReverse /pentaho-style ajp://localhost:8009/pentaho-style
(The bit that goes
ajp://localhost:8009
refers to the ajp service that is running on port 8009 of tomcat by default.)Again we have to restart the apache service for the changes to take effect:
$ sudo apache2ctl restart.
Acknowledgements
Thanks to Paul Stöllberger, Pedro Alves and Tom Barber for valuable feedback and background information regarding AJP.
9 comments:
Cool Roland! i remember having to set this up for my website + tomcat + Redmine.... it wasn't easy :)
Love your blog! I've been doing some research into BI Technology and this has really helped. Thank you!
I am setting up a new Pentaho DI 5.3 Installation. The Server is installed in Linux Box. Dev Team have Installed Pentaho Client Tools in Windows 7 Virtual Machine. Created Central Repository in Client Machine connecting to Server
and all development are done in Repository.
This is the issue I am facing:- I have a File Location in UNIX Server /xxx/xxx where I will get and place all files from FTP Server. How do I connect to this Unix Location from Windows Client?
I tried reading the files using Text File Input giving the Unix Location directly and using Kettle Variables, But getting Error:- ****"Could not list the contents of "file:///C:/xxx/xxx" because it is not a folder."**** Does not recognize it as a Unix location.
I know I can use SSHGet and write a shell script, what are the other options that I have? && If I write a shell script, how will I give the location of script (if I am placing the script in UNIX Server /xxx/xxx).
Could you pls help me with this?
Hi Vignesh V!
If you're running PDI 5.3, you're an enterprise customer. You should be able to contact Pentaho support and they'll be glad to advise you with any of your issues.
HTH,
Roland
Hi Roland,
How to Achieve the same in for Pentaho BI server installed in Windows OS.
Hi Rejo,
well all the steps are the same essentually - you'll just have to find equivalent software for windows. Also, instead of managing apache through the apachectl script, you'd probably use the windows services dialog. And instead of bash commands, you'd use windows commandshell.
You really should be able to figure it out by reapplying thie information in my blog to windows, with maybe a little help from google search.
Good luck!
I did reverse proxy with apache2(HTTPS) and Pentaho 7.1. Everything work perfect but when I was trying to login with basic auth it will go loop relogin dialog.
@Anonymous, that's a pity. Nowadays, nginx is my favorite webserver. Proxy is real easy there - see:
https://docs.nginx.com/nginx/admin-guide/web-server/reverse-proxy/
Not sure if it solves your problem, but give it a try...
It's 2020, and the AJP Proxy was still the answer for most proxy problems i had with Pentaho, thank you so much!
Post a Comment