Wednesday, May 04, 2016

ODXL - A generic Data Export Layer for SAP/HANA based on OData

I'm very pleased to be able to announce the immediate availability of the Open Data Export Layer (ODXL) for SAP/HANA!

Executive summary

ODXL is a framework that provides generic data export capabilities for the SAP/HANA platform. ODXL is implemented as a xsjs Web service that understands OData web requests, and delivers a response by means of a pluggable data output handler. Developers can use ODXL as a back-end component, or even as a global instance-wide service to provide clean, performant and extensible data export capabilities for their SAP/HANA applications.

Currently, ODXL provides output handlers for comma-separated values (csv) as well as Microsoft Excel output. However, ODXL is designed so that developers can write their own response handlers and extend ODXL to export data to other output formats according to their requirements.

ODXL is provided by Just BI to the SAP/HANA developer community as open source software under the terms of the Apache 2.0 License. This means you are free to use, modify and distribute ODXL. For the exact terms and conditions, please refer to the license text.

The source code is available on github. Developers are encouraged to check out the source code and to contribute to the project. You can contribute in many ways: we value any feedback, suggestions for new features, filing bug reports, or code enhancements.

If you require professional support for ODXL, please contact Just-BI for details.

What exactly is ODXL?

ODXL started as an in-house project at the Just-BI department of custom development. It was borne from the observation that the SAP/HANA web applications that we develop for our customers often require some form of data export, typically to Microsoft Excel. Rather than creating this type of functionality again for each project, we decided to invest some time and effort to design and develop this solution in such a way that it can easily be deployed as a reusable component. And preferably, in a way that feels natural to SAP/HANA xs platform application developers.

What we came up with, is a xsjs web service that understands requests that look and feel like standard OData GET requests, but which returns the data in some custom output format. ODXL was designed to make it easily extensible so that developers can build their own modules that create and deliver the data in whatever output format suits their requirements.

This is illustrated in the high-level overview below:



For customers of Just-BI, there is an immediate requirement to get Microsoft Excel output. So, we went ahead and implemented output handlers for .xlsx and .csv formats, and we included those in the project. This means that ODXL supports data export to the .xlsx and .csv formats right out of the box.

However, support for any particular output format is entirely optional and can be controlled by configuration and/or extension:
  • Developers can develop their own output handlers to supply data export to whatever output format they like.
  • SAP/HANA Admins and/or application developers can choose to install only those output handlers they require, and configure how Content-Type headers and OData $format values map to output handlers.

So ODXL is OData? Doesn't SAP/HANA suppport OData already?

The SAP/HANA platform provides data access via the OData standard. This facility is very convenient for object-level read- and write access to database data for typical modern web applications. In this scenario, the web application would typically use asynchronous XML Http requests, and data would be exchanged in either Atom (a XML dialect) or JSON format.

ODXL's primary goal is to provide web applications with a way to export datasets in the form of documents. Data export tasks typically deal with data sets that are quite a bit larger than the ones accessed from within a web application. In addition, an data export document might very well compromise multiple parts - in other words, it may contain multiple datasets. The typical example is exporting multiple lists of different items from a web application to a workbook containaing multiple spreadsheets with data. In fact, the concrete use case from whence ODXL originated was the requirement to export multiple datasets to Microsoft Excel .xlsx workbooks.

So, ODXL is not OData. Rather, ODXL is complementary to SAP/HANA OData services. That said, the design of ODXL does borrow elements from standard OData.

OData Features, Extensions and omissions

ODXL GET requests follow the syntax and features of OData standard GET requests. Here's a simple example to illustrate the ODXL GET request:
GET "RBOUMAN"/"PRODUCTS"?$select=PRODUCTCODE, PRODUCTNAME& $filter=PRODUCTVENDOR eq 'Classic Metal Creations' and QUANTITYINSTOCK gt 1&$orderby=BUYPRICE desc&$skip=0&$top=5
This request is build up like so:
  • "RBOUMAN"/"PRODUCTS": get data from the "PRODUCTS" table in the database schema called "RBOUMAN".
  • $select=PRODUCTCODE, PRODUCTNAME: Only get values for the columns PRODUCTCODE and PRODUCTNAME.
  • $filter=PRODUCTVENDOR eq 'Classic Metal Creations' and QUANTITYINSTOCK ge 1: Only get in-stock products from the vendor 'Classic Metal Creations'.
  • $orderby=BUYPRICE desc: Order the data from highest price to lowest.
  • $skip=0&$top=5: Only get the first five results.
For more detailed information about invoking the odxl service, check out the section about the sample application. The sample application offers a very easy way to use ODXL for any table, view, or calculation view you can access and allows you to familiarize yourself in detail with the URL format.

In addition, ODXL supports the OData $batch POST request to support export of multiple datasets into a single response document.

The reasons to follow OData in these respects are quite simple:
  • OData is simple and powerful. It is easy to use, and it gets the job done. There is no need to reinvent the wheel here.
  • ODXL's target audience, that is to say, SAP/HANA application developers, are already familiar with OData. They can integrate and use ODXL into their applications with minimal effort, and maybe even reuse the code they use to build their OData queries to target ODXL.
ODXL does not follow the OData standard with respect to the format of the response. This is a feature: OData only specifies Atom (an XML dialect) and JSON output, whereas ODXL can supply any output format. ODXL can support any output format because it allows developers to plug-in their own modules called output handlers that create and deliver the output.

Currently ODXL provides two output handlers: one for comma-separated values (.csv), and one for Microsoft Excel (.xlsx). If that is all you need, you're set. And if you need some special output format, you can use the code of these output handlers to see how it is done and then write your own output handler.

ODXL does respect the OData standard with regard to how the client can specify what type of response they would like to receive. Clients can specify the MIME-type of the desired output format in a standard HTTP Accept: request header:
  • Accept: text/csv specifies that the response should be returned in comma separated values format.
  • Accept: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet specifies that the response should be returned in open office xml workbook format (Excel .xlsx format).
Alternatively, they can specify a $format=<format> query option, where <format> identifies the output format:
  • $format=csv for csv format
  • $format=xlsx for .xlsx format
Note that a format specified by the $format query option will override any format specified in an Accept:-header, as per OData specification.

ODXL admins can configure which MIME-types will be supported by a particular ODXL service instance, and how these map to pluggable output handlers. In addition, they can configure how values for passed for the $format query option map to MIME-types. ODXL comes with a standard configuration with mappings for the predefined output handlers for .csv and .xlsx output.

On the request side of things, most of OData's features are implemented by ODXL:
  • The $select query option to specify which fields are to be returned
  • The $filter query option allows complex conditions restricting the returned data. OData standard functions are implemented too.
  • The $skip and $top query options to export only a portion of the data
  • The $orderby query option to specify how the data should be sorted
ODXL currently does not offer support for the following OData features: The features that are currently not supported may be implemented in the future. For now, we feel the effort the implement them and adequately map their semantics to ODXL may not be worth the trouble. However, an implementation can surely be provided should there be sufficient interest from the community.

Installation

Use ODXL presumes you already have a SAP/HANA installation with a properly working xs engine. You also need HANA Studio, or Eclipse with the SAP HANA Tools plugin installed. The steps are a little bit different, depending on whether you just want to use ODXL, or whether you want to actively develop the ODXL project.

Here are the steps if you just want to use ODXL, and have no need to actively develop the project:
  1. In HANA Studio/Eclipse, create a new HANA xs project. Alternatively, find an existing HANA xs project.
  2. Find the ODXL repository on github, and download the project as a zipped folder. (Select a particular branch if you desire so; typically you'll want to get the master branch)
  3. Extract the project from the zip. This will yield a folder. Copy its contents, and place them into your xs project directory (or one of its sub directories)
  4. Activate the new content.
After taking these steps, you should now have a working ODXL service, as well as a sample application. The service itself is in the service subdirectory, and you'll find the sample application inside the app subdirectory.

The service and the application are both self-contained xs applications, and should be completely independent in terms of resources. The service does not require the application to be present, but obviously, the application does rely on being able to call upon the service.

If you only need the service, for example, because you want to call it directly from your own sample application, then you don't need the sample application. You can safely copy only the contents of the service directory and put those right inside your project directory (or one of its subdirectories) in that case. But even then, you might still want to hang on to the sample application, because you can use that to generate the web service calls that you might want to do from within your application.

If you want to hack on ODXL then you might want to fork or clone the ODXL github repository. If you do this inside a SAP/HANA xs project, or if you create a project pointing to that location, you can then deploy that to SAP/HANA and use that to send pull requests in case you want to contribute your changes back into the project.

Getting started with the sample application

To get up and running quickly, we included a sample web application in the ODXL project. The purpose of this sample application is to provide an easy way to evaluate and test ODXL.

The sample application lets you browse the available database schemas and queryable objects: tables and views, including calculation views (or at least, their SQL queryable runtime representation). After making the selection, it will build up a form showing the available columns. You can then use the form to select or deselect columns, apply filter conditions, and/or specify any sorting order. If the selected object is a calculation view that defines input parameters, then a form will be shown where you can enter values for those too.

In the mean while, as you're entering options into the form, a textarea will show the URL that should be used to invoke the ODXL service. If you like, you can manually tweak this URL as well. Finally, you can use one of the download links to immediately download the result corresponding to the current URL in either .csv or .xlsx format.

Alternatively, you can hit a button to add the URL to a batch request. When you're done adding items to the batch, you can hit the download workbook button to download as single .xlsx workbook, containing one worksheet for each dataset in the batch.

What versions of SAP/HANA are supported?

We initially built and tested ODXL on SPS9. The initial implementation used the $.hdb database interface, as well as the $.util.Zip builtin.

We then built abstraction layers for both database access and zip support to allow automtic fallback to the $.db database interface, and to use a pure javascript implementation of the zip algorithm based on Stuart Knightley's JSZip library. We tested this on SPS8, and everyting seems to work fine there.

We have not actively tested earlier SAP/HANA versions, but as far as we know, ODXL should work on any earlier version. If you find that it doesn't, then please let us know - we will gladly look into the issue and see if we can provide a solution.

Why Open Source? What's the Business Model? What's the catch?

For Just BI, Open Source software is not a business model, but a development model. While some companies build a successful business model around selling custom code code, this is currently not Just-BI's primary goal. Rather, Just-BI is a consulting company that focuses mainly on Business Intelligence solutions around the SAP ecosystem. Our areas of expertise include Business Objects, SAP BW, SAP HANA, as well as custom BI (web) Helping customers by providing solutions for their business problems is Just-BI's primary concern - not selling code.

However, we do acknowledge that sometimes, custom code plays an essential role in building a business solution for our customers. In these cases, we will gladly help our customers to design, build and deploy such solutions. But even in these cases we will try to look for standard component toolkits, like SAP UI5, or frameworks like Angular as a basis for our work.

The urge to standardize on familiar, well known toolkits and libraries hardly needs justification. In the end, customers don't have the end goal of acquiring and owning too many custom coded solutions, because today's hot new custom solution is tomorrow's legacy. The more a customer relies on custom code, the harder it will become to maintain and to move forward.

Sometimes, a particular building block that we need for applications may not be publicly available already. If such a building block is sufficiently generic (i.e., not bound to any particular customer) then we have every reason to want that to become a standard. For a generic and reusable component like ODXL, we believe that an open source model is the right way to do that.

We think that an open source development model will help maintain and advance ODXL. By using an open source release and development model, we have potentially more eyes to scrutinize our code, find bugs, suggest features, etc. In addition we hope our customers will feel more confident to embrace an open source solution, since they need not be locked into only our company for support and ongoing development.

How to Contribute

If you want to, there are many different ways to contribute to ODXL.
  1. If you want to suggest a new feature, or report a defect, then please use the github issue tracker.
  2. If you want to contribute code for a bugfix, or for a new feature, then please send a pull request. If you are considering to contribute code then we do urge you to first create an issue to open up discussion with fellow ODXL developers on how to best scratch your itch
  3. If you are using ODXL and if you like it, then consider to spread the word - tell your co-workers about it, write a blog, or a tweet, or a facebook post.
Thank you in advance for your contributions!

Finally

I hope you enjoyed this post! I hope ODXL will be useful to you. If so, I look forward to getting your feedback on how it works for you and how we might improve it. Thanks for your Time!

4 comments:

Anonymous said...
This comment has been removed by a blog administrator.
Unknown said...

Roland,

Have recently purchased both of our books and was wondering - are you looking at updating these to show what has changed now that Pentaho is up to version 6?

Regards
Tom

rpbouman said...

Hi Tom!

thank you for you interest in the Pentaho books. It is true that both books would benefit from an update. Unfortunately, it is not up to us authors to decide - that is the publisher's prerogative. So far, Wiley has not expressed an interest in an update.

Unknown said...

thanks

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...