Monday, December 02, 2019

Building a UI5 Demo for SAP HANA Text Analysis: Part 4

This is the last of a series of blogposts describing a simple web front end tool to explore SAP HANA's Text Analysis features on documents uploaded by the user. As a reminder, the following overview outlines all the posts in the series: In the previous post we presented the sample application, explained its functionality, and concluded by pointing to the github repository and the installation instructions so that you may run the application on your own HANA system.

In this post, we'll explain in detail how the upload functionality works from the UI5 side of things

Uploading Files and Binary data to HANA .xsodata services uing UI5

In the first installment of the series, options and concerns were discussed on the topic of loading the (binday) document content into the SAP HANA database. We chose to use an OData service. In this installment, we'll go into fairly deep detail how to implement a file upload feature backed by a HANA .xsodata service using the UI5 front-end framework.

Some notes on the UI5 Application Implementation

Before we discuss any particular details of the implementation of the UI5 application, it is necessary to point out that this particular application is demoware. Many typical patterns of UI5 application development were omitted here: there is no internationalization, and no modules or dependency injection with sap.ui.define(). There is not even a MVC architecture, so no XML views or controllers; no Component configuration, and no application descriptor (manifest.json).

Instead, the application consists of just a single index.html, which contains 2 <script> tags:
<script 
  src="https://sapui5.hana.ondemand.com/1.71.5/resources/sap-ui-core.js"
  id="sap-ui-bootstrap"
  data-sap-ui-libs="
    sap.ui.core,
    sap.ui.layout,
    sap.ui.unified,
    sap.ui.table,
    sap.ui.commons, 
    sap.m
  "
  data-sap-ui-theme="sap_bluecrystal"
>
</script>
<script src="index.js" type="text/javascript"></script>
The first one bootstraps ui5, and the second one loads index.js, which contains the implementation.

The main reason for this rather spartan approach is that the primary goal of me and my colleagues Arjen and Mitchell was to quickly come up with a functional prototype that demonstrates the file upload feature. Although I have grown used to a more orhtodox UI5 boilerplate, it was a distraction when it came to just quickly illustrating an upload feature. Once we built the upload feature, I wanted to see how easy it would be to augment it and make it somewhat useful application, and I was kind of interested to experience how it would be to carry on using this unorthodox, pure-javascript approach.

There's much more that could be said about this approach but it's another topic. So for now: if you're new to UI5 and want to learn more: don't take this application as an example, it's atypical. And if you are an experienced UI5 developer: now you have the background, let's move on to the essence.

Communication with the backend using an OData Model

Before we get to the topic of building and controlling the upload feature, a couple of words should be said about how UI5 applications can communicate with their (xsodata) backend.

In UI5, we needn't worry about the exact details of doing the backend call directly. Rather, UI5 offers an object that provides javascript methods that take care of this. This object is the model. In our application, the model is an instance of the sap.ui.model.odata.v2.ODataModel, which we instantiated somewhere in the top of our index.js:
var pkg = document.location.pathname.split('/').slice(1, -2);
var odataService = [].concat(pkg, ['service', 'ta']);  

/**
 * OData
 */
var modelName = 'data';
var model = new sap.ui.model.odata.v2.ODataModel('/' + odataService.join('/') + '.xsodata', {
  disableHeadRequestForToken: true
});
It's not necessary to go over the model instantiation in detail - for now it is enough to know that upon instantiation, the model is passed the uri of the .xsodata service we already built. We obtain the url in the code preceding the model instantiation by taking the url of the current webpage and building a path to service/ta.xsodata relative to that location:
var pkg = document.location.pathname.split('/').slice(1, -2);
var odataService = [].concat(pkg, ['service', 'ta']);  

Uploading a file: high level client-side tasks

From a functional point of view, the web app (client) there's two distinct tasks to be considered:
  • Building the user interface so the user can select the file to upload.
  • Loading the file contents into the database.
The first high-level task is strictly a matter of user interaction and is more or less independent from how the second high level task is implemented. For the second high-level task, we already have the backend in place - this is the OData service we built in the second installment of this blogpost series. What remains is how to do this from within UI5.

But already, we can break down this task in two subtasks:
  • Extracting the content from the chosen file. Once the user has chosen a file, they have only identified the thing they want to upload. The web app does not need to parse or understand the file content, but it does need to extract the data (file content) so it can send it to the server.
  • Sending the right request to the backend. The request will somehow include the contents extracted from the file, and it will have such a form that the server understands what to do with those contents - in this case, store it in the a table for text analysis.

A UI5 File Upload control

For the file upload user interaface, we settled on the sap.ui.unified.FileUploader control. Here's the relevant code from index.js that instantiates the control:
var fileUploader = new sap.ui.unified.FileUploader({
  buttonText: 'Browse File...',
  change: onFileToUploadChanged
  busyIndicatorDelay: 0
});
The sap.ui.unified.FileUploader control is presented to the user as a input field and a button to open a file chooser. This lets the user browse and pick a file from their client device.

In addition, the sap.ui.unified.FileUploader control provides events, configuration options and methods to validate the user's choice, and to send the file off to a server. For example, you can set the uploadUrl property to specify where to send the file to, and there's an upload() method to let the control do the request.

As it turns out, most of this addition functionality did not prove to be very useful for the task at hand, because the request we need to make is quite specific, and we didn't really find a clear way of configuring the control to send just the right request. Perhaps it is possible, and then we would be most obliged to learn how.

What we ended up doing instead is to only use the file choosing capabilities of the sap.ui.unified.FileUploader control. To keep track of the user's choice, we configured a handler for the change event, which gets called whenever the user chooses a file, or cancels the choice.

The handler does a couple of things:
  • Determine wheter a file was chosen. If not, the Upload confirmation button gets disabled so the user can only either retry choosing a file, or close the upload dialog.
  • If a file is chosen, a request is sent to the backend to figure out if the file already exists.
  • Depending upon whether the file already exists, the state of the upload dialog is set to inform the user of what action will be taken if they confirm the upload.
Let's go over these tasks in detail. First, validating the user's choice by checking if the user did in fact choose a file:
var fileToUpload;
var fileToUploadExists;
function onFileToUploadChanged(event){
  fileToUpload = null;
  fileToUploadExists = false;
  var files = event.getParameter('files');
  if (files.length === 0) {
    initFileUploadDialog();
    return;
  }
  fileToUpload  = files[0];
  
  ...more code here...
}
Note that we set up the fileToUpload variable to keep track of the user's choice. We need to keep track of it somewhere, since the choosing of the file and the upload are separate tasks with regards to the UI: choosing the file happens when the user hits the Browse button provided by the sap.ui.unified.FileUploader control, wheras the upload is triggered by hitting the confirm button of the upload dialog.

When the user is done choosing the file, the sap.ui.unified.FileUploader will fire the change event, and our handler onFileToUploadChanged() gets called and passed the event as an argument. This event provides access to the FileList object associated with the file chooser:
  var files = event.getParameter('files');
Note: the FileList is not part of UI5. Rather, it is one of a number of brower built-in objects, which together form the Web File API. We would have loved to obtain the FileList or the File object from our sap.ui.unified.FileUploader control directly by using a getter or something like that, but at the time we found no such method, and settled for a handler in the change event.

Once we have the FileList, we can check whether the user selected any files, and either disable the upload confirmation button (if no file was selected), or assign the chosen file to our fileToUpload variable so we can refer to it when the upload is confirmed:
function onFileToUploadChanged(event){

  ...

  if (files.length === 0) {
    initFileUploadDialog();
    return;
  }
  fileToUpload = files[0];

  ....

}
If we pass the check, our variable fileToUpload will now contain the File object reflecting the user's choice. (Note that this object too is not a UI5 object, it's also part of the Web File API.)

Note that in theory, the list of files associated with the sap.ui.unified.FileUploader could have contained more than one file. But the default behavior is to let the user choose only one file. You can override that behavior by setting the sap.ui.unified.FileUploader's multiple property to true. Because we know that in this case, there can be at most only one file, we only need to check whether there is a file or not - there's no need to consider muliple files.

Checking whether the File was already Uploaded

Once we know for sure the user has chosen a file, it remains to be determined what should be done with it should the user decide to confirm the upload. To help the user decide whether they should confirm, we send a request to the backend to find out if the file was already uploaded:
function onFileToUploadChanged(event){

  ...

  fileToUpload  = files[0];
  fileUploader.setBusy(true);
  model.read('/' + filesEntityName, {
    filters: [new sap.ui.model.Filter({
      path: fileNamePath,
      operator: sap.ui.model.FilterOperator.EQ,
      value1: fileToUpload.name
    })],
    urlParameters: {
      $select: [fileNamePath, 'FILE_LAST_MODIFIED']
    },            
    success: function(data){

      ...update state depending upon whether the file exists...

    },
    error: function(error){

      ...update state to inform the user of an error...

    }
  });

  ....

}
The model provides a read() method which can be used to query the backend OData service. The first argument to the read() method is the so-called path, which identifies the OData EntitySet we want to query. In this case, we are interested in the Files EntitySet, as this corresonds to our CT_FILE database table in our backend. Because we use the name of the Files EntitySet in a lot of places, we stored it in the filesEntityName variable. So, our path becomes:
'/' + filesEntityName
Apart from the path, the read() method takes a second argument, which is an object of query options. We'll highlight the few we need here.

Because we only want to know whether the backend already has a file with the same name as the one the user just selected , we add a parameter to restict the search. This is done with the filters option:
    filters: [new sap.ui.model.Filter({
      path: fileNamePath,
      operator: sap.ui.model.FilterOperator.EQ,
      value1: fileToUpload.name
    })],
The filters option takes an array of sap.ui.model.Filter objects. When we instantiate the sap.ui.model.Filter object, we pass an object with the following configuration options:
  • path - this should get a value that refers to a property defined by the OData entity type of this Entity Set. It corresponds to the name of a column of our database table. In this case, it is set to fileNamePath, which is a variable we initialized with 'FILE_NAME', i.e., the name of the column in the CT_FILE table that holds the name of our files.
  • value1 - this should be the literal value that we want to use in our filter. In this case, we want to look for files with the same name as the file chosen by the user, so we set it to the name property of the File object that the user selected - fileToUpload.name
  • operator - this should be one of the values defined by the sap.ui.model.FilterOperator object, which defines how the given filter value should be compared to the value of the column. In this case the operator is sap.ui.model.FilterOperator.EQ, which stands for an equals comparison. By using this operator, we demand that the value of the column should be exactly the same as the name of the chosen file.
There is one other option specified that affects the request:
    urlParameters: {
      $select: [fileNamePath, 'FILE_LAST_MODIFIED']
    },            
This specifies for which columns we want to retrieve the values from the backend. It may be omitted, but in that case, all columns would be returned. Often this will not be a problem, but in this case, we really want to prevent the server from returning the values for the FILE_CONTENT column. Always retrieving the file contents would be an unnessary burden for both the front- and the backend so we actively suppress the default behavior. The only columns requested here are FILE_NAME and FILE_LAST_MODIFIED. The latter is currently unused but might come in handy to provide even more information to the user so they can better decide whether they want to re-upload an existing file.

The remaining options in the call to the model's read() method have nothing todo with the request, but are callback functions for handling the result of the read request. The error callback gets called if there is some kind of issue with the request itself - maybe the backend has gone away, or maybe the structure of the service changed. The success callback is called when the read request executes normally, and any results are then passed to it as argument. This is even true if no results are found - the callback then simply gets passed an empty list of results.
In our example, the main purpose of the success callback is to flag whether the file already exists, and to update the state of the file uploader accordingly to inform the user. The existence of the file is flagged by assigning the fileToUploadExists variable, and we will see its significance in the next section where we discuss the implementation of the upload of the file contents.

Handling the Upload

We've just seen exactly how the UI5 application can let the user choose a file, and we even used our model to check whether the chosen file is already present in the backend. Once these steps are done, we now have successfully initialized two variables, fileToUpload and fileAlreadyExists. This is all we need to handle the upload.

In the application, the user initiates the upload by clicking the Confirmation Button of the uploadDialog. This then triggers the button's press event, where we've attached the function uploadFile as handler.

So, in this handler, we must examine the value of the fileAlreadyExists variable and take the appopriate action:
  • If fileAlreadyExists is false, we should tell our model to add a new item. This is done by calling the createEntry()-method
  • If fileAlreadyExists is true, we should tell our model to update the existing item. This is done by calling the update()-method

The path argument

Both methods take a path as first argument to indicate where the new item should be added, or which item to update.

When adding a new item, the path is simply the path of the Files Entity Set with the model:
'/' + filesEntityName
(Note: this is exactly the same as the path we used in the read() call to figure out whether the file already exists.)

The path for updating an existing item also starts with the path of the Entity Set, but includes the key to identify the item that is to be updated. Lucky for us, the sap.ui.model.odata.v2.ODataModel model provides the createKey() method which constructs such a path, including the key part, based on the values of the properties that make up the key. So, the code to construct the path for the update method becomes:
'/' + model.createKey(filesEntityName, {"FILE_NAME": fileToUpload.name})
(For more detailed information about how OData keys and paths work, see ODAta Uri Conventions, in particular the section on "Adressing Entries".)

The payload

In addition to the path, we also need to pass the data, which is sometimes referred to as the payload. While the path tells the model where to add or update an item, the payload specifies what should be added or updated.

Now, even though the UI documentation is not very specific about how to construct to payload, we have used the createEntry() and update() methods of the sap.ui.model.odata.v2.ODataModel in the past without any problems. It is normally quite intuitive and hasslefree: you simply specify an Object, and specify keys that match the property names of the target entity set, and assign JavaScript values, just as-is. So, if we disregard the FILE_CONTENT field for a moment, the payload for the Files entity set could be something like this:
var payload = {
  "FILE_NAME": fileToUpload.name,
  "FILE_TYPE": fileToUpload.type,
  "FILE_LAST_MODIFIED": new Date(fileToUpload.lastModified),
  "FILE_SIZE": fileToUpload.size,
  "FILE_LAST_UPLOADED": new Date(Date.now())
};
Let's compare this to the data types of the corresponding properties in the entity type of the entity set:
<EntityType Name="FilesType">
  <Key>
    <PropertyRef Name="FILE_NAME"/>
  </Key>
  <Property Name="FILE_NAME" Type="Edm.String" Nullable="false" MaxLength="256"/>
  <Property Name="FILE_TYPE" Type="Edm.String" Nullable="false" MaxLength="256"/>
  <Property Name="FILE_LAST_MODIFIED" Type="Edm.DateTime" Nullable="false"/>
  <Property Name="FILE_SIZE" Type="Edm.Int32" Nullable="false"/>
  <Property Name="FILE_CONTENT" Type="Edm.Binary" Nullable="false"/>
  <Property Name="FILE_LAST_UPLOADED" Type="Edm.DateTime" Nullable="false"/>
</EntityType>
(Note: this entity type is taken from the $metadata document of our service.)

So, in short - there is a pretty straightforward mapping between the JavaScript runtime values and the Edm Type System (see: "6. Primitive Data Types") used by OData: JavaScript Strings may be assigned to Edm.Strings, JavaScript Date objects may be assigned to Edm.DateTimes, and JavaScript Numbers may be assigned to Edm.Int32s.

This is less trivial than one might think when one considers what happens here: one the one hand, we have the types as declared by OData service, which are Edm Types. Then, we have to consider the content type used to transport the payload in the HTTP request: OData services may support several content types, and by default OData supports both application/atom+xml and application/json.So, when starting with a payload as a JavaScript runtime object, this first needs to be converted to an equivalent representation in one of these content types (UI5 uses the JSON representation) by the client before it can be sent off to the OData service in a HTTP request.

It bears repeating that this is not a simple, standard JSON serialization, since the type system used by the JSON standard only knows how to represent JavaScript Strings, Numbers, Booleans (and arrays and objects containing values of those types). The native JSON type system is simply too minimal to represent all the types in the Edm Type system used by OData, hence the need for an extra JSON representation format. The sap.ui.model.odata.v2.ODataModel does a pretty remarkable job in hiding all this complexity and making sure things work relatively painless.

Representing FILE_CONTENT in the payload

Now for the FILE_CONTENT property. In the entity type, we notice that the data type is Edm.Binary. What would be the proper JavaScript runtime type to construct the payload?

We just mentioned that normally, the mapping from JavaScript runtime types is usually taken care of by the sap.ui.model.odata.v2.ODataModel. So we might be tempted to simply pass the File object itself directly as value for the FILE_CONTENT property. But when we call either the createEntry or update method with a payload like this:
var payload = {
  "FILE_NAME": fileToUpload.name,
  "FILE_TYPE": fileToUpload.type,
  "FILE_LAST_MODIFIED": new Date(fileToUpload.lastModified),
  "FILE_SIZE": fileToUpload.size,
  "FILE_LAST_UPLOADED": new Date(Date.now()),
  "FILE_CONTENT": fileToUpload
};
we get an error in the response:
The serialized resource has an invalid value in member 'FILE_CONTENT'.
So clearly, the sap.ui.model.odata.v2.ODataModel needs some help here.

One might assume that the problem has to do with the File object being a little bit too specific for UI5 - after all, a File object is not just some binary value, but it is a sublclass of the Blob object, which has all kinds of file-specific properties of itself. However, assigning a proper, plain Blob object in the payload yields exactly the same result, so that's not it either.

Instead of continuing to experiment with different values and types, we took a step back and took a look at the OData specification to see if we could learn a bit more about the Edm.Binary type. In the part about the JSON representation (See: "4. Primitive Types") we found this:
Base64 encoded value of an EDM.Binary value represented as a JSON string
It seems to suggest the whatever thing that represents the Edm.Binary value need to be Base64 encoded, which yields a string value at runtime, and this string may then be serialized to a JSON string. So, if we could make a Base64 encoded string value of our binary value, we could assign that in the payload. (We already saw that sap.ui.model.odata.v2.ODataModel will turn JavaScript String values to a JSON representation so we don't have to do that step ourselves.)

Fortunately, it's easy to create Base64 encoded values. The browser built-in function btoa() does this for us.

However, we're not there yet, as the spec starts with a binary value, and JavaScript does not have a binary type (and hence, no binary values).

We then took a look at the specification to find out exactly what a Edm.Binary value is. We found something in the section about Primitive Datatypes on how to create literal Edm.Binary Values:
binary'[A-Fa-f0-9][A-Fa-f0-9]*' OR X '[A-Fa-f0-9][A-Fa-f0-9]*' 
NOTE: X and binary are case sensitive. 
Spaces are not allowed between binary and the quoted portion. 
Spaces are not allowed between X and the quoted portion. 
Odd pairs of hex digits are not allowed.

Example 1: X'23AB' 
Example 2: binary'23ABFF'
At this point the thinking was that we could take the bytes that make up the binary value and convert it to its hexadecimal string representation, single-quote it the resulting hex string, and finally prepend either X or binary to it. At runtime, this would then be a JavaScript string value reprenting an Edm.Binary literal, which we could then turn into its Base64 encoded value, and send assign to the payload.

When we went this route, the error message went away, and sure enough, documents started to show up in our backend table. Unfortunately, the documents ended up there as Edm.Binary literals, that is, as strings that are an accurate Edm.Binary literal representation of the document but otherwise useless.

At this point the solution was clear though - just leave out the intermediate step of converting the original value to an Edm.Binary literal.

The uploadFile function

Remember, at this point we have the File object stored in the fileToUpload variable, and a flag fileToUploadExists is set to true or false depending upon whether the file is already stored in the backend table. This is code we ended up with for uploading the file:
function uploadFile(){
  var fileReader = new FileReader();
  fileReader.onload = function(event){
    var binaryString = event.target.result;
    var payload = {
      "FILE_NAME": fileToUpload.name,
      "FILE_TYPE": fileToUpload.type,
      "FILE_LAST_MODIFIED": new Date(fileToUpload.lastModified),
      "FILE_SIZE": fileToUpload.size,
      "FILE_CONTENT": btoa(binaryString),
      "FILE_LAST_UPLOADED": new Date(Date.now())
    };
    if (fileToUploadExists) {
      model.update(
        '/' + model.createKey(filesEntityName, {
          "FILE_NAME": fileToUpload.name
        }), 
        payload
      );
    }
    else {
      model.createEntry('/' + filesEntityName, {
        properties: payload
      });
    }
    model.submitChanges({
      success: function(){
        closeUploadDialog();                
      }
    });
  };
  fileReader.readAsBinaryString(fileToUpload);
}
As explained earlier, uploading the file breaks down into 2 subtasks, and this handler takes care of both:
  • First, we use the FileReader to read the contents of the File object
  • Then, we send it to the backend. To do that, construct the path and the payload, and call either the createEntry or the update method based on whether the file already exists, passing the path and the payload.

Using the FileReader to read the contents of a File object

First, we need to read the contents of the file. We do that using a FileReader, which is also part of the Web File API. To get the contents of a File object, we can call one of the FileReader's read methods.

The FileReader'sread methods do not return the contents of the file directly: the Web File API is mostly asynchronous. Instead, we have to attach an event handler to the FileReader which can respond to the FileReader's events. In this case we overrided the FileReader's onload() method, which gets called when the FileReader is done reading a File. (Instead of the override, we could also have attached a handler with addEventListener but it really doesn't matter too much how the handler is attached.)

Once set up, we can now call a read() method and wait for the reader to call our onload() handler.

So the general structure to read the file is as follows:
function uploadFile(){
  var fileReader = new FileReader();
  fileReader.onload = function(event){

    var binaryString = event.target.result;

    ...do something with the file contents...

  };
  fileReader.readAsBinaryString(fileToUpload);
}


We already mentioned the FileReader provides a number of different read methods, and the chosen method determines the type of the value that will be available in event.target.result by the time the load handler is called. Today, the FileReader provides:To figure out which method we should use, we should consider how our backend expects to receive the data. Or rather, how our sap.ui.model.odata.v2.ODataModel wants us to pass the data so it can do the appropriate call to the backend. In a previous section we already explained our struggle to figure out how to represent a Edm.Binary value in the payload, and based on those findings, readAsBinaryString() is the appropriate method. With this read method, the FileReader turns each individual byte of the file contents in to a JavaScript character, much like the fromCharCode()-method of the String object would do. The resulting value is a JavasScript binary string: each character represents a byte.

Note that this is very different from what the readAsText() method would do: that would attempt to read the bytes as if they are encoded characters in UTF-8 encoding, in other words it would result in a character string, not a binary string.

After obtaining the file contents as binary string, we can apply the Base64 encoding and assign it to the payload:
    var payload = {
      "FILE_NAME": fileToUpload.name,
      "FILE_TYPE": fileToUpload.type,
      "FILE_LAST_MODIFIED": new Date(fileToUpload.lastModified),
      "FILE_SIZE": fileToUpload.size,
      "FILE_CONTENT": btoa(binaryString),
      "FILE_LAST_UPLOADED": new Date(Date.now())
    };

Summary

This concludes the last installment in this blog series. In this post we learned how to use: And by putting these elements together we created a File upload feature for a UI5 application, backed by a HANA OData service.

Odds and Ends

It's nice that we finally found out how to write the file contents to our OData Service. But something does not feel quite right. Although we happened to find a way to write the binary data that satisfies both the sap.ui.model.odata.v2.ODataModel as well as the SAP HANA .xsodata service that backs it, we still haven't found any official documentation, either from the OData specification or from SAP that confirms that this is really the correct way. We would hope that SAP HANA's .xsodata implementation is a faithful implementation of the standard, but for the Edm.Binary type, I'm just not 100% sure. If anybody could chime in and confirm this, and preferably point me to something in the OData specification that confirms this, then I would be most grateful.

1 comment:

Nerfx said...

Interesting information, thank you for sharing. Btw, merry christmas and happy new year. May God bless you and your family. Greetings...

DuckDB Bag of Tricks: Reading JSON, Data Type Detection, and Query Performance

DuckDB bag of tricks is the banner I use on this blog to post my tips and tricks about DuckDB . This post is about a particular challenge...