tag:blogger.com,1999:blog-153193702024-03-14T12:35:19.120+01:00Roland Bouman's blogProgramming - Databases - Analyticsrpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.comBlogger264125tag:blogger.com,1999:blog-15319370.post-18697418649928594772024-01-03T12:29:00.007+01:002024-01-03T12:29:54.284+01:00UI5 Tips: Persistent UI StateThis tip provides a way to centrally manage UI state, and to persist it - automatically and without requiring intrusive custom code sprinkled through your apps.
<h1>The UI State</h1>
Many ui5 controls and widgets allow some aspect of their appearance or behavior to be changed by the user. For example, a panel may be collapsed or expanded, a tab may be selected, columns width in a data grid may be adjused, and so on. We call all this the ui state.
When the user restarts the app, normally, the ui state is reset: properties that were explicitly set are reinitialized to that value, and properties that were not explicitly assigned will get assigned some default, which may either be a constant or some calculated value, depending on how the component is coded.
A reset of the ui state may not always be desirable. For example, if the user has to go through multiple clicks and selections before they arrive at a certain item inside the application that interests them, then it will be frustrating if they have to repeat the sequence the next time they open the application. Fortunately, for these use cases, UI5 offers <a href="https://sapui5.hana.ondemand.com/#/topic/e5200ee755f344c8aef8efcbab3308fb" rel="nofollow">routing and navigation</a>, which lets the user find content inside the application by navigation to a particular url.
However, not all ui state is about navigation. For example, the user may collapse a panel to get a bit more screen real estate, or resize the width of a column in a data grid, or toggle the state of a checkbox that controls some application-wide setting. These cases are clearly not navigational in nature, but have to do with layout and presentation. It would be confusing for the user to control this by visiting a particular url. Rather, we'd like the application to be able to retain the ui state exactly as the user left it.
In this tip we will explain a way to achieve this, and in a way that does not require any specific application code. It can all
<h1>Sample Application</h1>
The sample application for this tip is in the <a href="https://github.com/just-bi/ui5tips/tree/main/uistate"><code>uistate</code></a> directory. Simply expose the contents of the directory with your webserver and use your browser to navigate to <code>index.html</code>. A screenshot is shown below:
<img src="https://github.com/just-bi/ui5tips/raw/main/uistate/images/screenshot.png?raw=true" alt="Screenshot of the UI State App" />
<h2>Sample Application Features</h2>
The application has the following features:
<ul>
<li>A <a href="https://github.com/just-bi/ui5tips/tree/main/uistate/components/mainpage">Main page</a> with a splitter (<a href="https://openui5.hana.ondemand.com/api/sap.ui.layout.Splitter" rel="nofollow"><code>sap.ui.layout.Splitter</code></a>)</li>
<li>On the left, a <a href="https://github.com/just-bi/ui5tips/tree/main/uistate/components/sidebar">Sidebar</a> with a <a href="https://openui5.hana.ondemand.com/api/sap.ui.table.Table" rel="nofollow"><code>sap.ui.table.Table</code></a> showing the Name and Country of a list of companies.</li>
<li>On the right, a <a href="https://github.com/just-bi/ui5tips/tree/main/uistate/components/detailpage">Detail page</a>.</li>
</ul>
Users can click on a company in the sidebar to select it, and then the company will be shown in more detail in the Detail Page.
The Detail Page has some features of its own:
<ul>
<li>In the top, there's a <a href="https://openui5.hana.ondemand.com/api/sap.m.Panel" rel="nofollow"><code>sap.m.Panel</code></a> which shows the Company Name as title. The Panel is <a href="https://openui5.hana.ondemand.com/api/sap.m.Panel#methods/getExpandable" rel="nofollow"><code>expandable</code></a> and <a href="https://openui5.hana.ondemand.com/api/sap.m.Panel#methods/getExpanded" rel="nofollow"><code>expanded</code></a> by default. Inside the panel we can see the company's phone number.
Below the Panel, there's an <a href="https://openui5.hana.ondemand.com/api/sap.m.IconTabBar" rel="nofollow"><code>sap.m.IconTabBar</code></a> with 2 tabs:</li>
<li><a href="https://github.com/just-bi/ui5tips/blob/main/uistate/components/detailpage/DetailsIconTabFilter.fragment.xml">Details</a>, which shows the address of the currently selected company. This tab is also selected by default.</li>
<li><a href="https://github.com/just-bi/ui5tips/blob/main/uistate/components/detailpage/DepartmentsIconTabFilter.fragment.xml">Departments</a>, which shows the departments of the selected company.</li>
</ul>
<h2>Sample Application Demo</h2>
To test the application, try the following sequence of actions:
<ol>
<li>Use the browser to navigate to the index.html page. The sidebar should show the list op Companies, but no company will be selected yet. You can click any row in the sidebar to select a company, and if you do its details will be shown in the detailpage. For the demonstration it doesn't matter if you select one or not.</li>
<li>In the sidebar, the Name column is not wide enough to show the full company name <code>Euismod Ac Fermentum Corp.</code>. Adjust the width of the column by dragging the right end of its header to the right until the full company name is visible.</li>
<li>Also in the sidebar, the Country column is not wide enough to show the full name of the country <code>Congo, the Democratic Republic of the</code>. Adjust of that column too so the full name is visible.</li>
<li>After adjusting the column width in step 2. and 3., the sidebar will now have a horizontal scrollbar at the bottom, as the data grid is now wider than the position of the splitter grip. Drag the Splitter grip to the right so both columns of the sidebar are visible and the sidebar's horizontal scrollbar disappears again.</li>
<li>The Panel is expaned and the company's phone number is visible inside the panel. Click the button to left of the panel header title to collapse it.</li>
<li>The Details tab is selected by default. Click the Departments tab instead.</li>
</ol>
After all these actions, the application should now look something more like this:
<img src="https://github.com/just-bi/ui5tips/raw/main/uistate/images/screenshot-uistate-changed.png?raw=true" alt="Application after changing the UI State" />
If you now refresh the browser window (or even close the browser alltogether) and then revisit the application, you will notice that the selection is lost. However, the column widths, the position of the splitter grip, the collapsed state of the panel and the selected tab have all been preserved.
(You can restore the UI to the original state by pressing the Undo button in the top of the main page.)
<h1>Using the [<code>LocalStorageJSONModel</code>] to manage UI State</h1>
The UI State behavior is the result of binding all the relevant properties of the UI to a <a href="https://github.com/just-bi/ui5tips/wiki/LocalStorageJSONModel"><code>LocalStorageJSONModel</code></a>.
The model is declared in sample application's <a href="https://github.com/just-bi/ui5tips/blob/main/uistate/manifest.json"><code>manifest.json</code></a>, so it's instantiated automatically as the application starts, and becomes available throughout the application as <code>uistate</code>.
<div>
<pre> <b>"uistate"</b>: {
<b>"type": "ui5tips.utils.LocalStorageJSONModel"</b>,
"dataSource": "uistateTemplate"
}
</pre>
</div>
The model is initialized with the <code>uistateTemplate</code> data source:
<div>
<pre> <b>"uistateTemplate"</b>: {
<b>"uri": "data/uistateTemplate.json"</b>,
"type": "JSON"
}
</pre>
</div>
The datasource grabs its data from <a href="https://github.com/just-bi/ui5tips/blob/main/uistate/data/uistateTemplate.json"><code>data/uistateTemplate.json</code></a>:
<div>
<pre>{
<b>"autoSaveTimeout": 1000</b>,
"storagePrefix": "uistate",
<b>"template": {</b>
"appSettings": {
"sidebar": {
"splitterSize": "431px",
"columns": {
"name": {
"width": "190px"
},
"country": {
"width": "190px"
}
}
},
"detailpage": {
"panel": {
"expanded": true
},
"tabContainer": {
"selectedTab": "details"
}
}
}
<b>}</b>
}
</pre>
</div>
The options for the <code>LocalStorageJSONModel</code> are described in <a href="https://github.com/just-bi/ui5tips/wiki/LocalStorageJSONModel#instantiating-the-localstoragejsonmodel-from-the-manifestjson">the <code>LocalStorageJSONModel</code> wiki page</a>.
For the sample app, the <code>autoSaveTimeout</code> is relevant - for this example, the value is assigned <code>1000</code> and this means that if the state of the model is changed, there will be a <code>1000</code> ms (1 second) waiting period after which the data from the model is persisted in the Browser's local storage.
The template represents the default initial state of the UI. Please refer to the <a href="https://github.com/just-bi/ui5tips/wiki/LocalStorageJSONModel#template-and-model-initialization">the <code>LocalStorageJSONModel</code> wiki page</a> for a detailed discussion of the template and model initialization.
<h1>Managing Binding</h1>
The <a href="https://github.com/just-bi/ui5tips/wiki/LocalStorageJSONModel#data-binding"><code>LocalStorageJSONModel</code> wiki page</a> has some general remarks and clarifications about UI5 data binding. But it may not be entirely clear how to practically organize it to manage UI state. After all, even a simple example like the sample application we discuss here already has 5 distinct UI properties that the user can change.
<h2>Template Structure</h2>
One could, in principle, make one big property bag to to manage each and every UI property, and this may be the right choice if the application remains really simple. But as an application gets more features, more views and more functionality one may prefer to design a data structure that mimics the structure of the application, and that's the approach we have taken in this example.
If you look at <a href="https://github.com/just-bi/ui5tips/blob/main/uistate/data/uistateTemplate.json"><code>data/uistateTemplate.json</code></a>, you'll notice that the <code>template</code> contains a <code>applicationConfig</code> key which itself has two keys: one to maintain all settings for the <code>sidebar</code> and one for all settings for the <code>detailpage</code>:
<div>
<pre> "template": {
"appSettings": {
"sidebar": {
...
},
"detailpage": {
...
}
}
}
</pre>
</div>
To these keys, an object is assigned which may have a further hierarchical structure, depending on the UI control tree.
<h2>Template Structure and UI Tree Structure</h2>
Once we decide to structure the template hierarchically, it may be tempting to attempt to faithfully mimic the actual container/component structure of the UI in the model structure. However at this point it is my opinion that this is not necessary and not productive. The reason to warn against a too tight mapping of the UI tree to the model structure is that the UI tree is only to some extent a reflection of the functional organization of an application's parts.
A simple example from the sample application may illustrate this. For example, let's take look at the structure of the template that manages the settings for the <code>detail</code> page:
<div>
<pre> "detailpage": {
"panel": {
"expanded": true
},
"tabContainer": {
"selectedTab": "details"
}
}</pre>
</div>
Let's compare this to the ui tree of the detail page, which is defined in <a href="https://github.com/just-bi/ui5tips/blob/main/uistate/components/detailpage/DetailPage.view.xml"><code>DetailPage.view.xml</code></a>
<div>
<pre> <<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">FixFlex</span> <span class="pl-e">binding</span>=<span class="pl-s"><span class="pl-pds">"</span>{uistate>detailpage}<span class="pl-pds">"</span></span>>
<<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">fixContent</span>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">Panel</span>
<span class="pl-e">expandable</span>=<span class="pl-s"><span class="pl-pds">"</span>true<span class="pl-pds">"</span></span>
<span class="pl-e">expandAnimation</span>=<span class="pl-s"><span class="pl-pds">"</span>false<span class="pl-pds">"</span></span>
<span class="pl-e">binding</span>=<span class="pl-s"><span class="pl-pds">"</span>{uistate>panel}<span class="pl-pds">"</span></span>
<span class="pl-e">expanded</span>=<span class="pl-s"><span class="pl-pds">"</span>{uistate>expanded}<span class="pl-pds">"</span></span>
<span class="pl-e">headerText</span>=<span class="pl-s"><span class="pl-pds">"</span>{companies>CompanyName}<span class="pl-pds">"</span></span>
>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">content</span>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">Text</span> <span class="pl-e">text</span>=<span class="pl-s"><span class="pl-pds">"</span>Phone: {companies>Phone}<span class="pl-pds">"</span></span>/>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">content</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">Panel</span>>
</<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">fixContent</span>>
<<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">flexContent</span>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">IconTabBar</span>
<span class="pl-e">stretchContentHeight</span>=<span class="pl-s"><span class="pl-pds">"</span>true<span class="pl-pds">"</span></span>
<span class="pl-e">applyContentPadding</span>=<span class="pl-s"><span class="pl-pds">"</span>false<span class="pl-pds">"</span></span>
<span class="pl-e">expandable</span>=<span class="pl-s"><span class="pl-pds">"</span>false<span class="pl-pds">"</span></span>
<span class="pl-e">binding</span>=<span class="pl-s"><span class="pl-pds">"</span>{uistate>tabContainer}<span class="pl-pds">"</span></span>
<span class="pl-e">selectedKey</span>=<span class="pl-s"><span class="pl-pds">"</span>{uistate>selectedTab}<span class="pl-pds">"</span></span>
>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">items</span>>
<<span class="pl-ent">core</span><span class="pl-ent">:</span><span class="pl-ent">Fragment</span> <span class="pl-e">fragmentName</span>=<span class="pl-s"><span class="pl-pds">"</span>ui5tips.components.detailpage.DetailsIconTabFilter<span class="pl-pds">"</span></span> <span class="pl-e">type</span>=<span class="pl-s"><span class="pl-pds">"</span>XML<span class="pl-pds">"</span></span> />
<<span class="pl-ent">core</span><span class="pl-ent">:</span><span class="pl-ent">Fragment</span> <span class="pl-e">fragmentName</span>=<span class="pl-s"><span class="pl-pds">"</span>ui5tips.components.detailpage.DepartmentsIconTabFilter<span class="pl-pds">"</span></span> <span class="pl-e">type</span>=<span class="pl-s"><span class="pl-pds">"</span>XML<span class="pl-pds">"</span></span> />
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">items</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">IconTabBar</span>>
</<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">flexContent</span>>
</<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">FixFlex</span>></pre>
</div>
While we refer to it as detail page, the actual UI component that is used to implement it is a <a href="https://openui5.hana.ondemand.com/api/sap.ui.layout.FixFlex" rel="nofollow"><code>sap.ui.layout.FixFlex</code></a>. But it might just as well have been another type of container, like, say, a <a href="https://openui5.hana.ondemand.com/api/sap.m.Page" rel="nofollow"><code>sap.m.Page</code></a>. Another example might be the tabs. In this sample application we chose the <a href="https://openui5.hana.ondemand.com/api/sap.m.IconTabBar" rel="nofollow"><code>sap.m.IconTabBar</code></a> and in the future we might change that to the <a href="https://openui5.hana.ondemand.com/api/sap.m.TabContainer" rel="nofollow"><code>sap.m.TabContainer</code></a>. While these are functionally similar components, certain details, like property names and aggregation names might be different.
During normal application development, changing and rewriting the UI, swapping out particular containers for other types of containers is quite common. Often, much of the functional aspects are retained and expanded, even though the details of the implementation and choice and structuring of UI components may be quite different. If our template would mimic the UI structure too closely, we would have to modify our template also, and often without any real benefit. So the recommendation here is to make sure the template is organized according to functionality, not to the exact details of the UI tree.
<h2>Managing Binding Paths with Element Binding</h2>
Once you settle for a hierarchical template structure, the problem arises of how to deal with the paths. For example, consider the selected tab in the the detail page:
<div>
<pre> "template": {
"appSettings": {
"detailpage": {
"tabContainer": {
"selectedTab": "details"
}
}
}
}
</pre>
</div>
If we would have to write a path for this in a databinding, we would get <code>uistate>/appSettings/detailpage/tabContainer/selectedTab</code>. Obviously, in a realistic application we would have many properties and the UI code would soon become littered with these very long paths.
<a href="https://sapui5.hana.ondemand.com/1.36.6/docs/guide/91f05e8b6f4d1014b6dd926db0e91070.html" rel="nofollow">Element binding</a> is a UI5 feature that lets you bind a particular path from the model to a UI container or control, thus establishing a scope. Witin that scope, you can use relative paths, which UI5 will resolve against the path bound at the higher level.
To understand this feature fully it's best to first look at the component that sits at the top of the UI tree - <a href="https://github.com/just-bi/ui5tips/blob/main/uistate/components/app/App.view.xml"><code>App.view.xml</code></a>:
<div>
<pre> <<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">App</span>
<span class="pl-e">id</span>=<span class="pl-s"><span class="pl-pds">"</span>app<span class="pl-pds">"</span></span>
<b><span class="pl-e">binding</span>=<span class="pl-s"><span class="pl-pds">"</span>{uistate>/appSettings}<span class="pl-pds">"</span></span></b>
>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">pages</span>>
<<span class="pl-ent">mvc</span><span class="pl-ent">:</span><span class="pl-ent">XMLView</span> <span class="pl-e">viewName</span>=<span class="pl-s"><span class="pl-pds">"</span>ui5tips.components.mainpage.MainPage<span class="pl-pds">"</span></span>/>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">pages</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">App</span>></pre>
</div>
Note how the <a href="https://openui5.hana.ondemand.com/api/sap.ui.core.Element#methods/getElementBinding" rel="nofollow"><code>binding</code> property</a> on the <a href="https://openui5.hana.ondemand.com/api/sap.m.App" rel="nofollow"><code>sap.m.App</code></a> is bound to <code>uistate>/appSettings</code>. What this means is that relative bindings for the <code>uistate</code> model that occur on that component itself, but also to any components that are nested within it, will be resolved against <code>uistate>/appSettings</code>.
<a href="https://github.com/just-bi/ui5tips/blob/main/uistate/components/mainpage/MainPage.view.xml">MainPage.view.xml</a> is nested inside the <code>sap.m.App</code>. If we look at that:
<div>
<pre> <<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">Page</span> <span class="pl-e">title</span>=<span class="pl-s"><span class="pl-pds">"</span>UIState App<span class="pl-pds">"</span></span>>
...
<<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">Splitter</span>>
<<span class="pl-ent">mvc</span><span class="pl-ent">:</span><span class="pl-ent">XMLView</span> <span class="pl-e">id</span>=<span class="pl-s"><span class="pl-pds">"</span>sideBar<span class="pl-pds">"</span></span> <span class="pl-e">viewName</span>=<span class="pl-s"><span class="pl-pds">"</span>ui5tips.components.sidebar.SideBar<span class="pl-pds">"</span></span>>
<<span class="pl-ent">mvc</span><span class="pl-ent">:</span><span class="pl-ent">layoutData</span>>
<<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">SplitterLayoutData</span> <b><span class="pl-e">size</span>=<span class="pl-s"><span class="pl-pds">"</span>{uistate>sidebar/splitterSize}<span class="pl-pds">"</span></span></b>/>
</<span class="pl-ent">mvc</span><span class="pl-ent">:</span><span class="pl-ent">layoutData</span>>
</<span class="pl-ent">mvc</span><span class="pl-ent">:</span><span class="pl-ent">XMLView</span>>
<<span class="pl-ent">mvc</span><span class="pl-ent">:</span><span class="pl-ent">XMLView</span> <span class="pl-e">id</span>=<span class="pl-s"><span class="pl-pds">"</span>detailPage<span class="pl-pds">"</span></span> <span class="pl-e">viewName</span>=<span class="pl-s"><span class="pl-pds">"</span>ui5tips.components.detailpage.DetailPage<span class="pl-pds">"</span></span>/>
</<span class="pl-ent">layout</span><span class="pl-ent">:</span><span class="pl-ent">Splitter</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">Page</span>></pre>
</div>
You might notice one bound property to the <code>uistate</code> model here - it is the <code>size</code> property of the <code>sap.ui.layout.SplitterLayoutData</code> object, which is bound to <code>uistate>sidebar/splitterSize</code>. As you can see, that binding refers also to the <code>uistate</code> model, but as it does not start with a <code>/</code>, it is a relative path. So, UI5 will try to resolve it by going up the UI tree, and it will then find the scope from the <code>uistate</code> model that is established by the element binding in <code>App.view.xml</code>.
In <code>App.view.xml</code>, the binding was to <code>uistate>/appSettings</code>. If we resolve <code>uistate>sidebar/splitterSize</code> against that, the effective path will become <code>uistate>/appSettings/sidebar/splitterSize</code>.
If you look back at the earlier example of <a href="https://github.com/just-bi/ui5tips/blob/main/uistate/components/detailpage/DetailPage.view.xml"><code>DetailPage.view.xml</code></a>, we notice that its top <code>sap.layout.FixFlex</code> component was bound to <code>uistate>detailpage</code>, effectively letting all relative bindings to the <code>uistate</code> model inside it to be resolved to <code>uistate>/appSettings/detailpage</code>.
Of course, element binding does not only let you use shorter paths, using them consitently will also make it much easier to maintain the structure of the template. Whenever there is a radical change of structure, you should be able to rewire the element bindings without having to change each and every property individually.
<h1>Finally</h1>
Did you like this tip? Do you have a better tip? Feel free to post a comment and share your approach to the same or similar problem.
Want more tips? Find other posts with the <a href="https://blogs.sap.com/tag/ui5tips/">ui5tips tag</a>!rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-28244101754887745362024-01-03T12:29:00.003+01:002024-01-03T12:29:19.026+01:00UI5 Tips: Persisting JSONModel data using browser StorageIn this ui5tip, we'll take a look at integrating ui5's <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.model.json.JSONModel" rel="nofollow"><code>sap.ui.model.json.JSONModel</code></a> with <a href="https://openui5.hana.ondemand.com/api/module:sap/ui/util/Storage" rel="nofollow"><code>sap.ui.util.Storage</code> utility</a>.
Our immediate use case for this was to allow easy and transparent persistence of UI State, which <a href="https://github.com/just-bi/ui5tips/wiki/Persistent-UI-State">has its own dedicated tip</a>. In this tip, we describe how the actual persistence is implemented.
<h1><a id="user-content-sample-application-a-shopping-list" class="anchor" href="#sample-application-a-shopping-list" aria-hidden="true"></a>Sample application: a Shopping List</h1>
To illustrate just the storage model, we developed a tiny Shopping List application. You can run it yourself by downloading the contents of the <a href="https://github.com/just-bi/ui5tips/tree/main/localstoragejsonmodel"><code>localstoragemodel</code> folder</a> and exposing them with your webserver.
This is what the application looks like:
<img src="https://github.com/just-bi/ui5tips/raw/main/localstoragejsonmodel/images/screenshot.png?raw=true" alt="Screenshot of the Shopping List sample application to illustrate the local storage model." />
<h2><a id="user-content-sample-application-features" class="anchor" href="#sample-application-features" aria-hidden="true"></a>Sample Application Features</h2>
<ul>
<li>A Products list (left), and a Shopping list (right). Users can browse the Products list and see name and price. The Products list has a row action button with a shopping cart icon. If the product is already on the shopping list, the shopping cart appears full. Hitting the row action button will add the product to the Shopping List.</li>
<li>In the Shopping List, users see the product name, item price, quantity, and item total. The items in the shopping list also have a row action to remove the item from the shopping list.</li>
</ul>
The Shopping List also has a toolbar with some buttons that control the Shopping List Data:
<ul>
<li>The Save button will save the current contents of the shopping list to the local storage</li>
<li>The Undo button will restore the current contents of the shopping list with whatever data was stored in the local storage</li>
<li>The Submit button represents the action of actually placing an order for the shopping list. It will also clear the shopping list and save.</li>
<li>The Clear button will empty the current contents of the shopping list, but without saving the state to the local storage.</li>
</ul>
<h2><a id="user-content-sample-application-demo" class="anchor" href="#sample-application-demo" aria-hidden="true"></a>Sample Application Demo</h2>
To test the application, try the following sequence of actions:
<ol>
<li>Open the application. Initially, the Shopping list should be empty.</li>
<li>In the Products list, add a Product to the shopping list by hitting the shopping cart button. The item should be added to the Shopping list.</li>
<li>Refresh the browser window. When the application reloads, you'll notice that the shopping list is empty - that's expected, since you didn't save the shopping list.</li>
<li>Now, repeat step 2 and add some products to the shopping list. Hit the Save button.</li>
<li>Refresh the Browser again. Now, when the application reloads, the products you added in step 4. should re-appear automatically in the list.</li>
</ol>
This demonstrates that the application is capable of persisting the saved shopping list data. Instead of refreshing the window, you can also try to completely close the browser, or even reboot your machine. But when you revisit the application - with the same browser - then you'll notice that the data will still appear.
In addition to the persistence the application also provides a simple, one level undo action. Whenever you make a modification to the list, either by adding a new item, removing an item, or modifying the quantity of an item, both the Save and the Undo button will become enabled. We already demonstrated the Save button action. Hitting the Undo action button will restore the contents of the shopping list with whatever was available in the Storage, restoring the contents of the list to the previously saved state.
You can use your browser to inspect the local storage. It might look something like this:
<img src="https://github.com/just-bi/ui5tips/raw/main/localstoragejsonmodel/images/Inspect-localstorage.png?raw=true" alt="Screenshot of the Shopping List sample application to illustrate the local storage model." />
In the remainder of this tip we will discuss how these features were built by combinding two classes in the ui5 framework - the <code>sap.ui.util.Storage</code> utility and the <code>sap.ui.model.json.JSONModel</code>.
<h1><a id="user-content-sapuiutilstorage-utility" class="anchor" href="#sapuiutilstorage-utility" aria-hidden="true"></a><code>sap.ui.util.Storage</code> utility</h1>
The <a href="https://openui5.hana.ondemand.com/api/module:sap/ui/util/Storage" rel="nofollow"><code>sap.ui.util.Storage</code> utility</a> offers a UI5 APIto access the Browser's standard HTML5 <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Storage_API" rel="nofollow">Web Storage API</a>. It's a pretty basic, no-nonsense wrapper for managing modest amounts of data based on key/value access.
Of course, you can use the <code>sap.ui.util.Storage</code> utility directly and code your own logic to control exactly when you want to retrieve and store some data. While there is nothing against that approach, we envisioned something that also works when using <a href="https://openui5.hana.ondemand.com/topic/e5310932a71f42daa41f3a6143efca9c" rel="nofollow">models and data binding</a>.
This may need a little bit of explanation.
<h1><a id="user-content-ui5-models" class="anchor" href="#ui5-models" aria-hidden="true"></a>UI5 models</h1>
Models are a way to achieve managed data access. A model manages a particular collection of data, and can be shared across multiple elements of the application, or even be accessible to all elements of the application.
For example, both the Product List and the Shopping List are each managed by their own model, which are declared in the application's <a href="https://github.com/just-bi/ui5tips/blob/main/localstoragejsonmodel/manifest.json"><code>manifest.json</code></a>:
<div>
<pre> ...,
"models": {
"products": {
"type": "sap.ui.model.json.JSONModel",
"dataSource": "products"
},
"shoppingList": {
"type": "ui5tips.utils.LocalStorageJSONModel",
"dataSource": "shoppingListTemplate"
}
},
...
</pre>
</div>
The model can be observed by listening to its events, and this allows different parts of the application to react whenever something interesting happens to the state of the model - i.e. when its data is manipulated. For example, in <a href="https://github.com/just-bi/ui5tips/blob/main/localstoragejsonmodel/components/mainpage/MainPage.controller.js"><code>MainPage.controller.js</code></a>, an event handler is attached to listen to the Shopping list's <code>dirtyStateChange</code> and <code>propertyChange</code> events, which in turn control certain aspects of the screen logic, such as enabling and disabling the Save and Undo buttons:
<div>
<pre> ...,
initShoppingListModelHandlers: function(){
var shoppingListModel = this.getShoppingListModel();
shoppingListModel.attachDirtyStateChange(function(event){
this.dirtyStateChanged(event.getParameters());
}, this);
shoppingListModel.attachPropertyChange(function(event){
var path = event.getParameter('path');
var context = event.getParameter('context');
var itemsPath = '/items';
if (path === itemsPath || context && context.getPath() === itemsPath) {
this.itemsChanged(shoppingListModel.getProperty(itemsPath));
}
}, this);
},
...
</pre>
</div>
<h1><a id="user-content-data-binding" class="anchor" href="#data-binding" aria-hidden="true"></a>Data Binding</h1>
In Ui5, databinding is a mechanism that lets you declaratively construct objects and change their properties based on the state of a model. The declarative aspect means that no explicit coding is involved. For example, rather than setting up an event handler that contains explicit code to respond to changes to the state of the model, you can use a special syntax in designtime property assignments that ensures the runtime property value will be assigned directly from some part of the data in the model.
Some examples include:
<ul>
<li>The actual data in both the Product List and Shopping List. Both these are implemented using a <a href="https://openui5.hana.ondemand.com/api/sap.ui.table.Table" rel="nofollow"><code>sap.ui.table.Table</code> control</a>, which only support adding rows through data binding. For example, take a look at <a href="https://github.com/just-bi/ui5tips/blob/main/localstoragejsonmodel/components/mainpage/ShoppingList.fragment.xml"><code>ShoppingList.fragment.xml</code></a> to see how it gets its rows from the Shopping List model:</li>
</ul>
<div>
<pre> ...
<table:Table
id="shoppingList"
title="Shopping List"
editable="true"
selectionMode="None"
enableBusyIndicator="true"
visibleRowCountMode="Auto"
rowActionCount="1"
<b>rows="{
path: 'shoppingList>/items'
}"</b>
>
...
</pre>
<table id="shoppingList" title="Shopping List"></table>
</div>
(This example basically says: create a row in the shopping list for each item in the shopping list model)
<ul>
<li>The Product List's row action shows a full or empty shopping cart, depending upon whether the product is already in the shopping list. This is achieved in <a href="https://github.com/just-bi/ui5tips/blob/main/localstoragejsonmodel/components/mainpage/Products.fragment.xml"><code>Products.fragment.xml</code></a> with databinding, which passes the current product and the items from the shopping list to the controller's <code>getShoppingCartRowActionIconSource</code> formatter function:</li>
</ul>
<div>
<pre> ...
<table:RowActionItem
binding="{shoppingList>/items}"
<b></b>icon="{
parts: [
{path: 'products>'},
{path: 'shoppingList>'}
],
formatter: '.getShoppingCartRowActionIconSource'
}"
text="Add to Cart"
press="onCartButtonPressed"
/>
...
</pre>
</div>
(This example says: call the <code>getShoppingCartRowActionIconSource</code> method to obtain an icon, depending on the current product from the product model and all the items in the shopping list model.)
<ul>
<li>In <a href="https://github.com/just-bi/ui5tips/blob/main/localstoragejsonmodel/components/mainpage/ShoppingList.fragment.xml"><code>ShoppingList.fragment.xml</code></a>, the state of the Submit and Clear buttons is enabled depending upon whether the shopping list has any items:</li>
</ul>
<div>
<pre> ...
<m:contentMiddle>
<m:Button
id="approvalButton"
icon="sap-icon://cart-approval"
tooltip="Send Order"
<b>enabled="{= ${shoppingList>/items}.length > 0 && ${shoppingList>/items/0} !== undefined }"</b>
press="onApproveButtonPressed"
/>
</m:contentMiddle>
<m:contentRight>
<m:Button
id="clearAllButton"
icon="sap-icon://clear-all"
tooltip="Clear Shoppinglist"
<b>enabled="{= ${shoppingList>/items}.length > 0 && ${shoppingList>/items/0} !== undefined}"</b>
press="onClearButtonPressed"
/>
</m:contentRight>
...
</pre>
</div>
(This example says uses a slightly different binding syntax called <a href="https://sapui5.hana.ondemand.com/#/topic/daf6852a04b44d118963968a1239d2c0.html">expression binding</a> to enable the button if there is at least one item in the shopping list.)
While all these features could also have been implemented by explicit coding, data binding allows a lot of this to be defined completely declaratively in the view, with much less code, and denoted in a way that transparently and unambiguously ties the data to the relevant item in the UI.
Now - there is no shame (I think) in not immediately embracing ui5's data binding. There are a number of areas that can be somewhat complex and unintuitive at first. But by using it more and more often, you start experiencing the benefits, and - just as important -, learn about the limitations. Now, this post is not an in-depth article on ui5 databinding. It's just that, at some point, you learn to use it in such a way that it becomes one of the most important factors in how you design ui5 applications, as well as the way different parts of the application communicate with each other.
So, we consider using ui5 models and data binding as a given. And if you find you have a need for the kind of client-side persistence capabilities offered by the Web Storage API, then you are probably not interested in that as an isolated way of storing some bits of data. Instead, you're going to want to have a normal, regular ui5 model that incorporates these persistence features.
<h1><a id="user-content-a-sapuimodeljsonjsonmodel-backed-by-sapuiutilstorage" class="anchor" href="#a-sapuimodeljsonjsonmodel-backed-by-sapuiutilstorage" aria-hidden="true"></a>A <code>sap.ui.model.json.JSONModel</code> backed by <code>sap.ui.util.Storage</code></h1>
We decided to take the <code>sap.ui.model.json.JSONModel</code> as a base, and extend it to add a few methods that allow the model's data to be stored and retrieved from <code>sap.ui.util.Storage</code>.
The reason for this approach is to allow our model exactly the same as the standard ui5 <code>sap.ui.model.json.JSONModel</code>. This means that in particular, all behavior with regards to databinding will be exactly as with the <code>sap.ui.model.json.JSONModel</code>.
In theory, it would also be possible to extend the abstract <a href="https://openui5.hana.ondemand.com/api/sap.ui.model.ClientModel" rel="nofollow"><code>sap.ui.model.ClientModel</code></a>, but it turns out that implementing reliable databinding is not as easy as it seems. Or I should say, I took a naive shot at doing that, and failed. While it might be very instructive to try it in earnest, I decided that at this point I am more interested in having a working solution than to learn all the ui5 internals required to succesfully implement databinding.
The result is the <a href="https://github.com/just-bi/ui5tips/blob/main/localstoragejsonmodel/utils/LocalStorageJSONModel.js"><code>LocalStorageJSONModel</code></a>.
<h2><a id="user-content-instantiating-the-localstoragejsonmodel-from-the-manifestjson" class="anchor" href="#instantiating-the-localstoragejsonmodel-from-the-manifestjson" aria-hidden="true"></a>Instantiating the <code>LocalStorageJSONModel</code> from the <code>manifest.json</code></h2>
The sample application creates the <code>LocalStorageJSONModel</code> implicitly by declaring it in the <a href="https://github.com/just-bi/ui5tips/blob/main/localstoragejsonmodel/manifest.json"><code>manifest.json</code></a>:
<div>
<pre> "shoppingList": {
<b>"type": "ui5tips.utils.LocalStorageJSONModel"</b>,
"dataSource": "shoppingListTemplate"
}
</pre>
</div>
It gets initialized with a <code>datasource</code> called <code>shoppingListTemplate</code> which is also declared in the <code>manifest.json</code>:
<div>
<pre> "shoppingListTemplate": {
<b>"uri": "data/shoppingListTemplate.json"</b>,
"type": "JSON"
}
</pre>
</div>
The <code>datasource</code> refers to some configuration data stored in <a href="https://github.com/just-bi/ui5tips/blob/main/localstoragejsonmodel/data/shoppingListTemplate.json"><code>data/shoppingListTemplate.json</code></a> and its contents are:
<div>
<pre>{
"autoSaveTimeout": -1,
"storagePrefix": "shoppingList",
"template": {
"items": [
]
}
}
</pre>
</div>
This data is passed as first argument to the <code>LocalStorageJSONModel</code> constructor. Its properties are:
<ul>
<li><code>int autoSaveTimeout</code>: (optional) an integer specifying the number of milliseconds to wait after the last change to the model before automatically saving the model's data to the persistent storage. If this is <code>0</code> or less, data is not automatically persisted.</li>
<li><code>string storagePrefix</code>: (optional) a string that is used to prefix the key under which the models data will be stored in storage. The
<a href="https://openui5.hana.ondemand.com/api/module:sap/ui/util/Storage#constructor" rel="nofollow"><code>sap.ui.util.Storage</code> constructor</a> takes a storagePrefix, and the <code>LocalStorageJSONModel</code> takes its own class name for that. But if you have several of these models in one application, you can keep them apart by specifying a specific <code>storagePrefix</code> here.</li>
<li><code>object template</code>: (optional) an object that will be used as template data for the model.</li>
</ul>
<h2><a id="user-content-instantiating-the-localstoragejsonmodel-directly" class="anchor" href="#instantiating-the-localstoragejsonmodel-directly" aria-hidden="true"></a>Instantiating the <code>LocalStorageJSONModel</code> directly</h2>
Of course, you can also import the class into your in your ui5 classes (for example, in a controller) and call its constructor to create an instance:
<div>
<pre>sap.ui.define([
"sap/ui/core/mvc/Controller",
<b>"ui5tips/utils/LocalStorageJSONModel"</b>
],
function(
Controller,
LocalStorageJSONModel
){
"use strict";
var controller = Controller.extend("ui5tips.components.app.App", {
onInit: function(){
var localStorageModel = <b>new LocalStorageJSONModel</b>({
"autoSaveTimeout": -1,
"storagePrefix": "myApp",
"template": {
...data...
}
});
this.getView().setModel(localStorageModel, 'localStorageModel');
}
});
return controller;
});
</pre>
</div>
<h1><a id="user-content-key-methods" class="anchor" href="#key-methods" aria-hidden="true"></a>Key Methods</h1>
The most important methods provided by <code>LocalStorageJSONModel</code> are:
<ul>
<li><code>loadFromStorage(template)</code>: populates the model with the data persisted in the storage. If the <code>template</code> argument is specified, then the data from the storage is patched with the data in the template. (For more details, see the next section about model initialization and the template). In the sample application, the Undo button action is implemented by calling <code>loadFromStorage()</code>:</li>
</ul>
<div>
<pre> onUndoButtonPressed: function(){
var shoppingListModel = this.getShoppingListModel();
<b>shoppingListModel.loadFromStorage()</b>;
}
</pre>
</div>
<ul>
<li><code>saveToStorage()</code>: stores the model data to the browser storage. In the sample application, the Save button action is implemented by calling the <code>saveToStorage()</code> method.</li>
</ul>
<div>
<pre> onSaveButtonPressed: function(){
var shoppingListModel = this.getShoppingListModel();
shoppingListModel.saveToStorage();
}
</pre>
</div>
<ul>
<li><code>deleteFromStorage()</code>: permanently removes the data from the local storage. Use this if you're sure the application will not need any of the currently stored data anymore.</li>
<li><code>isDirty()</code>: returns a boolean that indicates whether the current model state is different from what is stored. If it returns true, it means the current state of the model is different from the stored state. Note that you can also use the <code>dirtyStateChanged</code> event to get notified of a change in the dirty state.</li>
</ul>
<h2><a id="user-content-template-and-model-initialization" class="anchor" href="#template-and-model-initialization" aria-hidden="true"></a>Template and Model initialization</h2>
As part of model initialization, whatever data the browser had associated with the <code>storagePrefix</code> is retrieved.
If a <code>template</code> is specified, then the data retrieved from the storage is patched with the template and the resulting data structure is immediately saved to the storage. This provide a basic method to evolve the structure of the model and pre-populate it with any defaults.
The patching of the data occurs non-destructively: only those paths in the template that do not exist already in the stored data structure will be added.
If you need it, you can always apply more advanced patching schemes after instantiation, but in many cases, this built-in behavior will suffice to update and upgrade the model structure as your application grows and gets more features.
You can use the following methods to work with the template:
<ul>
<li><code>getTemplateData()</code>: retrieve the template passed to the constructor.</li>
<li><code>resetToTemplate()</code>: repopulates the model with the template. Any data stored in the model will be lost.</li>
<li><code>updateDataFromTemplate(data, template)</code>: utility method that is used to patch the <code>data</code> argument with the <code>template</code> argument. It returns a object that represents the merge of the <code>data</code> argument and the <code>template</code> argument.</li>
</ul>
<h1><a id="user-content-events" class="anchor" href="#events" aria-hidden="true"></a>Events</h1>
The <code>LocalStorageJSONModel</code> provides these events:
<ul>
<li><code>dirtyStateChange</code>: this event has two parameters, <code>isDirty</code> to indicate whether the model is now dirty and <code>wasDirty</code>, indicating whether the model was dirty prior to the latest change.
The sample application uses this event to determine whether to enable or disable the Save and Undo buttons:</li>
</ul>
<div>
<pre> shoppingListModel.attachDirtyStateChange(function(event){
this.dirtyStateChanged(event.getParameters());
}, this);
</pre>
</div>
and
<div>
<pre> dirtyStateChanged: function(parameters){
var isDirty = parameters.isDirty;
this.byId('saveButton').setEnabled(isDirty);
this.byId('undoButton').setEnabled(isDirty);
},
</pre>
</div>
The following events can be used to keep track of the model state:
<ul>
<li><code>attachDirtyStateChange(data, handler, listener)</code>: attach a <code>handler</code> function to get notifications of a change in the dirty state. If a some change is made that causes a difference between the stored data and the model data, this event is fired and the <code>handler</code> is called in the scope of the <code>listener</code>, and gets passed the application specific payload <code>data</code>.</li>
</ul>
<h2>Autosave</h2>
The sample application controls when the model data will be persisted to local storage by calling <code>saveToStorage()</code> explicitly. But there are also use cases where you simply want the storage to always reflect the state of the model, or at least, track it as closely as possible. The <a href="https://github.com/just-bi/ui5tips/wiki/Persistent-UI-State">persistence of UI state</a> is such a case, and for these scenarios the <code>LocalStorageJSONModel</code> supports an automatic save feature.
Autosave works by monitoring the state of the model, and then saving to the storage whenever a change is detected. While saving to storage should generally be pretty fast, it is a blocking operation. So rather than always explicitly persisting after a change occurs, we simply buffer the change events with the <a href="https://github.com/just-bi/ui5tips/wiki/bufferedEventHandler"><code>bufferedEventHandler</code></a> and persist the data to storage some time after the occurrence of the last change event.
To use the autosave feature, simply pass a positive value for the <code>autoSaveTimeout</code> property when you instantiate the model. Alternatively, can also get or set the value of the autoSaveTimeout property after model construction by calling the getAutoSaveTimeout() and setAutoSaveTimeout() methods respectively. To disable autosave, simply set the property to a zero or negative value.
<h1>Finally</h1>
Did you like this tip? Do you have a better tip? Feel free to post a comment and share your approach to the same or similar problem.
Want more tips? Find other posts with the <a href="https://blogs.sap.com/tag/ui5tips/">ui5tips tag</a>!rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-62672646070407000112024-01-03T12:28:00.002+01:002024-01-03T12:28:37.364+01:00UI5 Tips: Change expand/collapse icons for Tree, Panel and TreeTable using only CSSUI5 offers a couple of widgets that can expand and collapse. To do that, these controls render a button with an icon that indicates the current state, and which the user can click to toggle the state.
The standard icons that UI5 renders for the expand/collapse button are navigation arrows, which some of our users disliked. In this tip, you'll learn how you can replace them with more appropriate icons using only a few lines of CSS. No javascript code is involved.
If you want to check out this tip yourself, download the app from the <a href="https://github.com/just-bi/ui5tips/tree/main/expandcollapse"><code>expandcollapse</code> directory</a> and expose it to your webserver. You can then navigate to <a href="https://github.com/just-bi/ui5tips/blob/main/expandcollapse/index.html"><code>index.html</code></a> to see the sample app in effect.
<h1><a id="user-content-ui5-exandablecollapsible-controls" class="anchor" href="#ui5-exandablecollapsible-controls" aria-hidden="true"></a>UI5 exandable/collapsible Controls</h1>
First, lets take a look at the standard UI5 controls.
<h2><a id="user-content-panel" class="anchor" href="#panel" aria-hidden="true"></a>Panel</h2>
The <a href="https://openui5.hana.ondemand.com/api/sap.m.Panel" rel="nofollow"><code>sap.m.Panel</code></a> has an <a href="https://openui5.hana.ondemand.com/api/sap.m.Panel#controlProperties" rel="nofollow"><code>expandable</code> property</a>. If <code>true</code> the Panel renders a button which the user can use to hide and show the contents of the panel. A screenshot is shown below:
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/UI5%20Sample%20-%20Panel.png?raw=true" alt="An expandable sap.m.Panel" />
(This screenshot is taken from UI5's <a href="https://openui5.hana.ondemand.com/entity/sap.m.Panel/sample/sap.m.sample.PanelExpanded" rel="nofollow">Panel - Expand / Collapse sample</a>)
<h2><a id="user-content-tree" class="anchor" href="#tree" aria-hidden="true"></a>Tree</h2>
The <a href="https://openui5.hana.ondemand.com/api/sap.m.Tree" rel="nofollow"><code>sap.m.Tree</code></a> is a classical way of presenting hierarchically organized items like a folder structure. A screenshot is shown below:
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/UI5%20Sample%20-%20Tree.png?raw=true" alt="A sap.m.Tree control" />
(This screenshot is taken from UI5's <a href="https://openui5.hana.ondemand.com/entity/sap.m.Tree/sample/sap.m.sample.Tree" rel="nofollow">Tree - Basic sample</a>)
<h2><a id="user-content-treetable" class="anchor" href="#treetable" aria-hidden="true"></a>TreeTable</h2>
The <a href="https://openui5.hana.ondemand.com/api/sap.ui.table.TreeTable" rel="nofollow"><code>sap.ui.table.TreeTable</code></a> is just like a regular data grid table (<a href="https://openui5.hana.ondemand.com/api/sap.ui.table.Table" rel="nofollow"><code>sap.ui.table.Table</code></a>), but with an added functionality to hierarchically organize the rows in the table, and with the ability to expand or collapse rows according to the hierarchy. A screenshot is shown below:
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/UI5%20Sample%20-%20TreeTable.png?raw=true" alt="A sap.ui.table.TreeTable" />
(This screenshot is taken from UI5's <a href="https://openui5.hana.ondemand.com/entity/sap.ui.table.TreeTable/sample/sap.ui.table.sample.TreeTable.JSONTreeBinding" rel="nofollow"><code>sap.ui.table.TreeTable</code> JSONTreeBinding sample</a>)
<h1><a id="user-content-a-look-at-the-icons" class="anchor" href="#a-look-at-the-icons" aria-hidden="true"></a>A look at the icons</h1>
Let's take a look at the standard icons that UI5 renders for the expand/collapse button:
<ul>
<li>When collapsed, the icon is the <a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=navigation-right-arrow" rel="nofollow">navigation-right-arrow</a> icon. This is what it looks like:</li>
</ul>
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/right-arrow.png?raw=true" alt="UI5 Right arrow icon" />
<ul>
<li>When expanded, it's the <a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=navigation-down-arrow" rel="nofollow">navigation-down-arrow</a> icon. This is what it looks like:</li>
</ul>
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/down-arrow.png?raw=true" alt="UI5 Down arrow icon" />
<h1><a id="user-content-propsed-icons" class="anchor" href="#propsed-icons" aria-hidden="true"></a>Proposed Icons</h1>
While I don't really have a problem with these icons, some of our users had a problem recognizing the collapse/expand functionality for Panels. We looked a bit around in <a href="https://openui5.hana.ondemand.com/test-resources/sap/m/demokit/iconExplorer/webapp/index.html" rel="nofollow">the UI5 Icon explorer</a> and decided we'd rather use these icons instead:
<ul>
<li><a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=expand&search=expand" rel="nofollow">expand</a></li>
</ul>
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/expand.png?raw=true" alt="UI5 Expand icon" />
<ul>
<li><a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=expand&search=collapse" rel="nofollow">collapse</a></li>
</ul>
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/collapse.png?raw=true" alt="UI5 Expand icon" />
Going by their name, it's a bit of a mystery to me why they weren't used by UI5 in the first place. But anway, now we have this tip to explain how you can change them.
<h1><a id="user-content-css-to-change-the-icons" class="anchor" href="#css-to-change-the-icons" aria-hidden="true"></a>CSS to change the icons</h1>
We prepared a separate CSS file for each of the aforementioned UI5 controls, and included them into the app via the <a href="https://github.com/just-bi/ui5tips/blob/main/expandcollapse/manifest.json"><code>manifest.json</code></a>:
<div>
<pre><span class="pl-s">"resources"</span>: <span class="pl-kos">{</span>
<span class="pl-s">"css"</span>: <span class="pl-kos">[</span>
<span class="pl-kos">{</span> <span class="pl-s">"uri"</span>: <span class="pl-s">"css/ui5-customization-m.Panel.css"</span> <span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-kos">{</span> <span class="pl-s">"uri"</span>: <span class="pl-s">"css/ui5-customization-m.TabContainer.css"</span> <span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-kos">{</span> <span class="pl-s">"uri"</span>: <span class="pl-s">"css/ui5-customization-m.Tree.css"</span> <span class="pl-kos">}</span><span class="pl-kos">,</span>
<span class="pl-kos">{</span> <span class="pl-s">"uri"</span>: <span class="pl-s">"css/ui5-customization-ui.tree.TreeTable.css"</span> <span class="pl-kos">}</span>
<span class="pl-kos">]</span>
<span class="pl-kos">}</span></pre>
</div>
<h2><a id="user-content-how-ui5-renders-icons" class="anchor" href="#how-ui5-renders-icons" aria-hidden="true"></a>How UI5 renders icons</h2>
Before we discuss how to apply the CSS to change the icons, it's useful to understand how UI5 icon rendering works.
In general, UI5 uses icon fonts. The UI5 framework loads a <code>library.css</code> stylesheet, which has a <code>@font-face</code> rule like this:
<div>
<pre><span class="pl-k">@font-face</span> {
<span class="pl-c1">font-family</span><span class="pl-kos">:</span> <b><span class="pl-s">"SAP-icons"</span></b>;
<span class="pl-c1">src</span><span class="pl-kos">:</span> <span class="pl-en">url</span>(<span class="pl-s">'../base/fonts/SAP-icons.woff2'</span>) <span class="pl-en">format</span>(<span class="pl-s">'woff2'</span>)<span class="pl-kos">,</span>
<span class="pl-en">url</span>(<span class="pl-s">'../base/fonts/SAP-icons.woff'</span>) <span class="pl-en">format</span>(<span class="pl-s">'woff'</span>)<span class="pl-kos">,</span>
<span class="pl-en">url</span>(<span class="pl-s">'../base/fonts/SAP-icons.ttf'</span>) <span class="pl-en">format</span>(<span class="pl-s">'truetype'</span>)<span class="pl-kos">,</span>
<span class="pl-en">local</span>(<span class="pl-s">'SAP-icons'</span>);
<span class="pl-c1">font-weight</span><span class="pl-kos">:</span> normal;
<span class="pl-c1">font-style</span><span class="pl-kos">:</span> normal
}</pre>
</div>
This binds the name <code>SAP-icons</code> to the font resource, and will ensure that whenever a HTML element is assigned the <code>font-family: "SAP-icons"</code> css property, it will render whatever text it contains with glyphs from that font.
Now, when using the UI5 javascript API, you don't actually ever have to deal with these details at this level. Rather, if you ever need to assign an icon explicitly, for example, when using a <a href="https://openui5.hana.ondemand.com/api/sap.ui.core.Icon" rel="nofollow"><code>sap.ui.core.Icon</code></a> control, you can assign a custom icon uri using the sap-icon protocol, which maps more or less reasonable icon names to the glyph that depicts the desired icon. (You can read more about the sap-icon uri protocol in <a href="https://sapui5.hana.ondemand.com/#/topic/776f7352807e4f82b18176c8fbdc0c56" rel="nofollow">the Icon topic of the SAP UI5 walkthrough</a>)
Apart from these explicitly assigned icons, the renderer classes of various UI5 controls will write out the required HTML code for the icons that just happen to be fixed to it. Let's call these structural icons. For example, there is no property that allows you to change the icon that a <code>sap.m.Panel</code> uses for its exapand/collapse button - that's just part of how the Panel happens to be coded - it's part of its structure.
As we will see in the following sections, the <code>font-family</code> is just the underlying medium that allows the UI5 framework to render icons. The details of how a particular control renderer renders its structural icons can still vary a bit, and we'll need to figure out how a particular control renders its icons before we can change them.
<h1><a id="user-content-how-sapuitabletreetable-renders-the-collapseexpand-icons" class="anchor" href="#how-sapuitabletreetable-renders-the-collapseexpand-icons" aria-hidden="true"></a>How <code>sap.ui.table.TreeTable</code> renders the collapse/expand icons</h1>
The <code>sap.ui.table.TreeTable</code> renderer takes a straightforward approach to rendering the collapse/expand icons. If you open one of the <a href="https://openui5.hana.ondemand.com/entity/sap.ui.table.TreeTable/sample/sap.ui.table.sample.TreeTable.JSONTreeBinding" rel="nofollow">standard UI5 TreeTable samples</a>, and right click the expand/collapse icon to inspect it (for example, with the Chrome developer tools), then you might see something like this:
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/TreeTableIconStandard.png?raw=true" alt="Inspecting the sap.ui.table.TreeTable expand/collapse icon with chrome developer tools" />
The <code>sap.ui.table.TreeTable</code> renderer has written a <code><span></code> element with a <code>sapUiTableTreeIcon</code> class:
<div>
<pre><span class="pl-kos"><</span><span class="pl-ent">span</span>
<span class="pl-c1">class</span>="<span class="pl-s">
<b>sapUiTableTreeIcon </b>
<b>sapUiTableTreeIconNodeClosed</b>
</span>"
<span class="pl-c1">title</span>="<span class="pl-s">Expand Node</span>"
<span class="pl-c1">role</span>="<span class="pl-s">button</span>"
<span class="pl-c1">aria-expanded</span>="<span class="pl-s">false</span>"
<span class="pl-kos">></span><span class="pl-kos"></</span><span class="pl-ent">span</span><span class="pl-kos">></span></pre>
</div>
The span does not actually contain any text - rather a css <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/::before" rel="nofollow"><code>::before</code> pseudo class</a> is used for that. This is also used to bind it to the <code>"SAP-icons"</code> font, using the <code>font-family</code> property - this ensures that element will render glyphs from the icon font:
<div>
<pre>.<span class="pl-c1">sapUiTableTreeIcon</span>::<span class="pl-ent">before</span> {
<b><span class="pl-c1">font-family</span><span class="pl-kos">:</span> <span class="pl-s">"SAP-icons"</span>;</b>
<span class="pl-c1">font-size</span><span class="pl-kos">:</span> <span class="pl-c1">.75<span class="pl-smi">rem</span></span>;
<span class="pl-c1">color</span><span class="pl-kos">:</span> <span class="pl-pds"><span class="pl-kos">#</span>0854a0</span>;
}</pre>
</div>
The actual text content that determines the icon is controlled through another rule, using another css class, which uses <a href="https://developer.mozilla.org/en-US/docs/Web/CSS/content" rel="nofollow">the css <code>content</code> property</a> to write out the character that renders the appropriate icon from the font.
When collapsed, its:
<div>
<pre>.<span class="pl-c1">sapUiTableTreeIcon</span>.<b><span class="pl-c1">sapUiTableTreeIconNodeClosed</span></b>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <b><span class="pl-s">'\e066'</span></b>;
}</pre>
</div>
(You may recall that <code>\e066</code> is the character that corresponds to the <a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=navigation-right-arrow" rel="nofollow">navigation-right-arrow icon</a>.)
When expanded, its:
<div>
<pre>.<span class="pl-c1">sapUiTableTreeIcon</span>.<b><span class="pl-c1">sapUiTableTreeIconNodeOpen</span></b>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <b><span class="pl-s">'\e1e2'</span></b>;
}</pre>
</div>
(You may recall that <code>\e1e2</code> is the character that corresponds to the <a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=navigation-down-arrow" rel="nofollow">navigation-down-arrow icon</a>.)
This way, the <code>sap.ui.TreeTable</code> only needs to change the style class from <code>sapUiTableTreeIconNodeClosed</code> to <code>sapUiTableTreeIconNodeOpen</code> on the <code><span></code>, depending on the expanded/collapsted state of the row: the css magic will take care of rendering the right icon.
<h2><a id="user-content-changing-the-expandcollapse-icons-for-the-sapuitabletreetable" class="anchor" href="#changing-the-expandcollapse-icons-for-the-sapuitabletreetable" aria-hidden="true"></a>Changing the expand/collapse icons for the <code>sap.ui.table.TreeTable</code></h2>
As we have just witnessed, the <code>sap.ui.table.TreeTable</code> uses separate classes for the collapse and expand icons. This makes it really quite simple to change the icons. We only have to write our own rules for the <code>sapUiTableTreeIconNodeOpen::before</code> and <code>sapUiTableTreeIconNodeClosed::before</code> classes to mask the default ones, and assign the proper value for the <code>content</code> property:
<div>
<pre><span class="pl-c">/**</span>
<span class="pl-c">* sap.ui.table.TreeTable: better icons for expanded</span>
<span class="pl-c">*/</span>
.<span class="pl-c1">sapUiTableTreeIcon</span>.<b><span class="pl-c1">sapUiTableTreeIconNodeOpen</span></b>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <b><span class="pl-s">'\e1d9'</span></b>;
}
<span class="pl-c">/**</span>
<span class="pl-c">* sap.ui.table.TreeTable: better icons for collapsed</span>
<span class="pl-c">*/</span>
.<span class="pl-c1">sapUiTableTreeIcon</span>.<b><span class="pl-c1">sapUiTableTreeIconNodeClosed</span></b>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <b><span class="pl-s">'\e1da'</span></b>;
}</pre>
</div>
(You will find similar rules in the <a href="https://github.com/just-bi/ui5tips/blob/main/expandcollapse/css/ui5-customization-ui.tree.TreeTable.css"><code>ui5-customization-ui.tree.TreeTable.css</code></a> provided by this ui5tip)
The only thing we really need to think of when applying this stylesheet is that it is loaded after UI5 framework loads the CSS specific to the <code>ui.tree.TreeTable</code> control: if our CSS is loaded before the framework's CSS, then our rules will be masked by the framework's, and we want to do it exactly the other way around.
To ensure that the framework's CSS for the <code>ui.tree.TreeTable</code> control is loaded before our custom CSS, simply include the <code>sap.ui.table</code> library in the <code>data-sap-ui-libs</code> property of the <code><script></code> element you use to load UI5. (see the <a href="https://github.com/just-bi/ui5tips/blob/main/expandcollapse/index.html"><code>index.html</code></a> for this tip):
<div>
<pre><span class="pl-kos"><</span><span class="pl-ent">script</span>
<span class="pl-c1">id</span>="<span class="pl-s">sap-ui-bootstrap</span>"
<span class="pl-c1">src</span>="<span class="pl-s">https://openui5.hana.ondemand.com/1.87.0/resources/sap-ui-core.js</span>"
<span class="pl-c1">data-sap-ui-theme</span>="<span class="pl-s">sap_belize</span>"
<span class="pl-c1">data-sap-ui-libs</span>="<span class="pl-s">sap.m, <b>sap.ui.table</b></span>"
<span class="pl-c1">data-sap-ui-bindingSyntax</span>="<span class="pl-s">complex</span>"
<span class="pl-c1">data-sap-ui-compatVersion</span>="<span class="pl-s">edge</span>"
<span class="pl-c1">data-sap-ui-preload</span>="<span class="pl-s">async</span>"
<span class="pl-c1">data-sap-ui-resourceroots</span>='<span class="pl-s">{</span>
<span class="pl-s"> "ui5tips": "./"</span>
<span class="pl-s"> }</span>'
<span class="pl-kos">></span><span class="pl-kos"></</span><span class="pl-ent">script</span><span class="pl-kos">></span></pre>
</div>
That's it! The screenshot below shows what the TreeTable looks like in this tip's sample app:
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/TreeTable-modified.png?raw=true" alt="sap.ui.table.TreeTable with modified collapse/expand icons" />
<h1><a id="user-content-how-sapmpanel-renders-the-collapsexpand-icons" class="anchor" href="#how-sapmpanel-renders-the-collapsexpand-icons" aria-hidden="true"></a>How <code>sap.m.Panel</code> renders the collaps/expand icons</h1>
Let's take a look at how the <a href="https://openui5.hana.ondemand.com/api/sap.m.Panel" rel="nofollow"><code>sap.m.Panel</code></a> renders its collapse/expand icon. We can again open <a href="https://openui5.hana.ondemand.com/entity/sap.m.Panel/sample/sap.m.sample.PanelExpanded" rel="nofollow">UI5's own <code>sap.m.Panel</code> sample</a> and use our browser's develpoment tools to inspect the page's HTML code:
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/PanelIconStandard.png?raw=true" alt="Inspecting sap.m.Panel's standard expand/collapse icon" />.
Just like the <code>sap.ui.table.TreeTable</code> we discussed in the previous section, the <code>sap.m.Panel</code> renders a <code><span></code> element for the icon, which is assigned a CSS class to mark it as the icon, and wich is bound to the icon font face:
<div>
<pre><span class="pl-kos"><</span><span class="pl-ent">span</span>
<b><span class="pl-c1">data-sap-ui-icon-content</span>="<span class="pl-s"></span>"</b>
<span class="pl-c1">class</span>="<span class="pl-s">
<b>sapUiIcon</b>
sapUiIconMirrorInRTL
sapMBtnCustomIcon
sapMBtnIcon
sapMBtnIconLeft</span>"
<span class="pl-c1">style</span>="<span class="pl-s">font-family: 'SAP\2dicons';</span>"
<span class="pl-kos">></span><span class="pl-kos"></</span><span class="pl-ent">span</span><span class="pl-kos">></span></pre>
</div>
And, just like for the <code>sap.ui.table.TreeTable</code>, there is a CSS rule to select the <code>::before</code> pseudo-class, which has the <code>content</code> property to insert the appropriate character that corresponds to the glyph.
<div>
<pre>.<span class="pl-c1">sapUiIcon</span>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <b><span class="pl-en">attr</span>(data-sap-ui-icon-content);</b>
<span class="pl-c1">speak</span><span class="pl-kos">:</span> none;
<span class="pl-c1">font-weight</span><span class="pl-kos">:</span> normal;
<span class="pl-c1">-webkit-font-smoothing</span><span class="pl-kos">:</span> antialiased;
}</pre>
</div>
There are some remarkable differences too with respect to the <code>sap.ui.tree.TreeTable</code> example.
In this case, there are no separate classes corresponding to the collapsed/expanded state of the Panel. Instead, the <code>content</code> property of the <code>.sapUiIcon::before</code> pseudo-class uses the value of the elements <code>data-sap-ui-icon-content</code> attribute. It will render whatever text is in the elements <code>data-sap-ui-icon-content</code> attribute.
If you check the code for the <code><span></code>, you'll note the <code>data-sap-ui-icon-content</code> has been assigned some text, which is rendered as a so-called .notdef glyph, both in the developer tools and here on the page. (The <a href="https://www.high-logic.com/fontcreator/manual11/recommendedglyphs.html" rel="nofollow">.notdef glyph</a> is the "boxed question mark").
You can copy the text from the <code>data-sap-ui-icon-content</code> attribute in the browser tools and paste it in a hex editor, or in a javascript string to figure out what its character code is, for example:
<div>
<pre><span class="pl-c">// decimal: 57839</span>
<span class="pl-s">""</span><span class="pl-kos">.</span><span class="pl-en">charCodeAt</span><span class="pl-kos">(</span><span class="pl-c1">0</span><span class="pl-kos">)</span>
<span class="pl-c">// hex: 0xE1EF</span>
<span class="pl-kos">(</span><span class="pl-s">""</span><span class="pl-kos">.</span><span class="pl-en">charCodeAt</span><span class="pl-kos">(</span><span class="pl-c1">0</span><span class="pl-kos">)</span><span class="pl-kos">)</span><span class="pl-kos">.</span><span class="pl-en">toString</span><span class="pl-kos">(</span><span class="pl-c1">16</span><span class="pl-kos">)</span></pre>
</div>
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/charcode.png?raw=true" alt="Inspecting the value of the data-sap-ui-icon-content attribute" />
It turns out that this corresponds to UI5's <a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=slim-arrow-down" rel="nofollow">slim-arrow-down</a> icon, which has a similar appearance to the down-arrow icon.
If you collapse the panel and inspect it again, you'll notice that the value of the <code>data-sap-ui-icon-content</code> is now <code>0xE1ED</code>, which corresponds to UI5's <a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=slim-arrow-right" rel="nofollow">slim-arrow-right</a> icon.
<h2><a id="user-content-changing-the-expandcollapse-icons-for-the-sapmpanel" class="anchor" href="#changing-the-expandcollapse-icons-for-the-sapmpanel" aria-hidden="true"></a>Changing the expand/collapse icons for the <code>sap.m.Panel</code></h2>
Now, it's clear that we cannot simply mask the existing classes in the same way we did in the <code>sap.ui.tree.TreeTable</code> case. The reason is that in this case the icon is driven directly by an attribute value, not by change of style class.
Since the icon is so clearly driven by the value of the attribute, your initial hunch might be to somehow change the value that is written out to the HTML. But this would involve rewriting or overriding the <code>sap.m.Panel</code> or its renderer, and we're not quite prepared to do that just to change the icon.
But, there is a way.
What we can do is write some rules that match the <code><span></code> depending on the value of the <code>data-sap-ui-icon-content</code> attribute. And if we can match a CSS selector based on the attribute value, we can simply write out a <code>content</code> property with the desired character instead. This works as long as we know what values the attribute will have, which is of course the case here, as there will only be 2 different values, corresponding to the collapsed or expanded state of the panel.
This is what it looks like in <a href="https://github.com/just-bi/ui5tips/blob/main/expandcollapse/css/ui5-customization-m.Panel.css"><code>ui5-customization-m.Panel.css</code></a>:
<div>
<pre><span class="pl-c">/*</span>
<span class="pl-c"> sap.m.Panel better expanded button. </span>
<span class="pl-c"> The value in the predicate for data-sap-ui-icon-content may not render correctly,</span>
<span class="pl-c"> but this is decimal 57839, or 0xE1EF, which corresponds to UI5's "slim-arrow-down" icon</span>
<span class="pl-c"> (https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=slim-arrow-down)</span>
<span class="pl-c">*/</span>
<span class="pl-ent">div</span>.<span class="pl-c1">sapMPanel</span>.<span class="pl-c1">sapMPanelExpandable</span> <span class="pl-c1">></span> <span class="pl-ent">div</span> <span class="pl-c1">></span> <b><span class="pl-ent">span</span>[<span class="pl-c1">data-sap-ui-icon-content</span><span class="pl-c1">=</span>]</b>.<span class="pl-c1">sapUiIcon</span>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <b><span class="pl-s">'\e1d9'</span></b>;
}
<span class="pl-c">/*</span>
<span class="pl-c"> sap.m.Panel better collapse button</span>
<span class="pl-c"> The value in the predicate for data-sap-ui-icon-content may not render correctly,</span>
<span class="pl-c"> but this is decimal 57837, or 0xE1ED, which corresponds to UI5's "slim-arrow-right" icon</span>
<span class="pl-c"> (https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=slim-arrow-down)</span>
<span class="pl-c">*/</span>
<span class="pl-ent">div</span>.<span class="pl-c1">sapMPanel</span>.<span class="pl-c1">sapMPanelExpandable</span> <span class="pl-c1">></span> <span class="pl-ent">div</span> <span class="pl-c1">></span> <b><span class="pl-ent">span</span>[<span class="pl-c1">data-sap-ui-icon-content</span><span class="pl-c1">=</span>]</b>.<span class="pl-c1">sapUiIcon</span>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <b><span class="pl-s">'\e1da'</span></b>;
}</pre>
</div>
Note the <code>span[data-sap-ui-icon-content=].sapUiIcon::before</code> is the essential bit that allows us to react to a specific icon value. The selector part before is there to ensure the rule will only apply to the expand/collapse button of a Panel, and not to some random other control's icon.
And, here's what it looks like in the sample app:
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/Panel-modified.png?raw=true" alt="sap.m.Panel with modified expand/collapse icons." />
<h1><a id="user-content-how-sapmtree-renders-the-collapseexpand-icons" class="anchor" href="#how-sapmtree-renders-the-collapseexpand-icons" aria-hidden="true"></a>How <code>sap.m.Tree</code> renders the collapse/expand icons</h1>
The <code>sap.m.Tree</code> uses exactly the same mechanism to render the icons as the <code>sap.m.Panel</code> does - the character that corresponds to the appropriate icon glyph is written to a <code>data-sap-ui-icon-content</code>, and the value of the attribute is rendered against the icon font's font face. The only difference with the Panel is that the <code>sap.m.Tree</code> uses the <a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=navigation-right-arrow" rel="nofollow">navigation-right-arrow</a> and <a href="https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=navigation-down-arrow" rel="nofollow">navigation-down-arrow</a> icons, just like the <code>sap.ui.table.TreeTable</code> did.
Apart from that, we also need to ensure the first bit of the selectors are specific to the <code>sap.m.Tree</code>, which is similar to what we did for the <code>sap.m.Panel</code>.
This is what the CSS looks like in <a href="https://github.com/just-bi/ui5tips/blob/main/expandcollapse/css/ui5-customization-m.Tree.css">ui5-customization-m.Tree.css</a>:
<div>
<pre><span class="pl-c">/**</span>
<span class="pl-c"> sap.m.TreeItem : better icons for collapsed </span>
<span class="pl-c"> The value in the predicate for data-sap-ui-icon-content may not render correctly,</span>
<span class="pl-c"> but this is decimal 57446, or 0xE066, which corresponds to UI5's "navigation-right-arrow" icon</span>
<span class="pl-c"> (https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=navigation-right-arrow)</span>
<span class="pl-c">*/</span>
<span class="pl-ent">li</span>.<span class="pl-c1">sapMTreeItemBase</span> <span class="pl-c1">></span> <span class="pl-ent">span</span>[<span class="pl-c1">data-sap-ui-icon-content</span><span class="pl-c1">=</span>].<span class="pl-c1">sapMTreeItemBaseExpander</span>.<span class="pl-c1">sapUiIcon</span>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <span class="pl-s">'\e1da'</span>;
}
<span class="pl-c">/**</span>
<span class="pl-c"> sap.m.TreeItem : better icons for expanded</span>
<span class="pl-c"> The value in the predicate for data-sap-ui-icon-content may not render correctly,</span>
<span class="pl-c"> but this is decimal 57826, or 0xE1E2, which corresponds to UI5's "navigation-down-arrow" icon</span>
<span class="pl-c"> (https://sapui5.hana.ondemand.com/sdk/test-resources/sap/m/demokit/iconExplorer/webapp/index.html#/overview/SAP-icons/?tab=grid&icon=navigation-down-arrow)</span>
<span class="pl-c">*/</span>
<span class="pl-ent">li</span>.<span class="pl-c1">sapMTreeItemBase</span> <span class="pl-c1">></span> <span class="pl-ent">span</span>[<span class="pl-c1">data-sap-ui-icon-content</span><span class="pl-c1">=</span>].<span class="pl-c1">sapMTreeItemBaseExpander</span>.<span class="pl-c1">sapUiIcon</span>::<span class="pl-ent">before</span> {
<span class="pl-c1">content</span><span class="pl-kos">:</span> <span class="pl-s">'\e1d9'</span>;
}</pre>
</div>
And this is what the Tree looks like in the sample app:
<img src="https://github.com/just-bi/ui5tips/raw/main/expandcollapse/images/Tree-modified.png?raw=true" alt="A sap.m.Tree with modified expand/collapse icons." />
<h1>Finally</h1>
Did you like this tip? Do you have a better tip? Feel free to post a comment and share your approach to the same or similar problem.
Want more tips? Find other posts with the <a href="https://blogs.sap.com/tag/ui5tips/">ui5tips tag</a>!rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-41547391081773151732024-01-03T12:18:00.001+01:002024-01-03T12:18:31.725+01:00UI5 Tips: Buffering Events to avoid a request-stormStandard UI5 event handling will usually go a long way. Yet sometimes, certain user actions can cause ui5 objects to generate a lot of similar events within a small period of time, and it is often not useful to handle each and all of them: only the last event needs handling.
A very common scenario is doing a search in response to the <a href="https://openui5.hana.ondemand.com/#/api/sap.m.SearchField%23events/liveChange" rel="nofollow"><code>liveChange</code> event</a>: if you'd attach a handler to handle the <code>liveChange</code> event, and do the backend query from there, then a backend request would be sent for each keystroke while the user is typing in the search field. This causes a storm of requests that the backend must somehow handle. But most of these requests will be for naught, as the user is only interested in the result of the query that matches the last complete search term they typed.
So, rather than firing a query to the backend for each and every keystroke, it makes more sense to buffer these events, and react to only the last one. The bufferedEventHandler utility helps you to do just that in a generic and reusable way.
This ui5tip describes the <a href="https://github.com/just-bi/ui5tips/blob/main/bufferedeventhandler/utils/bufferedEventHandler.js">bufferedEventHandler</a> utility. It is available on github under terms of the Apache 2.0 License. There's also a <a href="https://github.com/just-bi/ui5tips/tree/main/bufferedeventhandler">sample application</a> so you can try it out yourself.
<h1><a id="user-content-the-bufferedeventhandler-sample-app" class="anchor" href="#the-bufferedeventhandler-sample-app" aria-hidden="true"></a>The BufferedEventHandler sample app</h1>
The <a href="https://github.com/just-bi/ui5tips/tree/main/bufferedeventhandler"><code>bufferedEventHandler</code> sample application</a> illustrates the scenario from the introduction. It consists of a single page showing mockup company data in a <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.table.Table" rel="nofollow"><code>sap.ui.table.Table</code></a>. A screenshot is shown below:
<img src="https://github.com/just-bi/ui5tips/raw/main/bufferedeventhandler/images/BufferedEventHandlerApp-schreenshot.png?raw=true" alt="Screenshot of the BufferedEventHandler sample app." />
At the top left of the grid, there's a <a href="https://openui5.hana.ondemand.com/#/api/sap.m.SearchField" rel="nofollow"><code>sap.m.SearchField</code></a> labeled "Search in Name". The user can type some search term into the searchfield, and the grid will automatically refresh and show only the rows for which the CompanyName has a case-insensitive match with the entered search term.
While the search happens <strong>automatically</strong>, it does not happen <strong>immediately</strong> as the search term changes at every keystroke. Rather, about 1 second after the user stops typing, the data grid is filtered.
At the top right of the grid, there's a <a href="https://openui5.hana.ondemand.com/#/api/sap.m.ProgressIndicator" rel="nofollow"><code>sap.m.ProgressIndicator</code></a> labeled "Event buffer Timeout". The progress indicator reflects how much time has passed since the last keystroke. When the progress indicator reaches a 100%, the filter action is executed.
<h1><a id="user-content-the-bufferedeventhandler-utility" class="anchor" href="#the-bufferedeventhandler-utility" aria-hidden="true"></a>The bufferedEventHandler Utility</h1>
To buffer events we provide a <code>bufferedEventHandler</code> utility object with just one <code>bufferEvents</code> function. You can find this in the <a href="https://github.com/just-bi/ui5tips/blob/main/bufferedeventhandler/utils/bufferedEventHandler.js"><code>bufferedEventHandler</code> file</a> in the <a href="https://github.com/just-bi/ui5tips/blob/main/bufferedeventhandler/utils"><code>utils</code> directory</a>.
To use it, we need to import it into the source file where we want to use it. This will usually be in a ui5 controller and in the sample app we do this in <a href="https://github.com/just-bi/ui5tips/blob/main/bufferedeventhandler/components/mainpage/MainPage.controller.js"><code>MainPage.controller.js</code></a>:
<pre>sap.ui.define([
"sap/ui/core/mvc/Controller",
"sap/ui/table/Column",
"sap/m/Text",
"sap/ui/model/Filter",
"sap/ui/model/FilterOperator",
"sap/ui/model/FilterType",
<b>"ui5tips/utils/bufferedEventHandler"</b>
],
function(
Controller,
Column,
Text,
Filter,
FilterOperator,
FilterType,
<b>bufferedEventHandler</b>
){
"use strict";
var controller = Controller.extend("ui5tips.components.mainpage.MainPage", {
...
});
return controller;
}
</pre>
We can now refer to the <code>bufferedEventHandler</code> utility through the local variable that is also called <code>bufferedEventHandler</code>.
The controller uses the <code>bufferedEventHandler</code> utility in the <code>initSearchField()</code> method. This called from the controller's standard <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.core.mvc.Controller%23methods/onInit" rel="nofollow"><code>onInit()</code> lifecycle method</a>, which is called just once for the <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.core.mvc.Controller" rel="nofollow"><code>Controller</code></a> instance:
<pre> ...
onInit: function() {
this.initSearchField();
},
initSearchField: function(){
var searchField = this.byId('searchField');
<b>bufferedEventHandler.bufferEvents</b>(
// event provider
searchField,
// timeInterval
1000,
// eventId
'liveChange',
// data
null,
// handler
this.doSearch,
// listener
this,
// progressHandler
this.searchFieldProgress,
// progressUpdateInterval
50
);
},
...
</pre>
<h1><a id="user-content-the-bufferevents-method" class="anchor" href="#the-bufferevents-method" aria-hidden="true"></a>The <code>bufferEvents()</code> Method</h1>
The meat of the <code>initSearchField()</code> method is the call to the <code>bufferEvents</code> method of the <code>bufferedEventHandler</code> utility. This method has the following arguments:
<ul>
<li><code>eventProvider</code>: the 1st argument should be the object that emits the events - in our example this is the <code>sap.m.SearchField</code>. This object should be a subclass of <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.base.EventProvider" rel="nofollow"><code>sap.ui.base.EventProvider</code></a>. (<code>bufferEvents</code> will throw an error if it's not!)</li>
<li><code>timeInterval</code>: the 2nd argument is the timeout, in milliseconds. This is the amount of time that should pass between the occurrence of the last event and the call to the actual handler of the event. If a new event occurs during the wait period, the timeout is reset, and a new waiting period is started. In the example, we use a <code>timeInterval</code> of <code>1000</code> - that is, we will wait 1000 milliseconds (1 second) before handling the last event.</li>
</ul>
Choosing the <code>timeInterval</code> is a balancing act. In the case of the example, where the events are generated in response to user actions, the <code>timeInterval</code> should not be too short, as the user should be given enough time to type a meaningful searchterm before the actual query kicks in. But if the <code>timeInterval</code> is too long, the application may appear unresponsive to the user. If the application appears unresponsive, the user may try to retype their search term, which will only postpone the reaction even more. (There's more about this in the section about the ProgressIndicator).
The next 4 arguments of <code>bufferEvents</code> correspond to <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.base.EventProvider%23methods/attachEvent" rel="nofollow"><code>sap.ui.base.EventProvider</code>'s <code>attachEvent()</code> method</a>:
<ul>
<li><code>eventId</code>: a string that identifies the event to listen to. In our example this is <code>'liveChange'</code>. Some ui5 objects, (for example, <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.base.ManagedObject" rel="nofollow"><code>sap.ui.base.ManagedObject</code>s</a>, which includes all <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.core.Control" rel="nofollow"><code>sap.ui.core.Control</code>s</a>) expose the events they expose through their <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.base.Object%23methods/getMetadata" rel="nofollow">metadata</a>. In these cases, <code>bufferEvents</code> will verify whether the passed <code>eventId</code> is in fact exposed by the object, and it will throw an error in case it doesn't. <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.base.EventProvider" rel="nofollow"><code>EventProvider</code>s</a> that do not expose their events through metadata can still be used with the <code>bufferedEventHandler</code>, but you'll just need to make sure yourself the value for <code>eventId</code> is valid, as <code>bufferEvent</code> has no way of checking it.</li>
<li><code>data</code>: an optional argument to pass any "extra" data that the event handler might need. In the example, we pass <code>null</code> as we have no need for any additional data.</li>
<li><code>handler</code>: this should be the callback-function that will be called upon to actually handle the event. The callback function will receive an instance of an <a href="https://openui5.hana.ondemand.com/#/api/sap.ui.base.Event" rel="nofollow"><code>sap.ui.base.Event</code></a> as single argument, which typically provides access to all relevant information pertaining to the event. In the example, we pass <code>this.doSearch</code>, which is a method of the controller that will perform the actual filtering of the data grid.</li>
<li><code>listener</code>: this is an optional argument which you can use to specify the scope in which the handler will be called. Typically the handler will not be completely standalone, but it will refer to a <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/this" rel="nofollow"><code>this</code> object</a>, one way or another. If the handler function is not already bound (for example, by using the function's <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Function/bind" rel="nofollow"><code>bind()</code> method</a>), then you should pass whatever object should act as <code>this</code> for the handler function via the <code>listener</code> argument. In the example, we simply use <code>this</code> which refers to the controller instance itself. This makes sense as the handler function is also a method of the controller. (Remember: we passed <code>this.doSearch</code> as handler.)</li>
</ul>
In the call to <code>bufferEvents</code>, these arguments will be used to create an actual handler for the event, and also automatically attach it to the <code>eventProvider</code> for the specified <code>eventId</code>. But rather then calling the passed <code>handler</code>, it will start a <a href="https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/setTimeout" rel="nofollow">javascript timeout</a> for a duration of the passed <code>timeInterval</code>. If the timeout was already initiated, it is cleared, thus canceling the previous event, and initiating a new waiting period.
<h1><a id="user-content-monitoring-wait-progress" class="anchor" href="#monitoring-wait-progress" aria-hidden="true"></a>Monitoring wait progress</h1>
The final 2 arguments to <code>bufferEvents</code> are optional, and may be used by the application to monitor the waiting period between the occurrence of the last event and the time when the handler will actually be called:
<ul>
<li><code>progressHandler</code>: when passed, this should be a callback function which is to be called at the start and during the waiting period. If the <code>progressHandler</code> callback is called, it will be called using the <code>listener</code> as scope. The callback will be passed a floating point number between <code>0</code> and <code>1</code>, indicating the fraction of the time that has passed between the last event and now. If a <code>progressHandler</code> is specified, it is always called at least once and passed <code>0</code> whenever a new waiting period is initiated. In this example we passed <code>this.searchFieldProgress</code>, which is a method of the controller that updates the <a href="https://openui5.hana.ondemand.com/#/api/sap.m.ProgressIndicator" rel="nofollow"><code>sap.m.ProgressIndicator</code></a> that sits in the right top of the data grid.</li>
<li><code>progressUpdateInterval</code>: this should be in integer, indicating the number of milliseconds between the calls to the <code>progressHandler</code>. In our example it is 50, which means we will get <code>1000 / 50 = 20</code> updates during the waiting period, which ensures a smooth and regular update of the <code>ProgressIndicator</code> control.</li>
</ul>
<h1><a id="user-content-the-progressindicator" class="anchor" href="#the-progressindicator" aria-hidden="true"></a>The ProgressIndicator</h1>
The sample application provides a <a href="https://openui5.hana.ondemand.com/#/api/sap.m.ProgressIndicator"><code>sap.m.ProgressIndicator</code></a> to indicate when the entered search term will be used to filter the data.
A progress indicator may not be necessary in case the <code>timeInterval</code> is so short that it will appear to the user as if the event is handled immediately. But when the <code>timeInterval</code> exceeds <code>200</code> or <code>250</code> milliseconds, most users will start to experience a noticeable lag.
Now, there is this strange psychological phenomenon happening here - as the user is still typing their search term, they will be happy that the backend query is not already fired. It would make them feel rushed if the grid was constantly being updated while they were typing. But once the user is done typing their search term, they want to have the result as quickly as possible. Obviously, the software cannot read the user's mind (yet!), so once the user stops typing, the application needs to let the user know they acknowledged their action, and that it is 'working on it'.
Hence the need for a progress indicator: by having a visual indicator that "something's happening", the user will be assured the application has acknowledged their input, and this will make the wait period before actually handling the event more acceptable.
If the wait is sufficiently short, a simple busyIndicator might do the trick, but since the <code>progressHandler</code> gets passed an exacte estimate of how much longer the user will need to wait, our progress indicator can communicate this to the user. This will make the application's behavior more predictable and hopefully more satisfying to use.
Of course, it is not absolutely necessary to use the <a href="https://openui5.hana.ondemand.com/#/api/sap.m.ProgressIndicator"><code>sap.m.ProgressIndicator</code></a> to give this kind of feedback to the user. It's just that for this sample, this was the easiest, most straighforward illustration of this principle. You can use the `progressHandler` callback to do anything you like to fit your need.
<h1><a id="user-content-detaching" class="anchor" href="#detaching" aria-hidden="true"></a>Detaching</h1>
The <code>bufferEvents</code> method will create and attach a handler to the eventProvider. <code>bufferEvents</code> will also return that generated handler so you can detach it explicitly from the <code>eventProvider</code> if you need to. As a convenience, the returned handler provides its own <code>detach()</code> method for this purpose:
<pre>var bufferedEventHandlerInstance = bufferedEventHandler.bufferEvents(...);
...
<b>bufferedEventHandlerInstance.detach()</b>;
</pre>
(Note that in a typical scenario, the eventHandler and the eventProvider will almost certainly be in the same scope and lifecycle, so there is rarely a need to explicitly do this.)
<h1>Other Use Cases</h1>
The <code>liveSearch</code> scenario may not always be a convincing use case. For example, if the query is done against a client model rather than a remote backend system then it might not actually be a problem to re-issue the query for every keystroke. But there are some other scenarios that benefit from event buffering. We will encounter one such case in the tip about <a href="https://github.com/just-bi/ui5tips/wiki/Persistent-UI-State">Persisting UI State</a>.
<h1>Finally</h1>
Did you like this tip? Do you have a better tip? Feel free to post a comment and share your approach to the same or similar problem.
Want more tips? Find other posts with the <a href="https://blogs.sap.com/tag/ui5tips/">ui5tips tag</a>!rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-19433416576026356842024-01-03T12:17:00.002+01:002024-01-03T12:27:58.987+01:00UI5 Tips: Manipulating the sap.m.TabContainer close buttons with custom CSSHere's a ui5tip to show how you can change the look and feel of the <a href="https://openui5.hana.ondemand.com/#/api/sap.m.TabContainer" rel="nofollow"><code>sap.m.TabContainer</code></a> with a minimal amount of custom CSS. If you want to try this for yourself, be sure to <a href="https://github.com/just-bi/ui5tips/tree/main/tabcontainer">check out the sample application from github</a>.
<h1>The sap.m.TabContainer</h1>
The <a href="https://openui5.hana.ondemand.com/#/api/sap.m.TabContainer" rel="nofollow"><code>sap.m.TabContainer</code></a> provides a simple, no-nonsense widget to build a tabbed user interface (check out <a href="https://openui5.hana.ondemand.com/entity/sap.m.TabContainer" rel="nofollow">the samples</a>). Tabs can be added via <a href="https://openui5.hana.ondemand.com/api/sap.m.TabContainer#aggregations" rel="nofollow">the items aggregation</a>, which should contain a collection of <a href="https://openui5.hana.ondemand.com/#/api/sap.m.TabContainerItem" rel="nofollow"><code>sap.m.TabContainerItem</code></a>'s.
While this control generally suits my needs, it has one feature I find problematic: each tabs always has a close button, which appears as a little 'X' icon in the right side of the tab. If the user clicks it, it will actually 'close' the tab, that is: the respective <code>sap.m.TabContainerItem</code> will be removed from the TabContainer.
See the screenshot below to see what the default looks like (close buttons highligthed in red):<img src="https://github.com/just-bi/ui5tips/raw/main/tabcontainer/images/DefaultTabContainer_closebutton.png?raw=true" alt="default sap.m.TabContainer with close buttons on each tab." />
<h1><a id="user-content-suppressing-the-close-action" class="anchor" href="#suppressing-the-close-action" aria-hidden="true"></a>Suppressing the close action</h1>
The openui5 samples show how you can suppress that behavior: you can write an event handler for <a href="https://sapui5.hana.ondemand.com/#/api/sap.m.TabContainer%23events/itemClose" rel="nofollow">the <code>itemClose</code> event</a>, and then call <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.base.Event%23methods/preventDefault" rel="nofollow">the <code>preventDefault()</code> method</a> on the event:
(In the view xml:)
<pre><<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainer</span> <b><span class="pl-e">itemClose</span>=<span class="pl-s"><span class="pl-pds">"</span>onTabContainerItemClose<span class="pl-pds">"</span></span></b>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">items</span>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span>>
...
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span>>
...
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">items</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainer</span>>
</pre>
(In the controller javascript:)
<pre> <b>onTabContainerItemClose</b>: <span class="pl-k">function</span><span class="pl-kos">(</span><span class="pl-s1">event</span><span class="pl-kos">)</span><span class="pl-kos">{</span>
<span class="pl-s1">event</span><span class="pl-kos">.</span><b><span class="pl-en">preventDefault</span><span class="pl-kos">(</span><span class="pl-kos">)</span></b><span class="pl-kos">;</span>
<span class="pl-kos">}</span>
</pre>
Obviously, it would be strange if we'd always prevent the tab from being closed: Suppressing the default action of closing the tab only makes sense in a context where the user is supposed to be able to close the tab at all, and in such a case this could be used to pop up a dialog to let the user choose if they really meant to close the tab or want to keep it open.
But the use case I frequently encounter is that the tab should not be closeable in the first place. While suppressing the close action would ensure the tab is never closed, it would confuse and anger the user, as the close button itself would still be there, inviting users to perform an action that can never be fulfilled.
<h1><a id="user-content-using-the-other-tab-widget" class="anchor" href="#using-the-other-tab-widget" aria-hidden="true"></a>Using the other Tab widget</h1>
One might suggest to use the <a href="https://sapui5.hana.ondemand.com/#/api/sap.m.IconTabBar" rel="nofollow"><code>sap.m.IconTabBar</code></a> widget instead of the <code>sap.m.TabContainer</code>. The <code>sap.m.IconTabBar</code> takes <a href="https://sapui5.hana.ondemand.com/#/api/sap.m.IconTabFilter" rel="nofollow"><code>sap.m.IconTabFilter</code></a>'s in its items collection, and these do not have a close button.
Now, in some cases the <code>sap.m.IconTabBar</code>/<code>sap.m.IconTabFilter</code> may suit your needs and then you're fine. However I find that it has a number of other drawbacks (which I won't get into here).
Besides, the <code>sap.m.IconTabBar</code> introduces a similar problem, but the other way around: whereas we cannot get rid of the close button in the <code>sap.m.TabContainer</code>, we cannot ever have a close button in the <code>sap.m.IconTabBar</code>. What we really want, is a property or something like that, which will let us control whether the tab will have a close button or not.
<h1><a id="user-content-css-to-the-rescue" class="anchor" href="#css-to-the-rescue" aria-hidden="true"></a>CSS to the rescue</h1>
In the previous section we argued that we'd really like to be able to control for each individual tab whether they have a close action at all, for example, by setting a property.
To add a property one would normally have to extend a ui5 control, and attach some code so that the property setting can somehow influence the behavior of the control - in this case, control whether or not the close button will be displayed. While this is probably possible (I haven't tried it for this case), it does seem like an extraordinary measure for such a humble request.
I found that a similar effect can be achieved by <a href="https://sapui5.hana.ondemand.com/1.32.7/docs/guide/723f4b2334e344c08269159797f6f796.html" rel="nofollow">applying some custom CSS</a> in combination with standard ui5 features. That's what this entire sample is about.
With this tip you can:
<ul>
<li>hide all close buttons for an entire <code>sap.m.TabContainer</code></li>
<li>hide the close button on an individual <code>sap.m.TabContainerItem</code></li>
<li>show the close button on an individual <code>sap.m.TabContainerItem</code> in case the close buttons are hidden by default on the <code>sap.m.TabContainer</code></li>
</ul>
All this functionality requires the inclusion of some css. In the sample this is all isolated in a single <a href="https://github.com/just-bi/ui5tips/blob/main/tabcontainer/css/ui5-customization.css"><code>ui5-customization.css</code> file</a>, which is included into the application by <a href="https://sapui5.hana.ondemand.com/1.32.7/docs/guide/723f4b2334e344c08269159797f6f796.html" rel="nofollow">declaring it the manifest.json</a>.
<h1><a id="user-content-hiding-all-close-buttons" class="anchor" href="#hiding-all-close-buttons" aria-hidden="true"></a>Hiding all close buttons</h1>
To hide all the close buttons for all <code>sap.m.TabContainerItem</code> in the items collection of a particular <code>sap.m.TabContainer</code>, simply add the <code>noCloseButtons</code> style class:
<pre><<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainer</span>
<b><span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>noCloseButtons<span class="pl-pds">"</span></span></b>
>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">items</span>>
<span class="pl-c"><span class="pl-c"><!--</span> note: close button will be hidden by default for each m:TabContainerItem <span class="pl-c">--></span></span>
...
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">items</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainer</span>>
</pre>
This works because the <code>class</code> property in the ui5 xml view is rendered to the html dom directly. So whatever html elements that ui5 creates to implement the TabControl widget will then be selectable with a class selector in css, and this is how we can relatively simply influence the look of our TabContainer through css.
In our <code>ui5-customization.css</code> file, this is how we use <code>noCloseButtons</code> class to hide the buttons:
<pre><span class="pl-ent">div</span>.<span class="pl-c1">sapMTabContainer</span><b>.<span class="pl-c1">noCloseButtons</span></b> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStripContainer</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStrip</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTSTabsContainer</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTSTabs</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStripItem</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTSItemCloseBtnCnt</span> {
<b><span class="pl-c1">visibility</span>: hidden;</b>
}
</pre>
(Note the initial selector, <code>div.sapMTabContainer.</code><strong><code>noCloseButtons</code></strong> and the chain of <code>></code> child selectors which target the actual bit of html that is used to render the close button, which is simply hidden by setting the css <code>visibility</code> property to <code>hidden</code>.)
In the sample application, you can see this behavior in action in the <a href="https://github.com/just-bi/ui5tips/blob/main/tabcontainer/components/app/App.view.xml">App.view.xml</a>. This contains the code for the outermost tabcontainer. A screenshot is shown below and as you can see both tabs ("Hide individually" and "Show individually") do not have a close button:
<img src="https://github.com/just-bi/ui5tips/raw/main/tabcontainer/images/TabContainer_noCloseButtons.png?raw=true" alt="Hiding the close buttons on a sap.m.TabContainer by applying the noCloseButtons css class." />
<h1><a id="user-content-hiding-an-individual-close-button" class="anchor" href="#hiding-an-individual-close-button" aria-hidden="true"></a>Hiding an individual close button</h1>
If we can add a custom css class to <code>sap.m.TabContainer</code> to hide all close buttons, then surely it should be possible to follow the same approach for an individual <code>sap.m.TabContainerItem</code>, right? Yes, it should, <a href="https://github.com/SAP/openui5/issues/2946">but sadly, we cannot</a>. (The reason is that <code>sap.m.TabContainer</code> is a subclass of <code>sap.m.Control</code>, which provides a <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.core.Control%23methods/addStyleClass" rel="nofollow"><code>addStyleClass()</code> method</a>, whereas <code>sap.m.TabContainerItem</code> is a subclass of <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.core.Element" rel="nofollow"><code>sap.ui.core.Element</code></a>, which does not have such a method)
Now, let's take a step back and think about how we used the css style class on the <code>sap.m.TabContainer</code> to hide all the close buttons. By setting the custom css style class on the <code>sap.m.TabContainer</code>, the html dom was changed to include the custom class, and we could then use that in a css selector.
So even if we cannot apply a css style class to a <code>sap.m.TabContainerItem</code>, might there be another way that would allow us to influence how ui5 writes the html dom so we may write a css selector in our custom css? It turns out that such a feature exists in the shape of a feature called <a href="https://sapui5.hana.ondemand.com/1.36.6/docs/guide/1ef9fefa2a574735957dcf52502ab8d0.html" rel="nofollow">ui5 custom data</a>.
The <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.core.Element%23aggregations" rel="nofollow">custom data aggregation</a> is provided by <code>sap.ui.core.Element</code> and thus available to its subclasses, including <code>sap.m.TabContainerItem</code>. A <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.core.CustomData" rel="nofollow">custom data item</a> is an arbitrary key/value pair, and by setting its <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.core.CustomData%23constructor" rel="nofollow"><code>writeToDom</code> property</a>, ui5 will render it to the html dom as a <a href="https://developer.mozilla.org/en-US/docs/Learn/HTML/Howto/Use_data_attributes" rel="nofollow">html data attribute</a>.
To see what it looks like in our sample, take a look at <a href="https://github.com/just-bi/ui5tips/blob/main/tabcontainer/components/app/TabContainerItemWithHiddenCloseButton.fragment.xml#L45"><code>TabContainerItemWithHiddenCloseButton.fragent.xml</code></a>, which uses it to hide the close button in the second <code>sap.m.TabContainerItem</code>, in an otherwise normal <code>sap.m.TabContainer</code>:
<pre><<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span>
<span class="pl-e">id</span>=<span class="pl-s"><span class="pl-pds">"</span>item2<span class="pl-pds">"</span></span>
<span class="pl-e">name</span>=<span class="pl-s"><span class="pl-pds">"</span>No Close Button<span class="pl-pds">"</span></span>
>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">customData</span>>
<b><<span class="pl-ent">core</span><span class="pl-ent">:</span><span class="pl-ent">CustomData</span> <span class="pl-e">writeToDom</span>=<span class="pl-s"><span class="pl-pds">"</span>true<span class="pl-pds">"</span></span> <span class="pl-e">key</span>=<span class="pl-s"><span class="pl-pds">"</span>noCloseButton<span class="pl-pds">"</span></span> <span class="pl-e">value</span>=<span class="pl-s"><span class="pl-pds">"</span>true<span class="pl-pds">"</span></span>/></b>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">customData</span>>
...
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span>>
</pre>
Because the <code>CusomtData</code>'s <code>writeToDom</code> property is set to <code>true</code>, ui5 will generate a html data attribute to the html dom that looks something like this:
<div>
<pre>data-noclosebutton='true'</pre>
</div>
And in our <a href="https://github.com/just-bi/ui5tips/blob/main/tabcontainer/css/ui5-customization.css#L41"><code>ui5-customization.css</code> file</a>, the following rule is intended to pick that up and hide the close button:
<pre><span class="pl-ent">div</span>.<span class="pl-c1">sapMTabContainer</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStripContainer</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStrip</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTSTabsContainer</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTSTabs</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStripItem</span>[<b><span class="pl-c1">data-noclosebutton</span><span class="pl-c1">=</span><span class="pl-s">'true'</span></b>] <span class="pl-c1">></span> .<span class="pl-c1">sapMTSItemCloseBtnCnt</span> {
<b><span class="pl-c1">visibility</span>: hidden;</b>
}
</pre>
As you can see it is very similar to the rule we used to hide all close buttons on any <code>sap.m.TabContainer</code> having the <code>noCloseButtons</code> class, except now the class is missing and instead we use a css predicate selector based on the data attribute:
<pre>.<span class="pl-c1">sapMTabStripItem</span>[<span class="pl-c1">data-noclosebutton</span><span class="pl-c1">=</span><span class="pl-s">'true'</span>]
</pre>
The screenshot below shows what it looks like in the app. Note that the tab named "Default" has the close button as usual, but the one named "No Close Button - the one with the custom data attribute - does not show a close button:
<img src="https://github.com/just-bi/ui5tips/raw/main/tabcontainer/images/TabContainerItem_noCloseButton.png?raw=true" alt="Individual TabContainerItem with a hidden close button." />
<h1><a id="user-content-showing-an-individual-close-button" class="anchor" href="#showing-an-individual-close-button" aria-hidden="true"></a>Showing an individual close button</h1>
The final hack in this sample combines the style class and the custom data attribute. CSS allows us to write a selector that takes both the presence of the css style class as well as the presence of a html data attribute into account. We can put this to good use if we want to have a <code>sap.m.TabContainer</code> that hides all close buttons by default, but undo the hiding of the close button of specific <code>sap.m.TabContainerItem</code>'s, based on the value of a data attribute (which is in turn controlled by the ui5 Custom Data feature).
You an see this in action in the <a href="https://github.com/just-bi/ui5tips/blob/main/tabcontainer/components/app/TabContainerItemWithHiddenCloseButtons.fragment.xml"><code>TabContainerItemWithHiddenCloseButtons.fragment.xml</code> file</a> of the example:
<pre><<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainer</span>
<b><span class="pl-e">class</span>=<span class="pl-s"><span class="pl-pds">"</span>noCloseButtons<span class="pl-pds">"</span></span></b>
>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">items</span>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span> <span class="pl-e">name</span>=<span class="pl-s"><span class="pl-pds">"</span>Default<span class="pl-pds">"</span></span>>
...
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span> <span class="pl-e">name</span>=<span class="pl-s"><span class="pl-pds">"</span>Show Close Button<span class="pl-pds">"</span></span>>
<<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">customData</span>>
<b><<span class="pl-ent">core</span><span class="pl-ent">:</span><span class="pl-ent">CustomData</span> <span class="pl-e">key</span>=<span class="pl-s"><span class="pl-pds">"</span>noCloseButton<span class="pl-pds">"</span></span> <span class="pl-e">value</span>=<span class="pl-s"><span class="pl-pds">"</span>false<span class="pl-pds">"</span></span> <span class="pl-e">writeToDom</span>=<span class="pl-s"><span class="pl-pds">"</span>true<span class="pl-pds">"</span></span> /></b>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">customData</span>>
..
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainerItem</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">items</span>>
</<span class="pl-ent">m</span><span class="pl-ent">:</span><span class="pl-ent">TabContainer</span>>
</pre>
Again the second <code>sap.m.TabContainerItem</code> has a <code>CustomData</code> item with the key <code>noCloseButton</code>, but now the value is <code>true</code> so as to override the effect of the <code>noCloseButtons</code> style class applied to the <code>sap.m.TabContainer</code>.
In our our <a href="https://github.com/just-bi/ui5tips/blob/main/tabcontainer/css/ui5-customization.css#L57"><code>ui5-customization.css</code> file</a>, the following rule is intended to pick that up and show the close button:
<pre><span class="pl-ent">div</span>.<span class="pl-c1">sapMTabContainer</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStripContainer</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStrip</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTSTabsContainer</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTSTabs</span> <span class="pl-c1">></span> .<span class="pl-c1">sapMTabStripItem</span>[<b><span class="pl-c1">data-noclosebutton</span><span class="pl-c1">=</span><span class="pl-s">'false'</span></b>] <span class="pl-c1">></span> .<span class="pl-c1">sapMTSItemCloseBtnCnt</span> {
<b><span class="pl-c1">visibility</span>: visible;</b>
}
</pre>
The screenshot below shows what it looks like when you run the sample application:
<img src="https://github.com/just-bi/ui5tips/raw/main/tabcontainer/images/TabContainerItem_noCloseButton_false.png?raw=true" alt="Show the close button in an individual TabContainerItem, overriding the effect of the noCloseButtons style class of the sap.m.TabContainer." />
<h1>Finally</h1>
Did you like this tip? Do you have a better tip? Feel free to post a comment and share your approach to the same or similar problem.
Want more tips? Find other posts with the <a href="https://blogs.sap.com/tag/ui5tips/">ui5tips tag</a>!rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-10526430861262424812024-01-03T12:15:00.000+01:002024-01-03T12:15:45.461+01:00UI5 Tips: Adding a Splash-Screen / Loading indicatorUI5 apps always take a little time to load. If you don't take any precautions, the user will be looking at a blank screen while the app is loading.
It is fairly simple to add a splash screen and loading indicator to improve the user experience. This will give your app a more professional appearance, and it will cost minimal effort.
This post shows you exactly how to do this. You might also be interested in checking out <a href="https://github.com/just-bi/ui5tips/tree/main/loadingscreen">the sample app from github</a> so you can run it yourself.
<h1><a id="user-content-running-the-app" class="anchor" href="#running-the-app" aria-hidden="true"></a>Running the app</h1>
The loadingscreen sample app is in the <a href="../tree/main/loadingscreen">loadingscreen</a> subdirectory of this repository. Simply expose the folder and all its contents with a webserver, and navigate to its index.html
<h1><a id="user-content-how-does-it-work" class="anchor" href="#how-does-it-work" aria-hidden="true"></a>How does it work?</h1>
In various tutorials and the <a href="https://sapui5.hana.ondemand.com/#/topic/3da5f4be63264db99f2e5b04c5e853db" rel="nofollow">ui5 walkthrough</a>, the <code><body></code> is usually left empty. This is for a good reason - any content that is in the body would simply show up on the page, as is shown in the walkthrough's <a href="https://sapui5.hana.ondemand.com/#/topic/2680aa9b16c14a00b01261d04babbb39" rel="nofollow">"Hello world!" example</a>. But there is an exception: if an element has an <code>id</code> attribute with the value <code>busyIndicator</code>, then this content will be hidden after the ui5 bootstrap code is finished and the app content is placed inside the body.
In the loadingscreen sample, the index.html page has a static <code><div></code> element with an <code>id</code> attribute with the value <code>busyIndicator</code> as static content in the <code><body></code> element. Indside that <code><div></code> we can place anything we like. In the example, the div contains a header and a footer with static text to indicate that the application is loading. Between the header and the footer is an image of the ui5 logo, and a css animation, which superficially resembles the ui5 busy indicator animation:
<pre><body class="sapUiBody" id="content">
<!-- Loading splash screen -->
<div id="busyIndicator" style="text-align: center; font-family: Sans, Arial">
<!-- static header text -->
<h3>My UI5 App is loading</h3>
<!-- static image of the ui5logo -->
<img src="images/openui5-logo.png" class="logo"/>
<!--
loader animation
CC0 licensed code used with permission from
https://loading.io/css/
-->
<div class="lds-ellipsis">
<div></div>
<div></div>
<div></div>
<div></div>
</div>
<!-- end of loader animation -->
<!-- static footer text -->
<center><h5>This may take a few moments...</h5></center>
</div>
</body>
</pre>
The css animation requires some css, and this is included simply as a static css resource by including an appropriate <code><link></code> element in the <code><head></code> of the page:
<pre><head>
<!-- css required for the loading screen css anumation -->
<link id="animation" rel="stylesheet" type="text/css" href="css/progress-animation.css"/>
</head>
</pre>
In this case, we pulled the css animation from the excellent site <a href="https://loading.io/css/" rel="nofollow">https://loading.io/css/</a>, which provides many different free css animations.
Note that any css required by the loading screen must really be included statically via the <code><link></code> or <code><style></code> element. The standard ui5 mechanism to <a href="https://sapui5.hana.ondemand.com/#/topic/723f4b2334e344c08269159797f6f796" rel="nofollow">include css by declaring it in the manifest.json</a> is no good as it will be loaded as part of the ui5 bootstrap, and the whole idea of the loading screen is to show something <strong>before</strong> the ui5 bootstrap even starts.
<h1><a id="user-content-what-does-it-look-like" class="anchor" href="#what-does-it-look-like" aria-hidden="true"></a>What does it look like</h1>
This is what it looks like when the app is loading:
<img src="https://github.com/just-bi/ui5tips/raw/main/loadingscreen/images/loadingscreen.png?raw=true" alt="Screenshot of the loading screen" />
The text, logo image, and the loader animation are all static content and will show during UI5 bootstrap.
<h1><a id="user-content-next-steps" class="anchor" href="#next-steps" aria-hidden="true"></a>Next Steps</h1>
Obviously, this loading screen is only an example to show how you can make it work. The entire design of the loading screen is up to you.
Just remember to keep it light and quick: the whole reason to include a loading screen in the first place was to give the user something to look at while ui5 is bootstrapping. If the loading screen itself requires a lot of resources then it defeats its purpose. For this reason, you might consider including
any css directly using the <code><style></code> element, rather than relying on a network request to load external css with the <code><link></code> element.
<h1>Finally</h1>
Did you like this tip? Do you have a better tip? Feel free to post a comment and share your approach to the same or similar problem.
Want more tips? Find other posts with the <a href="https://blogs.sap.com/tag/ui5tips/">ui5tips tag</a>!rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-52078364607608262002021-02-22T02:49:00.004+01:002021-02-22T02:49:54.157+01:00Year-to-Date on Synapse Analytics 5: Using Window FunctionsFor one of our <a href="https://www.just-bi.nl/" target="_justbi">Just-BI</a> customers we implemented a Year-to-Date calculation in a Azure Synapse Backend.
We encountered a couple of approaches and in this series I'd like to share some sample code, and discuss some of the merits and benefits of each approach.
<br/>
<br/>
<b>TL;DR</b>: <a href="#windowfunction">A Year-to-Date solution based on a <code><b style="color:magenta">SUM</b>()</code> window function</a> is simple to code and maintain as well as efficient to execute.
This as compared to a number of alternative implementations, namely a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">self-<code style="color: blue">JOIN</code></a> (combined with a <code style="color: blue">GROUP BY</code>), a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery</code></a>, and a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html"><code style="color: blue">UNION</code></a> (also combined with a <code style="color: blue">GROUP BY</code>).
<br/>
<br/>
Note: this is the 5th post in a series.
<ul>
<li>For sample data and setup, please <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-1.html">see the 1st post</a> in this series. </li>
<li>For a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">solution based on a self-<code><b style="color:blue">JOIN</b></code> and <code><b style="color:blue">GROUP BY</b></code></a>, please find <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">the 2nd post</a> in this series.</li>
<li>For a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">solution based on a subquery</a>, please find <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">the 3rd post</a> in this series.</li>
<li>For a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html">solution based on a <code><b style="color:blue">UNION</b></code></a>, please find <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html">the 4th post</a> in this series.</li>
</ul>
(While our use case deals with Azure Synapse, most of the code will be directly compatible with other SQL Engines and RDBMS-es.)
<br/>
<br/>
<h3><a name="windowfunction">Using window functions</a></h3>
<br/>
Nowadays, many SQL engines and virtually all major RDBMSes support <a href="https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15" target="mssql">window functions</a> (sometimes called analytic functions).
A window function looks like a classic aggregate function. In some respects it also behaves like one, but at the same time there are essential differences.
<br/>
<br/>
<h4>Aggregate functions</h4>
<br/>
Consider the following example:<br>
<pre>
<b style="color: blue">select</b> <b style="color: magenta">sum</b>(SalesAmount) <b style="color: blue">as</b> SumOfSalesAmount
, <b style="color: magenta">count</b>(*) <b style="color: blue">as</b> RowCount
<b style="color: blue">from</b> SalesYearMonth
</pre>
The example uses two <a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/aggregate-functions-transact-sql?view=sql-server-ver15" target="mssql">aggregate functions</a>, <a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/sum-transact-sql?view=sql-server-ver15" target="mssql"><code style="color:magenta">SUM()</code></a> and <a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/count-transact-sql?view=sql-server-ver15" target="mssql"><code style="color:magenta">COUNT()</code></a>. It returns a result like this:
<br/>
<br/>
<code><table stype="font-face: courier,monospace;" border="1" cellpadding="5" cellspacing="5">
<thead>
<tr>
<th>SumOfSalesAmount</th>
<th>RowCount</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">109,846,381.43</td>
<td style="text-align: right">38</td>
</tr>
</tbody>
</table></code>
<br/>
Two things are happening here: <ul>
<li>Even though there are multiple rows in the <code>SalesYearMonth</code> table, the result consists of just one row. In other words, a collection of source rows have been <i>aggregated</i> into fewer (in this case, only one) result row.</li>
<li>The functions have caclculated a value based on some aspect of the individual rows in the source collection. In the case of <code><span color="magenta">SUM</span>(SalesAmount)</code>, the value of the <code>SalesAmount</code> column of each individual row was added to obtain a total. In the case of <code><span color="magenta">COUNT</span>(*)</code>, each row was counted, adding up to the total number of rows.</li>
</ul>
Because the previous example uses aggregate functions, we cannot also select any non-aggregated columns. For example, while <code>SalesYear</code> and <code>SalesMonth</code> are present in the individual underlying rows, we cannot simpy select them, because they do not exist in the result row, which is an aggregate.
<br/>
<br/>
<h4>Window functions</h4>
<br/>
Now, <code style="color:magenta">SUM()</code> and <code style="color:magenta">COUNT()</code> also exist as window functions. Consider the following query:
<pre>
<b style="color: blue">select</b> SalesYear
, SalesMonth
, SalesAmount
, <b style="color: magenta">sum</b>(SalesAmount) <b style="color: blue">over</b>() <b style="color: blue">as</b> TotalOfSalesAmount
, <b style="color: magenta">count</b>(*) <b style="color: blue">over</b>() <b style="color: blue">as</b> RowCount
<b style="color: blue">from</b> SalesYearMonth
</pre>
You might notice the last two expressions in the <code><b style="color: blue">SELECT</b></code>-list look almost identical to the aggregate functions in the previous example.
The difference is that in this query, the function call is followed by an <code><b style="color:blue">OVER()</b></code>-clause.
Syntactically, this is what distinguishes ordinary aggregate functions from window functions.
<br/>
<br/>
Here is its result: <br/>
<br/>
<code><table stype="font-face: courier,monospace;" border="1" cellpadding="5" cellspacing="5">
<thead>
<tr>
<th>SalesYear</th>
<th>SalesMonth</th>
<th>SalesAmount</th>
<th>TotalOfSalesAmount</th>
<th>RowCount</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">2011</td>
<td style="text-align: right">5</td>
<td style="text-align: right">503,805.92</td>
<td style="text-align: right">109,846,381.43</td>
<td style="text-align: right">38</td>
</tr>
<tr>
<td style="text-align: right">2011</td>
<td style="text-align: right">6</td>
<td style="text-align: right">458,910.82</td>
<td style="text-align: right">109,846,381.43</td>
<td style="text-align: right">38</td>
</tr>
<tr>
<td colspan="5" style="text-align: center">...more rows...</td>
</tr>
<tr>
<td style="text-align: right">2014</td>
<td style="text-align: right">6</td>
<td style="text-align: right">49,005.84</td>
<td style="text-align: right">109,846,381.43</td>
<td style="text-align: right">38</td>
</tr>
</tbody>
</table></code><br/>
Note that we now get all the rows from the underlying <code>SalesYearMonth</code> table: no aggregation has ocurred.
But the window functions do return a result that is identical to the one we got when using them as aggregate functions, and that for each row of the <code>SalesYearMonth</code> table.
<br/>
<br/>
It's as if for each row of the underlying table, the respective aggregate function was called over all rows in the entire table.
Conceptually this is quite like the construct we used in the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery-solution</a>.
The following example illustrates this:<pre>
<b style="color: blue">select</b> SalesYear
, SalesMonth
, SalesAmount
, (
<b style="color: blue">select</b> <b style="color: magenta">sum</b>(SalesAmount)
<b style="color: blue">from</b> SalesYearMonth
) <b style="color: blue">as</b> TotalOfSalesAmount
, (
<b style="color: blue">select</b> <b style="color: magenta">count</b>(*)
<b style="color: blue">from</b> SalesYearMonth
) <b style="color: blue">as</b> RowCount
<b style="color: blue">from</b> SalesYearMonth
</pre>
<h4>The <i>window</i> and the <code><b style="color:blue">OVER()</b></code>-clause</h4>
<br/>
Thinking about window functions as a shorthand for a subquery helps to understand how they work and also explains their name:
a window function returns the result of an aggregate function on a particular subset of the rows in the query scope.
This subset is called the <i>window</i> and it is defined by the <code><b style="color: blue">OVER</b>()</code>-clause.
<br/>
<br/>
The parenthesis after the <b style="color: blue">OVER</b>-keyword can be used to define which rows will be considered as window.
When left empty (like in the example above) all rows are considered.
<br/>
<br/>
<h4>Controlling the <i>window</i> using the <code><b style="color:blue">PARTITION BY</b></code>-clause</h4>
<br/>
If you compare the previous example with our prior <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery-solution</a>, you'll notice that here, we do not have a <code><b style="color:blue">WHERE</b></code>-clause to tie the subquery to the current row of the outer query. That's why our result is calculated over the entire table, rather than with respect to the current year and preceding months, as in our prior <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery-solution</a>.
This is equivalent to the empty parenthesis following the <b style="color: blue">OVER</b>-keyword in the corresponding window-function example.
<br/>
<br/>
In the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery-solution</a> we wrote a <code><b style="color:blue">WHERE</code></b>-clause to specify a condition to tie the rows of the subquery to the current row.
For window functions, we can control which rows make up the window window by writing a <code><b style="color:blue">PARTITION BY</b></code>-clause inside the parenthesis following the <b style="color: blue">OVER</b>-keyword.
<br/>
<br/>
The <code><b style="color:blue">PARTITION BY</b></code>-clause does not let you specify an arbitrary condition, like we could in a subquery.
Instead, the relationship between the current row and rows in the window must be expressed through one or more attributes for which they share a common value. The following example may illustrate this:
<pre>
<b style="color: blue">select</b> SalesYear
, SalesMonth
, SalesAmount
, <b style="color: magenta">sum</b>(SalesAmount) <b style="color: blue">over</b>(<b style="color: blue">partition by</b> SalesYear) <b style="color: blue">as</b> YearTotalOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
</pre>
In the example above,
<code><b style="color: magenta">sum</b>(SalesAmount) <b style="color: blue">over</b>(<b style="color: blue">partition by</b> SalesYear)</code>
means: calculate the total of <code>SalesAmount</code> over all rows where the value of <code>SalesYear</code> is equal to the value of the <code>SalesYear</code> in the current row.
<br/>
<br/>The equivalent query using subqueries would be:<pre>
<b style="color: blue">select</b> OriginalSales.SalesYear
, OriginalSales.SalesMonth
, OriginalSales.SalesAmount
, (
<b style="color: blue">select</b> <b style="color: magenta">sum</b>(YearSales.SalesAmount)
<b style="color: blue">from</b> SalesYearMonth <b style="color: blue">as</b> YearSales
<b style="color: blue">where</b> YearSales.SalesYear = OriginalSales.SalesYear
) <b style="color: blue">as</b> YearTotalOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth <b style="color: blue">as</b> OriginalSales
</pre>
The result is shown below:
<br/>
<br/>
<code><table stype="font-face: courier,monospace;" border="1" cellpadding="5" cellspacing="5">
<thead>
<tr>
<th>SalesYear</th>
<th>SalesMonth</th>
<th>SalesAmount</th>
<th>YearTotalOfSalesAmount</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right">2011</td>
<td style="text-align: right">5</td>
<td style="text-align: right">503,805.92</td>
<td style="text-align: right">12,641,672.21</td>
</tr>
<tr>
<td style="text-align: right">2011</td>
<td style="text-align: right">6</td>
<td style="text-align: right">458,910.82</td>
<td style="text-align: right">12,641,672.21</td>
</tr>
<tr>
<td colspan="5" style="text-align: center">...more rows...</td>
</tr>
<tr>
<td style="text-align: right">2014</td>
<td style="text-align: right">6</td>
<td style="text-align: right">49,005.84</td>
<td style="text-align: right">20,057,928.81</td>
</tr>
</tbody>
</table></code>
(Note that <code>12,641,672.21</code> is the sum of the <code>SalesAmount</code> for <code>SalesYear 2011</code>; <code>20,057,928.81</code> is the total for <code>2014</code>.)
<br/>
<br/>
<h4>A partition for the preceding months?</h4>
<br/>
It's great that the <code><b style="color:blue">PARTITION BY</b></code>-clause allows us to specify a window for relevant year, but it's still too wide:
we want the window to contain only the rows from the current year, but only for this month and its preceding months.
In the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery-solution</a> this was easy, as we could write whatever condition we want in the <code><b style="color:blue">WHERE</b></code>-clause.
So we wrote:<pre>
<b style="color: blue">where</b> SalesYtd.SalesYear = SalesOriginal.SalesYear
<b style="color: blue">and</b> SalesYtd.SalesMonth <= SalesOriginal.SalesMonth
</pre>
Specifying <code>SalesYear</code> in the window functions' <code><b style="color:blue">PARTITION BY</b></code>-clause is equivalent to the first part of the subquery's <code><b style="color:blue">WHERE</b></code>-clause condition.
<br/>
<br/>
It's less clear what our partition expression should look like to select all months preceding the current month.
It's not impossible though.
For example, we can write an expression to mark whether the current <code>SalesMonth</code> is equal to or less than a <i>specific</i> month. For example:
<pre>
<span style="color: teal">-- every month up to and including june is 1, all months beyond june is 0</span>
<b style="color: blue">case</b>
<b style="color: blue">when</b> SalesMonth <= <b style="color: red">6</b> <b style="color: blue">then</b> <b style="color: red">1</b>
<b style="color: blue">else</b> <b style="color: red">0</b>
<b style="color: blue">end</b>
</pre>
If we can write such an expression, then of course, we can also use it in a <code><b style="color:blue">PARTITION BY</b></code>-clause, like so:
<pre>
<b style="color:magenta">sum</b>(SalesAmount) <b style="color:blue">over</b> (
<b style="color:blue">partition by</b>
SalesYear
, <b style="color: blue">case</b>
<b style="color: blue">when</b> SalesMonth <= <b style="color: red">6</b> <b style="color: blue">then</b> <b style="color: red">1</b>
<b style="color: blue">else</b> <b style="color: red">0</b>
<b style="color: blue">end</b>
)
</pre>
Let's try and think what this brings us.
<br/>
<br/>
Suppose the value for <code>SalesMonth</code> is <span style="color:red;">6</span> (june), or less?
The <b style="color:blue;">CASE</b> expression would return <span style="color:red;">1</span>, and the window function would take all rows into account for which this is the case.
So january, february, march and so on, up to june would all get the total of those six months - that is, the YTD value for june.
<br/>
<br/>
On the other hand, if <code>SalesMonth</code> is larger than <span style="color:red;">6</span>, the <b style="color:blue;">CASE</b> expression evaluates to <span style="color:red;">0</span>.
So all months beyond june (that is: july, august, and so on up to december) form a partition as well, and for those months, whatever is the sum over those months would be returned.
<br/>
<br/>
Now, it's not really clear what the outcome means in case the month is beyond june. But it doesn't really matter - what is important, is that we now know how to calculate the correct YTD value for a given month.
And, what we did for june, we can do for any other month.
So, once we have the YTD expressions for each individual month, we can set up yet another <b style="color:blue">CASE</b>-expression to pick the right one according to the current <code>SalesMonth</code>.
<br/>
<br/>
Putting all that together, we get:
<pre>
<b style="color: blue">select</b> SalesYear
, SalesMonth
, SalesAmount
, <b style="color: blue">case</b> SalesMonth
<span style="color: teal">-- january</span>
<b style="color: blue">when</b> <b style="color: red">1</b> <b style="color: blue">then</b> SalesAmount
<span style="color: teal">-- february</span>
<b style="color: blue">when</b> <b style="color: red">2</b> <b style="color: blue">then</b>
<b style="color: magenta">sum</b>(SalesAmount) <b style="color: blue">over</b>(
<b style="color: blue">partition by</b>
SalesYear
, <b style="color: blue">case when</b> SalesMonth <= <b style="color: red">2</b> <b style="color: blue">then</b> <b style="color: red">1</b> <b style="color: blue">else</b> <b style="color: red">0</b> <b style="color: blue">end</b>
)
...more cases for the other months...
<span style="color: teal">-- december</span>
<b style="color: blue">when</b> <b style="color: red">12</b> <b style="color: blue">then</b>
<b style="color: magenta">sum</b>(SalesAmount) <b style="color: blue">over</b>(
<b style="color: blue">partition by</b>
SalesYear
, <b style="color: blue">case when</b> SalesMonth <= <b style="color: red">12</b> <b style="color: blue">then</b> <b style="color: red">1</b> <b style="color: blue">else</b> <b style="color: red">0</b> <b style="color: blue">end</b>
)
<b style="color: blue">end as</b> YtDOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
</pre>
Like with the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html" ><code><b style="color:blue">UNION</b></code>-solution</a>, we are taking advantage of our knowledge of the calendar, which allows us to create these static expressions.
We would not be able to do this in a general case, or where the number of distinct values is very large. But for 12 months, we can manage.
<br/>
<br/>
While it's nice to know that this is possible, there is a much, much nicer way to achieve the same effect - the frame specification.
<br/>
<br/>
<h4>Frame Specification</h4>
<br/>
The frame specification lets you specify a subset of rows <i>within</i> the partition.
The way you can specify the frame feels a bit odd (to me at least), as it is specified in terms of the current row's position in the window.
Hopefully the following example will make this more clear:
<pre>
<b style="color: blue">select</b> SalesYear
, SalesMonth
, SalesAmount
, <b style="color: magenta">sum</b>(SalesAmount) <b style="color: blue">over</b>(
<b style="color: blue">partition by</b> SalesYear
<b style="color: blue">order by</b> SalesMonth
<b style="color: blue">rows between unbounded preceding</b>
<b style="color: blue">and current row</b>
) <b style="color: blue">as</b> SalesYtd
<b style="color: blue">from</b> SalesYearMonth
</pre>
We already discussed the <code><b style="color: blue">PARTITION BY</b></code>-clause, all the clause after that are new.
<br/>
<br/>
The <code><b style="color: blue">ORDER BY</b></code>-clause sorts the rows within the window, in this case by <code>SalesMonth</code>.
We need to rows to be ordered because of how the frame specification works: it lets you pick rows by position, relative to the current row.
The position of the rows is undetermined unless we sort them explicitly, so if we want to pick rows reliably we need the <code><b style="color: blue">ORDER BY</b></code>-clause to guarantee the order.
<br/>
<br/>
The frame specification follows the <code><b style="color: blue">ORDER BY</b></code>-clause.
There are a number of possible options here, but I will only discuss the one in the example.
In this case, it almost explains itself: we want to use the current row, and all rows that precede it.
Since we ordered by <code>SalesMonth</code>, this means all the rows that chronologically precede it.
As this selection applies to the current partition, we will only encounter months here that are within the current year.
<br/>
<br/>
So here we have it: a YTD calculation implemented using a window functions.
It's about the same amount of code as compared to the subquery solution, but more delcarative, as we do not need to specify the details of a condition.
On the other hand, it is also less flexible than a subquery, but in general one should expect the window functions to perform better than the equivalent subquery.
<br/>
<br/>
<h3><a name="generalized">Generalizing the solutions</a></h3>
<br/>
So far all our examples were based on the <code>SalesYearMonth</code> table, which provides <code>SalesYear</code> and <code>SalesMonth</code> as separate columns.
One might wonder what would it would take to apply these various methods to a realistic use case.
<br/>
<br/>
For example, it is likely that in a real dataset, the time would be available as a single column of a <code>DATE</code> or <code>DATETIME</code> data type.
A single date column potentially affects the YTD calculation in two ways: <ul>
<li>
Year: As the YTD is calculated over a period of a year and almost all solutions we described used the <code>SalesYear</code> column explicitly to implement that logic.
</li>
<li>
Preceding rows: To calculate the YTD for a specific row, there has to be a clear definition of what rows are in the same year, but which precede it.
In our examples we could use the <code>SalesMonth</code> column for that, but this might be a but different in a realistic case.
</li>
<li>
Lowest Granularity: The lowest granularity of the <code>SalesMonthYear</code> table is at the month level, and we collected the YTD values at that level.
(If we'd want to be precise we'd have to call that year-to-month).
</li>
</ul>
Apart from the time aspect, the definition of the key affects all solutions that generate "extra" rows and require a <code><b style="color:blue">GROUP BY</b></code> to re-aggregate to the original granularity.
<br/>
<br/>
<h4>The Year</h4>
<br/>
The <code><b style="color:blue">ON</b></code>-condition of the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code><b style="color:blue">JOIN</b></code>-solution</a>
and
the <code><b style="color:blue">WHERE</b></code>-condition of the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery-solution</a> both rely on a condition that finds other rows in the same year,
and the <a href="#window">window function-solution</a> uses the year in its <code><b style="color:blue">PARTITION BY</b></code>-clause.
<br/>
<br/>
It is usually quite simple to extract the year from a date, date/time or timestamp.
In Synapse Analytics or MS SQL one can use the
<a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/datepart-transact-sql?view=sql-server-ver15" target="mssql"><code><b style="color:magenta">DATEPART</b></code></a>
or
<a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/year-transact-sql?view=sql-server-ver15" target="mssql"><code><b style="color:magenta">YEAR</b></code></a> function to do this.
<br/>
<br/>
The <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html"><code><b style="color:blue">UNION</b></code>-solution</a> has no direct dependency on the year.
<br/>
<br/>
<h4>The preceding rows</h4>
<br/>
The need to find the preceding rows applies to all solutions that use the year to find the rows to apply the YTD calculation on.
In our samples, this could all be solved using the <code>SalesMonth</code> column.
<br/>
<br/>
Again, it are the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code><b style="color:blue">JOIN</b></code>-solution</a> and <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">subquery-solution</a> that used it in their condition,
whereas the <a href="#window">window function-solution</a> uses it in its <code><b style="color:blue">ORDER BY</b></code>-clause.
<br/>
<br/>
In this case, the fix is more straighforward then with the year: instead of the month column, these solutions can simply use the date or date/time column directly.
No conversion or datepart extraction is required.
<br/>
<br/>
<h4>Lowest granularity</h4>
<br/>
The granularity is of special concern to the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html"><code><b style="color:blue">UNION</b></code>-solution</a>.
The solution relies on an exhaustive and static enumeration of all possible future dates within the year.
Already at the month level, this already required a lot of manual code.
<br/>
<br/>
Below the month, the next level would be day.
While it would in theory be possible to extend the solution to that level, it is already bordering the impractible at the month level.
<br/>
<br/>
<h4>The Key</h4>
<br/>
The key definition affects both the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code><b style="color:blue">JOIN</b></code>-solution</a> and the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html"><code><b style="color:blue">UNION</b></code>-solution</a>,
as that both require a <code><b style="color:blue">GROUP BY</b></code> over the key. rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-85323826489911712072021-02-22T02:36:00.003+01:002021-02-22T02:50:57.117+01:00Year-to-Date on Synapse Analytics 4: Using UNION and GROUP BYFor one of our <a href="https://www.just-bi.nl/" target="_justbi">Just-BI</a> customers we implemented a Year-to-Date calculation in a Azure Synapse Backend.
We encountered a couple of approaches and in this series I'd like to share some sample code, and discuss some of the merits and benefits of each approach.
<br/>
<br/>
<b>TL;DR</b>: <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-5.html">A Year-to-Date solution based on a <code><b style="color:magenta">SUM</b>()</code> window function</a> is simple to code and maintain as well as efficient to execute.
This as compared to a number of alternative implementations, namely a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">self-<code style="color: blue">JOIN</code></a> (combined with a <code style="color: blue">GROUP BY</code>), a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery</code></a>, and a <a href="#union"><code style="color: blue">UNION</code></a> (also combined with a <code style="color: blue">GROUP BY</code>).
<br/>
<br/>
Note: this is the 4th post in a series.
<ul>
<li>For sample data and setup, please <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-1.html">see the 1st post</a> in this series. </li>
<li>For a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">solution based on a self-<code><b style="color:blue">JOIN</b></code> and <code><b style="color:blue">GROUP BY</b></code></a>, please find <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">the 2nd post</a> in this series.</li>
<li>For a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">solution based on a subquery</a>, please find <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">the 3rd post</a> in this series.</li>
</ul>
(While our use case deals with Azure Synapse, most of the code will be directly compatible with other SQL Engines and RDBMS-es.)
<br/>
<br/>
<h3><a name="union">Using a <code style="color:blue">UNION</code></a></h3>
<br/>
We mentioned how the solution with the <code style="color:blue">JOIN</code> relates each row of the main set with a subset of "extra" rows over which the YTD value is calculated by aggregating over the key of the main set using a <code style="color:blue">GROUP BY</code>.
<br/>
<br/>
It may not be immediately obvious, but we can also use the SQL <code style="color:blue">UNION</code> (or rather, <code style="color:blue">UNION ALL</code>) operator to generate such a related subset.
Just like with the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code style="color:blue">JOIN</code>-solution</a>, this can then be aggregated using <code style="color:blue">GROUP BY</code>.
An example will help to explain this:<pre>
<b style="color: blue">select</b> SalesYear
, SalesMonth
, <b style="color: magenta">sum</b>(SumOfSalesAmount) <b style="color: blue">as</b> SumOfSalesAmount
, <b style="color: magenta">sum</b>(YtdOfSumOfSalesAmount) <b style="color: blue">as</b> YtdOfSumOfSalesAmount
<b style="color: blue">from</b> (
<b style="color: blue">select</b> SalesYear
, SalesMonth
, SumOfSalesAmount
, SumOfSalesAmount <b style="color: blue">as</b> YtdOfSumOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
<b style="color: blue">union all</b>
<span style="color: teal">-- JANUARY</span>
<b style="color: blue">select</b> SalesYear
, SalesMonth + <span style="color: red">1</span> <span style="color: teal">--</span> february
, <b style="color: red">null</b>
, SumOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
<b style="color: blue">where</b> SalesMonth = <span style="color: red">1</span>
<b style="color: blue">union all</b>
<b style="color: blue">select</b> SalesYear
, SalesMonth + <span style="color: red">2</span> <span style="color: teal">--</span> march
, <b style="color: red">null</b>
, SumOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
<b style="color: blue">where</b> SalesMonth = <span style="color: red">1</span>
<b style="color: blue">union all</b>
... and so on, all for JANUARY ...
<b style="color: blue">union all</b>
<b style="color: blue">select</b> SalesYear
, SalesMonth + <span style="color: red">11</span> <span style="color: teal">--</span> december
, <b style="color: red">null</b>
, SumOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
<b style="color: blue">where</b> SalesMonth = <span style="color: red">1</span>
<b style="color: blue">union all</b>
<span style="color: teal">-- FEBRUARY</span>
<b style="color: blue">select</b> SalesYear
, SalesMonth + <span style="color: red">1</span> <span style="color: teal">--</span> march
, <b style="color: red">null</b>
, SumOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
<b style="color: blue">where</b> SalesMonth = <span style="color: red">2</span>
<b style="color: blue">union all</b>
... and so on, for the rest of FEBRUARY,
and then again for MARCH, APRIl, MAY, JUNE, JULY, AUGUST, SEPTEMBER, OCTOBER...
<span style="color: teal">-- NOVEMBER</span>
<b style="color: blue">select</b> SalesYear
, SalesMonth + <span style="color: red">1</span> <span style="color: teal">--</span> december
, <b style="color: red">null</b>
, SumOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
<b style="color: blue">where</b> SalesMonth = <span style="color: red">11</span>
) Sales
<b style="color: blue">group by</b> SalesYear
, SalesMonth
</pre>
<h4>Duplicating metric-data so it contributes to the following months</h4>
<br/>
In the top of the <code>UNION</code> we simply provide the entire resultset from <code>SalesYearMonth</code>, reporting the <code>SumOfSalesAmount</code>-metric as is, but also copying it to <code>YtdSumOfSalesAmount</code>.
The other parts of the <code>UNION</code> are used to selectively duplicate the data for the <code>SumOfSalesAmount</code>-metric into the <code>YtdSumOfSalesAmount</code>, so that its data contributes to the <code>YtdSumOfSalesAmount</code> for all following months.
<br/>
<br/>
We start by grabbing january's data by applying the condition that demands that the <code>SalesMonth</code> equals <span style="color: red">1</span>:<pre>
<span style="color: teal">-- JANUARY</span>
<b style="color: blue">select</b> SalesYear
, SalesMonth + <span style="color: red">1</span> <span style="color: teal">--</span> february
, <b style="color: red">null</b>
, SumOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
<b style="color: blue">where</b> SalesMonth = <span style="color: red">1</span>
<b style="color: blue">union all</b>
... repeat to duplicate january's data into february, march, and so on all the way up to december...
</pre>.
This is done for a total of 11 times, each time adding 1, 2, 3 and so on - all the way up to 11 - to the <code>SalesMonth</code> attribute.
This ensures january's data, as captured by the condition in the <code style="color:blue">WHERE</code> clause, is reported also in february (<code>SalesMonth + <span style="color: red">1</span></code>), march (<code>SalesMonth + <span style="color: red">2</span></code>), and so on, all the way up to december (<code>SalesMonth + <span style="color: red">11</span></code>).
<br/>
<br/>
After the string of <code>UNION</code>s for january appear more parts to duplicate the data also for february and all following months:
<span style="color: teal">-- FEBRUARY</span>
<b style="color: blue">select</b> SalesYear
, SalesMonth + <span style="color: red">1</span> <span style="color: teal">--</span> march
, <b style="color: red">null</b>
, SumOfSalesAmount
<b style="color: blue">from</b> SalesYearMonth
<b style="color: blue">where</b> SalesMonth = <span style="color: red">02</span>
<b style="color: blue">union all</b>
... repeat to duplicate february's data into march, april, and so on all the way up to december...
</pre>.
Again, february's data is selected by applying the condition <pre><b style="color: blue">where</b> SalesMonth = <span style="color: red">2</span></pre>, and this happens now 10 times, again adding a number to the <code>SalesMonth</code> so it is duplicated to march, april, may, all the way up to december - in other words, all months following february.
<br/>
<br/>
What we thus did for january and februry is repeated for march, april, and so on for all months up to november.
November is the last month we need to do this for: November's data still needs to be copied to december, but as december is the last month, that data only needs to be counted in december itself.
<br/>
<br/>
While it may seem wasteful to duplicate all this data, it really is not that different in that respect from the other solutions we've seen so far.
It's just that now it's really in your face, because there is a pretty direct correspondence between the SQL code and the data sets that are being handled.
The <code style="color:red">JOIN</code> and subquery solutions hande similar amounts of data, it's just achieved with way less code, and in a far more implicit manner.
<br/>
<br/>
<h4>Original metric is retained</h4>
<br/>
Note that the original metric also computes correctly, because the parts of the union only duplicate the data to the YTD column.
The union parts that duplicate the data to the subsequent months select a <code style="color: red">NULL</code> for the original metric.
So the data for the original metric is never duplicated, and thus retains its normal value.
<br/>
<br/>
<h4>Drawbacks to the <code style="color:blue">UNION</code>-solution</h4>
<br/>
The main drawback to the <code style="color:blue">UNION</code>-solution is its maintainability.
A lot of code is required, far more than for any of the methods we have seen so far.
Despite the indiviual patterns are simple (condition to get one month, adding a number to project that data to a future month), it is suprisingly easy to make a little mistake somewhere
<br/>
<br/>
We just argued that this solution is not so much different from the <code style="color:blue">JOIN</code> solution, but that remark only pertains to how the calculation is performed.
The <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code style="color:blue">JOIN</code>-solution</a> generates the data it operates upon dynamically and declaratively; the <code style="color:blue">UNION</code> solution does this statically and explicitly.
This is also why it is impossible to generalize ths approach for any arbitrary <code style="color:blue">JOIN</code>: YTD is a special case, because we know exactly how often we should duplicate the data as this is dictated by the cyclical structure of our calendar.
<br/>
<br/>
<h3>Next installment: Solution 4 - window functions</h3>
<br/>
In the <a href="">next installment</a> we will present and discuss a solution based on a window functions.rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-73290240485753267422021-02-22T02:25:00.007+01:002021-02-22T02:53:01.676+01:00Year-to-Date on Synapse Analytics 3: Using a SubqueryFor one of our <a href="https://www.just-bi.nl/" target="_justbi">Just-BI</a> customers we implemented a Year-to-Date calculation in a Azure Synapse Backend.
We encountered a couple of approaches and in this series I'd like to share some sample code, and discuss some of the merits and benefits of each approach.
<br/>
<br/>
<b>TL;DR</b>: <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-5.html">A Year-to-Date solution based on a <code><b style="color:magenta">SUM</b>()</code> window function</a> is simple to code and maintain as well as efficient to execute.
This as compared to a number of alternative implementations, namely a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">self-<code style="color: blue">JOIN</code></a> (combined with a <code style="color: blue">GROUP BY</code>), a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery</code></a>, and a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html"><code style="color: blue">UNION</code></a> (also combined with a <code style="color: blue">GROUP BY</code>).
<br/>
<br/>
Note: this is the 3rd post in a series.
<ul>
<li>For sample data and setup, please <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-1.html">see the 1st post</a> in this series. </li>
<li>For a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">solution based on a self-<code><b style="color:blue">JOIN</b></code> and <code><b style="color:blue">GROUP BY</b></code></a>, please find <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-1.html">the 2nd post</a> in this series.</li>
</ul>
(While our use case deals with Azure Synapse, most of the code will be directly compatible with other SQL Engines and RDBMS-es.)
<br/>
<br/>
<h3><a name="subquery">Using a subquery</a></h3>
<br/>
We can also think of YTD calculation as a separate query that we perform for each row of the <code>SalesYearMonth</code> table.
While this does imply a row-by-row approach, we can still translate this easily to pure SQL by creating an expression in the <code style="color:blue">SELECT</code>-list, which uses a subquery to calculate the YTD value for the current row:<pre>
<b style="color: blue">select</b> SalesOriginal.SalesYear
, SalesOriginal.SalesMonth
, SalesOriginal.SalesAmount
, (
<b style="color: blue">select</b> <b style="color:magenta">sum</b>(SalesYtd.SalesAmount)
<b style="color: blue">from</b> SalesYearMonth <b style="color: blue">as</b> SalesYtd
<b style="color: blue">where</b> SalesYtd.SalesYear = SalesOriginal.SalesYear
<b style="color: blue">and</b> SalesYtd.SalesMonth <= SalesOriginal.SalesMonth
) <b style="color: blue">as</b> SalesYtd
<b style="color: blue">from</b> SalesYearMonth <b style="color: blue">as</b> SalesOriginal
</pre>
There's a similarity with the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code style="color:blue">JOIN</code>-solution</a>, in that we use the <code>SalesYearMonth</code> table twice, but in different roles.
In the <code style="color:blue">JOIN</code>-solution both appeared on one side of the <code style="color:blue">JOIN</code> keyword and we used the aliases <code>OriginalSales</code> and <code>YtdSales</code> to be able to keep them apart.
In the subquery approach, the distinction between these two different instances of the <code>SalesYearMonth</code> table is more explicit: the main instance of the <code>SalesYearMonth</code> table occurs in the <code style="color:blue">FROM</code>-clause, and the one for the YTD calculation occurs in the <code style="color:blue">SELECT</code>-list.
<br/>
<br/>
Also similar to the <code style="color:blue">JOIN</code> solution is the condition to tie the set for the YTD calculation to the main query using the <code>SalesYear</code> and <code>SalesMonth</code> columns.
Such a subquery is referred to as a <i>correlated</i> subquery.
<br/>
<br/>
As for any differences with the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code style="color:blue">JOIN</code> solution</a>:
In the condition, the only difference is the left/right placement of <code>SalesOriginal</code> and <code>SalesYtd</code>, which is chosen only by order of appearance in the query but functionally completely equivalent.
The most striking difference between the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code style="color:blue">JOIN</code> solution</a> and the subquery is the absence of the <code style="color:blue">GROUP BY</code>-list in the latter.
<br/>
<br/>
<h4>Drawbacks of the subquery</h4>
<br/>
As we had much to complain about the <code style="color:blue">GROUP BY</code>-list in the <code style="color:blue">JOIN</code> solution, it might seem that the subquery solution is somehow "better".
However, a solution with a correlated subquery in general tends to be slower than a <code style="color:blue">JOIN</code> solution.
Whether this is actually the case depends on on many variables and you'd really have to check it against your SQL engine and datasets.
<br/>
<br/>
Another drawback of the subquery solution becomes clear when we want to calculate the YTD for multiple measures.
Our example only has one <code>SalesAmount</code> measure, but in this same context we can easily imagine that we also want to know about price, discount amounts, tax amounts, shipping costs, and so on.
<br/>
<br/>
In the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html"><code style="color:blue">JOIN</code> solution</a>, we would simply add any extra measures to the select list, using <code style="color: magenta">MAX()</code> (or <code style="color: magenta">MIN()</code> or <code style="color: magenta">AVG()</code>) to obtain the original value, and <code style="color: magenta">SUM()</code> to calculate its respective YTD value:
As long as it's over the same set, the <code style="color:blue">JOIN</code>, its condition, and even the <code style="color:blue">GROUP BY</code>-list would remain the same, no matter for how many different measures we would add a YTD calculation.
<br/>
<br/>
This is very different in the subquery case.
Each measure for which you need a YTD calculation would get its own subquery.
Even though the condition would be the same for each such YTD calculation, you would still need to repeat the subquery code - one for each YTD measure.
<br/>
<br/>
<h3>Next installment: Solution 3 - using a <b style="color:blue">UNION</b></h3>
<br/>
In the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html">next installment</a> we will present and discuss a solution based on a <b style="color:blue">UNION</b> and a <b style="color:blue">GROUP BY</b>.rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-12200249065919944212021-02-22T02:13:00.005+01:002021-02-22T02:52:16.125+01:00Year-to-Date on Synapse Analytics 2: Using a self-JOIN and GROUP BYFor one of our <a href="https://www.just-bi.nl/" target="_justbi">Just-BI</a> customers we implemented a Year-to-Date calculation in a Azure Synapse Backend.
We encountered a couple of approaches and in this series I'd like to share some sample code, and discuss some of the merits and benefits of each approach.
<br/>
<br/>
<b>TL;DR</b>: <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-5.html">A Year-to-Date solution based on a <code><b style="color:magenta">SUM</b>()</code> window function</a> is simple to code and maintain as well as efficient to execute.
This as compared to a number of alternative implementations, namely a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html#subquery">self-<code style="color: blue">JOIN</code></a> (combined with a <code style="color: blue">GROUP BY</code>), a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery</code></a>, and a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html"><code style="color: blue">UNION</code></a> (also combined with a <code style="color: blue">GROUP BY</code>).
<br/>
<br/>
Note: this is the 2nd post in a series. For sample data and setup, please <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-1.html">see the first post</a> in this series.
(While our use case deals with Azure Synapse, most of the code will be directly compatible with other SQL Engines and RDBMS-es.)
<br/>
<br/>
<h3><a name="selfjoin">Using a self-<code style="color:blue">JOIN</code></a></h3>
<br/>
The recipe for the set-oriented approach can be directly translated to SQL:
<pre>
<b style="color: blue">select</b> SalesOriginal.SalesYear
, SalesOriginal.SalesMonth
, <b style="color: magenta">max</b>(SalesOriginal.SalesAmount) <b style="color: blue">as</b> SalesAmount
, <b style="color: magenta">sum</b>(SalesYtd.SalesAmount) <b style="color: blue">as</b> SalesYtd
<b style="color: blue">from</b> SalesYearMonth <b style="color: blue">as</b> SalesOriginal
<b style="color: blue">inner join</b> SalesYearMonth <b style="color: blue">as</b> SalesYtd
<b style="color: blue">on</b> SalesOriginal.SalesYear = SalesYtd.SalesYear
<b style="color: blue">and</b> SalesOriginal.SalesMonth >= SalesYtd.SalesMonth
<b style="color: blue">group by</b> SalesOriginal.SalesYear
, SalesOriginal.SalesMonth
</pre>
<h4>The self-<code style="color:blue">JOIN</code></h4>
<br/>
In our discussion of the set-oriented approach we mentioned combining the rows from the table with each other to produce all different combinations.
In the code sample about, the <code style="color:blue">JOIN</code>-clause takes care of that aspect.
<br/>
<br/>
As you can see, the <code>SalesYearMonth</code> table appears twice: on the left hand and on the right hand of the <code style="color:blue">JOIN</code>-keyword, but using different aliases: <code>SalesOriginal</code> and <code>SalesYtd</code>.
It is a so-called <i>self-join</i>.
<br/>
<br/>
Even though both aliases refer to an instance of the same <code>SalesYearMonth</code> base table, each has a very different role.
We can think of the one with the <code>SalesOriginal</code> alias as really the <code>SalesYearMonth</code> table itself.
The <code>SalesYtd</code> alias refers to an instance of the <code>SalesYearMonth</code> table that, for any given row from <code>SalesOriginal</code>, represents a subset of rows that chronologically precedes the row from <code>SalesOriginal</code>.
<br/>
<br/>
The <code style="color:blue">ON</code>-clause that follows controls which combinations should be retained: for each particular row of <code>SalesOriginal</code> we only want to consider rows from <code>SalesYtd</code> from the same year, which is why the first predicate in the <code style="color:blue">ON</code>-clause is:<pre>SalesOriginal.SalesYear = SalesYtd.SalesYear</pre>
Within that year, we only want to consider rows that precede it chronologically, and that explains the second predicate: <pre>SalesOriginal.SalesMonth >= SalesYtd.SalesMonth</pre>
<h4><code style="color:blue">GROUP BY</code> and <code style="color:magenta">SUM()</code></h4>
<br/>
It is is important to realize the <code style="color:blue">JOIN</code> is only half of the solution.
<br/>
<br/>
While the <code style="color:blue">JOIN</code> takes care of gathering and combining all related rows necessary to compute the YTD value,
the actual calculation is done by the <code style="color:magenta">SUM()</code> function in the <code style="color:blue">SELECT</code>-list, and the <code style="color:blue">GROUP BY</code> defines which rows should be taken together to be summed.
<br/>
<br/>
In summary: <ul>
<li>the <code style="color:blue">JOIN</code> generates new rows by combining rows from its left-hand table with the rows from its right-hand table, bound by the condition in the <code style="color:blue">ON</code>-clause.</li>
<li>The <code style="color:blue">GROUP BY</code> partitions the rows into subsets having the same combinations of values for <code>SalesYear</code> and <code>SalesMonth</code>.</li>
<li>The <code style="color:magenta">SUM()</code> aggregates the rows in each <code>SalesYear, SalesMonth</code> partition, turning its associated set of rows into one single row, while adding the values of the <code>SalesAmount</code> column together.</li>
</ul>
Note that the columns in the <code style="color:blue">GROUP BY</code> list are qualified by the <code>SalesOriginal</code> alias - and not <code>SalesYtd</code>.
Also note that the <code style="color:blue">GROUP BY</code> columns form the key of the original <code>SalesYearMonth</code> table - together they uniquely identify a single row from the <code>SalesYearMonth</code> table.
This is not a coincidence: it expresses precisely that <code>SalesOriginal</code> really has the role of being just itself - the <code>SalesYearMonth</code> table.
<br/>
<br/>
<h4>What about the other columns?</h4>
<br/>
The <code style="color:blue">GROUP BY</code> affects treatment of the non-key columns as well.
In this overly simple example, we had only one other column - <code>OriginalSales.SalesAmount</code>.
<br/>
<br/>
(Note that this is different from <code>YtdSales.SalesAmount</code>, which we aggregated using <code style="color:magenta">SUM()</code> to calculate the YTD value)
<br/>
<br/>
Since <code>OriginalSales.SalesAmount</code> comes from the <code>SalesOriginal</code> instance of the <code>SalesYearMonth</code> table, we can reason that after the <code style="color:blue">GROUP BY</code> on the key columns <code>SalesYear</code> and <code>SalesMonth</code>, there must be exactly one <code>SalesAmount</code> value for each distinct combination of <code>SalesYear</code> and <code>SalesMonth</code>.
In other words, <code>SalesAmount</code> is functionally dependent on <code>SalesYear</code> and <code>SalesMonth</code>.
<br/>
<br/>
Some SQL engines are smart enough to realize this and will let you refer to any expression that is functionally dependent upon the expressions in the <code style="color:blue">GROUP BY</code>-list in the <code style="color:blue">SELECT</code>-list.
Unfortunately, Synapse and MS SQL Server are not among these and if we try we will get an Error:
<br/>
<pre>
Msg 8120, Level 16, State 1, Line 11
Column 'Sales.SalesAmount' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
</pre>
The error message suggets we can do two things to solve it:<ul>
<li>either we aggregate by wrapping the <code>SalesOriginal.SalesAmount</code>-expression into some <a target="mssql" href="https://docs.microsoft.com/en-us/sql/t-sql/functions/aggregate-functions-transact-sql?view=sql-server-ver15">aggregate function</a></li>
<li>or we expand the <code style="color:blue">GROUP BY</code>-list and add the <code>SalesOriginal.SalesAmount</code>-expression there.</li>
</ul>
To me, neither feels quite right.
<br/>
<br/>
<code>SalesAmount</code> is clearly intended as a measure, and it feels weird to treat them the same as the attributes <code>SalesYear</code> and <code>SalesMonth</code>.
So adding it to the <code style="color:blue">GROUP BY</code>-list feels like the wrong choice. Besides, it also makes the code less maintainable, as each such column will now appear twice: once in the <code style="color:blue">SELECT</code>-list, where we need it no matter what, and once again in the <code style="color:blue">GROUP BY</code>-list, just to satisfy the SQL engine.
<br/>
<br/>
So, if we don't want to put it in the <code style="color:blue">GROUP BY</code>-list, we are going to need to wrap it in an aggregate function.
We just mentioned that <code>SalesAmount</code> is a measure and therefore that does not sound unreasonable.
However, we have to be careful which one we choose.
<br/>
<br/>
One would normally use <code>SalesAmount</code> as an additive measure and be able to use <code style="color:magenta">SUM()</code> for that.
But here, in this context, <code style="color:magenta">SUM()</code> is definitily the wrong choice!
<br/>
<br/>
All we want to do is to "get" back" whatever value we had for <code>SalesAmount</code>, in other words, unaffected by the whole routine of join-and-then-aggregate, which we did only to calculate the YTD value.
The "extra" rows generated by the <code>JOIN</code> are only needed to do the YTD calculation and should not affect any of the other measures.
Using <code style="color:magenta">SUM()</code> would simply add the <code>SalesAmount</code> just as many times as there are preceding rows in the current year, which simply does not have any meaningful application.
<br/>
<br/>
What we want instead is to report back the original <code>SalesAmount</code> for any given <code>SalesYear, SalesMonth</code> combination.
We just reasoned that there will be just one distinct <code>SalesOriginal.SalesAmount</code> value for any combination of values in <code>SalesOriginal.SalesYear, SalesOriginal.SalesMonth</code>,
and it would be great if we had an aggregate function that would simply pick the <code>SalesOriginal.SalesAmount</code> value from any of those rows.
To the best of my knowledge, no such aggregate function exists in MS SQL Server or Synapse Analytics.
<br/>
<br/>
We can use <a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/max-transact-sql?view=sql-server-ver15" target="mssql"><code style="color:magenta">MAX()</code></a> or <a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/min-transact-sql?view=sql-server-ver15" target="mssql"><code style="color:magenta">MIN()</code></a>, or even <a href="https://docs.microsoft.com/en-us/sql/t-sql/functions/avg-transact-sql?view=sql-server-ver15" target="mssql"><code style="color:magenta">AVG()</code></a>.
While this would all work and deliver the intended result, it still feels wrong as it seems wasteful to ask the SQL engine to do some calculation on a set of values while it could pick just any value.
<h3>Next installment: Solution 2 - using a subquery</h3>
<br/>
In the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">next installment</a> we will present and discuss a solution based on a subquery.rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-69425156949443197852021-02-22T02:03:00.006+01:002021-02-22T02:51:40.353+01:00Year-to-Date on Synapse Analytics 1: BackgroundFor one of our <a href="https://www.just-bi.nl/" target="_justbi">Just-BI</a> customers we implemented a Year-to-Date calculation in a Azure Synapse Backend.
We encountered a couple of approaches and in this series I'd like to share some sample code, and discuss some of the merits and benefits of each approach.
<br/>
<br/>
(While our use case deals with Azure Synapse, most of the code will be directly compatible with other SQL Engines and RDBMS-es.)
<br/>
<br/>
<b>TL;DR</b>: <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-5.html">A Year-to-Date solution based on a <code><b style="color:magenta">SUM</b>()</code> window function</a> is simple to code and maintain as well as efficient to execute.
This as compared to a number of alternative implementations, namely a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">self-<code style="color: blue">JOIN</code></a> (combined with a <code style="color: blue">GROUP BY</code>), a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-3.html">subquery</code></a>, and a <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-4.html"><code style="color: blue">UNION</code></a> (also combined with a <code style="color: blue">GROUP BY</code>).
<br/>
<br/>
<h3>In this Installment</h3>
<ul>
<li><a href="#ytd">A definition of what Year-to-Date (YTD) is</a></li>
<li><a href="#context">Some background</a> on why we are interested in calculating the YTD using Synapse Analytics / MS SQL Server</li>
<li>How to calculate YTD using an <a href="#iteration">iterative</a> and a <a href="#iteration">set-oriented</a> approach</li>
<li><a href="#sample-data">A sample table and dataset</a> which will be used in the next installments to demonstrate our sample code.</li>
</ul>
<h3><a name="context">Context</a></h3>
<br/>
Our customer is using an Azure Data Lake to store data from all kinds of source systems, including its SAP ERP system. Azure Synapse Analytics sits on top of the Data Lake and is used as analytics workhorse, but also to integrate various data sets present in the data lake. Front-end BI tools, such as Microsoft PowerBI, can then connect to Synapse and import or query the data from there.
<br/>
<br/>
In many cases, the datamarts presented by Synapse are pretty straightforward.
Calculations and derived measures needed to build dashboards and data visualizations can typically be developed rather quickly inside the Power BI data model.
Once the front-end development has stabilized, one can consider to refactor the solution and move parts away from the front-end and push them down to the backend for performance or maintainability.
<br/>
<br/>
(There are all kinds of opinions regarding data architecture and on when to put what where. We do not pretend to have the final answer to that, but the current workflow allows us to very quickly deliver solutions that can be used and verified by the users. At present I do not think we could achieve the same productivity if we would demand that everything be designed and built on the backend right from the get go.)
<br/>
<br/>
So, today we were refactoring some of the logic in a PowerBI model, including a Year-to-Date calculation.
The solution we ended up implementing to solve it seems to work rather nicely so I figured to share it.
<br/>
<br/>
<h3><a name="ytd">Year-to-Date value</a></h3>
<br/>
What's a year to date (YTD) value? Basically it's the cumulative value over a metric in time, which resets once a year.
In other words, the year to date value is the per-year total of the value achieved up to the current date.
<br/>
<br/>
This is best explained with an example. Consider the following dataset:
<br/>
<br/>
<code><table stype="font-face: courier,monospace;" border="1" cellpadding="5" cellspacing="5">
<thead>
<tr>
<th>Date</th>
<th>Value</th>
<th>YTD Value</th>
</tr>
</thead>
<tbody>
<tr><td>2012-01-10</td><td style="text-align: right">35,401.14</td><td style="text-align: right">35,401.14</td></tr>
<tr><td>2012-01-20</td><td style="text-align: right">15,012.18</td><td style="text-align: right">50,413.32</td></tr>
<tr><td>2012-02-01</td><td style="text-align: right">25,543.71</td><td style="text-align: right">75,957.03</td></tr>
<tr><td>2012-02-10</td><td style="text-align: right">32,115.41</td><td style="text-align: right">108,072.43</td></tr>
<tr><td>2012-02-20</td><td style="text-align: right">17,688.07</td><td style="text-align: right">125,760.50</td></tr>
<tr><td>2012-03-01</td><td style="text-align: right">10,556.53</td><td style="text-align: right">136,317.03</td></tr>
<tr><td>...</td><td style="text-align: right">...</td><td style="text-align: right">...</td></tr>
<tr><td>2013-01-01</td><td style="text-align: right">19,623.90</td><td style="text-align: right">19,623.90</td></tr>
<tr><td>2013-01-10</td><td style="text-align: right">8,351.18</td><td style="text-align: right">27,975.08 </td></tr>
<tr><td>2013-01-20</td><td style="text-align: right">20,287.65</td><td style="text-align: right">48,262.73</td></tr>
<tr><td>2013-02-01</td><td style="text-align: right">33,055.69</td><td style="text-align: right">81,318.42</td></tr>
</tbody>
</table></code>
<br/>
In the table above we have dates from two years - <code>2012</code> and <code>2013</code> - and for each date a <code>Value</code>.
<br/>
<br/>
For the first date encountered within a year, the <code>YTD Value</code> is equal to the <code>Value</code> itself;
For each subsequent <code>Date</code>, the <code>YTD Value</code> is maintained as a running total of the values that appeared at the earlier dates.
<br/>
<br/>
So, <code>2012-01-10</code> is the first date we encounter in <code>2012</code> and therefore its <code>YTD Value</code> is equal to the <code>Value</code> at that date (<code>35,401.14</code>).
The next date is <code>2012-01-20</code> and its <code>Value</code> is <code>15,012.18</code>; therefore its <code>YTD Value</code> is <code>50,413.32</code>, which is <code>15,012.18 + 35,401.14</code>.
The accumulation continues until we reach the last date of <code>2012</code>.
<br/>
<br/>
At <code>2013-01-01</code> the first date of the next year, the <code>YTD Value</code> resets again to be equal to the <code>Value</code>, and then in the subsequent dates of <code>2013</code>, the <code>YTD Value</code> again accumulates the current <code>Value</code> by adding it to the preceding <code>YTD-value</code>.
<br/>
<br/>
<h4>How to use YTD</h4>
<br/>
You can use YTD values to analyze how well actual trends are developing over time as compared to a planning or predicition.
By comparing the calculated YTD of a measure to a projected value (for example, a sales target), we can see how far off we are at any point in time.
<br/>
<br/>
If you gather these comparisons for a couple of moments in time, you can get a sense of the pace in which the actual situation is deviating from on converging to the target or the projected situation.
These insights allow you to intervene in some way: maybe you need to adjust your planning, or change your expectations. Or maybe you need to adjust your efforts in order to more closely approximate your target.
<br/>
<br/>
<h3><a name="iteration">Thinking about YTD as iteration</a></h3>
<br/>
From the way we explained what a year-to-date value is, you might think about it as an <i>actual</i> "rolling sum".
By that I mean, you might think about it as an iterative problem, that you solve by going through the rows, one by one.
In pseudocode, such a solution would do something like:
<pre>
<b>declare</b> year, ytd
<b>loop through</b> rows:
<b>if</b> year <b>equals</b> row.year <b>then</b>
<b>assign</b> ytd + row.value <b>to</b> ytd
<b>else</b>
<b>assign</b> row.value <b>to</b> ytd
<b>assign</b> row.year <b>to</b> year
<b>end if</b>
<b>end loop through</b> rows
</pre>
While this approach would apparently give you the desired result, it does not help you to solve the problem in SQL directly.
Pure SQL does not let you iterate rows like that, and it also does not let you work with variables like that.
<br/>
<br/>
Even with the iterative approach there is a hidden problem: the reset of the <code>ytd</code> variable and the update of the <code>year</code> variable that occurs whenever the <code>row.year</code> is different from the current value of the <code>year</code> variable will only work properly if the rows of one particular year are next to each other (like when the rows are ordered by year prior to iteration).
The same applies within the year: the rows need to be sorted in chronological order, as the YTD value should reflect how much of the value was accumulated at that date within that year.
<br/>
<br/>
It may seem like a waste of time to think about an approach that is of no use to solving the problem.
But this simple iterative approach provides a very simple recipe for quickly checking whether an actual solution behaves as expected.
We'll use it later to veryify some results.
<br/>
<br/>
<h3><a name="set">A set-oriented approach</a></h3>
<br/>
To implement it in SQL we have to think in a set-oriented way.
Conceptually, we can think about it as if we combine each row in the set with all of the other rows, forming a cartesian product,
and then retain only those combinations that have identical values for <code>year</code>, but a smaller or equal value for the <code>month</code>.
<br/>
<br/>
This way, each row will combine with itself, and with all the other rows that chronologically precede it within the same year.
The YTD value is then obtained by aggregating the rows over <code>year</code> and <code>month</code> value, summing the value to become the YTD value.
<br/>
<br/>
<h3><a name="sample-data">Sample Data</a></h3>
<br/>
To play around a bit with the problem in SQL, let's set up a simple table:
<pre>
<b style="color: blue">create table</b> SalesYearMonth (
SalesYear <b style="color: blue">int</b>
, SalesMonth <b style="color: blue">int</b>
, SalesAmount <b style="color: blue">decimal</b>(<code style="color: red">20</code>,<code style="color: red">2</code>)
, <b style="color: blue">primary key</b>(SalesYear, SalesMonth)
);
</pre>
And, here's some data:
<pre>
<b style="color: blue">insert into</b> SalesYearMonth (
SalesYear
, SalesMonth
, SalesAmount
) <b style="color: blue">values</b> (
(<code style="color: red">2011</code>,<code style="color: red">5</code>,<code style="color: red">503805.92</code>)
,(<code style="color: red">2011</code>,<code style="color: red">6</code>,<code style="color: red">458910.82</code>)
,(<code style="color: red">2011</code>,<code style="color: red">7</code>,<code style="color: red">2044600.00</code>)
,(<code style="color: red">2011</code>,<code style="color: red">8</code>,<code style="color: red">2495816.73</code>)
,(<code style="color: red">2011</code>,<code style="color: red">9</code>,<code style="color: red">502073.85</code>)
,(<code style="color: red">2011</code>,<code style="color: red">10</code>,<code style="color: red">4588761.82</code>)
,(<code style="color: red">2011</code>,<code style="color: red">11</code>,<code style="color: red">737839.82</code>)
,(<code style="color: red">2011</code>,<code style="color: red">12</code>,<code style="color: red">1309863.25</code>)
,(<code style="color: red">2012</code>,<code style="color: red">1</code>,<code style="color: red">3970627.28</code>)
,(<code style="color: red">2012</code>,<code style="color: red">2</code>,<code style="color: red">1475426.91</code>)
,(<code style="color: red">2012</code>,<code style="color: red">3</code>,<code style="color: red">2975748.24</code>)
,(<code style="color: red">2012</code>,<code style="color: red">4</code>,<code style="color: red">1634600.80</code>)
,(<code style="color: red">2012</code>,<code style="color: red">5</code>,<code style="color: red">3074602.81</code>)
,(<code style="color: red">2012</code>,<code style="color: red">6</code>,<code style="color: red">4099354.36</code>)
,(<code style="color: red">2012</code>,<code style="color: red">7</code>,<code style="color: red">3417953.87</code>)
,(<code style="color: red">2012</code>,<code style="color: red">8</code>,<code style="color: red">2175637.22</code>)
,(<code style="color: red">2012</code>,<code style="color: red">9</code>,<code style="color: red">3454151.94</code>)
,(<code style="color: red">2012</code>,<code style="color: red">10</code>,<code style="color: red">2544091.11</code>)
,(<code style="color: red">2012</code>,<code style="color: red">11</code>,<code style="color: red">1872701.98</code>)
,(<code style="color: red">2012</code>,<code style="color: red">12</code>,<code style="color: red">2829404.82</code>)
,(<code style="color: red">2013</code>,<code style="color: red">1</code>,<code style="color: red">2087872.46</code>)
,(<code style="color: red">2013</code>,<code style="color: red">2</code>,<code style="color: red">2316922.15</code>)
,(<code style="color: red">2013</code>,<code style="color: red">3</code>,<code style="color: red">3412068.97</code>)
,(<code style="color: red">2013</code>,<code style="color: red">4</code>,<code style="color: red">2532265.91</code>)
,(<code style="color: red">2013</code>,<code style="color: red">5</code>,<code style="color: red">3245623.76</code>)
,(<code style="color: red">2013</code>,<code style="color: red">6</code>,<code style="color: red">5081069.13</code>)
,(<code style="color: red">2013</code>,<code style="color: red">7</code>,<code style="color: red">4896353.74</code>)
,(<code style="color: red">2013</code>,<code style="color: red">8</code>,<code style="color: red">3333964.07</code>)
,(<code style="color: red">2013</code>,<code style="color: red">9</code>,<code style="color: red">4532908.71</code>)
,(<code style="color: red">2013</code>,<code style="color: red">10</code>,<code style="color: red">4795813.29</code>)
,(<code style="color: red">2013</code>,<code style="color: red">11</code>,<code style="color: red">3312130.25</code>)
,(<code style="color: red">2013</code>,<code style="color: red">12</code>,<code style="color: red">4075486.63</code>)
,(<code style="color: red">2014</code>,<code style="color: red">1</code>,<code style="color: red">4289817.95</code>)
,(<code style="color: red">2014</code>,<code style="color: red">2</code>,<code style="color: red">1337725.04</code>)
,(<code style="color: red">2014</code>,<code style="color: red">3</code>,<code style="color: red">7217531.09</code>)
,(<code style="color: red">2014</code>,<code style="color: red">4</code>,<code style="color: red">1797173.92</code>)
,(<code style="color: red">2014</code>,<code style="color: red">5</code>,<code style="color: red">5366674.97</code>)
,(<code style="color: red">2014</code>,<code style="color: red">6</code>,<code style="color: red">49005.84</code>);
</pre>
This setup is slightly different from the original problem statement. Instead of a column with <code style="color:blue">DATE</code> data type, we have separate <code>SalesYear</code> and <code>SalesMonth</code> columns.
This is fine - it doesn't change the problem or the solution in any way.
<br/>
<br/>
In fact, this setup allows us to think about the essential elements of the problem without having to worry about the details of getting to that point.
Once we done that, we can apply the approach to a more realistic case.
<br/>
<br/>
<h3>Next installment: Solution 1 - a self-<code><b style="color:blue">JOIN</b></code></h3>
<br/>
In the <a href="https://rpbouman.blogspot.com/2021/02/year-to-date-on-synapse-analytics-2.html">next installment</a> we will present and discuss a solution based on a self-<code><b style="color:blue">JOIN</b></code> and a <code><b style="color:blue">GROUP BY</b></code>.rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-40341487929574090802019-12-02T00:32:00.000+01:002019-12-02T23:14:00.428+01:00Building a UI5 Demo for SAP HANA Text Analysis: Part 4This is the last of a series of blogposts describing a simple web front end tool to explore SAP HANA's Text Analysis features on documents uploaded by the user.
As a reminder, the following overview outlines all the posts in the series:
<ul>
<li><a href="http://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text.html">Part 1 - an Overview: SAP HANA Text Analysis on Documents uploaded by an end-user</a></li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html">Part 2 - Hands on: Building the backend for a SAP HANA Text Analysis application</a></li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_28.html">Part 3 - Presenting: A UI5 front-end to upload documents and explore SAP HANA Text Analytics features</a></li>
<li>Part 4 - Deep dive: How to upload documents with OData in a UI5 Application</li>
</ul>
In the <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_28.html">previous post</a> we presented the sample application, explained its functionality, and concluded by pointing to <a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo" target="github">the github repository</a> and the installation instructions so that you may run the application on your own HANA system.
<br/>
<br/>
In this post, we'll explain in detail how the upload functionality works from the UI5 side of things
<h2>Uploading Files and Binary data to HANA <code>.xsodata</code> services uing UI5</h2>
In the <a href="http://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text.html">first installment of the series</a>, options and concerns were discussed on the topic of loading the (binday) document content into the SAP HANA database. We chose to use an OData service. In this installment, we'll go into fairly deep detail how to implement a file upload feature backed by a HANA .xsodata service using the UI5 front-end framework.
<h3>Some notes on the UI5 Application Implementation</h3>
Before we discuss any particular details of the implementation of the UI5 application, it is necessary to point out that this particular application is demoware.
Many typical patterns of UI5 application development were omitted here: there is no <a href="https://sapui5.hana.ondemand.com/1.36.6/docs/guide/df86bfbeab0645e5b764ffa488ed57dc.html" target="ui5">internationalization</a>, and no <a href="https://sapui5.hana.ondemand.com/#/topic/f665d0de4dba405f9af4294de824b03b" target="ui5">modules</a> or dependency injection with <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui/methods/sap.ui.define" target="ui5"><code>sap.ui.define()</code></a>. There is not even a MVC architecture, so no <a href="https://sapui5.hana.ondemand.com/#/topic/1409791afe4747319a3b23a1e2fc7064" target="ui5">XML views</a> or <a href="https://sapui5.hana.ondemand.com/#/topic/50579ddf2c934ce789e056cfffe9efa9" target="ui5">controllers</a>; no <a href="https://sapui5.hana.ondemand.com/#/topic/4cfa60872dca462cb87148ccd0d948ee" target="ui5">Component configuration</a>, and no <a href="https://sapui5.hana.ondemand.com/#/topic/8f93bf2b2b13402e9f035128ce8b495f" target="ui5">application descriptor</a> (<code>manifest.json</code>).
<br/>
<br/>
Instead, the application consists of just a single <a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo/blob/master/web/index.html" target="github"><code>index.html</code></a>, which contains 2 <code><script></code> tags:<pre><code><script
src=<span style="color:red">"https://sapui5.hana.ondemand.com/1.71.5/resources/sap-ui-core.js"</span>
id=<span style="color:red">"sap-ui-bootstrap"</span>
data-sap-ui-libs=<span style="color:red">"
sap.ui.core,
sap.ui.layout,
sap.ui.unified,
sap.ui.table,
sap.ui.commons,
sap.m
"</span>
data-sap-ui-theme=<span style="color:red">"sap_bluecrystal"</span>
>
</script>
<script src=<span style="color:red">"index.js"</span> type=<span style="color:red">"text/javascript"</span>></script></code></pre>
The first one <a href="https://sapui5.hana.ondemand.com/#/topic/a04b0d10fb494d1cb722b9e341b584ba.html" target="ui5">bootstraps ui5</a>, and the second one loads <a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo/blob/master/web/index.js" target="github"><code>index.js</code></a>, which contains the implementation.
<br/>
<br/>
The main reason for this rather spartan approach is that the primary goal of me and my colleagues Arjen and Mitchell was to quickly come up with a functional prototype that demonstrates the file upload feature. Although I have grown used to a more orhtodox UI5 boilerplate, it was a distraction when it came to just quickly illustrating an upload feature. Once we built the upload feature, I wanted to see how easy it would be to augment it and make it somewhat useful application, and I was kind of interested to experience how it would be to carry on using this unorthodox, pure-javascript approach.<br/><br/>There's much more that could be said about this approach but it's another topic. So for now: if you're new to UI5 and want to learn more: don't take this application as an example, it's atypical. And if you are an experienced UI5 developer: now you have the background, let's move on to the essence.
<h3>Communication with the backend using an OData Model</h3>
Before we get to the topic of building and controlling the upload feature, a couple of words should be said about how UI5 applications can communicate with their (xsodata) backend.<br/><br/>
In UI5, we needn't worry about the exact details of doing the backend call directly. Rather, UI5 offers an object that provides javascript methods that take care of this. This object is the <b>model</b>. In our application, the model is an instance of the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.odata.v2.ODataModel" target="ui5"><code>sap.ui.model.odata.v2.ODataModel</code></a>, which we instantiated somewhere <a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo/blob/master/web/index.js#L7" target="github">in the top of our <code>index.js</code></a>:<pre><code><span style="color:blue">var</span> pkg = document.location.pathname.split(<span style="color:red">'/'</span>).slice(<span style="color:red">1</span>, <span style="color:red">-2</span>);
<span style="color:blue">var</span> odataService = [].concat(pkg, [<span style="color:red">'service'</span>, <span style="color:red">'ta'</span>]);
<span style="color:grey">/**
* OData
*/</span>
<span style="color:blue">var</span> modelName = <span style="color:red">'data'</span>;
<b><span style="color:blue">var</span> model = <span style="color:blue">new</span> sap.ui.model.odata.v2.ODataModel(<span style="color:red">'/'</span> + odataService.join(<span style="color:red">'/'</span>) + <span style="color:red">'.xsodata'</span></b>, {
disableHeadRequestForToken: <span style="color:blue">true</span>
});</code></pre>It's not necessary to go over the model instantiation in detail - for now it is enough to know that upon instantiation, the model is passed the uri of <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#ta-xsodata-service-definition"> the <code>.xsodata</code> service we already built</a>. We obtain the url in the code preceding the model instantiation by taking the url of the current webpage and building a path to <code>service/ta.xsodata</code> relative to that location:<pre><code><span style="color:blue">var</span> pkg = document.location.pathname.split(<span style="color:red">'/'</span>).slice(<span style="color:red">1</span>, <span style="color:red">-2</span>);
<span style="color:blue">var</span> odataService = [].concat(pkg, [<span style="color:red">'service'</span>, <span style="color:red">'ta'</span>]); </code></pre>
<h3>Uploading a file: high level client-side tasks</h3>
From a functional point of view, the web app (client) there's two distinct tasks to be considered:<ul>
<li>Building the user interface so the user can select the file to upload.</li>
<li>Loading the file contents into the database.</li>
</ul>
The first high-level task is strictly a matter of user interaction and is more or less independent from how the second high level task is implemented.
For the second high-level task, we already have the backend in place - this is <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#ta-xsodata-service-definition">the OData service we built in the second installment of this blogpost series</a>. What remains is how to do this from within UI5.
<br/>
<br/>But already, we can break down this task in two subtasks:
<ul>
<li>Extracting the content from the chosen file. Once the user has chosen a file, they have only identified the thing they want to upload. The web app does not need to parse or understand the file content, but it does need to extract the data (file content) so it can send it to the server.</li>
<li>Sending the right request to the backend. The request will somehow include the contents extracted from the file, and it will have such a form that the server understands what to do with those contents - in this case, store it in the a table for text analysis.</li>
</ul>
<h3>A UI5 File Upload control</h3>
For the file upload user interaface, we settled on the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.unified.FileUploader" target="ui5"><code>sap.ui.unified.FileUploader</code></a> control. Here's the relevant code from <a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo/blob/master/web/index.js#L257" target="github"><code>index.js</code></a> that instantiates the control:<pre><code><span style="color:blue">var</span> fileUploader = <span style="color:blue">new</span> <b>sap.ui.unified.FileUploader</b>({
buttonText: <span style="color:red">'Browse File...'</span>,
<b>change: onFileToUploadChanged</b>
busyIndicatorDelay: <span style="color:red">0</span>
});
</code></pre>
The <code>sap.ui.unified.FileUploader</code> control is presented to the user as a input field and a button to open a file chooser. This lets the user browse and pick a file from their client device.
<br/>
<br/>
In addition, the <code>sap.ui.unified.FileUploader</code> control provides events, configuration options and methods to validate the user's choice, and to send the file off to a server. For example, you can set the <code>uploadUrl</code> property to specify where to send the file to, and there's an <code>upload()</code> method to let the control do the request.
<br/>
<br/>
As it turns out, most of this addition functionality did not prove to be very useful for the task at hand, because the request we need to make is quite specific, and we didn't really find a clear way of configuring the control to send just the right request. Perhaps it is possible, and then we would be most obliged to learn how.
<br/>
<br/>
What we ended up doing instead is to only use the file choosing capabilities of the <code>sap.ui.unified.FileUploader</code> control. To keep track of the user's choice, we <a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo/blob/master/web/index.js#L204" target="github">configured a handler</a> for the <code>change</code> event, which gets called whenever the user chooses a file, or cancels the choice.
<br/>
<br/>
The handler does a couple of things: <ul>
<li>Determine wheter a file was chosen. If not, the Upload confirmation button gets disabled so the user can only either retry choosing a file, or close the upload dialog.</li>
<li>If a file is chosen, a request is sent to the backend to figure out if the file already exists.</li>
<li>Depending upon whether the file already exists, the state of the upload dialog is set to inform the user of what action will be taken if they confirm the upload. </li>
</ul>
Let's go over these tasks in detail. First, validating the user's choice by checking if the user did in fact choose a file:<pre><code><b><span style="color:blue">var</span> fileToUpload;</b>
<span style="color:blue">var</span> fileToUploadExists;
<b><span style="color:blue">function</span> onFileToUploadChanged(event)</b>{
fileToUpload = null;
fileToUploadExists = false;
<b><span style="color:blue">var</span> files = event.getParameter(<span style="color:red">'files'</span>)</b>;
<span style="color:blue">if</span> (files.length === 0) {
initFileUploadDialog();
return;
}
fileToUpload = files[0];
...more code here...
}
</code></pre>
Note that we set up the <code>fileToUpload</code> variable to keep track of the user's choice. We need to keep track of it somewhere, since the choosing of the file and the upload are separate tasks with regards to the UI: choosing the file happens when the user hits the Browse button provided by the <code>sap.ui.unified.FileUploader</code> control, wheras the upload is triggered by hitting the confirm button of the upload dialog.
<br/>
<br/>
When the user is done choosing the file, the <code>sap.ui.unified.FileUploader</code> will fire the <code>change</code> event, and our handler <code>onFileToUploadChanged()</code> gets called and passed the event as an argument. This event provides access to the <a href="https://developer.mozilla.org/en-US/docs/Web/API/FileList" target="mdn"><code>FileList</code></a> object associated with the file chooser:<pre><code> <b><span style="color:blue">var</span> files = event.getParameter(<span style="color:red">'files'</span>)</b>;</code></pre>
Note: the <code>FileList</code> is not part of UI5. Rather, it is one of a number of brower built-in objects, which together form the <a href="https://www.w3.org/TR/FileAPI/" target="w3">Web File API</a>. We would have loved to obtain the <code>FileList</code> or the <code>File</code> object from our <code>sap.ui.unified.FileUploader</code> control directly by using a getter or something like that, but at the time we found no such method, and settled for a handler in the <code>change</code> event.
<br/>
<br/>
Once we have the <code>FileList</code>, we can check whether the user selected any files, and either disable the upload confirmation button (if no file was selected), or assign the chosen file to our <code>fileToUpload</code> variable so we can refer to it when the upload is confirmed: <pre><code><span style="color:blue">function</span> onFileToUploadChanged(event){
...
<b><span style="color:blue">if</span> (files.length === 0)</b> {
initFileUploadDialog();
<b>return;</b>
}
<b>fileToUpload = files[0];</b>
....
}</code></pre>If we pass the check, our variable <code>fileToUpload</code> will now contain the <a href="https://developer.mozilla.org/en-US/docs/Web/API/File" target="mdn"><code>File</code></a> object reflecting the user's choice. (Note that this object too is not a UI5 object, it's also part of the Web File API.)
<br/>
<br/>
Note that in theory, the list of files associated with the <code>sap.ui.unified.FileUploader</code> could have contained more than one file. But the default behavior is to let the user choose only one file. You can override that behavior by setting the <code>sap.ui.unified.FileUploader</code>'s <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.unified.FileUploader/methods/getMultiple" target="ui5"><code>multiple</code></a> property to true. Because we know that in this case, there can be at most only one file, we only need to check whether there is a file or not - there's no need to consider muliple files.
<h3>Checking whether the File was already Uploaded</h3>
Once we know for sure the user has chosen a file, it remains to be determined what should be done with it should the user decide to confirm the upload. To help the user decide whether they should confirm, we send a request to the backend to find out if the file was already uploaded: <pre><code><span style="color:blue">function</span> onFileToUploadChanged(event){
...
<b>fileToUpload = files[0];</b>
fileUploader.setBusy(<span style="color:blue">true</span>);
<b>model.read('/' + filesEntityName</b>, {
filters: [<span style="color:blue">new</span> sap.ui.model.Filter({
path: fileNamePath,
operator: sap.ui.model.FilterOperator.EQ,
<b>value1: fileToUpload.name</b>
})],
urlParameters: {
$select: [fileNamePath, <span style="color:red">'FILE_LAST_MODIFIED'</span>]
},
success: <span style="color:blue">function</span>(data){
...update state depending upon whether the file exists...
},
error: <span style="color:blue">function</span>(error){
...update state to inform the user of an error...
}
});
....
}</code></pre>
The model provides a <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.odata.v2.ODataModel/methods/read" target="ui5"><code>read()</code> method</a> which can be used to query the backend OData service. The first argument to the <code>read()</code> method is the so-called <code>path</code>, which identifies the OData EntitySet we want to query. In this case, we are interested in the <code>Files</code> EntitySet, as this corresonds to our <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#ta-CT_FILE-table"><code>CT_FILE</code> database table</a> in our backend. Because we use the name of the <code>Files</code> EntitySet in a lot of places, we stored it in the <code>filesEntityName</code> variable. So, our path becomes: <pre><code><span style="color:red">'/'</span> + filesEntityName</code></pre>
Apart from the path, the <code>read()</code> method takes a second argument, which is an object of query options. We'll highlight the few we need here.
<br/>
<br/>
Because we only want to know whether the backend already has a file with the same name as the one the user just selected , we add a parameter to restict the search. This is done with the <code>filters</code> option:<pre><code> filters: [<b><span style="color:blue">new</span> sap.ui.model.Filter</b>({
path: fileNamePath,
operator: sap.ui.model.FilterOperator.EQ,
<b>value1: fileToUpload.name</b>
})],
</code></pre>The <code>filters</code> option takes an array of <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.Filter" target="ui5"><code>sap.ui.model.Filter</code></a> objects. When we instantiate the <code>sap.ui.model.Filter</code> object, we pass an object with the following configuration options:<ul>
<li><code>path</code> - this should get a value that refers to a property defined by the OData entity type of this Entity Set. It corresponds to the name of a column of our database table. In this case, it is set to <code>fileNamePath</code>, which is a variable we initialized with <code><span style="color:red">'FILE_NAME'</span></code>, i.e., the name of the column in the <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#ta-CT_FILE-table"><code>CT_FILE</code></a> table that holds the name of our files.</li>
<li><code>value1</code> - this should be the literal value that we want to use in our filter. In this case, we want to look for files with the same name as the file chosen by the user, so we set it to the <a href="https://developer.mozilla.org/en-US/docs/Web/API/File/name" target="mdn"><code>name</code> property of the <code>File</code> object</a> that the user selected - <code>fileToUpload.name</code></li>
<li><code>operator</code> - this should be one of the values defined by the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.FilterOperator" target="ui5"><code>sap.ui.model.FilterOperator</code></a> object, which defines how the given filter value should be compared to the value of the column. In this case the operator is <code>sap.ui.model.FilterOperator.EQ</code>, which stands for an equals comparison. By using this operator, we demand that the value of the column should be exactly the same as the name of the chosen file.
</ul>
There is one other option specified that affects the request: <pre><code> urlParameters: {
$select: [fileNamePath, <span style="color:red">'FILE_LAST_MODIFIED'</span>]
},
</code></pre>This specifies for which columns we want to retrieve the values from the backend. It may be omitted, but in that case, all columns would be returned. Often this will not be a problem, but in this case, we really want to prevent the server from returning the values for the <code>FILE_CONTENT</code> column. Always retrieving the file contents would be an unnessary burden for both the front- and the backend so we actively suppress the default behavior. The only columns requested here are <code>FILE_NAME</code> and <code>FILE_LAST_MODIFIED</code>. The latter is currently unused but might come in handy to provide even more information to the user so they can better decide whether they want to re-upload an existing file.
<br/>
<br/>
The remaining options in the call to the model's <code>read()</code> method have nothing todo with the request, but are callback functions for handling the result of the read request. The <code>error</code> callback gets called if there is some kind of issue with the request itself - maybe the backend has gone away, or maybe the structure of the service changed. The <code>success</code> callback is called when the read request executes normally, and any results are then passed to it as argument. This is even true if no results are found - the callback then simply gets passed an empty list of results.<br/>In our example, the main purpose of the <code>success</code> callback is to flag whether the file already exists, and to update the state of the file uploader accordingly to inform the user. The existence of the file is flagged by assigning the <code>fileToUploadExists</code> variable, and we will see its significance in the next section where we discuss the implementation of the upload of the file contents.
<h3>Handling the Upload</h3>
We've just seen exactly how the UI5 application can let the user choose a file, and we even used our model to check whether the chosen file is already present in the backend. Once these steps are done, we now have successfully initialized two variables, <code>fileToUpload</code> and <code>fileAlreadyExists</code>. This is all we need to handle the upload.
<br/>
<br/>
In the application, the user initiates the upload by clicking the Confirmation Button of the uploadDialog. This then triggers the button's <code>press</code> event, where we've attached the function <code>uploadFile</code> as handler.
<br/></br/>
So, in this handler, we must examine the value of the <code>fileAlreadyExists</code> variable and take the appopriate action:<ul>
<li>If <code>fileAlreadyExists</code> is <code><span style="color:blue">false</span></code>, we should tell our model to add a new item. This is done by calling the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.odata.v2.ODataModel/methods/createEntry" target="ui5"><code>createEntry()</code>-method</a></li>
<li>If <code>fileAlreadyExists</code> is <code><span style="color:blue">true</span></code>, we should tell our model to update the existing item. This is done by calling the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.odata.v2.ODataModel/methods/update" target="ui5"><code>update()</code>-method</a></li>
</ul>
<h4>The path argument</h4>
Both methods take a path as first argument to indicate where the new item should be added, or which item to update.<br/><br/>When adding a new item, the path is simply the path of the Files Entity Set with the model:<pre><code><span style="color:red">'/'</span> + filesEntityName</code></pre>(Note: this is exactly the same as the path we used in the <code>read()</code> call to figure out whether the file already exists.)
<br/>
<br/>
The path for updating an existing item also starts with the path of the Entity Set, but includes the key to identify the item that is to be updated. Lucky for us, the <code>sap.ui.model.odata.v2.ODataModel</code> model provides the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.odata.v2.ODataModel/methods/createKey" target="ui5"><code>createKey()</code> method</a> which constructs such a path, including the key part, based on the values of the properties that make up the key. So, the code to construct the path for the <code>update</code> method becomes:<pre><code><span style="color:red">'/'</span> + <b>model.createKey</b>(filesEntityName, {<span style="color:red">"FILE_NAME"</span>: fileToUpload.name})</code></pre>(For more detailed information about how OData keys and paths work, see <a href="https://www.odata.org/documentation/odata-version-2-0/uri-conventions/" target="odata">ODAta Uri Conventions</a>, in particular the section on "Adressing Entries".)
<h4>The payload</h4>
In addition to the path, we also need to pass the data, which is sometimes referred to as <em>the payload</em>. While the path tells the model <em>where</em> to add or update an item, the payload specifies <em>what</em> should be added or updated.
<br/><br/>
Now, even though the UI documentation is not very specific about how to construct to payload, we have used the <code>createEntry()</code> and <code>update()</code> methods of the <code>sap.ui.model.odata.v2.ODataModel</code> in the past without any problems. It is normally quite intuitive and hasslefree: you simply specify an Object, and specify keys that match the property names of the target entity set, and assign JavaScript values, just as-is. So, if we disregard the <code>FILE_CONTENT</code> field for a moment, the payload for the Files entity set could be something like this:<pre><code><span style="color:blue">var</span> payload = {
<span style="color:red">"FILE_NAME"</span>: fileToUpload.name,
<span style="color:red">"FILE_TYPE"</span>: fileToUpload.type,
<span style="color:red">"FILE_LAST_MODIFIED"</span>: <span style="color:blue">new</span> Date(fileToUpload.lastModified),
<span style="color:red">"FILE_SIZE"</span>: fileToUpload.size,
<span style="color:red">"FILE_LAST_UPLOADED"</span>: <span style="color:blue">new</span> Date(Date.now())
};</code></pre>Let's compare this to the data types of the corresponding properties in the entity type of the entity set:<pre><code><EntityType Name=<span style="color:red">"FilesType"</span>>
<Key>
<PropertyRef Name=<span style="color:red">"FILE_NAME"</span>/>
</Key>
<Property Name=<span style="color:red">"FILE_NAME"</span> Type=<span style="color:red">"Edm.String"</span> Nullable=<span style="color:red">"false"</span> MaxLength=<span style="color:red">"256"</span>/>
<Property Name=<span style="color:red">"FILE_TYPE"</span> Type=<span style="color:red">"Edm.String"</span> Nullable=<span style="color:red">"false"</span> MaxLength=<span style="color:red">"256"</span>/>
<Property Name=<span style="color:red">"FILE_LAST_MODIFIED"</span> Type=<span style="color:red">"Edm.DateTime"</span> Nullable=<span style="color:red">"false"</span>/>
<Property Name=<span style="color:red">"FILE_SIZE"</span> Type=<span style="color:red">"Edm.Int32"</span> Nullable=<span style="color:red">"false"</span>/>
<b><Property Name=<span style="color:red">"FILE_CONTENT"</span> Type=<span style="color:red">"Edm.Binary"</span> Nullable=<span style="color:red">"false"</span>/></b>
<Property Name=<span style="color:red">"FILE_LAST_UPLOADED"</span> Type=<span style="color:red">"Edm.DateTime"</span> Nullable=<span style="color:red">"false"</span>/>
</EntityType>
</code></pre>(Note: this entity type is taken from <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#ta-xsodata-metadata">the <code>$metadata</code> document</a> of our service.)
<br/><br/>
So, in short - there is a pretty straightforward mapping between the JavaScript runtime values and the <a href="https://www.odata.org/documentation/odata-version-2-0/overview/" target="odata">Edm Type System</a> (see: "6. Primitive Data Types") used by OData: JavaScript <code>String</code>s may be assigned to <code>Edm.String</code>s, JavaScript <code>Date</code> objects may be assigned to <code>Edm.DateTime</code>s, and JavaScript <code>Number</code>s may be assigned to <code>Edm.Int32</code>s.
<br/><br/>
This is less trivial than one might think when one considers what happens here: one the one hand, we have the types as declared by OData service, which are Edm Types. Then, we have to consider the content type used to transport the payload in the HTTP request: OData services may support several content types, and by default OData supports both <a href="https://www.odata.org/documentation/odata-version-2-0/atom-format/" target="odata">application/atom+xml</a> and <a href="https://www.odata.org/documentation/odata-version-2-0/json-format/" target="odata">application/json</a>.So, when starting with a payload as a JavaScript runtime object, this first needs to be converted to an equivalent representation in one of these content types (UI5 uses the JSON representation) by the client before it can be sent off to the OData service in a HTTP request.
<br/>
<br/>It bears repeating that this is not a simple, standard JSON serialization, since the type system used by the JSON standard only knows how to represent JavaScript <code>String</code>s, <code>Number</code>s, <code>Boolean</code>s (and arrays and objects containing values of those types). The native JSON type system is simply too minimal to represent all the types in the Edm Type system used by OData, hence the need for an extra JSON representation format. The <code>sap.ui.model.odata.v2.ODataModel</code> does a pretty remarkable job in hiding all this complexity and making sure things work relatively painless.
<h4>Representing <code>FILE_CONTENT</code> in the payload</h4>
Now for the <code>FILE_CONTENT</code> property. In the entity type, we notice that the data type is <code>Edm.Binary</code>. What would be the proper JavaScript runtime type to construct the payload?
<br/>
<br/>
We just mentioned that normally, the mapping from JavaScript runtime types is usually taken care of by the <code>sap.ui.model.odata.v2.ODataModel</code>. So we might be tempted to simply pass the <code>File</code> object itself directly as value for the <code>FILE_CONTENT</code> property. But when we call either the <code>createEntry</code> or <code>update</code> method with a payload like this:<pre><code><span style="color:blue">var</span> payload = {
<span style="color:red">"FILE_NAME"</span>: fileToUpload.name,
<span style="color:red">"FILE_TYPE"</span>: fileToUpload.type,
<span style="color:red">"FILE_LAST_MODIFIED"</span>: <span style="color:blue">new</span> Date(fileToUpload.lastModified),
<span style="color:red">"FILE_SIZE"</span>: fileToUpload.size,
<span style="color:red">"FILE_LAST_UPLOADED"</span>: <span style="color:blue">new</span> Date(Date.now()),
<b><span style="color:red">"FILE_CONTENT"</span>: fileToUpload</b>
};</code></pre> we get an error in the response:<pre><code>The serialized resource has an invalid value in member 'FILE_CONTENT'.</code></pre>So clearly, the <code>sap.ui.model.odata.v2.ODataModel</code> needs some help here.
<br/>
<br/>
One might assume that the problem has to do with the <code>File</code> object being a little bit too specific for UI5 - after all, a <code>File</code> object is not just some binary value, but it is a sublclass of the <code>Blob</code> object, which has all kinds of file-specific properties of itself. However, assigning a proper, plain <code>Blob</code> object in the payload yields exactly the same result, so that's not it either.
<br/>
<br/>
Instead of continuing to experiment with different values and types, we took a step back and took a look at the OData specification to see if we could learn a bit more about the <code>Edm.Binary</code> type. In the <a href="https://www.odata.org/documentation/odata-version-2-0/json-format/" target="odata">part about the JSON representation</a> (See: "4. Primitive Types") we found this: <pre>Base64 encoded value of an EDM.Binary value represented as a JSON string</pre>It seems to suggest the whatever thing that represents the <code>Edm.Binary</code> value need to be Base64 encoded, which yields a string value at runtime, and this string may then be serialized to a JSON string. So, if we could make a Base64 encoded string value of our binary value, we could assign that in the payload. (We already saw that <code>sap.ui.model.odata.v2.ODataModel</code> will turn JavaScript String values to a JSON representation so we don't have to do that step ourselves.)
<br/>
<br/>
Fortunately, it's easy to create Base64 encoded values. The browser built-in function <a href="https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/btoa" target="mdn"><code>btoa()</code></a> does this for us.
<br/></br>However, we're not there yet, as the spec starts with a binary value, and JavaScript does not have a binary type (and hence, no binary values).<br/><br/>
We then took a look at the specification to find out exactly what a <code>Edm.Binary</code> value is. We found something in <a href="https://www.odata.org/documentation/odata-version-2-0/overview/" target="odata">the section about Primitive Datatypes</a> on how to create literal <code>Edm.Binary</code> Values:<pre>binary'[A-Fa-f0-9][A-Fa-f0-9]*' OR X '[A-Fa-f0-9][A-Fa-f0-9]*'
NOTE: X and binary are case sensitive.
Spaces are not allowed between binary and the quoted portion.
Spaces are not allowed between X and the quoted portion.
Odd pairs of hex digits are not allowed.
Example 1: X'23AB'
Example 2: binary'23ABFF'</pre>
At this point the thinking was that we could take the bytes that make up the binary value and convert it to its hexadecimal string representation, single-quote it the resulting hex string, and finally prepend either <code>X</code> or <code>binary</code> to it. At runtime, this would then be a JavaScript string value reprenting an <code>Edm.Binary</code> literal, which we could then turn into its Base64 encoded value, and send assign to the payload.<br/><br/>When we went this route, the error message went away, and sure enough, documents started to show up in our backend table. Unfortunately, the documents ended up there as Edm.Binary literals, that is, as strings that are an accurate <code>Edm.Binary</code> literal representation of the document but otherwise useless.
<br/><br/>
At this point the solution was clear though - just leave out the intermediate step of converting the original value to an <code>Edm.Binary</code> literal.
<h3>The <code>uploadFile</code> function</h3>
Remember, at this point we have the <code>File</code> object stored in the <code>fileToUpload</code> variable, and a flag <code>fileToUploadExists</code> is set to <code>true</code> or <code>false</code> depending upon whether the file is already stored in the backend table. This is code we ended up with for uploading the file:<pre><code><span style="color:blue">function</span> uploadFile(){
<span style="color:blue">var</span> fileReader = <span style="color:blue">new</span> FileReader();
fileReader.onload = <span style="color:blue">function</span>(event){
<span style="color:blue">var</span> binaryString = event.target.result;
<span style="color:blue">var</span> payload = {
<span style="color:red">"FILE_NAME"</span>: fileToUpload.name,
<span style="color:red">"FILE_TYPE"</span>: fileToUpload.type,
<span style="color:red">"FILE_LAST_MODIFIED"</span>: <span style="color:blue">new</span> Date(fileToUpload.lastModified),
<span style="color:red">"FILE_SIZE"</span>: fileToUpload.size,
<span style="color:red">"FILE_CONTENT"</span>: btoa(binaryString),
<span style="color:red">"FILE_LAST_UPLOADED"</span>: <span style="color:blue">new</span> Date(Date.now())
};
<span style="color:blue">if</span> (fileToUploadExists) {
model.update(
<span style="color:red">'/'</span> + model.createKey(filesEntityName, {
<span style="color:red">"FILE_NAME"</span>: fileToUpload.name
}),
payload
);
}
<span style="color:blue">else</span> {
model.createEntry(<span style="color:red">'/'</span> + filesEntityName, {
properties: payload
});
}
model.submitChanges({
success: <span style="color:blue">function</span>(){
closeUploadDialog();
}
});
};
fileReader.readAsBinaryString(fileToUpload);
}</code></pre>As explained earlier, uploading the file breaks down into 2 subtasks, and this handler takes care of both:<ul>
<li>First, we use the <code>FileReader</code> to read the contents of the <code>File</code> object</li>
<li>Then, we send it to the backend. To do that, construct the path and the payload, and call either the <code>createEntry</code> or the <code>update</code> method based on whether the file already exists, passing the path and the payload.</li>
</ul>
<h3>Using the <code>FileReader</code> to read the contents of a <code>File</code> object</h3>
First, we need to read the contents of the file. We do that using a <a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader" target="mdn"><code>FileReader</code></a>, which is also part of the Web File API. To get the contents of a <code>File</code> object, we can call one of the <code>FileReader</code>'s <code>read</code> methods.
<br/>
<br/>The <code>FileReader's</code>read methods do not return the contents of the file directly: the Web File API is mostly asynchronous. Instead, we have to attach an event handler to the <code>FileReader</code> which can respond to the <code>FileReader</code>'s events. In this case we overrided the <code>FileReader</code>'s <code>onload()</code> method, which gets called when the <code>FileReader</code> is done reading a <code>File</code>. (Instead of the override, we could also have attached a handler with <code>addEventListener</code> but it really doesn't matter too much how the handler is attached.)
<br/><br/>Once set up, we can now call a <code>read()</code> method and wait for the reader to call our <code>onload()</code> handler.
<br/>
<br/>So the general structure to read the file is as follows:<pre><code><span style="color:blue">function</span> uploadFile(){
<span style="color:blue">var</span> fileReader = <b><span style="color:blue">new</span> FileReader()</b>;
fileReader.<b>onload</b> = <span style="color:blue">function</span>(event){
<span style="color:blue">var</span> binaryString = <b>event.target.result</b>;
...do something with the file contents...
};
<b>fileReader.readAsBinaryString(fileToUpload)</b>;
}</code></pre>
<br/><br/>
We already mentioned the <code>FileReader</code> provides a number of different <a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader#Methods" target="mdn"><code>read</code> methods</a>, and the chosen method determines the type of the value that will be available in <code>event.target.result</code> by the time the load handler is called. Today, the <code>FileReader</code> provides:<ul>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsArrayBuffer" target="mdn"><code>readAsArrayBuffer()</code></a></li>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsBinaryString" target="mdn"><code>readAsBinaryString()</code></a></li>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsDataURL" target="mdn"><code>readAsDataURL()</code></a></li>
<li><a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader/readAsText" target="mdn"><code>readAsText()</code></a></li>
</ul>To figure out which method we should use, we should consider how our backend expects to receive the data. Or rather, how our <code>sap.ui.model.odata.v2.ODataModel</code> wants us to pass the data so it can do the appropriate call to the backend. In a previous section we already explained our struggle to figure out how to represent a <code>Edm.Binary</code> value in the payload, and based on those findings, <code>readAsBinaryString()</code> is the appropriate method. With this read method, the <code>FileReader</code> turns each individual byte of the file contents in to a JavaScript character, much like the <code>fromCharCode()</code>-method of the <code>String</code> object would do. The resulting value is a JavasScript binary string: each character represents a byte.<br/><br/>
Note that this is very different from what the <code>readAsText()</code> method would do: that would attempt to read the bytes as if they are encoded characters in UTF-8 encoding, in other words it would result in a character string, not a binary string.
<br/>
<br/>
After obtaining the file contents as binary string, we can apply the Base64 encoding and assign it to the payload:<pre><code> <span style="color:blue">var</span> payload = {
<span style="color:red">"FILE_NAME"</span>: fileToUpload.name,
<span style="color:red">"FILE_TYPE"</span>: fileToUpload.type,
<span style="color:red">"FILE_LAST_MODIFIED"</span>: <span style="color:blue">new</span> Date(fileToUpload.lastModified),
<span style="color:red">"FILE_SIZE"</span>: fileToUpload.size,
<span style="color:red">"FILE_CONTENT"</span>: <b>btoa(binaryString)</b>,
<span style="color:red">"FILE_LAST_UPLOADED"</span>: <span style="color:blue">new</span> Date(Date.now())
};
</code></pre>
<h2>Summary</h2>
This concludes the last installment in this blog series. In this post we learned how to use:<ul>
<li>the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.unified.FileUploader" target="ui5"><code>sap.ui.unified.FileUploader</code></a> to present a file chooser to the user</li>
<li>the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.unified.FileUploader/events/change" target="ui5"><code>sap.ui.unified.FileUploader</code>'s <code>change</code> event</a> to get a hold of the <a href="https://developer.mozilla.org/en-US/docs/Web/API/File" target="mdn"><code>File</code></a> object representing the user's selection.</li>
<li>the <a href="https://developer.mozilla.org/en-US/docs/Web/API/FileReader" target="mdn"><code>FileReader</code></a> to read the contents of a <code>File</code> object</li>
<li>the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.odata.v2.ODataModel/methods/read" target="ui5"><code>sap.ui.model.odata.v2.ODataModel</code>'s <code>read()</code> method</a> to specify a query using a <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.Filter" target="ui5"><code>sap.ui.model.Filter</code></a> to check whether an item already exists in the backend.</li>
<li>the <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.odata.v2.ODataModel/methods/createEntry" target="ui5"><code>createEntry()</code></a> and <a href="https://sapui5.hana.ondemand.com/#/api/sap.ui.model.odata.v2.ODataModel/methods/update" target="ui5"><code>update()</code></a> methods of the <code>sap.ui.model.odata.v2.ODataModel</code> to create or update an entry in the backend.</li>
<li>the <a href="https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/btoa" target="mdn"><code>btoa()</code></a> function to create payloads for OData properties of the <code>Edm.Binary</code> type</li>
</ul>
And by putting these elements together we created a File upload feature for a UI5 application, backed by a HANA OData service.
<h3>Odds and Ends</h3>
It's nice that we finally found out how to write the file contents to our OData Service. But something does not feel quite right. Although we happened to find a way to write the binary data that satisfies both the <code>sap.ui.model.odata.v2.ODataModel</code> as well as the SAP HANA <code>.xsodata</code> service that backs it, we still haven't found any official documentation, either from the OData specification or from SAP that confirms that this is really the correct way. We would hope that SAP HANA's <code>.xsodata</code> implementation is a faithful implementation of the standard, but for the Edm.Binary type, I'm just not 100% sure. If anybody could chime in and confirm this, and preferably point me to something in the OData specification that confirms this, then I would be most grateful.rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com1tag:blogger.com,1999:blog-15319370.post-30199305158809375482019-12-01T01:37:00.000+01:002019-12-02T03:23:25.846+01:00Building a UI5 Demo for SAP HANA Text Analysis: Part 3We now continue our series on building a simple web application for exploring SAP HANA Text Analysis features. As a reminder, here are the links to the other installments in the series:
<ul>
<li><a href="http://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text.html">Part 1 - an Overview: SAP HANA Text Analysis on Documents uploaded by an end-user</a></li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html">Part 2 - Hands on: Building the backend for a SAP HANA Text Analysis application</a></li>
<li>Part 3 - Presenting: A UI5 front-end to upload documents and explore SAP HANA Text Analytics features</li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_2.html">Part 4 - Deep dive: How to upload documents with OData in a UI5 Application</a></li>
</ul>
In the <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html">previous blog post</a>, we built a backend for our SAP HANA Text Analysis application.
In this blog post I present a simple web application which lets end-users upload documents to the backend and inspect the SAP HANA Text analysis results.
<br/>
<br/>
The application is a very simple UI5 application. The code, along with the back-end code is available on github, and instructions to install this application on your own SAP HANA system are provided as well.
<h2>The UI5 Application: Functional Overview</h2>
Here's an overview of the UI5 demo application:
<br/>
<br/>
<img src="https://drive.google.com/uc?id=11_w_i0Q89XhYFM4NQCdWGpsFCfB3AxWr&authuser=0&export=download"/>
<br/>
<br/>
The application features a single page, which is split vertically.
On the left hand side of the splitter is the list of uploaded files, and it shows all rows from the <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#ta-CT_FILE-table"><code>CT_FILE</code></a> database table.
On the right hand side of the splitter is the list of text analysis results, and this shows rows from the <code>$TA_</code> database table.
<br/>
<br/>
In the screenshot, only the <code>FILE_NAME</code> column is visible, but you can reveal the other columns by choosing them from the column header menu, which pops up when you right click a column header:
<br/>
<br/>
<img src="https://drive.google.com/uc?id=1m0mH5OQZv3yCcwn2M4u-XQj6BPeIyCKG&authuser=0&export=download"/>
<br/>
<br/>
Since we haven't uploaded any files yet, both lists are currently empty. So, let's upload a file to see it in action!
To upload a file, hit the button on the top left side of the application toolbar (1):
<br/>
<br/>
<img src="https://drive.google.com/uc?id=1FC6_7rzeRgDH8qJamjDqrTfwQDkTP-5G&authuser=0&export=download"/>
<br/>
<br/>
After clicking the "Upload File for Text Analysis" toolbar button, a dialog appears that lets you browse files so you can upload them.
Hit the "Browse File..." button in the dialog to open a File explorer (2). Use the file explorer to choose a file (3).
Note that this demo project's github repository provides a number of sample files in the <code>sample-docs</code> folder.
<br/>
<br/>
After choosing a file in the File explorer, the file name appears in the dialog:
<br/>
<br/>
<img src="https://drive.google.com/uc?id=1q1uD5n3PYhdZLYtAmsCHnBbtN4tV4yqj&authuser=0&export=download"/>
<br/>
<br/>
To actually upload the chosen file, confirm the dialog by clicking the "Upload" button at the bottom of the dialog.
The file will then appear in the file list left of the splitter, and is then selected.
<br/>
<br/>
Whenever the selection in the file list changes, the text analysis results in the list on the right of the splitter are updated to match the selected item.
As we <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#hana_fulltext_asynchronous">mentioned in the previous post</a>, collection of text analysis results is <code>ASYNCHRONOUS</code>, so after uploading a new file, there is a possibility that the text analysis results have not yet arrived. Unfortunately, there is not much that can be done about that at this point.
<br/>
<br/>
<img src="https://drive.google.com/uc?id=1wJlPq9osrJZQQyA2Aw2St_JjBn8ReH51&authuser=0&export=download"/>
<br/>
<br/>
You can now browse, filter, and sort the list of analysis results to explore the results of the text analysis.
Obviously, by itself this is not very useful, but the point of this app is to make if very easy to inspect the actual raw text analysis results.
Hopefully, it will give you some ideas on how you could use this type of information to build actual real world applications.
<br/>
<br/>
Once you're done with a particular file, you can remove it too using this application: in the File list, simply hit the trashbin icon to remove that particular file.
A dialog will appear where you need to confirm the deletion of that file. When you confirm the dialog, the file will be deleted form the <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#ta-CT_FILE-table"><code>CT_FILE</code></a> table.
Note that any corresponding analysis results from the <code>$TA_</code> table will not be removed by this demo application, unless you manually added a foreign key constraint on the <code>$TA_</code> table that cascades the deletes from the <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html#ta-CT_FILE-table"><code>CT_FILE</code></a> table.
<h2>Installing this application on your own HANA System</h2>
Front-end and back-end code for this application is <a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo" target="github">available on github</a> and licensed as open source software under the terms and conditions of <a href="https://www.apache.org/licenses/LICENSE-2.0" target="apache">the Apache 2.0 software license</a>.
The remainder of this post provides the installation instructions.
<h3>Obtaining the source and placing it in a destination package on your HANA system</h3>
<ul>
<li>Create a package with your favorite IDE for SAP HANA (Web IDE, SAP HANA Studio, Eclipse with SAP HANA Developer Tools)</li>
<li><a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo/archive/master.zip">Download</a> an archive of <a href="https://github.com/just-bi/hana-ui5-text-analysis-upload-demo" target="github">the github repository</a></li>
<li>Unzip the archive and transfer its contents to the HANA package you just created.</li>
</ul>
<h3>Updating Package and Schema names</h3>
<ul>
<li>With <code>db/CT_FILE.hdbdd</code>:
<ul>
<li>update the <code>namespace</code>, update the package identifier from <code>"system-local"."public"."rbouman"."ta"</code> to the name of the package you just created.</li>
<li>modify the <code>@Schema</code> from <code>'RBOUMAN'</code> to whatever schema you want to use. (Create a schema yourself if you don't already have one)</li>
<li>Activate <code>db/CT_FILE.hdbdd</code>. In the database catalog, you should now have this table. Hana should have created a <a href="https://help.sap.com/viewer/fedd7e90a382415cbdd273891651ab4d/1.0.12/en-US/e580220fc1014045ab9f45ea9f82d8d8.html" rel="nofollow">corresponding <code>$TA_</code> table</a> as well.</li>
</ul>
</li>
<li>With <code>service/ta.xsodata</code>:
<ul>
<li>In the first entity definition, update the table repository object identifier <code>"system-local.public.rbouman.ta.db::CT_FILE"</code> so it matches the location of the table on your system.</li>
<li>In the second entity definition, update the catalog table identifier from <code>"RBOUMAN"."$TA_system-local.public.rbouman.ta.db::CT_FILE.FT_IDX_CT_FILE"</code> so it matches the database schema and catalog table name on your system.</li>
<li>Activate <code>service/ta.xsodata</code>.</li>
</ul>
</li>
</ul>
<h3>Activation</h3>
You can now activate the package you created to activate all remaining objects, such as the <code>.xsapp</code> and <code>.xsaccess</code> files, as well as the <code>web</code> subpackage and all its contents.
<h3>Running the application</h3>
After installation, you should be able to open the web application. You can do this by navigating to:
<pre><code>http://yourhanahost:yourxsport/path/to/your/package/web/index.html</code></pre>
where:
<ul>
<li><code>yourhanahost</code> is the hostname or IP address of your SAP HANA system</li>
<li><code>yourxsport</code> is the <a href="https://help.sap.com/viewer/6b94445c94ae495c83a19646e7c3fd56/1.0.12/en-US/116cc3f3f3f645159ee138c3ba50a48b.html" rel="nofollow">port where your HANA's xs engine is running</a>. Typically this is 80 followed by your HANA instance number.</li>
<li><code>path/to/your/package</code> is the name of the package where you installed the app, but using slashes (/) instead of dots (.) as the separator character.</li>
</ul>
<h2>Summary</h2>
In this blog post we finally got to use <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html">the backend we built previously</a> by installing and running the UI5 App.
You may either use the app to explore the SAP HANA Text Analysis results and to experiment with different different document formats.
<br/>
<br/>
If you're also interested into how the actual upload process works and how it is implemented in the UI5 app, then you can read all about it in the next and final installment of this series.
rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-12994778958741262082019-12-01T00:44:00.001+01:002019-12-14T02:03:25.950+01:00Building a UI5 Demo for SAP HANA Text Analysis: Part 2In the <a href="http://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text.html">previous blog post</a>, I explained some of the pre-requisites for building a SAP HANA Text Analysis application, and some thoughts were dedicated on how to expose these features to a end-user facing web-application. In this blog post, these considerations are put to practice and a basic but functional backend is created to support such an application.
<br/>
<br/>
<ul>
<li><a href="http://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text.html">Part 1 - an Overview: SAP HANA Text Analysis on Documents uploaded by an end-user</a></li>
<li>Part 2 - Hands on: Building the backend for a SAP HANA Text Analysis application</li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_28.html">Part 3 - Presenting: A UI5 front-end to upload documents and explore SAP HANA Text Analytics features</a></li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_2.html">Part 4 - Deep dive: How to upload documents with OData in a UI5 Application</a></li>
</ul>
<h2>Building the HANA Text Analysis Backend</h2>
We'll quicky go through a setup so you can try this out yourselves. This assumes you have access to a HANA System and development tools (either the <a href="https://developers.sap.com/topics/sap-webide.html" target="sap">SAP Web IDE</a>, <a href="https://help.sap.com/viewer/52715f71adba4aaeb480d946c742d1f6/1.0.12/en-US/ade083aeded84e289d1710a1cf131499.html" target="sap">HANA Studio</a>, or <a href="https://www.eclipse.org/downloads/" target="eclipse">Eclipse IDE</a> with <a href="https://tools.hana.ondemand.com/#hanatools" target="sap">SAP HANA Develpoment tools</a>, or whatever - it doesn't really matter.
<h3>Package Structure</h3>
There is a lot that could be said about the proper package structure for HANA applications, but we won't go there now. It's enough to have something with just enough structure to keep the different responsibilities of the application apart.
We settled for a simple base package called <code>ta</code>, which acts as the root package of the demo project, with 3 subpackages:<ul>
<li><code>db</code> for anything related to the physical database structure</li>
<li><code>service</code> for the OData service, that exposes the database to our web-application</li>
<li><code>web</code> for the HTML5/UI5 application - i.e. the stuff that is served from HANA to run inside the client's web-browser</li>
</ul>
Apart from these 3 subpackages, the <code>ta</code> package also contains these 2 files, which are necessary to expose it as a web application:<ul>
<li><code>.xsapp</code> - an empty file to make the contents of the package <a href="https://help.sap.com/viewer/400066065a1b46cf91df0ab436404ddc/1.0.12/en-US/fac9ec6995a0426c840f85ae5a8f6930.html" target="sap">available via the XS webserver</a>.</li>
<li><code>.xsaccess</code> - A configuration file <a href="https://help.sap.com/viewer/400066065a1b46cf91df0ab436404ddc/1.0.12/en-US/804d4967affd4a43b6a109e6f3987b21.html" target="sap">for managing access and authorizations</a> for the web application.</li>
</ul>
Note that with this setup, all of our subpackages are exposed, whereas only the service and web subpackages are required to be exposed. An actual, serious application would only expose whatever is minimally required to be exposed, and would not expose any packages related to the physical database structure.
<h3 id="ta-CT_FILE-table">The <code>CT_FILE</code> table</h3>
We created a HANA table to hold our uploaded files by creating a file called <code>CT_FILE.hdbdd</code> file in the <code>db</code> package. This allows you to maintain the table definition as a repository object, which makes it transportable.
<br/>
<br/>
The <code>CT_FILE.hdbdd</code> file has the following contents:<pre><code>namespace "system-local"."public"."rbouman"."ta"."db";
@Schema: 'RBOUMAN'
@Catalog.tableType: #COLUMN
Entity CT_FILE {
Key "FILE_NAME" : String(256) not null;
"FILE_TYPE" : String(256) not null;
"FILE_LAST_MODIFIED" : UTCTimestamp not null;
"FILE_SIZE" : Integer not null;
<b>"FILE_CONTENT" : LargeBinary not null;</b>
"FILE_LAST_UPLOADED" : UTCTimestamp not null ;
}
technical configuration {
<b>FULLTEXT INDEX "FT_IDX_CT_FILE" ON ("FILE_CONTENT")
ASYNCHRONOUS
LANGUAGE DETECTION ('en')
MIME TYPE COLUMN "FILE_TYPE"
FUZZY SEARCH INDEX off
PHRASE INDEX RATIO 0.721
SEARCH ONLY OFF
FAST PREPROCESS OFF
TEXT ANALYSIS ON
CONFIGURATION 'GRAMMATICAL_ROLE_ANALYSIS';</b>
};</code></pre>
The important feature here is the definition of the <code>FILE_CONTENT</code> column as a <code>LargeBinary</code>, and the <code>FULLTEXT INDEX</code> definition on that column.
The particular syntax to define the fulltext index in a <code>.hdbdd</code> table definition is described <a href="https://help.sap.com/viewer/09b6623836854766b682356393c6c416/1.0.12/en-US/ad036c56b5e545ae8b31ece0ab95379f.html#loioad036c56b5e545ae8b31ece0ab95379f__subsection_kmv_f5r_qt" target="sap">SAP HANA Core Data Services (CDS) Reference</a>, whereas the actual options that are applicable for <code>FULLTEXT INDEX</code>es are described in the <a href="https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/20d4117e75191014ba5aaab91b3f087d.html" target="sap">SAP HANA SQL and System Views Reference</a>. Finally, guidance on the meaning and functionality of the text analysis configurations are described in the <a href="https://help.sap.com/viewer/fedd7e90a382415cbdd273891651ab4d/1.0.12/en-US/31b772b1530349a5bf32ec345f5a0080.html" target="sap">SAP HANA Text Analysis Developer Guide</a>.
<br/>
<br/>
The short of it is that with this configuration, the (binary) content of docments stored in the <code>TEXT_CONTENT</code> column will be analyzed automatically. The results of the analysis will be stored in a sepearate <code>$TA_</code> table called <code>$TA_system-local.public.rbouman.ta.db::CT_FILE.FT_IDX_CT_FILE</code>. This table is created and maintained by the HANA system. The structure of this <code>$TA_</code> table is described <a href="https://help.sap.com/viewer/fedd7e90a382415cbdd273891651ab4d/1.0.12/en-US/e580220fc1014045ab9f45ea9f82d8d8.html" target="sap">here</a>.
<h4 id="hana_fulltext_asynchronous"><code>ASYNCHRONOUS</code></h4>
I just mentioned that the analysis results will be stored in the <code>$TA_</code> table automatically. While this is true, the analysis does not occur immediately. This is because the <code>FULLTEXT INDEX</code> is created with the <code>ASYNCHRONOUS</code> option. This allows HANA to store documents in the <code>CT_FILE</code> table without having to wait for the text analysis process to finish.
<br/>
<br/>
We could the debate the advantages and drawbacks of the <code>ASYNCHRONOUS</code> option and whether it would make more sense to specify <code>SYNCHRONOUS</code> instead, or leave the option out alltogether (in which case <code>SYNCHRONOUS</code> would be implied). However there is a very simple reason why it is currently specified as <code>ASYNCHRONOUS</code>: if a <code>FULLTEXT INDEX</code> specifies a <code>CONFIGURATION</code> option, then it must be specified as <code>SYNCHRONOUS</code>, or else the following error occurs upon activation:
<pre><code>CONFIGURATION not supported with synchronous processing</code></pre>
For actual analysis, we really do need the <code>CONFIGURATION</code> option, as it offers all the truly interesting properties of text analysis. So, it seems there's just no way around it - text analysis results are collected in the background, and finish at some point after our document is stored in the table. And there seems to be no way of finding out whether the analysis is finished, or even if it is still busy. For instance, this makes it impossible to determine whether a recently uploaded document is still being analyzed, or whether the document was not eligible for text analysis at all: in both cases, the analysis results will remain absent.
<br/>
<br/>
That said, even though the <code>FULLTEXT INDEX</code> is specified as <code>ASYNCHRONOUS</code> HANA will let you specify when the analysis results should be updated. At least, according to the <a href="https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/20d4117e75191014ba5aaab91b3f087d.html" target="sap">SAP HANA SQL and System Views Reference</a>, it is possible to specify a <code>FLUSH [QUEUE] <flush_queue_elem></code>-clause right after the <code>ASYNCHRONOUS</code> option, with <code><flush_queue_elem></code> indicating either a time interval (expressed as a number of minutes) or a number of documents. So, in theory, it would be possible to write:<pre><code>ASYNCHRONOUS FLUSH QUEUE AFTER 1 DOCUMENTS</code></pre> which would indicate that update of the analysis would kick in as soon as a new document has been loaded.
<br/>
<br/>
Unfortunately, on the HANA System I have access to, this results in the following error upon activation:<pre>Flush based on documents/minutes not yet supported</pre>
The same error message occurs when I try <code>ASYNCHRONOUS FLUSH QUEUE EVERY 1 MINUTES</code> instead.
<br/>
<br/>
So, it looks like we'll just have to live with this for now. I did some checks and I noticed that analysis kicks in after a couple of seconds, but this is on a system that is not very heavily used. So for the purpose of exploration it's not too bad, but this does seem like it could become a problem for real-time applications (like chatbots).
<h4>The Key</h4>
Another thing worth mentioning here is the key of the <code>CT_FILE</code> table. For this very simple demo application, we chose to make only the <code>FILE_NAME</code> column the primary key of the table. The choice of key will depend on what kind of application you're building. In many practical cases you might not care at all about the fysical name of the uploaded file at all, and a name given by the uploader might be a better choice. Or maybe you don't care about names at all, only about whether the content of the document may be considered unique, in which case some hash of the file contents may be a suitable choice.
<br/>
<br/>
No matter what key you choose for the <code>FULLTEXT INDEX</code>ed table, the column definitions that make up the key are copied to the corresponding <code>$TA_</code> table in order to maintain the relationship between the analysis result and the original source document of the analysis, thus using those columns as a foreign key. Note however that HANA does not automatically create a <code>FOREIGN KEY</code> constraint to enforce referential integrity. But you may add such a constraint yourself. This may be useful in particular to cascade deletes on the document table to the text analysis results table.
(Adding such a constraint manually is suggested in the introduction of the <a href="https://help.sap.com/viewer/fedd7e90a382415cbdd273891651ab4d/1.0.12/en-US/31b772b1530349a5bf32ec345f5a0080.html" target="sap">SAP HANA Text Analysis Developer Guide</a>.)
<br/>
<br/>
The primary key of the <code>$TA_</code> table consists of the key columns from the document table, plus to additional columns that identify an individual analysis result: <code>TA_RULE</code> and <code>TA_COUNTER</code>, where <code>TA_RULE</code> is the type of analysis that yielded the result, and <code>TA_COUNTER</code> is an integer that identifies, in order, each analysis result within a document and within a particular analysis type.
<h3 id="ta-xsodata-service-definition">The <code>.xsodata</code> Service Definition</h3>
We expose both the <code>CT_TEXT</code> and the <code>$TA_system-local.public.rbouman.ta.db::CT_FILE.FT_IDX_CT_FILE</code> tables via an OData service. The OData service is created by creating a <a heref="https://help.sap.com/viewer/4505d0bdaf4948449b7f7379d24d0f0d/1.0.12/en-US/57920551c8ed4dea996c895ea05c6843.html" target="sap"><code>.xsodata</code> service definition file</a> called <code>ta.xsodata</code> in the <code>service</code> subpackage.
<br/>
<br/>
The contents of <code>ta.xsodata</code> service definition file are shown below:<pre><code>service {
entity <b>"system-local.public.rbouman.ta.db::CT_FILE"</b> as <b>"Files"</b>;
entity <b>"RBOUMAN"."$TA_system-local.public.rbouman.ta.db::CT_FILE.FT_IDX_CT_FILE"</b> as <b>"TextAnalysis"</b>;
}
annotations {
enable OData4SAP;
}
settings {
support null;
}</code></pre>
This creates a OData service and maps our two tables <code>CT_FILE</code> and the <code>$TA_</code> table to the OData EntitySets <code>Files</code> and <code>TextAnalysis</code> respectively. The OData service will be available at a url of which the path corresponds to the fully qualified package name of the <code>.xsodata</code> file.
<br/>
<br/>
Note that the syntax for mapping the tables to EntitySets is slightly different, depending upon whether the table is created as a repository object or as a database catalog object:<ul>
<li>for <code>CT_FILE</code>, it is the package name containing the table's <code>.hdbdd</code> file, followed by 2 colons, and then followed by the local table name.</li>
<li>for <code>$TA_</code> table, it is the (quoted) database schema name, followed by a dot, and them followed by the quoted table name.</li>
</ul>
The reason for the difference is that we only maintain the <code>CT_FILE</code> table as a repository object. There is no corresponding repository object for the <code>$TA_</code> file, since HANA creates that autonomously as a result of the full text index on <code>CT_FILE</code>. Since the <code>$TA_</code> table is created automatically we can assume the entire thing is transportable as it is, as long as we make sure we maintain our document table as a repository object, and refer to it in the <code>.xsodata</code> file using it's repository object name.
<h3>Activation and Verification</h3>
Now that we have all these artifacts, we should try and activate our package and test the service. You can either attempt to activate the entire package, or activate each file individually. For the latter, you need to make sure to activate the <code>.hdbdd</code> file before attempting to activate the <code>.xsodata</code> file, because the <code>.xsodata</code> file is dependent upon the existence of the tables in the database catalog.
<br/>
<br/>
After succesful activation, you can attempt to visit the service by navigating to its service document or metadata document using your web browser. These documents should be available at the following urls:<ul>
<li>Service Document: <code>http://yourhanahost:yourxsport/path/to/your/package/service/ta.xsodata</code></li>
<li>Metadata Document: <code>http://yourhanahost:yourxsport/path/to/your/package/service/ta.xsodata/$metadata</code></li>
</ul>
where: <ul>
<li><code>yourhanahost</code> is the hostname or ipadress of your SAP HANA system</li>
<li><code>yourxsport</code> is the portname of your xsengine. This is normally 80 followed by the HANA Instance number. For example, if the instance number is 10, the port will be 8010</li>
<li><code>path/to/your/package</code> is the path you get when you take the package identifier where you put the <code>db</code>, <code>service</code> and <code>web</code> subpackages and replace the dot (.) that separates the individual package names with a slash (/).</li>
</ul>
<h3 id="ta-xsodata-metadata">Inspect the <code>$metadata</code> document of the service</h3>
If all is well, your <code>$metadata</code> document should look something like this:<pre><code>
<edmx:Edmx
xmlns:edmx=<span style="color:red">"http://schemas.microsoft.com/ado/2007/06/edmx"</span>
xmlns:sap=<span style="color:red">"http://www.sap.com/Protocols/SAPData"</span>
Version=<span style="color:red">"1.0"</span>
>
<edmx:DataServices
xmlns:m=<span style="color:red">"http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"</span>
m:DataServiceVersion=<span style="color:red">"2.0"</span>
>
<Schema
xmlns:d=<span style="color:red">"http://schemas.microsoft.com/ado/2007/08/dataservices"</span>
xmlns:m=<span style="color:red">"http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"</span>
xmlns=<span style="color:red">"http://schemas.microsoft.com/ado/2008/09/edm"</span>
Namespace=<span style="color:red">"system-local.public.rbouman.ta.service.ta"</span>
>
<EntityType Name=<span style="color:red">"FilesType"</span>>
<Key>
<PropertyRef Name=<span style="color:red">"FILE_NAME"</span>/>
</Key>
<Property Name=<span style="color:red">"FILE_NAME"</span> Type=<span style="color:red">"Edm.String"</span> Nullable=<span style="color:red">"false"</span> MaxLength=<span style="color:red">"256"</span>/>
<Property Name=<span style="color:red">"FILE_TYPE"</span> Type=<span style="color:red">"Edm.String"</span> Nullable=<span style="color:red">"false"</span> MaxLength=<span style="color:red">"256"</span>/>
<Property Name=<span style="color:red">"FILE_LAST_MODIFIED"</span> Type=<span style="color:red">"Edm.DateTime"</span> Nullable=<span style="color:red">"false"</span>/>
<Property Name=<span style="color:red">"FILE_SIZE"</span> Type=<span style="color:red">"Edm.Int32"</span> Nullable=<span style="color:red">"false"</span>/>
<Property Name=<span style="color:red">"FILE_CONTENT"</span> Type=<span style="color:red">"Edm.Binary"</span> Nullable=<span style="color:red">"false"</span>/>
<Property Name=<span style="color:red">"FILE_LAST_UPLOADED"</span> Type=<span style="color:red">"Edm.DateTime"</span> Nullable=<span style="color:red">"false"</span>/>
</EntityType>
<EntityType Name=<span style="color:red">"TextAnalysisType"</span>>
<Key>
<PropertyRef Name=<span style="color:red">"FILE_NAME"</span>/>
<PropertyRef Name=<span style="color:red">"TA_RULE"</span>/>
<PropertyRef Name=<span style="color:red">"TA_COUNTER"</span>/>
</Key>
<Property Name=<span style="color:red">"FILE_NAME"</span> Type=<span style="color:red">"Edm.String"</span> Nullable=<span style="color:red">"false"</span> MaxLength=<span style="color:red">"256"</span>/>
<Property Name=<span style="color:red">"TA_RULE"</span> Type=<span style="color:red">"Edm.String"</span> Nullable=<span style="color:red">"false"</span> MaxLength=<span style="color:red">"200"</span>/>
<Property Name=<span style="color:red">"TA_COUNTER"</span> Type=<span style="color:red">"Edm.Int64"</span> Nullable=<span style="color:red">"false"</span>/>
<Property Name=<span style="color:red">"TA_TOKEN"</span> Type=<span style="color:red">"Edm.String"</span> MaxLength=<span style="color:red">"5000"</span>/>
<Property Name=<span style="color:red">"TA_LANGUAGE"</span> Type=<span style="color:red">"Edm.String"</span> MaxLength=<span style="color:red">"2"</span>/>
<Property Name=<span style="color:red">"TA_TYPE"</span> Type=<span style="color:red">"Edm.String"</span> MaxLength=<span style="color:red">"100"</span>/>
<Property Name=<span style="color:red">"TA_NORMALIZED"</span> Type=<span style="color:red">"Edm.String"</span> MaxLength=<span style="color:red">"5000"</span>/>
<Property Name=<span style="color:red">"TA_STEM"</span> Type=<span style="color:red">"Edm.String"</span> MaxLength=<span style="color:red">"5000"</span>/>
<Property Name=<span style="color:red">"TA_PARAGRAPH"</span> Type=<span style="color:red">"Edm.Int32"</span>/>
<Property Name=<span style="color:red">"TA_SENTENCE"</span> Type=<span style="color:red">"Edm.Int32"</span>/>
<Property Name=<span style="color:red">"TA_CREATED_AT"</span> Type=<span style="color:red">"Edm.DateTime"</span>/>
<Property Name=<span style="color:red">"TA_OFFSET"</span> Type=<span style="color:red">"Edm.Int64"</span>/>
<Property Name=<span style="color:red">"TA_PARENT"</span> Type=<span style="color:red">"Edm.Int64"</span>/>
</EntityType>
<EntityContainer
Name=<span style="color:red">"ta"</span>
m:IsDefaultEntityContainer=<span style="color:red">"true"</span>
>
<EntitySet Name=<span style="color:red">"Files"</span> EntityType=<span style="color:red">"system-local.public.rbouman.ta.service.ta.FilesType"</span>/>
<EntitySet Name=<span style="color:red">"TextAnalysis"</span> EntityType=<span style="color:red">"system-local.public.rbouman.ta.service.ta.TextAnalysisType"</span>/>
</EntityContainer>
</Schema>
</edmx:DataServices>
</edmx:Edmx></code></pre>
<h2>Summary</h2>
In this installment, we executed the plan formulated in <a href="http://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text.html">part 1 of this series</a>.
We should now have a functional back-end which we may use to support our front-end application.
<br/>
<br/>
In <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_28.html">the next installment</a> we will present a front-end application, and explain how you can obtain it and install it yourself on your own SAP HANA System.
rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-73811528280516447622019-12-01T00:18:00.000+01:002019-12-12T01:08:49.835+01:00Building a UI5 Demo for SAP HANA Text Analysis: Part 1Last week, me and my <a href="https://www.just-bi.nl/about/#team-a-roles" target="just">Just-BI</a> co-workers Arjen Koot and Mitchell Beekink had a bit of a rumble with HANA (1.0) and <a href="https://sapui5.hana.ondemand.com/" target="sap">the UI5 toolkit</a>.
In the process, we made a few observations and found out a few things which we figured might be worth sharing in a couple of blog posts:
<ul>
<li>Part 1 - an Overview: SAP HANA Text Analysis on Documents uploaded by an end-user</li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html">Part 2 - Hands on: Building the backend for a SAP HANA Text Analysis application</a></li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_28.html">Part 3 - Presenting: A UI5 front-end to upload documents and explore SAP HANA Text Analytics features</a></li>
<li><a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_2.html">Part 4 - Deep dive: How to upload documents with OData in a UI5 Application</a></li>
</ul>
(Even though this was all done on HANA 1.0, many of these things should still work on HANA 2.0 as well using XS Classic).
<br/>
<br/>
<h2>Exploring HANA Text Analysis</h2>
The main use case of our concern is <a href="https://help.sap.com/viewer/fedd7e90a382415cbdd273891651ab4d/1.0.12/en-US/31b772b1530349a5bf32ec345f5a0080.html" target="sap">SAP HANA Text Analytics</a>.
SAP HANA's Text Analysis features let you extract tokens and semantics from various sources. Text analysis is not limited to plaintext but is also supported for binary documents in various formats, such as PDF and Microsoft Office documents such Word documents and Excel workbooks. After analysis, the analysis result may then be used for further processing.
<br/>
<br/>
Business cases that might utilize text analysis features include automated classification of invoices or reimbursment requests, matching CV's from employees or job applicants to vacancies, and detection of plagiary, to name just a few.<br/>
<br/>
The hard work of converting the documents, and performing the actual text analysis is all handled fully by HANA, which is great. In our specific case, this process includes the conversion of binary documents in PDF format to text.
<h3>What Document Types can HANA handle?</h3>
To find out which types and formats your HANA instance can handle, run a query on <a href="https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/2.0.02/en-US/20c865f5751910148cf1f22a1a3a22a1.html" target="sap"><code>"SYS"."M_TEXT_ANALYSIS_MIME_TYPES"</code></a>:
<pre><code>SELECT *
FROM "SYS"."M_TEXT_ANALYSIS_MIME_TYPES";
+---------------------------------------------------------------------------+--------------------------------------------+
|MIME_TYPE_NAME | MIME_TYPE_DESCRIPTION |
+---------------------------------------------------------------------------+--------------------------------------------+
| text/plain | Plain Text |
| text/html | HyperText Markup Language |
| text/xml | Extensible Markup Language |
| application/x-cscompr | SAP compression format |
| application/x-abap-rawstring | ABAP rawstring format |
| application/msword | Microsoft Word |
| application/vnd.openxmlformats-officedocument.wordprocessingml.document | Microsoft Word |
| application/vnd.ms-powerpoint | Microsoft PowerPoint |
| application/vnd.openxmlformats-officedocument.presentationml.presentation | Microsoft PowerPoint |
| application/vnd.ms-excel | Microsoft Excel |
| application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | Microsoft Excel |
| application/rtf | Rich Text Format |
| application/vnd.ms-outlook | Microsoft Outlook e-mail (".msg") messages |
| message/rfc822 | Generic e-mail (".eml") messages |
| application/vnd.oasis.opendocument.text | Open Document Text |
| application/vnd.oasis.opendocument.spreadsheet | Open Document Spreadsheet |
| application/vnd.oasis.opendocument.presentation | Open Document Presentation |
| application/vnd.wordperfect | WordPerfect |
| application/pdf | Portable Document Format |
+---------------------------------------------------------------------------+--------------------------------------------+</code></pre>
<h3>HANA Text Analysis Applications: pre-requisites</h3>
To use the text analysis features, you need to<ul>
<li>Create a database table with a column having the <a href="https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/1.0.12/en-US/20a1569875191014b507cf392724b7eb.html?q=data%20types#loio20a1569875191014b507cf392724b7eb___csql_data_types_1sql_data_types_introduction_lob" target="sap"><code>BLOB</code></a> data type. SAP HANA's text analysis features also work with plaintext, but for our specific use case we are interested in analyzing PDF documents. From the point of view of the application and database storage, these documents are binary files, which is why end up with a <code>BLOB</code>.</li>
<li>Create a <a target="sap" href="https://help.sap.com/viewer/4fe29514fd584807ac9f2a04f6754767/1.0.12/en-US/20d4117e75191014ba5aaab91b3f087d.html"><code>FULLTEXT INDEX</code></a> on that <code>BLOB</code> column, which configures all text analysis features we need, such as tokenization, stemming, and semantic extraction.</li>
</ul>
Once this is in place, we only need to store our documents (binary PDF files) in the <code>BLOB</code> column, and SAP HANA will do the rest (more or less automatically). The text analysis results can then be collected from <a href="https://help.sap.com/viewer/fedd7e90a382415cbdd273891651ab4d/1.0.12/en-US/e580220fc1014045ab9f45ea9f82d8d8.html" target="sap">a <code>$TA</code> table</a> and used for further, application specific processing.
<h2>Uploading Document Content</h2>
Now, as humble a task as it may seem, storing binary document content into a <code>BLOB</code> column is a bit of a challenge.
<br/>
<br/>
Various resources published by SAP (like <a href="https://blogs.sap.com/2014/10/13/text-search-and-text-analysis-with-sap-hana/" target="sap">this one</a>) focus on the text analysis features itself. They only offer a simple and pragmatic suggestion for loading the data into the table, which relies on a (client-side) Python script that reads the file from the client, and then uses a plain SQL <code>INSERT</code>-statement to upload the file contents to the <code>BLOB</code> column.
<br/>
<br/>
This approach is fine for exploring just the text analysis features of course, but it's not of much use if you want to create a end user facing application.
What we would like to do for them instead, is to offer a web application or mobile app (for example, based on UI5), which would allow end users to upload their own documents to the database with an easy-to use graphical user-interface.
<h3>What about a <code>.xsjs</code> script?</h3>
Now, it is entirely possible to come up with a solution that is somewhat similar to the client-side Python script.
For example, we could write an <a href="https://help.sap.com/viewer/52715f71adba4aaeb480d946c742d1f6/1.0.12/en-US/90878018cccd40f7a4b6754c04e2d34a.html" target="sap"><code>.xsjs</code> script</a> that runs on the HANA server.
This script would then handle a HTTP POST request and receive the raw document bytes in the request body.
<br/>
<br/>
Then, the <code>.xsjs</code> script would run a similar kind of SQL statement, and store the received data in the <code>BLOB</code> column.
(An example <code>.xsjs</code> script using SQL can be found <a href="https://help.sap.com/viewer/52715f71adba4aaeb480d946c742d1f6/1.0.12/en-US/0d2aa67a44a94b14ae80dc883a4c6419.html" target="sap">here</a>.)
<h4>Drawbacks of a <code>.xsjs</code> approach</h4>
However, there are a couple of drawbacks to this approach.
<br/>
<br/>
First of all, it is unlikely that a user-facing application would only want to upload the binary content of the document. Most likely, some application-specific metadata will need to be stored as well.
<br/><br/>
Another problem is that we would need to write specifc code to handle a very specific kind of request: uploading data to one particular table - that is: the unstructured document content itself, plus whatever structured data the application needs to associate with it.
The script will need to refer to the table and its columns by name (or perhaps via a stored procedure that does the actual loading - same difference), and when the name of the table or one of its columns changes, the script must also be changed.
<br/>
<br/>
This same issue bites back when you need to transport such a solution to another HANA system. Since there is no formal dependency between the <code>.xsjs</code> script and the database catalog object it references, this poses a bit of a challenge for the HANA transport system which is typically used for these kinds of tasks. At least, such a dependency is not registered anywhere and needs thus be managed manually.
<br/><br/>
Another thing to keep in mind is that if an application can write data somewhere, it generally also needs to read data from that source.
For example, the user may want to search some of the metadata fields to see if a particular file was already uploaded, or to see when a particular file was last uploaded, or to update a previously uploaded file.
<br/>
<br/>
Now, we could certainly write the <code>.xsjs</code> script so that it does those things as well. But the point is, even though we only minimally need a service that allows us to upload the document, this interface is incomplete from the application's point of view, so the actual script will need to implement much more data operations than only the upload. And so, what seemed like a small task to write a, simple script to do one simple thing, becomes a pretty serious task to write a pretty full featured service which does a whole bunch of things. And even if writing that would seem okay, it then needs to be maintained and documented and so on.
<br/>
<br/>
Finally, writing a server side script that executes SQL could introduce a security risk.
Even though it might be written in a way that is safe, one actually needs to make sure that it is indeed written safely, and this needs to be implemented for all functionalities that the service offers.
<br/>
<br/>
So, in summary - even though it may seem like a simple task, a realistic, practical solution that is safe, maintainable and fully featured is not so trivial at all. Not with .xsjs anyway! It will require substantial effort to write, document and maintain. The time and effort would be much better spent when dedicated to desiging or building actual business features.
<h3>Using OData instead</h3>
HANA also offers <a href="https://help.sap.com/viewer/52715f71adba4aaeb480d946c742d1f6/1.0.12/en-US/7cc43e570b5648d69231fbd7a9c7bf90.html" target="sap">an OData implementation</a>.
HANA OData services are defined in <a href="https://help.sap.com/viewer/52715f71adba4aaeb480d946c742d1f6/1.0.12/en-US/1a8c8a3eaefc4e2aa7ab23195b684b16.html" target="sap"><code>.xsodata</code> service definitions</a> and they pretty much solve all drawbacks mentioned above:<ul>
<li>The WEB API is generic, and works the same for just about any database table-like object you want to expose</li>
<li>The API is complete and well defined, and supports all CRUD data operations you need in practice</li>
<li>HANA registers the dependency between the <code>.xsodata</code> file and the repository or database catalog objects it references. This is essential to create transportable solutions.</li>
<li>Typically, changes in table structure will be automatically picked up by the <code>.xsodata</code> service and is a low maintenance task. Writing the <code>.xsjs</code> service in that way implies another level of complexity which is certainly doable but increases development and maintenance effort.</li>
</ul>
So: what's not to like about OData? We'll certainly get back on that topic, especially in <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_2.html">the last part of the series</a>! But for now let's just be happy with the benefits of <code>.xsodata</code> over <code>.xsjs</code>.
<h2>Summary</h2>
We learned that we must create a table with a <code>BLOB</code> column to hold our documents, and a <code>FULLTEXT INDEX</code> so HANA knows to analyze the contents. After some consideration we decided to try if we can upload use HANA's OData implementation to upload document content to such a table.
<br/>
<br/>
In <a href="https://rpbouman.blogspot.com/2019/12/building-ui5-demo-for-sap-hana-text_1.html">the next installment</a>, we will explain in more detail how to build these backend objects.rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-10294097945236327762018-03-18T01:10:00.002+01:002018-03-18T01:56:05.639+01:00A Tale of a JavaScript Memory LeakAbstract: Matching JavaScript regular expressions against large input strings with V8 can result in memory leaks. In this post, I explain how to troubleshoot the issue using Google Chrome heap snapshots. Finally, a fix proposed by my son David (age 14) is presented.
<h3>Background</h3>
At <a href="https://www.just-bi.nl/" target="just-bi">Just-BI</a> we developed a browser-based application for one of our customers. One way the application gets its data is by loading and parsing Microsoft Excel files. The app is succesful and our customers are happy, a fact they express by attempting to load ever larger files.
<br/>
<br/>
On mobile Safari (iPad), our app starts crashing when the files reach a certain size. This happens around 5 to 6 MB. Argueably, that's not that large, but things are a bit more complicated: Microsft Excel files, at least those of the .xlsx variety, are actually zip-compressed folders, containing mostly <a href="https://msdn.microsoft.com/en-us/library/dd922181(v=office.12).aspx" target="ms">OOXML spreadsheet</a> documents.
<br/>
<br/>
It is a well known fact that XML is really verbose, and I suppose we should be grateful that our 5 - 6 MB excel files uncompress to only 40 MB of XML.
<br/>
<br/>
We could debug the issue somewhat, and we noticed that by the time Safari crashes, it does so reporting it is out of heap space. We can't be sure if that's the actual cause, but we think we might be able to overcome or at least postpone this issue by somehow cutting down on memory usage.
<br/>
<br/>
This brings us to the main topic of our tale.
<h3>Parsing xlsx files in the Browser</h3>
To parse xlsx files, we use a particular javascript library called <a href="https://github.com/SheetJS/js-xlsx" target="_github">js-xlsx</a>. This is actually a pretty nice piece of work, and I do not hesitate to recommend it. We have used it for quite a while without major issues; It's just that the particular strategey that this library uses to parse the xlsx file will temporarily spike memory usage, and we believe this triggers some bug in Safari, which eventually leads to a crash.
<br/>
<br/>
So, we're currently investigating a less general, less standards-compliant way to parse xlsx files. In return, this allows us to parse xlsx faster, and with a much reduced peak-memory usage.
<br/>
<br/>
I don't want to talk too much in detail about the xlsx parser we're developing. All I can say it is not meant to be a general, fully featured xlsx parser; the only requirement it is designed to fulfill is to avoid or at least postpone the crash we observe in Safari, and to parse portions of xlsx workbooks having a well-known structure specific to our application requirements.
<br/>
<br/>
I am happy to report that, with just one day of work, we managed to get an initial version of our parser to work. It has a peak-memory usage that is half that of the previous solution. And it sounds even more spectacular when I write: 100% less memory!
<br/>
<br/>
As an added bonus, the new solution is just a little less than 10x as fast as the old solution. For some reason, it does not sound much cooler when I say the new solution is about an order of magnitude faster, so I won't.
<h3>JavaScript Regular Expressions</h3>
Our parser relies on <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions" target="_mdn">JavaScript regular expressions</a>, which are exposed through the global built-in <code><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp" target="_mdn">RegExp</a></code> object.
<br/>
<br/>
One might be aware that for at least 100 different programming languages there at least 10,000 stackexchange answers to wittily denounce any attempt to parse XML using regular expressions. Some bloggers' entire careers seems built entirely around their particular brand of scorn and disdain about this topic.
<br/>
<br/>
We have little to add to discussions like these, other than that in modern JavaScript runtimes, regular expressions are a very productive and powerful tool for quickly building tokenizers. These tokenizers can have amazing performance, and can serve admirably as a foundation to build parsers of many kinds, including but certainly not limited to XML-parsers.
<h3>A Memory Leak</h3>
Despite the initial success, not all is well with our new xlsx parser. We found that, notwithstanding lower peak-memory consumption, it did suffer from a memory leak. We noticed this by creating a simple sample application, loading only our parser, and then making <a href="https://developers.google.com/web/tools/chrome-devtools/memory-problems/heap-snapshots" target="_chrome">heap snapshots using Chrome developer tools</a> during various phases of the process. See the screenshot below:
<br/>
<br/>
<img width="900" height="596" src="https://drive.google.com/uc?export=download&id=1iSDYRxG4YrpF1bScHkcRpCTSnqMJ2mWr"/>
<br/>
<br/>
This is what happens:<ol>
<li>First heap snapshot was made directly after loading the application, and measures 6.5 MB. Whether you think this is a lot or not, this is our baseline, and there is not much we can do about it now.</li>
<li>Next, the user picks a xlsx workbook, and the application opens it. The snapshot is now 12.6 MB, which is an increase of 6.1 MB as compared to our baseline. The workbook file is a little less than 6 MB and accounts for most of the increase. At this point, our sample application has also extracted and parsed the list of worksheets contained within the workbook, as well as the shared string table. I haven't looked at that in detail, but for now I am satisfied to believe that this accounts for the remaining extra memory.</li>
<li>At this point, we extracted the worksheet of interest from the workbook and uncompressed it into a javascript string. This made our heap snapshot increase by almost 34 MB. That is certainly a lot! However, the filesize of the worksheet document itself is 34,445 kB, so it seems everything is accounted for.</li>
<li>The next heap snapshot was taken after parsing the worksheet and building a parse tree. The snapshot weighs 77.3 MB - an increase another 31 MB. Now, the sheet has 32,294 rows with 24 cells of data each, and most of the cells are unique decimal numbers, so it is a decent chunk of data. But even then, it still feels as if this is way too large.<br/><br/>That said, things probably look worse than they really are. Our new parser is event based: the parse method accepts a configuration object that contains only a callback, which is called every time a new row is extracted from the sheet. For our sample application, the callback is only a very naive proof of concept. I suspect there are plenty opportunities to make the parse tree builder smarter and the parse tree smaller.</li>
<li>The last heap snapshot was taken after the parse. At this point, the parse tree, the workbook object, and the XML string went out of scope. But we are still looking at a heap snapshot of more than 40 MB! This is bad news: we really should be back at something close to heap snapshot 1. So, there's about 34 MB unaccounted for.</li>
</ol>
In the screenshot, you can also see what's hogging the memory: in the top right pane, we find our XML document string, which indeed accounts for the retained 34 MB of memory. In the bottom right pane, we can see who's still referencing it: it's some property called <code>parent</code> of <code>sliced string @15298471</code>. And these are referenced twice in some array, which is referenced by something called <code>regexp_last_match_info</code> in the native context.
<h3>Memory Leak, Explained</h3>
Now, what I think we're looking at is the <code><a target="_mdn" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastMatch">lastMatch</a></code> property of the global built-in <code><a target="_mdn" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp">RegExp</a></code>-object.
<br/>
<br/>
If you're not familiar with JavaScript regular expressions, it might be helpful to consider exactly how our parser uses them. We're using code like this:<pre>
function parse(){
//regular expression to match a <row> start-tag.
var regexp = /<row\s([^>]+)\s*>/g;
var match, data, xml = "...this is our huge XML document, loaded into a JavaScript String...";
match = regexp.exec(xml);
if (match) {
//...do something with matched substrings contained as array elements in match...
data = match[1];
...etc...
}
}
</pre>
(Note that this is just an example of the concept - not literally the actual code)
<br/>
<br/>
The <code>parse()</code> function first assigns a literal regular expression to the <code>regexp</code> variable. Under the covers, this literal regular expression results in calling the built-in global <code>RegExp</code> constructor, instantiating a new <code>RegExp</code> instance. Then, the <code><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec" target="_mdn">exec</a></code>-method of the <code>RegExp</code> instance is called, passing the -huge- XML document string. The <code>exec</code>-method returns a object representing the result: if there was no match, <code>null</code> is returned; if there is a match an object is returned that contains information about the match(es).
<br/>
<br/>
If there was a match, the <code>match</code> object will look a lot like an overloaded array object, having the matched parts of the string argument as elements. The element at index <code>0</code> of the match object (<code>match[0]</code>) is the substring matching the entire regular expression, the element at index <code>1</code> is the substring that matches the first parenthesized <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#special-capturing-parentheses" target="_mdn">capturing group</a>, and so on.
<br/>
<br/>
Now, since the <code>match</code> variable is a local variable in our parse function, everything should be garbage collectible after the function ends, right? <br/>
<br/>
Yes. But No.
<h4>About <code>RegExp.lastMatch</code></h4>
As it turns out, when an instance of the <code>RegExp</code> object finds a match, then the corresponding match info object containing all the matching substrings is stored in the <code><a target="_mdn" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastMatch">lastMatch</a></code> property of the global built-in <code><a target="_mdn" href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp">RegExp</a></code>-object. So, even if our <code>parse</code>-method is out of scope, the last match made by some regular expression inside it is still dangling around, attached to the global <code>RegExp</code> object in its <code>lastMatch</code> property.
<h4>Substrings in V8</h4>
Now, if the the <code>lastMatch</code>-object is still around, then the substrings representing the matches are also still around. As it turns out, V8 implements these substrings as "slice" objects. From within the JavaScript environment, they act and behave like <code>String</code> objects, but internally, the V8 javascript engine implements them as objects that have a <code>parent</code> property to keep a reference to the original <code>String</code> object from which they are a substring, along with some indexes to indicate what part of the original string makes up the substring.
<br/>
<br/>
Now, if you think about it, this way of implementing substrings is actually pretty clever, since it allows V8 to do many string manipulations very efficiently, minimizing overhead and memory consumption due to copying parts of strings to and through. In my case, it just becomes an atrocious memory hog because of the <code>RegExp</code> object, that has decided to maintain a reference to the last match object (for whatever reason).
<br/>
<br/>
Other people have ran into issues due to V8's substring design as well, and a bug was filed here:
<br/>
<br/>
<a href="https://bugs.chromium.org/p/v8/issues/detail?id=2869" target="_v8">https://bugs.chromium.org/p/v8/issues/detail?id=2869</a>
<h3>Solutions</h3>
My oldest son, David of 14 years old, came up with a pretty creative solution: what if we'd write our own substring implementation, overriding the native one? If this makes you cringe, just think of 30 MB memory leaks and crashing browsers: it puts things in perspective. If this still sounds crazy to you, you should realize that when he came up with this idea, we had been looking together at this issue for two hours already. And even though we felt we knew the substring issue was related, we still had no way to prove that this was actually the case. His idea was feasible and might possibly confirm our suscpicions, so we went ahead and did it, and attached our own implementations of <code><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substring" target="_mdn">substring</a></code> and <code><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/substr" target="_mdn">substr</a></code> to the <code><a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/proto">__proto__</a></code> object of our XML string to override only these methods of that string instance.
<br/>
<br/>
As was to be expected, our own substring implementations were way slower than the native ones, and the parse took about 25 times longer than before. However, it *did* solve the memory leak. This was a strong indication that we were on the right track.
<br/>
<br/>
Then, David suggested another solution: why don't we simply clear out the <code>lastMatch</code> property of the global built-in <code>RegExp</code> object? We tried to do this directly, simply by assignment:<pre>
RegExp.lastMatch = null;
</pre>
Unfortunately, this does not work. Although it does not throw a runtime exception, the <code>RegExp</code> object is protected against this kind of assignment, and the property never gets overwritten. However, it is still possible to achieve what we want, simply by instantiating a new <code>RegExp</code> object, and then forcing a match against a known, short string. We can then wrap that in a utility function, so we can always call it after doing some serious regular expression matching on large strings:<pre>
function freeRegExp(){
/\s*/g.exec("");
}
</pre>
Here's the heap snapshot after applying this fix:
<br/>
<br/>
<img width="900" height="596" src="https://drive.google.com/uc?export=download&id=1yU20ARIYRVq0xnF9Hiq9CXWUlJSb7aIS"/>
<h3>Summary</h3>
<ul>
<li>Globals are bad.</li>
<li>Side-effects are bad, in particular if the modifications are global.</li>
<li>V8's substring implementation may lead to unexpected memory leaks.</li>
<li>Chrome heap snapshots are a powerful tool to troubleshoot them.</li>
<li>After applying regular expressions to huge strings, always force a match against a small string to prevent memory leaks.</li>
<li>David Rocks! He truly impressed me with his troubleshooting skills and his knack for pragmatic, feasible solutions.</li>
</ul>rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com7tag:blogger.com,1999:blog-15319370.post-30454833080688821102017-06-18T14:59:00.000+02:002017-06-18T21:56:21.305+02:00UI5: per-view Internationalization - What about inheritance?I started to learn <a href="https://sapui5.hana.ondemand.com/#docs/guide/95d113be50ae40d5b0b562b84d715227.html" target="_ui5">UI5</a> in earnest in september 2016. Quite soon after that I wrote two blog posts about per-view internationalization:<ul>
<li><a href="http://rpbouman.blogspot.nl/2016/09/sap-ui5-per-view-internationalization.html" target="_rpbouman">SAP UI5: Per-view Internationalization</a></li>
<li><a href="http://rpbouman.blogspot.nl/2016/09/sap-ui5-internationalization-for-each.html" target="_rpbouman">SAP UI5: Internationalization for each view - Addendum for Nested Views</a></li>
</ul>
<br/>
<br/>
(In case you're wondering: per-view internationalization is the ability to maintain <code>i18n.properties</code> files, which contain translations for human readable texts together with the view code wherein these texts appear, rather maintaining just one giant i18.properties file that contains the translations for any and all internationalized texts that appear throughout the application. I think the benefits of this idea are clear from a developer's point of view. If you're interested in more information about this concept, then please check out the two blog posts I wrote prior)
<h2>Per-view i18n and inheritance</h2>
The reason for me to revisit the topic is that I found that the method I use to figure out which i18n.properties files to load, doesn't work very well when you're using inheritance. In UI5, inheritance is achieved by calling the <a href="https://sapui5.hana.ondemand.com/#docs/api/symbols/sap.ui.base.Object.html#.extend" target=""_ui5"><code>extend</code>-method</a> on the constructor you want to inherit from.
<br/>
<br/>
To understand why it doesn't work well, we should first define the scope and the desired behavior.
<br/>
<br/>
In my case, I'm using inheritence to build controllers for views that are a lot like other existing views, but with some additional features. Most if not all of the behavior (and texts) of the existing view should be copied, and I'm managing that copy by extending the controller of the existing view. It's just that we need some extra things in the view, and these extra things might bring their own internationalization texts along.
<br/>
<br/>
So, what we really need is to include not only the i18n files for the extended controller, we also need to load any i18n files that might be created for the views managed by the superclasses of the extended controller. We need to mind the order too: the extended view might choose to override some of the texts defined by a superclass, so the i18n texts that are closer in the chain of inheritance should be given precedence.
<h2>A Per-view i18n solution that works with inheritance</h2>
The following code will solve this problem:
<pre>
_initI18n: function(){
//first, check if we already constructed the i18n model for this class
if (this.constructor._i18Model) {
//we did! Don't do all that work again, just use the existing one.
this.setModel(this.constructor._i18Model, i18n);
return;
}
//check if the view and controller are in the same directory.
//if they are not, then we need to take the possibility into account
//that the view and the controller might both have their own i18n files.
var stack = [];
var controllerClass = this.getMetadata();
var viewName = this.getView().getViewName();
if (controllerClass.getName() !== viewName) {
//if the view name is different from controller name,
//then we assume the view may have its own i18n
//that overrides those of the controller
stack.push(viewName);
}
//walk the chain of inheritance up to sap.ui.core.mvc.Controller
//store each superclass at the front of the stack
var className, rootControllerClassName = "sap.ui.core.mvc.Controller";
while (true) {
className = controllerClass.getName();
if (className === rootControllerClassName) {
break;
}
stack.unshift(className);
controllerClass = controllerClass.getParent();
}
//walk the stack and create a resourcebundle for each class
//use it to enhance this class' i18n model.
stack.forEach(function(className){
var bundleData;
if (window[className] && window[className]._i18Model) {
bundleData = window[className]._i18Model.getResourceBundle();
}
else {
className = className.split(".");
//snip off the local classname to get the directory name
className.pop();
//add i18n to make the i18n directory name
className.push(i18n);
//add i18n to point to the i18n.properties file(s)
className.push(i18n);
bundleData = {bundleName: className.join(".")};
}
var i18nModel = this.getModel(i18n);
if (i18nModel) {
i18nModel.enhance(bundleData);
}
else {
i18nModel = new ResourceModel(bundleData);
this.setModel(i18nModel, i18n);
}
}.bind(this));
//cache the i18n model for new instances of this class.
this.constructor._i18Model = this.getModel(i18n);
}
</pre>
Note that this code replaces the <code>_initI18n</code>-method that appeared in my prior blog posts on this topic. It is also assumed that this method sits in some abstract base controller, which you'll extend to create actual concrete controllers for your views.
<br/>
<br/>
Here are a couple of highlights that explain the new and improved <code>_initI18N</code>-method:<ul>
<li>Examining the entire chain of inheritance, up until <code>sap.ui.core.mvc.Controller</code>. This is achieved with this snippet: <pre>
var stack = [];
...
var className, rootControllerClassName = "sap.ui.core.mvc.Controller";
while (true) {
className = controllerClass.getName();
if (className === rootControllerClassName) {
break;
}
stack.unshift(className);
controllerClass = controllerClass.getParent();
}
</pre>
The subsequenct <code>forEach</code>-iteration of the stack then constructs the resource bundle to enhance the i18n model in the usual way.
<br/>
<br/>
Note that names of superclasses that are "higher up" in the hierarchy (or put another way: more basal) are stacked in front of subclasses. This way, the <code>forEach</code>-array method will encounter the class names in the desired order, allowing the subclasess to override texts added by superclasses.</li>
<li>Distinguish between texts defined by the view and the controller.
<br/><br/>
Admittedly this scenario is quite rare, but if the controller and view each define their own i18n files, then we'd like to enhance our i18n model with files from both resourcebundles. I somehwat arbitrarily decided that in this case, the view's texts should probably override those of the controller.
<br/><br/>
I achieve this with this piece of code: <pre>
var stack = [];
var controllerClass = this.getMetadata();
var viewName = this.getView().getViewName();
if (controllerClass.getName() !== viewName) {
//if the view name is different from controller name,
//then we assume the view may have its own i18n
//that overrides those of the controller
stack.push(viewName);
}
</pre>
In other words, the view name is placed as very last item of the stack; if it differs from the controller name, then the controller name appears directly before it, making the view resource bundle the last to enhance our i18n model.
</li>
<li>Caching the i18n model at the class level, so that every instance may reuse it.
<br/>
<br/>
While I fixed the inheritance issue, it occurred to me that all instances of the controller would go through their own cycle of building the i18n model. Since the i18nmodel deals almost only with static texts, it seemed wasteful to repeat all that work for each instance. We can simply store the i18n model as a property of the constructor, and retrieve it any time we're creating a new instance.
<br/>
<br/>
This is achieved with the very first and last bits of the <code>_initI18n</code>-method:<pre>
//first, check if we already constructed the i18n model for this class
if (this.constructor._i18Model) {
//we did! Don't do all that work again, just use the existing one.
this.setModel(this.constructor._i18Model, i18n);
return;
}
...
...
...
//cache the i18n model for new instances of this class.
this.constructor._i18Model = this.getModel(i18n);
</pre>
</li>
</ul>
<h2>Finally</h2>
I hope you enjoyed this post. Let me know and drop a line! rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com1tag:blogger.com,1999:blog-15319370.post-48797927273298037092017-06-17T12:48:00.000+02:002017-06-18T21:55:48.836+02:00Team Just-BI wins 2nd Prize at Dutch Accountability Hack 2017!To whom it may concern,
<br/>
<br/>
one week ago, Friday, 9th of June 2017 I was at the Dutch house of representatives ("de tweede kamer") to participate in the <a href="https://openstate.eu/en/2017/04/dutch-house-of-representatives-hosts-hackers-for-accountability-hack-2017/" target="_acchack">2017 Accountability Hackathon</a>.
<h2>The Accountability Hack Event</h2>
<img src="https://openstate.eu/wp-content/uploads/sites/14/2017/06/Screen-Shot-2017-06-10-at-00.03.17-260x260.png" style="float: left; margin:1em"/>
The event was organized and sponsored by a number of Dutch ministeries, the <a href="http://www.courtofaudit.nl/english" target="rekenkamer">Court of Audit</a> ("Algemene Rekenkamer"), the Central Agency for Statistics ("Centraal Bureau voor de Statistiek") and the <a href="https://openstate.eu/en" target="_openstate">Open State Foundation</a>. Goal of the event was to invite programmers, developers, data analysts, journalists and so on to come together and create applications that use one or more of the numerous <a href="https://data.overheid.nl/" target="opendata">open data sources</a> published by the Dutch government to create insights in the performance or the spending of Dutch governmental or publicly subsidized organisations.
<br/>
<br/>
This assignment alone might need some clarification. In Dutch democracy there has always been a push towards more transparency. But in the last decade in particular, there has been an increasing demand to provide this transparency by publishing openly accessible data sets. The idea is that publishing records and metadata contributes to an environment where civilians can answer any question about how their government is functioning themselves, by querying and combining their data. Now, obviously, not everybody is capable of working with raw data sets, so there is also a demand for tools, applications and people with know-how bridge the technical gap and truly making all this data available on a functional level.
<br/>
<br/>
This is where events like the Accountability Hackathon come in: it is a direct attempt to stimulate individuals, but also commercial companies to apply their expertise to create applications and tools that provide meaningful information and insights, based on open data.
<h2>Team Just-BI</h2>
<a href="https://www.just-bi.nl" target="justbi"><img src="https://just-bi.nl/wp-content/uploads/2013/03/logo3.png" style="float: right; margin: 1em;"/></a>
I participated on behalf of my company <a href="https://www.just-bi.nl/" target="justbi">Just-Business Intelligence</a>. Just-BI provides end-to-end Business Intelligence consultancy. I'm in the custom development branch, which creates web and mobile applications in the realm of self-service and operational Business Intelligence.
<br/>
<br/>
Just BI has a policy of assigning consultants to billable projects for at most 80% of their working time; The remaining 20% is meant to be invested in knowledge development. We try to align agendas and meet each other every friday at our office in Rijswijk.
<br/>
<br/>
This arrangement made it possible for me to attend an event like the Accountability Hackathon - in fact, Just-BI stimulates its consultants to reach out and participate in events like these.
<h2>Submission: Jubilant</h2>
The Just-BI submission is a generic <a href="http://www.odata.org/" target="_odata">OData</a> query and exploration tool called Jubilant (short for <b>Ju</b>st <b>B</b>usiness <b>I</b>nte<b>l</b>ligence <b>An</b>alysis <b>T</b>ool).
<br/>
<br/>
<img src="https://openstate.eu/wp-content/uploads/sites/19/2017/06/593ac7fc4a456_Jubilant2eKApi.png"/>
<br/>
<br/>
Jubilant is an <a href="http://openui5.org/" target="ui5">Open UI5</a> web application that provides a plugin architecture that makes it easy for developers to write their own data visualisations based on OData services. Jubilant provides rich metadata about OData services, as well as a number of reusable components that make it easy to quickly build a query editor/designer.
<br/>
<br/>
The Jubilant concept allows a plugin developer to focus on making a cool visualisation, without having having to invest time and effort to provide the user with a query builder. During the hackathon I managed to create two plugins - one simple table visualisation, which simply renders raw data in a data grid, and a OData Metadata Graph visualiser, which plots the structure, entity types and relationships exposed by the OData service as a graph.
<h2>OData and Open Data</h2>
The connection to open data and the assignment for the Accountability Hack is that a number of key open data API's use the OData protocol. A good example is the <a href="https://opendata.tweedekamer.nl/" target="tk">Dutch Parliament API</a>.
<br/>
<br/>
Interestingly, there are relatively few OData query tools available, and none of them are particularly affordable. In fact, during the accountability hackathon a few teams tried to work with these OData APIs and discovered they didn't quite know how to access and process them. I don't know if this finding influenced the jury in any way, but it certainly highlighted the need for a tool like Jubilant.
<br/>
<br/>
For Just-BI OData is a key protocol as well, since it happens to be the standard way of exposing data by many SAP products, like <a href="https://www.sap.com/products/hana.html" target="_sap">SAP/HANA</a>. While Just-BI is a general end-to-end Business Intelligence shop, many of our customer engagements have a strong focus on SAP products. This is also reflected in the Open UI5 framework, which has rather good support for OData.
<h2>Result: 2nd Prize!</h2>
<img src="https://www.just-bi.nl/wp-content/uploads/posts/2017/post_4962/hack-2017-300x146.jpg" style="float: left; margin: 1em;"/>
I was surprised, but obviously very happy to have been awarded the 2nd prize, which is good for 1.500 EUR. It is a honour and a privilege to be in a position to work on stuff I like and maybe contribute something to the transparency of the Dutch democracy. And, frankly, I just had a great time hacking!
<br/>
<br/>
<a href="https://www.just-bi.nl/just-care/" target="justbi" style="float: right; margin: 1em;"><img src="https://www.just-bi.nl/wp-content/uploads/posts/2013/post_2133/JustCareFB.png"/></a>
Since this was basically just a working day for me, I decided to donate 500 EUR of the prize money to <a href="https://www.just-bi.nl/just-care/" target="justbi">Just-Care</a>, which is a charity supported by Just-BI.
<br/>
<br/>
<h2>Where to get Jubilant?</h2>
At Just-BI, we're currently working out the exact details around a release of Jubilant. I will write an update as soon as I can disclose more, but I can already say that all the work I did for the accountability hack will become available as Open Source Software in the very near future.
<br/>
<br/>
In the mean while, if you're interested in Jubilant and OData-based self-service BI, don't hesitate to <a href="mailto:roland.bouman@gmail.com" target="me">contact me</a>. Or <a href="https://www.just-bi.nl/contact/" target="justbi">contact Just-BI</a>.
rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-45844231153096584142017-05-12T17:53:00.000+02:002017-08-15T10:34:55.493+02:00Do we still need to talk about Data Vault 2.0 Hash keys?A few days ago, I ran into the article <a href="http://blog.scalefree.com/2017/04/28/hash-keys-in-the-data-vault/" target="_scalefree">"Hash Keys In The Data Vault"</a>, published recently (2017-04-28) on the <a target="_scalefree" href="http://blog.scalefree.com">the Scalefree Company blog</a>. Scalefree is a company, <a href="http://www.scalefree.com/aboutus/#founders" target="_scalefree" >founded</a> by Dan Linstedt and Michael Olschminke. Linstedt is the inventor of <a href="https://danlinstedt.com/solutions-2/data-vault-basics/" target="_linstedt">Data Vault</a>, which is a method to model and implement <a href="https://en.wikipedia.org/wiki/Data_warehouse" target="_wiki">enterprise data warehouses</a>.
<br/>
<br/>
The article focuses on the use of <a href="https://en.wikipedia.org/wiki/Hash_function" target="_wiki">hash-functions</a> in <a target="_wiki" href="https://en.wikipedia.org/wiki/Data_vault_modeling">Data Vault data warehousing</a>. To be precise, it explains how Data Vault 2.0 differs from Data Vault 1.0 by using hash-functions rather than sequences to generate <a href="https://en.wikipedia.org/wiki/Surrogate_key" target="_wiki">surrogate key</a> values for business keys. In addition, hash-functions are suggested as a tool to detect change of non-business key attributes to track how their values change over time.
<h3>Abstract</h3>
First I will <a href="#scalefree-summary">analyze</a> and <a href="#scalefree-comments">comment</a> on the Scalefree article and DV 2.0, and explain a number of tenets of DV thinking along the way. <a href="#critique">Critical comments</a> will be made about using hash values as primary keys in DV 2.0, and the apparent lack of progress made in DV thinking regarding this matter. <a href="#birthday">A discussion</a> follows about the birthday problem and how it relates to DV 2.0 usage of hash functions. Using the <a href="#square-approximation">Square approximation method</a> it will be demonstrated how we can make informed and accurate decisions about the risk of a collision with regard to the data volume and choice of hash function. <a href="#impact-of-collision">Different scenarios</a> regarding the impact of a collision on data integrity will then be explored. Finally, a <a href="#proposal">practical proposal</a> is made to detect hash collisions and to prevent them from introducing data integrity violations into our data warehouse.
<h2 id="scalefree-summary">A Summary of the Scalefree article</h2>
I encourage you to first <a href="http://blog.scalefree.com/2017/04/28/hash-keys-in-the-data-vault/" target="_scalefree">read the original article</a>. My summary is here below.
<ol>
<li>The business key is what the business users use to identify a business object.</li>
<li>To identify business objects in the data warehouse, we should use a <a href="https://en.wikipedia.org/wiki/Surrogate_key" target="_wiki">surrogate key</a> that fits in one column instead of the (possibly composite) business key.</li>
<li>The reason to use a single-column surrogate key is that business keys can be large (many bytes per key field, multiple key fields) which makes them slow - in particular for join operations.</li>
<li>In DV 1.0, sequences are used to generate values for surrogate keys.</li>
<li>Using sequences to generate surrogate key values implies a loading process consisting of at least 2 phases that have to be executed in order.
<br/>
<br/>
The previous point requires some context regarding the architecture of the Data Vault model. Without pretending to completely represent all tenets of DV, I believe the explanation below is fair and complete enough to grasp the point:
<br/>
<br/>
DV stores the data that makes up a business object in 3 types of tables:
<br/>
<br/>
<ul>
<li>The business key and its corresponding surrogate key are stored in hub-tables.</li>
<li>The change history of descriptive attributes are stored in satellite-tables</li>
<li>Relationships between business objects are stored in link-tables.</li>
</ul>
<br/>
So, each distinct type of business object corresponds to one hub-table, and at least one, but possibly multiple satellite-tables. The satellite-tables refer to their respective hub-table via the surrogate key. Likewise, link-tables maintain relationships between multiple business objects by storing a combination of surrogate keys referring to the hub-tables that participate in the relationship.
<br/>
<br/>
This means that for a single new business object, any of its satellite-tables can be loaded no sooner than when it is known which surrogate key value belongs to its business key. Since the new sequence value is drawn when loading the new business key into its designated hub-table, this means that the new business object must be loaded into its hub-table prior to loading its satellite-tables.
Likewise, any relationships that the business object may have with other business objects can be loaded into link-tables only after *all* business objects that are related through the link have been loaded into their respective hub-tables.
<br/>
<br/>
In practice this means that in DV 1.0, you'd first have to load all hub-tables, drawing new surrogate key values from the sequences as you encounter new business keys. Only in a subsequent phase would you be able to load the satellite- and link-tables (possibly in parallel), using the business key to look up the previously generated surrogate key stored in the hub-tables.
<br/>
<br/>
</li>
<li>In DV 2.0, <a href="https://en.wikipedia.org/wiki/Hash_function" target="_wikipedia">hash-functions</a> are used to generate values for surrogate keys.</li>
<li>When using hash-functions to generate surrogate key values, hub-, satellite- and link-tables can all be loaded in parallel.
<br/>
<br/>
The previous point needs some explanation.
<br/>
<br/>
There is no strict, formal definition of a hash-function.
Rather, there are a number of aspects that many hash-functions share and on which DV 2.0 relies:<ul>
<li><a href="https://en.wikipedia.org/wiki/Hash_function#Determinism" target="_wiki">Deterministic</a>: a given set of input values will always yield the same output value. In a DV context, deterministic just means one business key maps to exactly one surrogate key.
<br/><br/>
To some extent, a solution based on a sequence as generator of surrogate keys also appears to be deterministic, at least within the confines of one physical system.
</li>
<li>Stateless: While a solution based on a sequence appears to be deterministic, its values are generated by incrementing the previous value, and by "remembering" the mapping from business key to its corresponding surrogate key by storing them together in the hub. Remembering the mapping by storing it in the hub is what makes it deterministic, because only then do we have the ability to look up the surrogate key based on the business key (and vice versa). But if we'd have two separate but otherwise identical systems, and load the same set of business objects into both, but each in a different order, then the mapping from business key to surrogate key will be different, depending on which object was loaded first in that particular system, because the one loaded earlier will draw a lower sequence number.
</li>
<li><a href="https://en.wikipedia.org/wiki/Hash_function#Defined_range" target="_wiki">Fixed output size</a>: DV expects a chosen hash-function to always return a fixed-length value output. (And typically, the length of the hash key should be smaller, ideally, much smaller than its corresponding business key)
</li>
<li><a href="https://en.wikipedia.org/wiki/Hash_function#Uniformity" target="_wiki">Uniformity</a>: This means that inputs of the hash-function yields results that are very evenly divided across the output domain. If that is the case, the chance that different inputs (i.e., two different business keys) yield the same output value is very small. The phenomenon where two calls to a hash-function, using different arguments, yield the same result value is called a <i>collision</i>. We will have much to say about collisions later on in this article.
</li>
</ul>
<br/>
Basically the idea is that a hash-function can be used to generate a surrogate key value that is typically much smaller (and thus, "faster", in particular for join operations) than its corresponding business key, and it can do so without an actual lookup to the set of existing surrogate key values. Ideally a hash-function calculates its output based soley based on its input, and nothing else.
<br/><br/>
It is in this latter aspect that makes it different from a sequence, which generates values that have no relationship at all with the values that make up business key. In the case of a sequence, the relationship between surrogate key and business key is maintained by storing it in the hub, and looking it up from there when it is needed.
<br/><br/>
</li>
<li>When using hash-functions there's a risk of collision.
A collision occurs when two different inputs to the hash-function generate identical output.
</li>
<li>
The risk of collision is very small. In a database with more than one trillion hash-values, the probability that you will get a collision is like the odds of a meteor landing on your data center.
</li>
<li><a href="https://en.wikipedia.org/wiki/MD5" target="_wiki">The MD5 hash-function</a> is recommended for DV 2.0. As compared to other popular hash-functions (like MD6, SHA1 etc), MD5 storage requirements are relatively modest (128 bits), while the chance of a hash-collision is 'decently low'. It's also almost ubiquitously available across platforms and databases.</li>
</ol>
<h2 id="scalefree-comments">Comments</h2>
Many tenets of Data Vault thinking make good sense: <ul>
<li>The focus on business keys, rather than blindly using the primary key of the source systems feeding into the EDW, ensures that the result will serve the needs of the business.</li>
<li>While the use of surrogate keys in transactional systems is still often a source of debate, this practice is not controversial in data warehousing.
<br/>
<br/>
That said, I think the technical reasons for introducing a surrogate key as mentioned by the article (make keys smaller and more wieldy to improve join performance)
are not as important as the functional requirement of any data warehouse to integrate data from multiple datasources, and ensuring identity there is resilient to change of the source systems.
<br/>
<br/>
</li>
<li>There's a lot to be said in favor of maintaining attribute history in satellite-tables. In particular, the ability to have multiple satellite-tables is appealing, since it allows you to create and maintain groups of attributes that somehow belong together, and maintain them as a unit. For example, grouping attributes based on their rate of change, or according to whether they tend to appear together in queries sounds like a very useful thing.</li>
<li>Link-tables also sound like a good idea. Decoupling direct links from the business objects and maintaining them in completely separate tables makes the EDW very resilient to changes in the data sources. In addition it allows you to add extra relationships that may not be directly present in the source systems but which make sense from the point of view of data integration and/or business intelligence.
</li>
<li>
Once we accept the benefits of satellite- and link-tables and surrogate keys, then we must also embrace hub-tables. The model wouldn't make any sense without it, since we need an integration point anyway to maintain the mapping between business and surrogate key (which would of course be the hub-table).
</li>
<li>Once we accept the modeling entities of DV, we also need to accept any constraints that surrogate key generation may have on the loading process. Of course, a dependency itself is something we would like to avoid, but the fact that a known limitation is recognized and anticipated is a good thing.</li>
</ul>
<h2 >Flashback a few years ago</h2>
When I read the Scalefree article, I experienced a bit of a flashback that took me back some two, three years to <a target="linkedin" href="https://www.linkedin.com/groups/44926/44926-5922882750100033537">this question ("Hash Key Collisions") by Charles Choi on the DataVault discussions group on linkedin</a>. In case you cannot access linkedin, I'm reprinting the question below: <blockquote>According to Dan's paper "DV2.0 and Hash Keys", the chances of having a hash key collision are nearly nonexistent (1/2^128). But I still worry.
<br/>
<br/>What if a collision actually does occur that results in the business making a catastrophic decision? Can we really say we have 100% confidence in the "system of fact" when we choose to accept the risk of a collision? What is our course of action from a technology point of view if a collision did actually occur?</blockquote>
The paper "DV2.0 and Hash Keys" that Charles refers to is publicly available though not freely. (You can purchase it via Amazon.)
Here's the relevant statement from the article:<blockquote>The mathematical chances of a collision as a result of using MD5 are (1 / (2^128)) which is 1 in 340 undecillion 282 decillion 366 nonillion 920 octillion 938 septillion 463 sextillion 463 quintillion 374 quadrillion 607 trillion 431 billion 768
million 211 thousand 456.<br/><br/>In reality, you would have to produce 6 billion new business keys per second per hub for 100 years to reach a 50% chance of getting a collision. Not very likely to happen in our lifetime.</blockquote>
A <a href="http://danlinstedt.com/allposts/datavaultcat/datavault-2-0-hashes-versus-natural-keys/#comment-2905" target="_linstedt">similar question</a> was asked by Ray OBrien in response to a post by Linstedt (see: <a target="_linstedt" href="http://danlinstedt.com/allposts/datavaultcat/datavault-2-0-hashes-versus-natural-keys">#datavault 2.0 hashes versus natural keys</a>):<blockquote>collisions are real and MD5 not a good choice. but generally the smaller the input key domain and the larger the Hash output size, the less chance of collision, BUT it is always there.. so I would like to see some comments on the verification steps needed and cost to load of Collision ManagementI. If Integrity of data is important, then this is important.</blockquote>
In <a target="_linstedt" href="http://danlinstedt.com/allposts/datavaultcat/datavault-2-0-hashes-versus-natural-keys/#comment-2906">Linstedt's answer</a>, you can find a similar expression of this probability, which goes:<blockquote>because we split business keys across multiple hubs, the chances of collision (even with MD5) are next to none.<br/><br/>Yes, they do exist – but you would have to produce 6 billion new business keys Per Second PER HUB in order to reach a 50% chance of a collision in the first place.</blockquote>
<h2 id="critique">Objections</h2>
Regardless of the merits of DV (which I believe I stated fairly in the previous section) I have a few doubts and objections on the writing and thinking by DV 2.0 advocates I have observed so far; all around the topic of hash-key collisions. My objections are:
<ul>
<li>The wording around the probability of hash-collisions is not helpful to understand the risk. As such, it does not help to decide whether to use DV 2.0, let alone choose a suitable hash-function for a concrete use-case.</li>
<li>The actual numbers regarding probability of hash-collisions are stated flat-out wrong on more than just one occasion.</li>
<li>The probability doesn't matter if you can't afford to lose any data</li>
<li>DV 2.0 does not discuss the consequences of a hash-collision, and no concrete advice is given on how to detect hash-collisions, let alone handle them.</li>
</ul>
I believe the questions by Charles Choi and Ray OBrien show that I am not alone. At the time they voiced their doubts, DV 2.0 was relatively new and I can understand that maybe at the time these tenets of DV 2.0 would still need to mature.
<br/>
<br/>
A couple of years have passed since, and after reading the article on the scalefree company blog, I am sad to observe that, apparently, no progress has been made in DV 2.0 thinking. At least, if such progress has been made, the scalefree company blog article doesn't seem to offer any new views on the matter. Instead, it comes up with an - equally unhelpful - restatement of the probability of hash-collision by comparing it to the chance of being hit by a meteor.
<br/>
<br/>
In the remainder of the article I will explain my objections and attempt to offer some thoughts that may help advance these matters.
<h3 id="unhelpful-wording">The DV 2.0 wording around the probability of hash-collisions is not helpful</h3>
First, let's try and analyze the wording around probabilities, and what message it conveys.
<h4>Distracting Rhetorics</h4>
What does it really mean when someone says that "the probability [...] is like the odds of a meteor landing on your data center"? What does it mean, really, when someone says that the probability is "1 in 340 undecillion 282 decillion 366 nonillion 920 octillion 938 septillion 463 sextillion 463 quintillion 374 quadrillion 607 trillion 431 billion 768 million 211 thousand 456"?
<br/>
<br/>
Well obviously, they are saying the chance is very small. Maybe it's just me, but I also sense a level of rhetoric in the wording that seems to be intended to dwarf he reader with Big Serious Numbers.
<br/>
<br/>
It's almost as if they're saying: this won't happen, so you shouldn't worry. You're not worrying about meteors hitting your data center all the time, so why worry about a hash-collision? Right?
<br/>
<br/>
You can observe that the rhetoric is working too, just look at how Charles Choi voiced his question: "According to Dan (Linstedt) [...] the chances [...] <i>are nearly nonexistent</i> (1/2^128). <i>But I still worry.</i>".
<br/>
<br/>
It's as if Charles is apologizing in advance for worrying.
<h4>Probabilities are not absolute</h4>
There is a more fundamental problem with the probability wording of the previous 2 examples, and that's that they project probability as an absolute.
<br/>
<br/>
To be fair, if you read them in context, then you'll notice that both statements are about MD5 collisions. Obviously, this matters, since not all hash-functions have equal probability of collisions. For hash-functions that have a fixed-length output, the chance surely has to have a relationship with the length of the output, since that puts a hard limitation on the number of unique values it could possibly encode.
<br/>
<br/>
However, apart from the output length of the hash-function and the algorithm it uses, there is at least one other factor which determines the probability, and that is your data volume.
<br/>
<br/>
Intuitively, this is easy to understand: if you have an empty set, the probability of the first hash-value causing a collision is exactly zero, since there is nothing to collide with. At the other extreme end of possibilities, if the set contains as many items as the total number of unique values the hash-function is capable of generating, then the probability of a collision is exactly one, since the entire keyspace has been "used up" already. In between these extremes, we have a growing number of existing entries that could collide with a new entry, so the probability increases from zero to one as the actual number of items (i.e, the number of rows in the hub - the data volume) increases.
<br/>
<br/>
While this observation seems trivial, it is important to mention it in this discussion because the aspect of volume is, for some reason, often not touched at all by DV 2.0 advocates. It's a mystery why that should be so, because if we would know the relationship between probability of a collision, maximum possible number of unique values, and the maximum volume of data, then we could reason about these variables sensibly. Like:<ul>
<li>Given the maximum risk that I am willing to take to lose data due to a collision, what is the maximum volume I can store if I use a 128-bit hash-function?</li>
<li>Given the maximum risk of collision that I am willing to take, and given the maximum number of rows I need to store, what would be the mimumum output length of the hash-function I should look for?</li>
<li>Given my current data volume, and my current choice of hash-function, what is the risk I am running now of losing data due to collision?</li>
</ul>
<h4 id="birthday">The Birthday Problem</h4>
Interestingly, Linstedt does provide one statement that at least takes the data volume into account: "you'd have to produce 6 billion new business keys Per Second PER HUB in order to reach a 50% chance of a collision in the first place". Let's see what that means exactly.
<br/>
<br/>
Apart from the probability, this statement includes the other two variables: 128 bits keylength of MD5; a data volume of 6 bio rows per second for a 100 years. But if you look at the probability (50%) you'll notice how completely useless this wording is, that is, apart from its rhetorical power. Who in their right mind is interested in a system that can only accept half of the data you're trying to store in it? How can you possibly apply this piece of knowledge in any practical sense to a system you have to actually build?
<br/>
<br/>
I spend a while thinking about how this statement could have come about, and I have to come to believe that it is actually a restatement of the classical <a href="https://en.wikipedia.org/wiki/Birthday_problem" target="_wiki">birthday paradox</a>: <blockquote>The birthday paradox, also known as the birthday problem, states that in a random gathering of 23 people, there is a 50% chance that two people will have the same birthday.</blockquote>
(As compared to Linstedt's statement, the 50% probability stays the same; the 6 billion rows per second for a 100 years is equivalent to the number of people gathered, and the number of possible unique values in the key corresponds to the number of days in a year.)
<br/>
<br/>
Whether or not the original birthday problem statement is actually what made Linstedt's word his statement like he did, I think it's clear that a 50% probability of a collision has no practical bearing on building any kind of database. To me, it just sounds like more rhetoric to convince that hash-collisions are really rare.
<h3 id="square-approximation">Probabilities are stated flat out wrong</h3>
The discussion of the birthday problem brings us to an actual method to calculate the probability as a function of data volume and key length. The wikipedia article I linked to provides an exact method, as well as many useful approximations, which are much easier to calculate. The birthday problem wikipedia article explains it far better than I ever could do, and also provides this really useful rule-of-the-thumb called <a href="https://en.wikipedia.org/wiki/Birthday_problem#Square_approximation">the square approximation</a>: <blockquote>A good rule of thumb which can be used for mental calculation is the relation<br/><br/><table style="font-size: 14pt;" cellpadding="0" cellspacing="0">
<tr><td rowspan="3">p(n) ≈</td><td style="text-align: center">n<sup>2</sup></td></tr>
<tr><td><div style="background-color: black; font-size: 2px; ">/</div></td></tr>
<tr><td style="text-align: center">2m</td></tr>
</table><br/>which can also be written as<br/><br/>
<div style="font-size:14pt">n ≈ <span style="font-family: courier">√</span> (2m * p(n))</div>
<br/>which works well for probabilities less than or equal to 0.5
</blockquote>
with
<dl>
<dt>n<dt><dd>the actual size of the keyset - i.e, the number of rows you need to fit into the hub</dd>
<dt>p(n)<dt><dd>the probability of a collision given <b>n</b>.</dd>
<dt>m<dt><dd>the theoretical maximum size of the keyset, i.e. the maximum number of unique values that your hash-function can encode.</dd>
</dl>
Given a fixed length hash-function, such as MD5, then <b>m</b> can be calculated by raising it to the power of the keylength (expressed as the number of bits): <blockquote>
<br/><br/>
m = 2 ^ bitlength (or 2<sup>bitlength</sup>)
<br/><br/>
</blockquote>(In case this needs explaining: if your key would be just one bit long, then it can old only two values - 0 or 1 or 2<sup>1</sup>. If the key would be 2 bits long, it could hold 2 * 2, or 2<sup>2</sup> = 4 unique values. With 3 bits, 2 * 2 * 2 or 2<sup>3</sup> = 8 and so on)
<br/><br/>
From the discussion above as well as the previous section, we can now conclude that at least one statement of the probability:<blockquote>The mathematical chances of a collision as a result of using MD5 are (1 / (2^128))</blockquote>is simply flat-out wrong, since it does not take the data volume into account. Rather, since 2^128 is the number of possible unique values that MD5 can cover, 1 / 2 ^ 128 is the chance that the second row you put into your hub will collide with the first one.
<br/><br/>
So how big or small are the odds really of running into a MD5 collision in case we're handling a volume of 6 billion rows per second for a 100 years? Using square approximation, we get:<br/>
<br/>
<table style="font-size: 14pt;" cellpadding="0" cellspacing="0">
<tr><td rowspan="3">p(n) ≈</td><td style="text-align: center">(6,000,000,000 rows * 60 seconds * 60 minutes * 24 hours * 365.25 days * 100 years)<sup>2</sup></td><td rowspan="3"> ≈ 0.526</td></tr>
<tr><td><div style="background-color: black; font-size: 2px; ">/</div></td></tr>
<tr><td style="text-align: center">2 * (2 ^ 128 bits in a MD5 hash-value)</td></tr>
</table><br/>
<br/>
which is in fact closer to 53%. To figure out how many years we would need to insert 6 billion rows per second to achieve the 50% chance of running into a collision, we can use the second form of the formula:
<br/>
<br/>
<div style="font-size:14pt">n ≈ <span style="font-family: courier">√</span> (2 * (2 ^ 128 bits in MD5 hash-value) * 50% probability) ≈ 1.84467e+<sup>19 </sup> rows in the hub</div>
<br/>
Dividing by 6,000,000,000 rows * 60 seconds * 60 minutes * 24 hours * 365.25 days will give you 97 and slightly less than a half year.
<br/>
<br/>
The point of doing these calculations here is obviously not to prove Linstedt wrong by showing you'd already arrive at a 50% after only 97 years and some instead of after a 100. Nor is it to determine that after 100 years, the chance is actually closer to 53%. Besides the fact that I'm using an approximation, neither makes any sense anyway, because the probability of 50% is already way, way beyond any definition of a working system.
<br/>
<br/>
The point I am trying to make is that it is perfectly possible to reason about large numbers and to clearly and transparently demonstrate how they are calculated. Using square approximation, you have a tool to calculate the value of the third variable once you have the value of the other two, allowing you to reason about it from three different angles.
<br/><br/>
I think we can all agree that's a much better position than getting stumped by Really Seriously Big Numbers.
<h4>Probabilities add up for each keyset</h4>
So far, we've just looked at the probability for encountering a collision while loading a single hub.
But the probability increases as you have more hubs.
<br/>
<br/>
Intuitively this is clear because each hubs could encounter a collision independently.
So, the chance of suffering a collision in either one of them grows as you have more hubs to maintain, and is quite a bit more than the chance of suffering a collision in just one particular hub.
<br/>
<br/>
If we'd like to compute the chance of getting a collision in at least one hub we can apply the following calculation:<blockquote>
1 - ((1 - Ph1) * (1 - Ph2) * ... * (1 - PhN))
</blockquote>with: <dl>
<dt>Ph1</dt><dd>Probability of a collision in the first hub</dd>
<dt>Ph2</dt><dd>Probability of a collision in the second hub</dd>
<dt>...</dt><dd></dd>
<dt>PhX</dt><dd>Probability of a collision in the last hub</dd>
</dl>
The rationale behind this is that if the probibility of a collision is P(n), then the chance of not having a collision is 1 - P(n). To calculate the chance of not having a collision in any of the hubs, we have to multiply the individual chances of not having a collision in one particular hub with each other. If that number is the chance of not having a collision in any of the hubs, then all remaining probability must mean there is a collision in one or more hubs. So the chance of having at least one collision is obtained by substracting the probability from having no collision at all from 1.
<br/>
<br/>So the chance that someone ever will encounter a collision could be quite a bit larger than you'd expect if you're focussing on just one hub.
<h3 id="I-cannot-afford-to-lose-data">No matter how slight a Probability, I can't afford to lose data</h3>
Now we arrive at a more fundamental objection regarding the matter of using hash-values as keys in your database.
<br/>
<br/>
If you re-read Charles question, you'd notice that he is politely explaining that, although he understands and appreciates that a MD5 hash-collision maybe really rare, he simply doesn't ever want to lose any data because of it. Ray OBrien raises the exact same point, even mentioning data integrity as the reason why he cares.
<br/>
<br/>
When this issue is put forward in DV 2.0 disucssions, it usually means the end of any meaningful discussion. The answers that DV 2.0 advocates usually give in response typically match one of the following:
<br/>
<br/>
<table>
<tr>
<td>"Look, do you realize how rare a hash-collision is?"</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Yes thank you. You just stumped me with some Really Seriously Big Numbers, and I get it. Super rare. I just don't want to lose data though.</td>
</tr>
<tr>
<td>"You don't have to use 128-bits MD5, you can use a hash-function that returns larger values, like 160-bits SHA1. Collisions will then be, you know, even more rare."</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Perfect thanks. Did I mention I can't afford to lose data?</td>
</tr>
<tr>
<td>"We use 2 hash-functions as key and it works for us."</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Ah, I get it now. You made collisions Super-duper-rare, how clever. So will you never lose data now?</td>
</tr>
<tr>
<td>"I have built hundreds of Terabyte-sized data warehouses, and I never encountered a hash-collision."</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Well let me guess. Might that be because they are very rare?</td>
</tr>
<tr>
<td>"Teradata is using hashing to solve MPP data distribution, and Hadoop uses hashing in HDFS. If it works for them, then why wouldn't it work in DV 2.0"</td>
<td></td>
</tr>
<tr>
<td></td>
<td>So you're saying Teradata and Hadoop use hashing for some purpose, and DV2.0 is using hashing for a completely different purpose, and now you want me to explain why it works in one use-case but not for a completely different use-case? That's...interesting. <br/><br/>How about: Teradata and Hadoop are not using hash values as primary keys, and DV 2.0 is?</td>
</tr>
<tr>
<td>"Look, why are you so worried about hash-collisions? You're not worrying all the time about a meteor hitting your data center, are you?"</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Actually, I do. That's why our database is geographically distributed across data centers.</td>
</tr>
<tr>
<td>"Ha! Gotcha now. What about two meteors? Wouldn't that be comparable to using two hashes?"</td>
<td></td>
</tr>
<tr>
<td></td>
<td>I suppose it would. Difference is, I can't help meteors falling on my data centers. But I can choose to stick to sequences instead of hash-functions.</td>
</tr>
<tr>
<td>"But I just explained, sequences are a bottleneck and prevent you from parallel loading your EDW!"</td>
<td></td>
</tr>
<tr>
<td></td>
<td>Yes I heard. And I asked my customer: they are pretty sure about how they feel regarding the possibility of losing data, and they clearly told me they'd rather wait around for the data to be loaded as compared to being able to report super-quickly on wrong data.</td>
</tr>
<tr>
<td>"We use hash-keys to store tweets for sentiment analysis, and our results are pretty accurate, even if we lose data sometimes."</td>
<td></td>
</tr>
<tr>
<td></td>
<td>I'm sure you are- good for you! But we manage a monetary transaction log and we feel that the risk of losing one $1,000,000,000 transaction just doesn't justify loading 1,000,000,000 transactions worth $1 super-fast in a parallel fashion. Silly us eh?</td>
</tr>
</table>
<br/>
<br/>
And that's really all there is to it: Probabilities don't mean a thing if you're really sure you don't want to lose any data.
When it happens, it is no comfort that you were the one to have had such extraordinarily bad luck experiencing it.
<br/>
<br/>
Another thing people may overlook is that the probability also doesn't tell you when it will happen. The only real guarantee you have is that inserting the first key in an empty hub will always succeed. But already the 2nd row might collide. It probably won't, and the odds are really slim. But it might. If you're sure you don't want that, then don't use hash values as keys. It's really that simple.
<br/>
<br/>
Sure, there might be other risks that could make us lose data. For example, the probability of disk corruption might be larger than that of a hash-collision. But it doesn't follow that we should be setting ourselves a trap if we can avoid it, especially if you know how to avoid it.
<br/>
<br/>
We cannot control disaster like disk corruption or meteor impact. If we could though, we would! Whether to use hash keys or to stick to sequences is a conscious choice, so let's be sure we make it based on information and requirements, and rather not based on some analogy that is chosen with the express purpose of making you feel a little bit ridiculous for being so averse to taking a risk.
<br/>
<br/>
If you're building a database for someone else, and you're considering to use hashes as keys for your data, then be prepared to ask you customer: "How many data can you afford to lose?" or "In the event that we cannot load some data due to hash-collisions, how much time and effort can we spend to take it offline so we can fix it?"
<br/>
<br/>
Another way to think about it is this: Suppose you would, in fact, lose data because of a collision. Then how comfortable are you to admit to your customer that you constructed a solution, that, by design, could end up giving you wrong results, while there was an alternative that guarantees correctness, at least to the extent of things you can control? And suppose you would get wrong results, did you anticipate just how wrong those results could be?
<br/>
<br/>
I truly feel that considerations like this are not on the database/data warehousing professional - they are on the customer. It's their data. Please, respect that.
<h3 id="impact-of-collision">What if we do have a collision?</h3>
I get that there are use cases where you might want to accept the small risk of a collision. But you cannot really, truly make that assessment if you haven't considered and anticipated it as if it is a real event actually hitting you. I think DV 2.0 falls short in nourishing healthy discussion regarding the anticipation of such events.
<br/>
<br/>
So, what will happen if you have your hash-keys in place, and you encounter a collision? We can try and anticipate a few concrete scenarios.
<br/>
<br/>
First of all, will you even detect a collision when it happens? The Scalefree article has a few flow diagrams showing how raw data is staged, and then loaded into a hub. In that flow, the row is dropped when the hash already exists. So the question now is, why did the hash already exist?
<br/>
<br/>
Obviously, it's possible that we already loaded this business object, and we're merely seeing it again. In that case, we're fine and we'll simply be loading newer data into the satellites and links for that business object. But it's also possible that this is an entirely different business object that happened to yield the exact same hash-value as that of a different business object loaded earlier. In other words, you now have a collision. For the hub, it will pass by unnoticed, but the satellites and links that point to that business object will now store data pertaining to more than one distinct business objects.
<br/>
<br/>
So in this case, we're not losing data, but compromising the integrity of the business object that arrived earlier. Our database integrity is now violated and our queries will return wrong results. You won't know though, because you didn't attempt to detect a collision. To your data vault, the distinction between multiple different business objects has ceased to exist.<br/><br/>
I don't know about you but this does not feel like a happy place to me - especially if this an inevitable and avoidable consequence of the design. (And a whole bunch of bad luck of course).
<br/>
<br/>
Alternatively, we build our solution in such a way that we can at least detect collisions. Once we detect it, we can maybe prevent loading associated data for the satellite-tables and link-tables for the colliding business object. This means we will be completely ignoring the later arriving business object, as if it isn't there.
<br/>
<br/>
We have now lost data. That is not a good thing, but at least this allows the earlier arriving business object to maintain its integrity. Our query results won't be wrong, they will just be incomplete. To me this is a slightly happier place, but the fact that it's a matter of fate which one of the objects made it into the database, and which one was rejected still makes me feel that the solution has failed.
<br/>
<br/>
But suppose we do want to go for that treatment (after of course getting confirmation from the business that this is really what they want) - how can we implement it? Well, at the very least, the load process for either the hub or the staging area would need to compare the hash value as well as the business key. Only if both are equal can we consider the objects equal.
<br/>
<br/>
Making the comparison is not hard but it will of course be slower than only calculating the hash, because in this case you need a lookup on the hub, just like you did with the DV 1.0 solution based on sequences.
<br/>
<br/>
But what if we detect the collision in this way? How can we then use this information to prevent loading the associated satellite- and link-table data?
<h2 id="proposal">Introducing a collision table</h2>
We could store collisions in a special collision table. We'd get one such table for each hub. The collision table would have the same layout as the hub table to which it corresponds, and you'd use it to store the colliding hash, as well as the business key for which the collision was met. The key of the collision table would have to be made up of both the hash key as well as the business key, so that we can handle multiple collisions for the same hash key.
<br/>
<br/>
Once the collision table is in place, and the process for loading the hub-tables is modified to detect and store collisions, we can think about the process for loading the satellite- and link-tables. I can see two options<ul>
<li>Have the load process for satellite- and link-tables do a lookup to the collision table to see if you need to discard data for business objects with collisions</li>
<li>Check collision tables after the load and run a clean-up process to restore integrity after detecting new collisions.</li>
</ul>
<h4>Collision table lookup</h4>
This solution relies on the process that loads the satellite- and link-tables to make a lookup to the collision table, using both the calculated hash and the business key. If collisions really are as rare as they should be, then that lookup should be really fast, because the collision tables will be pretty much empty.
<br/><br/>
If all goes well, then the lookup will fail. This means we have no collision and we can proceed loading the satellite- and link-tables.
In the rare event that the lookup succeeds there is apparently a hash-collision, and we must not load the satellite- and link-tables to prevent violating the integrity of our data. You are now in a position to either discard the data, or to store it someplace else in case you have a clever idea of reconciling the data later on.
<br/><br/>
However, we now have reintroduced the constraint on the loading process, because we now rely on the process that loads the hubs to also detect and store collisions in the collision table.
<br/>
<br/>
So, with this solution, we lose the ability to load hub-tables in parallel with satellite- and link-tables. What may be of some comfort in comparison with the DV 1.0 sequence based solution is that the collision table lookup will be much faster than a hub-lookup to find the value generated by a sequence, because the collision table will be pretty much empty. So, the burden of the constraint and loading dependency should be much lighter than in the case of a DV 1.0 sequence based solution.
<br/>
<br/>
Another important drawback is that if your solution spans multiple systems, you need to maintain one set of collision tables somewhere, and all loading processes will be dependent upon them. In other words, the solution is not stateless anymore.
<h4>Clean-up after load in case of new collissions</h4>
Alternatively, we keep loading hub-, satellite- and link-tables in parallel, and we check the collision tables after each load to see if the last load introduced any new collisions. If we find that it did, we need to perform clean-up after the load.
<br/>
<br/>
The way clean-up would work is as follows: our load process should have been logged and our satellite- and link-tables should have metadata identifying the load process that put its contents in the data warehouse. The load identifier would also be stored in the collision table. Using that information, we can identify which new collision our last load introduced. We now have the load identifier as well as the hash key of the new collision, and we can then use that to delete all satellite- and link-table rows that have the load identifier of our latest load, as well as the hash key of the colliding business object.
<br/><br/>
After clean-up, we have restored data integrity for those business objects that encountered a collision, up to the point prior to the last load.
<br/>
<br/>
Now, we know that our last load brought us a business object that had a hash collision with some business object that was already in our data warehouse, and we made the conscious decision to reject that data for now, or maybe store it somplace else untill we know how to reconcile it. But the last load might also have brought us data that actually belonged to the business object that already existed in our data warehouse. We would really like to reload that part of the data for the existing business object. Our clean-up process had no way of distinguishing between satellite- and link- data for the one or the other business object, because it only knows about their colliding hash keys. In other words, our clean-up process might have removed data that actually did belong to the already existing business object, and now we need to put that back.
<br/>
<br/>
The solution would be to have an alternative load process especially for this -hopefully exceptional- case.
<br/>
<br/>
The alternative load process would be similar to the collision table lookup solution described above. It would only load satellite- and link-tables, and it would include a lookup to the collision table. If the lookup fails, we're dealing with data belonging to the existing business object and we can load it. If the lookup succeeds then this is data that belongs to a new business object that caused a collision and we should discard it or store someplace else for later reconciliation.
<br/>
<br/>
If things go the way they should, then the clean-up-and-reload process should occur seldom. And if we need to run it, it would probably be quite fast since it deals with only a few business objects - typically only one.
<br/>
<br/>
The only drawback now is that our data warehouse lived through a short period where integrity was compromised during the load process. But at least, we can repair integrity for all existing business objects, and selectively discard only data for those business objects that suffer from hash collion with an already existing business object.
<br/><br/>
While this approach still relies on keeping collision tables around, we regain our ability to load hub-, satellite- and link-tables in parallel. We can even do loads spanning multiple systems; we just need to take care to clean those up as well in the case we do encounter a collision.
<h4>Other solutions?</h4>
I don't want't to pretend this is an exhaustive list - I'm hoping there are more options and I just can't think of them right now.
<h3 id="How-to-load-colliding-business-objects">How to load the colliding business objects?</h3>
What these scenario's do not solve is loading the later arriving business object. We only managed to prevent these objects from entering our data warehouse, but have not found a solution to load that data as well.
<br/>
<br/>And, We can't - not unless we change the key.
<br/>
<br/>
On the other hand, changing the key might just be doable: you could decide to try another hash-function for at least that hub. You would need to update all satellites and links pointing to that hub (and of course, the hub itself) and rehash every row that points to it.
<br/>
<br/>
Of course, since you're still relying on hash-keys, just - hopefully - larger ones, you haven't solved the problem, you've just increased the odds. And you might even run into new collision while you're doing the rehashing operation. But that's just the life you've chosen. At least we now have something that resembles dealing with the problem rather than praying it won't happen.
<br/>
<br/>
I guess my main point here is - make sure every stakeholder is actually accepting the risk and make sure the procedures for dealing with a collision are specified and tested. The fact that the chance you'll need it may be next to neligible is not a license to pretend you do not need to be prepared to do these tasks. If it is decided that you'll be taking the risk, then actually do take the risk, and take it seriously.
<h3 id="have-cake-eat-it-too">Can't we have our cake and eat it too?</h3>
Isn't there some way we can benefit from hashes and still, magically immunize ourselves against hash-collisions? It turns out we can.
<br/>
<br/>
To recap:<ul>
<li>DV, like other data warehousing methods - suggests using surrogate keys, because business keys are largish and unwieldy, and slow down join operations.</li>
<li>DV 2.0 suggests using hash-keys to avoid sequences, which make it impossible to load hubs, satellites and links all in parallel, and which slow down the load due to a lookup in a large hub.</li>
</ul>
This maybe a longshot, but it has as advantage the guarantee that it will work. As should be amply clear from the discussions in this article, hash-keys always have the chance of a collision, and it doesn't matter how small the risk is if you already know you do not want to accept losing data or giving up integrity. So, what is clear is that if that is the requirement, you cannot use hashes as keys. Period.
<br/>
<br/>
But that does not mean we cannot benefit from hashes.
<br/>
<br/>
Especially if the hash-key is small in comparison with the business key, then we could build up our keys as a composite of the hash-code, followed by the field or fields that make up the business key. If our database uses B-tree indexing, then any joins will be able to resolve the join in the very vast majority of cases only based on the hash-key column, which should be the first column in the key definition. You would still keep using the fields of the business key in your join conditions, to ensure that, in case of a hash-collision, the query will still return the correct result. Since the collisions are so rare, the database will end up with very few rows after it has resolved the join over the first field in the key, so the overhead of such a large key should be minimal.
<br/>
<br/>
Of course, this solution does not help in cutting down the amount of data you need to store - that will be much more in this scenario since you keep dragging the business keys literally everywhere: every satellite and every link table that points to this hub will inherit the column for the hash, as well as all columns that make up the business key. But if it helps - it should just add to the storage requirements, and not that much in terms of join processing.
<br/>
<br/>
The benefits of the -relatively- fast join will break down though when the database already uses hash-joins. In those cases, it will not be able to use the first column of the keys as prefix.
<h2>Finally</h2>
I hope you enjoyed this article. I am super curious to hear from DV practitioners if they have scenarios we can learn from. Drop a line in the comments! I'm looking forward to it.
rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com14tag:blogger.com,1999:blog-15319370.post-68901414153119408732017-03-30T00:52:00.002+02:002017-03-30T00:57:41.969+02:00Announcing Shivers - Visualizing SAP/HANA Information View DependenciesDear SAP HANA DBa's, developers, architects etc, we at <a href="http://www.just-bi.nl/" target="_just">Just BI</a> released <a href="https://github.com/just-bi/shivers">Shivers</a>.
<h4>What is Shivers?</h4>
Shivers stands for <b>S</b>AP/<b>H</b>ANA <b>I</b>nformation <b>V</b>iew<b>ers</b>. It is a tool to analyze and visualize dependencies between SAP/HANA information views (that is, Analytic Views, Attribute Views, and Calculation Views) and catalog objects (such as tables and views).
Below is a screenshot of Shivers so you can get an impression of what it looks like:
<br/>
<br/>
<a href="https://github.com/just-bi/shivers" target="_github"><img src="https://raw.githubusercontent.com/just-bi/shivers/master/doc/shivers-app-demo.png"/></a>
<h4>Why Shivers?</h4>
In our work as SAP- and Business Intelligence consultants, SAP/HANA information views are a key tool in delivering Business Intelligence and Custom App Development Solutions to our customers.
<br/>
<br/>
In some cases, information views can get quite complex. Visualization and documentation features offered by development tools, like HANA Studio, do not always provide the kind of insight we need to create or maintain our solutions.
<br/>
<br/>
One very particular and concrete case is creating technical documentation, or preparing presentation materials for knowledge transfer sessions to handover to the customer support organization. In the past, some of our consultants would manually piece together overview slides of the data objects (information views, tables) that make up our solutions. I decided to try to whip up a solution that would make tasks like this a little simpler.
<h4>How do I install and run Shivers?</h4>
The simplest option is to <a href="https://github.com/just-bi/shivers/archive/master.zip" target="_github">download a shivers.zip archive</a> from <a href="https://github.com/just-bi/shivers" target="_github">github</a>. You can then unzip the archive, and open the index.html file inside in a modern webbrowser (tested with chrome and IE11 but should work in all modern browsers).
<br/>
<br/>
Note that you do not need to install shivers on a web server or HANA XS server. It runs directly from your local disk in your browser. Of course, you can install Shivers on a webserver or HANA XS server if you like, and then you can open Shivers by navigating to the url corresponding to the place where you installed Shivers. But this won't affect how Shivers works.
<h4>How do I load information views into Shivers so I can visualize them?</h4>
On the Shivers toolbar in the top of the application, you'll find a file chooser button. When you click that, a file browser will open. Use it to browse to whatever location on your local disk where you keep your information view source files.
<br/>
<br/>
I for example have my HANA Studio workspace stored in C:\Users\rbouman\hana_work, so I browse to that location and then find the appropriate path in the subdirectories to the HANA System and package I'm interested in. When I found a directory containing information view source files, I selected them, and confirm the file chooser.
<br/>
<br/>
Shivers will then prompt for a package name. It needs to do this to know how other views could refer to this set of views. Since the package name is itself not stored in the information view source files, the user has to supply that information. Later, when analyzing dependencies, Shivers will use the entered package name and match it with pacakge names it encounters inside the information view source files, when that information view refers to other information views.
<br/>
<br/>
After these steps, Shivers will populate the treeview on the left-hand side of the screen with the package structure and entries for the information views you loaded.
<br/>
<br/>
Note that shivers has a log tab. The log tab will report about the loading process, and it will also report it whenever it loads and information view that depends on other information views that are currently not loaded into Shivers. If you want to make complete and exhaustive graphs, you should then also load whichever file is reported in the log tab as an omitted dependency.
<br/>
<br/>
When the loaded files are visible in the treeview, you can click on any of the treenodes representing an information view. When you select one, a tab will open inside the Shivers window, showing a dependency graph of the selected information view.
<br/>
<br/>
Shivers currently does not do any clever layout of your dependency graphs. But you can manually drag and place items inside the graph to make it look good.
<br/>
<br/>
When you are done editing your graph, you can right click it, and choose "Export Image" to export the visualization to a png file, which you can then use in your technical documentation or knowledge transfer presentation slide deck.
<h4>What about HADES?</h4>
In my <a href="http://rpbouman.blogspot.nl/2016/10/sap-hana-on-which-base-columns-do-my.html" target="_me">previous blogpost</a> (<a target="_me" href="http://rpbouman.blogspot.nl/2016/10/sap-hana-on-which-base-columns-do-my.html">http://rpbouman.blogspot.nl/2016/10/sap-hana-on-which-base-columns-do-my.html</a>) I wrote about <a href="https://github.com/just-bi/hades" target="_github">HADES</a>, which is a bunch of (open source) utilities to report on information view metadata.
<br/>
<br/>
In a way, HADES and SHIVERS complement each other.
<br/>
<br/>
HADES are server-side tools (so far, only stored routines) that help analysis of information views. HADES is data-oriented, and requires access to a HANA SQL client and allows you to extract just about any information you'll ever want to know about your information views. But, HADES is in a way also very low-level, and using it effectively requires SQL skills.
<br/>
<br/>
Shivers is a graphically oriented client-side tool. It is implemented as a web page, that you start in your browser, directly from your file system. Shivers does not require any connection to a HANA Server. Rather, you use checked out information view source files (.analyticview, .attributeview and .calculationview files), load them into shivers, and then the tool does static code and dependency analysis. So far, possibilities to extend or influence reporting of Shivers are very limited. Shivers draws dependency graphs of your information views, and for now - that is it.
<h4>Why is Shivers an offline client-side tool?</h4>
Implementing Shivers as non-server, client side tool was a very deliberate decision. At our customer's systems, we cannot assume that we are allowed to install our tools on the HANA Server for just any purpose. So we really must have a solution that works regardless of that. What we can assume is that we have access to the information view source files, since we mostly work with HANA Studio and check out information view source files from the HANA repository all the time.
<h4>What are the terms and conditions for using Shivers?</h4>
Shivers is open source software under the <a href="https://github.com/just-bi/shivers/blob/master/LICENSE">Apache 2.0 License</a>.
This should give you all the freedom to copy, use, modify and distribute Shivers, free of charge, provided you respect the Apache License.
<br/>
<br/>
<h4>How can I get support Shivers?</h4>
We at <a href="http://www.just-bi.nl">Just BI</a> are always happy to help if you run into any issue, but Shivers is not currently a for-profit solution. If necessary we can always negotiate professional support should you require it.
<h3>We Want your Feedback!</h3>
Please check out Shivers and let us know what you think. You can either drop me a line at this blog. Or you can post issues in <a href="https://github.com/just-bi/shivers/issues" target="github">the github change tracker</a>.
<br/>
<br/>
Don't be shy - bugs, new feature proposals, critique - everything is welcome. Just let us know :)
<br/>
<br/>
You can also <a target="_github" href="https://github.com/just-bi/shivers/issues#fork-destination-box">fork the shivers project</a>, and contribute your work back via a pull request.
rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com1tag:blogger.com,1999:blog-15319370.post-14310632421158076622016-10-24T22:30:00.000+02:002016-10-25T22:03:05.284+02:00SAP HANA: On which base columns do my information views depend?<p>
For one of <a href="http://just-bi.nl/">Just-BI</a>'s customers, we're currently working to productize a custom-built proof-of-concept SAP/HANA application.
</p>
<p>
This particular application is quite typical with regard to how we use SAP/HANA features and techniques: it is a web-application for desktop and tablet devices, that we serve through <a href="https://eaexplorer.hana.ondemand.com/_item.html?id=11384#!/overview">SAP/HANA's XS engine</a>. For database communication, we mostly use <a href="http://www.odata.org/documentation/odata-version-2-0/overview/">OData</a> (using <a href="https://help.sap.com/saphelp_hanaplatform/helpdata/en/7c/c43e570b5648d69231fbd7a9c7bf90/content.htm?frameset=/en/b8/0f8b626b3d44f882e8f2c3ff45952d/frameset.htm&current_toc=/en/6e/284b62132c41caa173bf590e9be084/plain.htm&node_id=128&show_children=false">XS OData Services</a>), as well as the odd <a href="https://help.sap.com/saphelp_hanaplatform/helpdata/en/90/878018cccd40f7a4b6754c04e2d34a/content.htm?frameset=/en/a2/acd502e9544de298c3959f250127f5/frameset.htm&current_toc=/en/6e/284b62132c41caa173bf590e9be084/plain.htm&node_id=176&show_children=false">xsjs</a> request (in this case, to offer MS Excel export using <a href="http://rpbouman.blogspot.nl/2016/05/odxl-generic-data-export-layer-for.html">ODXL</a>)
</p>
<p>
The OData services that our application uses are mostly backed by <a href="http://help.sap.com/saphelp_hanaplatform/helpdata/en/2b/914d0dec5a4e928a98e6b69f5347ec/content.htm?frameset=/en/dc/a9644841514bdea35d2825eff02c58/frameset.htm&current_toc=/en/51/778a327cf443f3941013f30d8cc003/plain.htm&node_id=7&show_children=false">SAP/HANA Calculation views</a>. These, in turn, are built on top of a mixed bag of objects:</p><ul>
<li>Some of these are custom base tables that belong to just our application;</li>
<li>Some are base tables that collect output from an advanced analytics recommendations algorithm that runs in an external R server</li>
<li>Some are information views (analytic views, attribute views and calculations views) that form a virtual datamart (of sorts) on top of base tables replicated from various SAP ERP source systems to our SAP/HANA database.</li>
</ul>
<p>
One of the prerequisites to productize the current solution is a re-design of the backend. Redesign is required because the new target system will be fed from even more ERP source systems than our proof-of-concept environment, and the new backend will need to align the data from all these different ERP implementations. In addition, the R algorithm will be optimized as well: in the proof-of-concept environment, the advanced analytics algorithm passes through a number of fields for convenience that will need to be acquired from elsewhere in the production environment.
</p>
<p>
To facilitate the redesign we need to have accurate insight into which base columns are ultimately used to implement our application's data services.
As it turns out, this is not so easily obtainable using standard tools. So, we developed something ourselves. We think this may be useful for others as well, which is why we'd like to share it with you through this blog.
</p>
<h3>Information View Dependencies</h3>
<p>
The standard toolset offers some support to obtain dependencies for information views (analytic views, attribute views and calculation views):
</p>
<ul>
<li>If you're a HANA Studio user, you might be able to use the "Where-used-list" and/or "Column lineage" features. Check out <a href="https://www.linkedin.com/pulse/hana-modeler-impact-analysis-refactoring-calculation-views-krishna?articleId=7946314775951594313">Krishnamoh Krishna's wonderful blog</a> about this topic.</li>
<li>You can query the <code><a href="https://help.sap.com/saphelp_hanaplatform/helpdata/en/20/cbd12e7519101489c7cfcd0f32868d/content.htm">OBJECT_DEPENDENCIES</a></code> system view</li>
</ul>
<p>
As it turns out, these standard tools do not give us the detailed information that we need.
The HANA studio features are mainly useful when designing and modifying information views, but do not let us obtain an overview of all dependencies, and not in a way that we can easily use outside of HANA Studio.
The usefulness of querying the <code>OBJECT_DEPENDENCIES</code> system view is limited by the fact that it only reports objects - that is, base tables or information views - but not the columns contained therein.
</p>
<p>
It looks like <a href="https://archive.sap.com/discussions/thread/3621015">we're not the only ones struggling with this issue</a>.
</p>
<h3>Getting the information view's definition as XML from _SYS_REPO.ACTIVE_OBJECT</h3>
<p>
To get the kind of information we need, we're just going to have to crack open the definition of the information view and look what's inside.
As it turns out, HANA stores this as XML in the <code>CDATA</code> column of the <code>_SYS_REPO.ACTIVE_OBJECT</code> system table, and we can query it by package name, object name and object suffix (which is basically the extension of the file containing the definition that is stored in the repository):
</p>
<pre>
SELECT <b>CDATA</b>
FROM <b>_SYS_REPO.ACTIVE_OBJECT</b>
WHERE PACKAGE_ID = <span style="color:red">'my.package.name'</span>
AND OBJECT_NAME = <span style="color:red">'CA_MY_CALCULATION_VIEW'</span>
AND OBJECT_SUFFIX = <span style="color:red">'calculationview'</span>
</pre>
<p>
With some effort, <code>_SYS_REPO.ACTIVE_OBJECT</code> can be joined to <code>OBJECT_DEPENDENCIES</code> to discover the objects on which the information view depends:
</p>
<pre>
SELECT od.BASE_SCHEMA_NAME
, od.BASE_OBJECT_NAME
, od.BASE_OBJECT_TYPE
FROM _SYS_REPO.ACTIVE_OBJECT ao
<b>INNER JOIN OBJECT_DEPENDENCIES od
ON <span style="color:red">'_SYS_BIC'</span> = od.DEPENDENT_SCHEMA_NAME
AND ao.PACKAGE_ID||<span style="color:red">'/'</span>||ao.OBJECT_NAME = od.DEPENDENT_OBJECT_NAME
AND <span style="color:red">'VIEW'</span> = od.DEPENDENT_OBJECT_TYPE</b>
WHERE ao.PACKAGE_ID = <span style="color:red">'my.package.name'</span>
AND ao.OBJECT_NAME = <span style="color:red">'CA_MY_CALCULATION_VIEW'</span>
AND ao.OBJECT_SUFFIX = <span style="color:red">'calculationview'</span>
</pre>
<p>
(Note: <code>OBJECT_DEPENDENCIES</code> reports all dependencies, not just direct dependencies)
</p>
<p>
Or we can query the other way around, and find the corresponding model for a dependency we found in <code>OBJECT_DEPENDENCIES</code>:
</p>
<pre>
SELECT ao.PACKAGE_ID
, ao.OBJECT_NAME
, ao.OBJECT_SUFFIX
, ao.CDATA
FROM object_dependencies od
<b>INNER JOIN _SYS_REPO.ACTIVE_OBJECT ao
ON SUBSTR_BEFORE(od.base_object_name, <span style="color:red">'/'</span>) = ao.package_id
AND SUBSTR_AFTER(od.base_object_name, <span style="color:red">'/'</span>) = ao.object_name</b>
AND ao.object_suffix in (
<span style="color:red">'analyticview'</span>
, <span style="color:red">'attributeview'</span>
, <span style="color:red">'calculationview'</span>
)
WHERE od.DEPENDENT_SCHEMA_NAME = <span style="color:red">'_SYS_BIC'</span>
AND od.DEPENDENT_OBJECT_NAME = <span style="color:red">'my.package.name/CA_MY_CALCULATION_VIEW'</span>
AND od.DEPENDENT_OBJECT_TYPE = <span style="color:red">'VIEW'</span>
</pre>
<p>
<b>NOTE:</b> It turns out that querying <code>OBJECT_DEPENDENCIES</code> fails at reporting dependencies between analytic views and the attribute views they use.
To capture those dependencies, you need to query <code>_SYS_REPO.ACTIVE_OBJECTCROSSREF</code>.
</p>
<h3>Parsing the information view's XML definition with stored procedure <code>p_parse_xml</code></h3>
<p>
Once we obtained the XML that defines the information view, we still need to pull it apart so we can figure out how it is tied to our base table columns.
To do that, we first apply a general XML parser that turns the XML text into a (temporary) table or table variable, such that each row represents a distinct, atomic element inside the XML document.
For this purpose I developed a HANA stored procedure called <code><a href="https://github.com/just-bi/hades/blob/master/procedures/p_parse_xml.sql">p_parse_xml</a></code>. Here is its signature:
</p>
<pre>
create PROCEDURE p_parse_xml (
<span style="color:grey">-- XML string to parse</span>
p_xml nclob
<span color:grey">-- Parse tree is returned as a table variable</span>
, out p_dom table (
<span style="color:grey">-- unique id of the node</span>
node_id int
<span style="color:grey">-- id of the parent node</span>
, parent_node_id int
<span style="color:grey">-- dom node type constant: 1=element, 2=attribute, 3=text, 4=cdata,</span>
<span style="color:grey">-- 5=entityref, 6=entity, 7=processing instruction, </span>
<span style="color:grey">-- 8=comment, 9=document, 10=document type, </span>
<span style="color:grey">-- 11=document fragment, 12=notation</span>
, node_type tinyint
<span style="color:grey">-- dom node name: tagname for element, attribute name for attribute,</span>
<span style="color:grey">-- target for processing instruction,</span>
<span style="color:grey">-- document type name for document type,</span>
<span style="color:grey">-- "#text" for text and cdata, "#comment" for comment,</span>
<span style="color:grey">-- "#document" for document, "#document-fragment" for document fragment.</span>
, node_name nvarchar(64)
<span style="color:grey">-- dom node value: text for text, comment, and cdata nodes, data for processing instruction node, null otherwise.</span>
, node_value nclob
<span style="color:grey">-- raw token from the parser</span>
, token_text nclob
<span style="color:grey">-- character position of token</span>
, pos int
<span style="color:grey">-- lenght of token.</span>
, len int
)
<span style="color:grey">-- flag whether to strip text nodes that only contain whitespace from the parse tree</span>
, p_strip_empty_text tinyint default 1
)
</pre>
<p>
Note that you can <a href="https://github.com/just-bi/hades/blob/master/procedures/p_parse_xml.sql">download the source dode for the entire procedure from github</a>.
The <code>p_parse_xml</code> procedure depends on <a href="https://github.com/just-bi/hades/blob/master/procedures/p_decode_xml_entities.sql"><code>p_decode_xml_entities</code></a>,
so if you want to run it yourself, be sure to install that first.
</p>
<p>
To see how you can use this, consider the following, simple example:
</p>
<pre>
call p_parse_xml(
'<parent-element attribute1="value1">
<child-element attribute2="value2" attribute3="value3">
text-content1
</child-element>
<child-element att="bla">
text-content2
</child-element>
</parent-element>', ?);
</pre>
<p>This gives us the following result:</p>
<pre>
+---------+----------------+-----------+----------------+-----------------------------+-------------------------------------------------------------+-----+-----+
| NODE_ID | PARENT_NODE_ID | NODE_TYPE | NODE_NAME | NODE_VALUE | TOKEN_TEXT | POS | LEN |
+---------+----------------+-----------+----------------+-----------------------------+-------------------------------------------------------------+-----+-----+
| 0 | ? | 9 | #document | ? | ? | 1 | 221 |
| 1 | 0 | 1 | parent-element | ? | <parent-element attribute1=\"value1\"> | 1 | 36 |
| 2 | 1 | 2 | attribute1 | value1 | attribute1=\"value1\" | 2 | 20 |
| 3 | 1 | 1 | child-element | ? | <child-element attribute2=\"value2\" attribute3=\"value3\"> | 41 | 55 |
| 4 | 3 | 2 | attribute2 | value2 | attribute2=\"value2\" | 42 | 20 |
| 5 | 3 | 2 | attribute3 | value3 | attribute3=\"value3\" | 62 | 20 |
| 6 | 3 | 3 | #text | text-content1 | text-content1 | 96 | 23 |
| 7 | 1 | 1 | child-element | ? | <child-element att=\"bla\"> | 139 | 25 |
| 8 | 7 | 2 | att | bla | att=\"bla\" | 140 | 10 |
| 9 | 7 | 3 | #text | text-content2 | text-content2 | 164 | 23 |
+---------+----------------+-----------+----------------+-----------------------------+-------------------------------------------------------------+-----+-----+
</pre>
<p>
The result is a tabular representation of the XML parse tree. Each row essentially represents a <a href="https://www.w3.org/TR/REC-DOM-Level-1/level-one-core.html#ID-1950641247">DOM Node</a>, and the column values represent the node's properties:
</p>
<ul>
<li>The <code>NODE_TYPE</code> column tells us what kind of node we're dealing with. Values in this column conform to the w3c standard document object model (DOM) enumeration of node type values.
The most important ones are 1 for element nodes ("tags"); 2 for attributes, and 3 for text. The entire parse tree is contained in a document node, which has node type 9.
</li>
<li>The <code>NODE_ID</code> is the unique identifier of the node while <code>PARENT_NODE_ID</code> points to whatever node is considered the parent node of the current node.
The parent node is basically the container of the node.
As you can see, the element with <code>NODE_ID=3</code> has the element node with <code>NODE_ID=1</code> as parent.
These correspond to the first <code><child-element></code> and <code><parent-element></code> elements in the document.
Attribute nodes are also marked as children of the element to which they belong. The DOM standard does not consider attributes children of their respective element node, but <code>p_parse_xml</code> does, mainly to keep the result table as simple as possible.
</li>
<li>The <code>NODE_NAME</code> column is a further characterization of what kind of node we're realing with. For most node types, the node name is a constant value which is essentially a friendly name for the node type.For example, document nodes (<code>NODE_TYPE=9</code> always have <code>#document</code> as <code>NODE_NAME</code>, and text nodes (<code>NODE_TYPE=3</code>) always have <code>#text</code> as <code>NODE_NAME</code>.
For element nodes and attribute nodes (<code>NODE_TYPE</code> is <code>1</code> and <code>2</code> respectively), the <code>NODE_NAME</code> is not constant. Rather, their node name conveys information about the meaning of the node and its contents. In other words, element and attribute names are metadata.
</li>
<li>The <code>NODE_VALUE</code> column contains actual data. For element and document nodes, it is alway NULL. For attributes, the <code>NODE_VALUE</code> column contains the attribute value, and for text nodes, it is the text content.</li>
<li>The <code>POS</code> lists the position where the current element was found; the <code>LEN</code> column keeps track of the length of current item as it appears in the doucment. Typically you won't need these columns, except maybe for debugging purposes. The <code>TOKEN_TEXT</code> column is also here mostly for debugging purposes.</li>
</ul>
<h3>Extracting Base Columns from Analytic and Attribute views</h3>
<p>
If you examine the XML definition of Analytic and/or Attribute views, you'll notice that table base columns are referenced by <code><keyMapping></code> and <code><measureMapping></code>-elements like this:</p>
<pre>
<keyMapping schemaName="...database schema..." columnObjectName="...table name..." columnName="...column name..."/>
</pre>
<p>
So, assuming we already parsed the model of an analytic or attribute view using <code>p_parse_xml</code> and captured its result in a table variable called <code>tab_dom</code>, we can run a query like this to obtain all <code><keyMapping></code> and <code><measureMapping></code>-elements:
</p>
<pre>
select mapping.*
from :tab_dom mapping
<b>where mapping.node_type = <span style="color:red">1</span> <span style="color: gray">-- get us all elements</span>
and mapping.node_name in (<span style="color:red">'keyMapping'</span> <span style="color: gray">-- with tagnames 'keyMappping' or 'measureMappping'</span>
,<span style="color:red">'measureMapping'</span>)</b>
</pre>
<p>While this gives us the actual elements, the actual data we're interested in is buried in the attributes of the <code><keyMapping></code> and <code><measureMapping></code>-elements. You might recall that in the <code>p_parse_xml</code> result, attribute nodes have <code>NODE_TYPE=2</code> appear as childnode of their respective element. So, we can extract all attributes of all <code><keyMapping></code> and <code><measureMapping></code>-elements with a self-join like this:</p>
<pre>
select mapping_attributes.*
from :tab_dom mapping
<b>inner join :tab_dom mapping_attributes
on mapping.node_id = mapping_attributes.parent_node_id <span style="color: gray">-- find all nodes that have the keymapping element node as parent </span>
and <span style="color:red">2</span> = mapping_attributes.node_type <span style="color: gray">-- but only if their node type indicates they are attribute nodes</span></b>
where mapping.node_type = <span style="color:red">1</span>
and mapping.node_name in (<span style="color:red">'keyMapping'</span>, <span style="color:red">'measureMapping'</span>)
</pre>
<p>
Since we are interested in not just any attribute node, but attribute nodes having specific names like <code>schemaName</code>, <code>columnObjectName</code> and <code>columnName</code>, we should put a further restriction on the <code>NODE_NAME</code> of these attribute nodes. Also note that this query will potentially give us multiple rows per <code><keyMapping></code> or <code><measureMapping></code>-element (in fact, just as many as there are attributes). Since we'd like to have just one row for each <code><keyMapping></code> or <code><measureMapping></code>-element having the values of its <code>schemaName</code>, <code>columnObjectName</code> and <code>columnName</code> attributes in separate columns, we should rewrite this query so that each attribute gets its own self-join.
</p>
<p>Thus, the final query becomes:</p>
<pre>
<b>select mapping_schemaName.node_value as schema_name
, mapping_columnObjectName.node_value as table_name
, mapping_columnName.node_value as column_name</b>
from :tab_dom mapping
inner join :tab_dom <b>mapping_schemaName</b> <span style="color: gray">-- get the attribute called 'schemaName'</span>
on mapping.node_id = mapping_schemaName.parent_node_id
and <span style="color:red">2</span> = mapping_schemaName.node_type
<b>and <span style="color:red">'schemaName'</span> = mapping_schemaName.node_name </b>
inner join :tab_dom <b>mapping_columnObjectName</b> <span style="color: gray">-- get the attribute called 'columnObjectName'</span>
on mapping.node_id = mapping_columnObjectName.parent_node_id
and <span style="color:red">2</span> = mapping_columnObjectName.node_type
<b>and <span style="color:red">'columnObjectName'</span> = mapping_columnObjectName.node_name </b>
inner join :tab_dom <b>mapping_columnName</b> <span style="color: gray">-- get the attribute called 'columnName'</span>
on mapping.node_id = mapping_columnName.parent_node_id
and <span style="color:red">2</span> = mapping_columnName.node_type
<b>and <span style="color:red">'columnName'</span> = mapping_columnName.node_name </b>
where mapping.node_type = <span style="color:red">1</span>
and mapping.node_name in (<span style="color:red">'keyMapping'</span>, <span style="color:red">'measureMapping'</span>)
</pre>
<h3>Extracting base columns from Calculation views</h3>
<p>
Getting the base columns used in calculation views is a bit more work.
However, the good news is that in terms of the queries we need to write, it does not get much more complicated than what we witnessed for analytic and attribute views in te previous section.
Querying the xml parse tree almost always boils down to finding elements and finding their attributes, and then doing something with their values.
</p>
<p>
The reason why it is more work to write queries against the model underlying calculation views is that the XML documents that define calculationviews use an extra level of mapping between the objects that represent the source of the columns and the way these columns are used inside the view.
The following snippet might illustrate this:
</p>
<pre>
<Calculation:scenario ...>
...
<dataSources>
...
<DataSource id=<span style="color:red">"...some id used to refer to this datasource..."</span> type=<span style="color:red">"DATA_BASE_TABLE"</span>>
...
<columnObject schemaName=<span style="color:red">"...db schema name..."</span> columnObjectName=<span style="color:red">"...table name..."</span>/>
...
</DataSource>
...
</dataSources>
...
<calculationViews>
...
<calculationView>
...
<input node=<span style="color:red">"#...id of a DataSource element..."</span>>
...
<mapping source=<span style="color:red">"...name of a column used as input..."</span> ... >
...
</input>
...
</calculationView>
...
</calculationViews>
...
</Calculation:scenario>
</pre>
<p>
The method for finding the base columns can be summarized as follows:
</p>
<ol>
<li>
<p>
Get all <code><DataSource></code>-elements having a <code>type</code>-attribute with the value <code>"DATA_BASE_TABLE"</code>.
These elements represent all base tables used by this view. Other types of objects used by this view will have another value for the <code>type</code>-attribute.
</p>
<p>
To obtain the schema and table name of the base table, find the <code><columnObject></code>-childelement of the <code><DataSource></code>-element.
Its <code>schemaName</code> and <code>columnObjectName</code>-attributes respectively contain the database schema and table name of the base table.
</p>
<p>
The <code><DataSource></code>-elements have an <code>id</code> attribute, and its value is used as unique identifier to refer to this data source.
</p>
</li>
<li>
<p>
Find all instances where the base table datasources are used.
</p>
<p>
A calculation view is essentially a graph of data transformation steps, each of which takes one or more streams of data as input, turning it into a stream of output data.
In the XML document that defines the calculation view, these transformation steps are represented by <code><calulationView></code>-elements.
These <code><calulationView></code>-elements contain one or more <code><input></code>-child elements, each of which represents a data stream that is used as input for the transformation step.
</p>
<p>
The <code><input></code>-elements have a <code>node</code>-attribute.
The value of the <code>node</code>-attribute is the value of the <code>id</code>-attribute of whatever element it refers to, prefixed by a hash-sign (<code>#</code>).
</p>
<p>
Note that this is a general technique to reference elements within the same XML document.
So, in order to find where a <code><DataSource></code>-element is used,
it is enough to find all elements in the same XML document that reference the value <code><DataSource></code>-element's <code>id</code>-attribute in the value of their <code>node</code>-attribute.
</p>
</li>
<li>
<p>
Once we have the elements that refer to our <code><DataSource></code>-element, we can find out which columns from the data source are used by looking for <code><mapping></code>-child elements.
</p>
<p>
The <code><mapping></code>-elements have a <code>source</code>-attribute, which holds the column-name.
</p>
</li>
</ol>
<p>
With these steps in mind, the SQL query we need to do on the calculation view parse tree becomes:
</p>
<pre>
select distinct
ds_co_schemaName.node_value schema_name
, ds_co_columnObjectName.node_value table_name
, ds_usage_mapping_source.node_value column_name
<span style="color:grey">--
-- ds: DataSource elements (Note the WHERE clause)
-- </span>
from :tab_dom ds
<span style="color:grey">--
-- ds_type: demand that the value of the type-attribute of the DataSource elements equal 'DATA_BASE_TABLE'
-- this ensures we're only looking at base tables.
--</span>
inner join :tab_dom ds_type
on ds.node_id = ds_type.parent_node_id
and <span style="color:red">2</span> = ds_type.node_type
and <span style="color:red">'type'</span> = ds_type.node_name
and <span style="color:red">'DATA_BASE_TABLE'</span> = cast(ds_type.node_value as varchar(<span style="color:red">128</span>))
<span style="color:grey">--
-- ds_co: get the columnObject childelement of the DataSource element.
-- Also, get the schemaName and columnObjectName attributes of that columnObject-element.
--</span>
inner join :tab_dom ds_co
on ds.node_id = ds_co.parent_node_id
and <span style="color:red">1</span> = ds_co.node_type
and <span style="color:red">'columnObject'</span> = ds_co.node_name
inner join :tab_dom ds_co_schemaName
on ds_co.node_id = ds_co_schemaName.parent_node_id
and <span style="color:red">2</span> = ds_co_schemaName.node_type
and <span style="color:red">'schemaName'</span> = ds_co_schemaName.node_name
inner join :tab_dom ds_co_columnObjectName
on ds_co.node_id = ds_co_columnObjectName.parent_node_id
and <span style="color:red">2</span> = ds_co_columnObjectName.node_type
and <span style="color:red">'columnObjectName'</span> = ds_co_columnObjectName.node_name
<span style="color:grey">--
-- ds_id: get the id-attribute of the DataSource element.
--</span>
inner join :tab_dom ds_id
on ds.node_id = ds_id.parent_node_id
and <span style="color:red">2</span> = ds_id.node_type
and <span style="color:red">'id'</span> = ds_id.node_name
<span style="color:grey">--
-- ds_usage: find any attributes that refer to the id of the DataSource
--</span>
inner join :tab_dom ds_usage
on <span style="color:red">'node'</span> = ds_usage.node_name
and <span style="color:red">2</span> = ds_usage.node_type
and <span style="color:red">'#'</span>||ds_id.node_value = cast(ds_usage.node_value as nvarchar(<span style="color:red">128</span>))
<span style="color:grey">--
-- ds_mapping: find any mapping child elements of the node that references the DataSource
--</span>
inner join :tab_dom ds_usage_mapping
on <span style="color:red">'mapping'</span> = ds_usage_mapping.node_name
and <span style="color:red">1</span> = ds_usage_mapping.node_type
and ds_usage.node_id = ds_usage_mapping.parent_node_id
<span style="color:grey">--
-- ds_mapping_source: get the source of the mapping elements. These are our base column names.
--</span>
inner join :tab_dom ds_usage_mapping_source
on <span style="color:red">'source'</span> = ds_usage_mapping_source.node_name
and <span style="color:red">2</span> = ds_usage_mapping_source.node_type
and ds_usage_mapping.node_id = ds_usage_mapping_source.parent_node_id
where ds.node_type = <span style="color:red">1</span>
and ds.node_name = <span style="color:red">'DataSource'</span>
</pre>
<h3>Putting it all together</h3>
To recapitulate, we discussed <ol>
<li>How to do general queries for dependencies using <code>OBJECT_DEPENDENCIES</code>, but that you need to query <code>_SYS_REPO.ACTIVE_OBJECTCROSSREF</code> to find out which Attribute views are used by Analytic views.</li>
<li>How to find the model XML code underlying our information views from the <code>_SYS_REPO.ACTIVE_OBJECT</code> table.</li>
<li>How to parse XML, and how to query the parse tree for elements and attributes</li>
<li>How the XML documents for information views are structured, and how to find base columns used in their models</li>
</ol>
<p>
With all these bits and pieces of information, we can finally create a procedure that fullfills the original requirement to obtain the base columns used by our information views.
This is available as the <code><a href="https://github.com/just-bi/hades/blob/master/procedures/p_get_view_basecols.sql">p_get_view_basecols</a></code> stored procedure.
Here is its signature:
</p>
<pre>
create PROCEDURE p_get_view_basecols (
<span style="color:grey">-- package name pattern. Used to match packages containing analytic, attribute or calculation views. Can contain LIKE wildcards.</span>
p_package_id nvarchar(<span style="color:red">255</span>)
<span style="color:grey">-- object name pattern. Used to match name of analytic, attribute or calculation views. Can contain LIKE wildcards.</span>
, p_object_name nvarchar(<span style="color:red">255</span>) default <span style="color:red">'%'</span>
<span style="color:grey">-- object suffix pattern. Can be used to specify the type of view. Can contain LIKE wildcards.</span>
, p_object_suffix nvarchar(<span style="color:red">255</span>) default <span style="color:red">'%'</span>
<span style="color:grey">-- flag to indicate whether to recursively analyze analytic, attribute or calculation views on which the view to be analyzed depends. </span>
<span style="color:grey">-- 0 means only look at the given view, 1 means also look at underlying views.</span>
, p_recursive tinyint default <span style="color:red">1</span>
<span style="color:grey">-- result table: base columns on which the specified view(s) depends.</span>
, out p_cols table (
<span style="color:grey">-- schema name of the referenced base column</span>
schema_name nvarchar(<span style="color:red">128</span>)
<span style="color:grey">-- table name of the referenced base column</span>
, table_name nvarchar(<span style="color:red">128</span>)
<span style="color:grey">-- column name of the referenced base column</span>
, column_name nvarchar(<span style="color:red">128</span>)
<span style="color:grey">-- list of view names that depend on the base column</span>
, views nclob
)
)
</pre>
<p>
Obtaining the list of base columns on which our application depends is now as simple as calling the procedure, like so:
</p>
<pre>
call p_get_view_basecols(
<span style="color:grey">-- look in our application package (and its subpackages)</span>
<span style="color:red">'our.application.package.%'</span>
<span style="color:grey">-- consider all information views</span>
, <span style="color:red">'%'</span>
<span style="color:grey">-- consider all types of information views</span>
, <span style="color:red">'%'</span>
<span style="color:grey">-- consider also information views upon which our information views depend</span>
, <span style="color:red">1</span>
<span style="color:grey">-- put the results into our output table</span>
, ?
);
</pre>
<h3>Finally</h3>
<p>
I hope you enjoyed this post!
Feel free to leave a comment to share your insights or to give feedback.
</p>
<p>
Please note that all source code for this topic is freely available as open source software in our <a href="https://github.com/just-bi/hades">just-bi/hades github reposiory</a>.
You are free to use, modify and distribute it, as long as you respect the copyright notice.
</p>
<p>
We welcome contributions! You can contribute in many ways:
</p>
<ul>
<li>Simply use the procedures. Give us feedback. You can do so by leaving a comment on this blog.</li>
<li>Spread the word: tell your colleagues, and maybe tweet or write blog post about it. Please use hashtag #justbihades</li>
<li>Share your requirements. <a href="https://github.com/just-bi/hades/issues">Create an issue to ask for more features</a> so we can improve our software.</li>
<li><a href="https://github.com/just-bi/hades#fork-destination-box">Fork it!</a>. Send us <a href="https://github.com/just-bi/hades/pulls">pull requests</a>. We welcome your contribution and we will fully attribute you!</li>
</ul>
rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com1tag:blogger.com,1999:blog-15319370.post-3273712629800534412016-09-22T10:55:00.000+02:002016-09-22T10:56:05.589+02:00SAP UI5: Internationalization for each view - Addendum for Nested Views <p>
After writing my previous post on <a href="http://rpbouman.blogspot.nl/2016/09/sap-ui5-per-view-internationalization.html">SAP UI5: Per-view Internationalization</a>, I found out that the solution does not work completely as intended when using nested views.
</p>
<p>
If you're using nested views, each view would still have its own set of unique texts that are entirely specific to just that view, and for those cases, the solution as described still works. But there might also be a number of texts that are shared by both the outer and one or more of the inner views. It would make sense to be able to define those texts in the i18n model at the level of the outer view, and have the i18n models of the nested view pick up and enhance the i18n model of the outer view.
</p>
<h2>Problem: <code>onInit()</code> is not the right place to initialize the i18n model</h2>
<p>
The problem with the original solution is that the <code>onInit()</code> method of the nested views gets called before that of the outer view. It makes sense - the whole can be initialized only after its parts have been initialized. But this does mean that the <code>onInit()</code> method is not the right place to initialize the i18n model.</p>
<p>Please consider these lines from the <code>_initI18n()</code> method that I proposed to initialize the i18n model:</p>
<pre>
<span style="color:grey">//Use the bundledata to create or enhance the i18n model</span>
var i18nModel = this.getModel(i18n);
if (i18nModel) {
i18nModel.enhance(bundleData);
}
else {
i18nModel = new ResourceModel(bundleData);
}
<span style="color:grey">//set this i18n model.</span>
this.setModel(i18nModel, i18n);
</pre>
<p>
Suppose this code runs as part of a nested view's <code>onInit()</code>. The call to <code>getModel()</code> will try to acquire the i18n model that is already set, or else the i18n model of the owner component. That's how the <code>getModel()</code> method in the base controller works (<a href="http://rpbouman.blogspot.nl/2016/09/sap-ui5-per-view-internationalization.html">please see my previous blog post to review that code</a>).</p>
<p>Now, at this point, no i18n model has been set for the view, and so the owner component's i18n model will be picked up. The i18n model of the outer view will however never be found, since the <code>onInit()</code> of the controller of the outer view has not been called yet (and therefore, its <code>_initI18n()</code> has not been called either).</p>
<h2>Solution: Use <code>onBeforeRendering()</code> rather than <code>onInit()</code></h2>
<p>
It turns out that this can be solved by calling the <code>_initI18N()</code> method in the <code>onBeforeRendering()</code> method rather than in the <code>onInit()</code> method. While nested views are initialized before initializaing the outer view, it's the other way around for the rendering process. This makes sense: as the outer view is being rendered, this requires rendering of its containing views. So the <code>onBeforeRendering()</code> method of the outer view will be called before the <code>onBeforeRendering()</code> method of its nested views. (It's the other way around for <code>onAfterRendering()</code>: outer views will be done rendering after its containing views are rendered).
</p>
<h2>Ensure i18n initialization occurs only once</h2>
<p>
There is one extra consideration in moving the i18n initialization from the <code>onInit()</code> to <code>onBeforeRendering()</code>. The reason is that views may go trough multiple rendering cycles, whereas the <code>onInit()</code> will only run once. If there are repeated rendering cycles, we do not want to reinitialize the i18n model, so we add a lock that ensures the i18n model is initialized only once:
</p>
<pre>
...
onInit: function(){
<span style="color:red; text-decoration:line-through">this._initI18n();</span>
},
<b>onBeforeRendering: function(){</b>
<b>this._initI18n();</b>
<b>}</b>,
<b>_i18nInitialized: false,</b>
_initI18n: function(){
<b>if (this._i18nInitialized === true) {</b>
<b>return;</b>
<b>}</b>
var i18n = "i18n";
<span style="color:grey">//create bundle descriptor for this controllers i18n resource data</span>
var metadata = this.getMetadata();
var nameParts = metadata.getName().split(".");
nameParts.pop();
nameParts.push(i18n);
nameParts.push(i18n);
var bundleData = {bundleName: nameParts.join(".")};
<span style="color:grey">//Use the bundledata to create or enhance the i18n model</span>
var i18nModel = this.getModel(i18n);
if (i18nModel) {
i18nModel.enhance(bundleData);
}
else {
i18nModel = new ResourceModel(bundleData);
}
<span style="color:grey">//set this i18n model.</span>
this.setModel(i18nModel, i18n);
<b>this._i18nInitialized = true;</b>
},
...
</pre>
<h2>Overriding <code>onBeforeRendering()</code> in extensions of the base controller</h2>
<p>
And of course, when extending the base controller, you'll need to remember to call the <code>onBeforeRendering()</code> method of the ascendant when overriding the <code>onBeforeRendering()</code> method:
</p>
<pre class="jive_text_macro jive_macro_code _jivemacro_uid_14741522109413566" jivemacro="code" ___default_attr="javascript" _jivemacro_uid="_14741522109413566">
sap.ui.define([
<b>"just/bi/apps/components/basecontroller/BaseController"</b>
], function(<b>Controller</b>){
"use strict";
var controller = Controller.extend("just.bi.apps.components.mainpanel.MainPanel", {
onBeforeRendering: function(){
<b>Controller.prototype.onBeforeRenderingcall(this);</b>
...
}
});
return controller;
});
</pre>
<h2>Finally</h2>
I hope you enjoyed this addendum. Feel free to share your insights if you think there is a better way to do handle i18n.rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-65214130462044309382016-09-18T01:03:00.000+02:002016-09-22T11:01:40.783+02:00SAP UI5: Per-view Internationalization<p style="font-weight: bold">NOTE: There is an addendum to this blog post that suggests a number of improvements. You can check out the addendum here: <a href="http://rpbouman.blogspot.nl/2016/09/sap-ui5-internationalization-for-each.html">SAP UI5: Internationalization for each view - Addendum for Nested Views</a>.</p>
<p>Quite recently, I dove into SAP UI5 development. To educate myself, I followed a workshop and I used <a href="https://sapui5.netweaver.ondemand.com/sdk/#docs/guide/3da5f4be63264db99f2e5b04c5e853db.html">the Walkthrough</a>.</p>
<p></p>
<p>During my explorations, I ran into a particular issue which I didn't see very readily addressed. I also found a solution for this particular issue, and even though I still have a ton to learn, I think it is worth sharing. So, here goes:</p>
<h2>The Feature: Translatable texts and the i18n model</h2>
<p>One of the SAP UI5 features highlighted in the Walkthrough is <a href="https://sapui5.netweaver.ondemand.com/sdk/#docs/guide/df86bfbeab0645e5b764ffa488ed57dc.html">treatment of translatable texts</a>. In the walkthrough this is realized by setting up a resource model, the <a href="https://en.wikipedia.org/wiki/Internationalization_and_localization">i18n</a> model.</p>
<p><span style="font-size: 13.3333px;">The i18n model is sourced form i18n <code>.properties</code> files, which are essentially lists of key/value pairs, one per line, and separated by an equals sign:</span></p>
<pre class="jive_text_macro jive_macro_code _jivemacro_uid_14741522109844804" jivemacro="code" ___default_attr="plain" _jivemacro_uid="_14741522109844804">
<span style="color:grey"># Each line is a key=value pair.</span>
greetingAction=Say Hello
greeting=Hello {0}!
</pre>
<p style="font-size: 13.3333px;"><span style="font-size: 10pt;">To actually setup a i18n model using these texts, you can explicitly instantiate a <code>sap.ui.model.resource.ResourceModel</code>:</span></p>
<pre>
sap.ui.define([
"sap/ui/core/mvc/Controller",
"sap/ui/model/resource/ResourceModel"
], function(Controller, ResourceModel){
"use strict";
return Controller.extend("my.app.App", {
onInit: function(){
<b>var i18nModel = new ResourceModel({
bundleName: "just.bi.apps.JustBiApp.i18n.i18n"
});
this.getView().setModel(i18nModel, "i18n");</b>
}
});
});
</pre>
<p style="font-size: 13.3333px;"><span style="font-size: 10pt;">Or, you can have your application instantiate the model by listing it in the <code>models</code> property of the <code>sap.ui5</code> entry in the <a href="https://sapui5.hana.ondemand.com/sdk/#docs/guide/8f93bf2b2b13402e9f035128ce8b495f.html"><code>manifest.json</code> application descriptor</a> file:</span></p>
<pre >
"models": {
<b>"i18n": {
"type": "sap.ui.model.resource.ResourceModel",
"settings": {
"bundleName": "just.bi.apps.JustBiApp.i18n.i18n"
}
}</b>
}
</pre>
<p style="font-size: 13.3333px;"><span style="font-size: 10pt;">In many cases, the text is required for the static labels of ui elements like input fields, menus and so on. Inside a view, static texts may be retrieved from the i18n model through special data binding syntax, like so:</span></p>
<pre class="jive_text_macro jive_macro_code _jivemacro_uid_147415221096774" jivemacro="code" ___default_attr="xml" _jivemacro_uid="_147415221096774">
<span style="color:greay"><-- Button text will read "Say Hello" --></span>
<Button text="{<b>i18n>greetingAction</b>}"/>
</pre>
<p>Texts may also be retrieved programmatically inside controller code by calling the <code>.getText()</code> method on the resource bundle object. The resource bundle object is may be obtained from the i18n resource model with the <code>getResourceBundle()</code> getter method:</p>
<p></p>
<pre class="jive_text_macro jive_macro_code _jivemacro_uid_14741522109621532" jivemacro="code" ___default_attr="javascript" _jivemacro_uid="_14741522109621532">var bundle = this.<b>getModel("i18n").getResourceBundle()</b>;
var text = <b>bundle.getText</b>("greeting", ["World"]); <span style="color:grey">// text has value "Hello, World!"</span></pre>
<p>Now, the cool thing is that you can write a separate i18n <code>.properties</code> file for each locale that you want to support. The framework discovers which locale is required by the client and use that to find the best matching i18n files appropriate for the client's locale.</p>
<p></p>
<p>The file name is used to identify to which language and/or locale the texts inside the file apply. For example, you'd put the German texts in a <code>i18n_de.properties</code> file, and the English texts in a <code>i18n_en.properties</code> file, and if you want to distinguish between British and American English, you'd create both a <code>i18n_en_GB.properties</code> and <code>i18n_en_US.properties</code> file.</p>
<p>(I haven't found out to exactly which standard the sap ui i18n <code>.properties</code> files apply, but from what I've seen so far I think it's safe to assume that you can use the <a href="https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes">two-letter lowercase ISO 639-1 code</a> for the language and the <a href="https://en.wikipedia.org/wiki/ISO_3166-1#Officially_assigned_code_elements">two-letter uppercase ISO 3166-1 code</a> for the country)</p>
<h2><br />The Problem: One i18n Model for entire application</h2>
<p>Now, the walkthrough demonstrates the feature by adding one i18n model for the entire application so that it becomes available in any of the views that make up the application. I appreciate that the walkthrough is not the right place to cover all kinds of more advanced scenarios, so I can understand why it settles for just one application-wide i18n model.</p>
<p></p>
<p><span style="font-size: 13.3333px;">However, I can't help but feeling this is not an ideal approach. Main reason is that it seems at odds with the fact that many texts are specific to just one particular view. This challenges both the development workflow as well as the reusability of our application components:</span></p>
<p><span style="font-size: 13.3333px;"><br /></span></p>
<ul>
<li>Each time you create or modify a particular view, you also have to edit the global i18n <code>.properties</code> files. To keep things manageable, you will probably invent some kind of view-specific prefix to prefix the keys pertaining to that view, and you'll probably end up creating a view-specific block in that i18n file. At some point, you'll end up with a lot of lines per i18n file, which is not so maintainable</li>
<li>Suppose you want to reuse a particular view in another application. Contrary to the practice used in the Walthrough, I like to keep view and associated Controller together, and in a folder separate from any other view and controller. This way I can easily copy, move or remove the things that belong together. Except that the texts, which also belong to that view/controller, are in the global i18n <code>.properties</code> file, and need to be managed separately.</li>
</ul>
<p></p>
<h2>The Solution: Keep per-view<span style="font-size: 13.3333px;"> </span>i18n files near view and controller code</h2>
<p>The solution I found is to create a i18n subfolder beneath the folder that contains my Controller and View. Since I already keep each associated view and controller together, and separate from the other views and controllers, this approach makes sense: It's just one step further in keeping code and resources that depend directly on each other physically together.</p>
<p><br />So, this is what my file and folder structure looks like:</p>
<p><img alt="FolderStructure.png" class="jive-image" src="https://drive.google.com/uc?export=download&id=0BzdLoKoT3p_ea1BJemVhcXRvWWs" __jive_id="1037768" /></p>
<p></p>
<p>So, the <code>webapp</code> folder is the root of the sap ui5 project. The <code>components</code> folder is where I keep subfolders for each functional unit (i.e. View+Controller+resources) of the application. In the picture, you see two such subfolders, <code>basecontroller</code> (more about that below) and <code>mainpanel</code>.</p>
<p></p>
<p>The <code>mainpanel</code> folder is the one that contains an actual component of my application - a <code>MainPanel</code> View, and its controller (in <code>MainPanel.view.xml</code> and <code>MainPanel.controller.js</code> respectively). Here we also find the <code>i18n</code> folder specific to this view, and inside are the i18n <code>.properties</code> files (one for each locale we need to support).</p>
<p></p>
<p>In order to load and apply the view-specific i18n <code>.properties</code> files, I'm using generic extension of <code>sap.ui.core.mvc.Controller</code> which loads the "local", view-specific i18n resource bundle. This extension is called <code>BaseController</code> and is in the basecontroller folder. Here's the code:</p>
<p></p>
<pre class="jive_text_macro jive_macro_code _jivemacro_uid_14741522109519280" jivemacro="code" ___default_attr="javascript" _jivemacro_uid="_14741522109519280">
sap.ui.define([
"sap/ui/core/mvc/Controller",
"sap/ui/model/resource/ResourceModel"
], function(Controller, ResourceModel){
"use strict";
var controller = Controller.extend("just.bi.apps.components.basecontroller.BaseController", {
onInit: function()
this._initI18n();
},
_initI18n: function(){
var i18n = "i18n";
<span style="color:grey">//create bundle descriptor for this controllers i18n resource data</span>
var metadata = this.getMetadata();
var nameParts = metadata.getName().split(".");
nameParts.pop();
nameParts.push(i18n);
nameParts.push(i18n);
var bundleData = {bundleName: nameParts.join(".")};
<span style="color:grey">//Use the bundledata to create or enhance the i18n model</span>
var i18nModel = this.getModel(i18n);
if (i18nModel) {
i18nModel.enhance(bundleData);
}
else {
i18nModel = new ResourceModel(bundleData);
}
<span style="color:grey">//set this i18n model.</span>
this.setModel(i18nModel, i18n);
},
getModel: function(modelname){
var view = this.getView();
var model = view.getModel.apply(view, arguments);
if (!model) {
var ownerComponent = this.getOwnerComponent();
if (ownerComponent) {
model = ownerComponent.getModel(modelname);
}
}
return model;
},
setModel: function(model, modelName){
var view = this.getView();
view.setModel.apply(view, arguments);
},
});
return controller;
});
</pre>
<p>Note how the <code>BaseController</code> initializes the i18 model by calling the <code>_init118n()</code> method. In this method we extract the className of the this object from its metadata (using the <code>.getName()</code> getter on the metadata obtained using the <code>.getMetadata()</code> getter), and we pop off the unqualified classname to obtain its namespace. We then add the string <code>"i18n"</code> twice - once for the folder, and once for the files inside it. We use this to create the bundlename which we use to instantiate the actual <code>ResourceModel</code>.
</p>
<p>
Before setting that model to the controller's view, we check if there is already an i18n model set using <code>getModel()</code>. This is a utility method that gets the model from this controller's associated view, or of the component that "owns" the view and this controller.
</p>
<p>
If a i18n model is already available, we enhance that by calling the <code>.enhance()</code> method on it, rather than replacing it. This way, any texts defined at a higher level are still available in this controller and view. This gives us a functional i18n model, which we then set using <code>setModel()</code>, which simply calls <code>setModel()</code> on the view associated with this controller.</p>
<p></p>
<p>To actually use this <code>BaseController</code>, we extend it when creating a "real" controller:</p>
<p></p>
<pre class="jive_text_macro jive_macro_code _jivemacro_uid_14741522109413566" jivemacro="code" ___default_attr="javascript" _jivemacro_uid="_14741522109413566">
sap.ui.define([
<b>"just/bi/apps/components/basecontroller/BaseController"</b>
], function(<b>Controller</b>){
"use strict";
var controller = Controller.extend("just.bi.apps.components.mainpanel.MainPanel", {
onInit: function(){
<b>Controller.prototype.onInit.call(this);</b>
...
}
});
return controller;
});
</pre>
<p>Note that if that real controller has its own <code>onInit()</code> method, we need to first call the <code>onInit()</code> method of <code>BaseController</code>, or rather, of whatever class we're extending. Fortunately, since we are extending it we already have a reference to it (in the snippet above, it's the <code>Controller</code> parameter injected into our definition), so we call its <code>onInit()</code> method via the <code>prototype</code> while using the controller that is currently being defined (this) as scope.</p>
<h2>Finally</h2>
<p>I hope you enjoyed this article! I hope it will be of use to you. If you have an alternative solution - quite possiby, a better one - then please feel free to leave a comment and point out your solution. I'm eager to hear and learn so don't hesitate to share your opinion and point of view.</p>
<h2 style="color: red">ADDENDUM: Nested views</h2>
<p>
It turns out the approach described in this post does not work well when using nested views. Fortunately, a simple improvement of the ideas in this post solves that problem. The approach is described in my next blog post, <a href="http://rpbouman.blogspot.nl/2016/09/sap-ui5-internationalization-for-each.html">SAP UI5: Internationalization for each view - Addendum for Nested Views</a>.
</p>
</body>rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com0tag:blogger.com,1999:blog-15319370.post-4582043092278264192016-05-04T00:35:00.001+02:002016-05-05T11:30:10.435+02:00ODXL - A generic Data Export Layer for SAP/HANA based on ODataI'm very pleased to be able to announce the immediate availability of the Open Data Export Layer (ODXL) for SAP/HANA!
<a href="http://scn.sap.com/community/hana-in-memory" target="saphana" style="float:right"><img src="https://drive.google.com/uc?export=download&id=0BzdLoKoT3p_eQjNSTkg3dkxvVGM" /></a>
<h3>Executive summary</h3>
ODXL is a framework that provides generic data export capabilities for the SAP/HANA platform.
ODXL is implemented as a <a href="http://scn.sap.com/community/developer-center/hana/blog/2012/11/29/sap-hana-extended-application-services" target="_sap">xsjs Web service</a> that understands <a href="http://www.odata.org/documentation/odata-version-2-0/uri-conventions/" target="odata">OData web requests</a>, and delivers a response by means of a pluggable data output handler.
Developers can use ODXL as a back-end component, or even as a global instance-wide service to provide clean, performant and extensible data export capabilities for their SAP/HANA applications.
<a href="http://just-bi.nl/" target="justbi" style="float:right"><img src="https://drive.google.com/uc?export=download&id=0BzdLoKoT3p_eaXpVTldxVDg4UU0" /></a>
<br/>
<br/>
Currently, ODXL provides output handlers for <a href="https://tools.ietf.org/html/rfc4180" target="csv">comma-separated values (csv)</a> as well as Microsoft Excel output.
However, ODXL is designed so that developers can write their own response handlers and extend ODXL to export data to other output formats according to their requirements.
<br/>
<br/>
<a style="float:left" href="https://github.com/just-bi/odxl" target="github"><img src="https://drive.google.com/uc?export=download&id=0BzdLoKoT3p_eVXFSTkhDTUt1cnM"/></a>
ODXL is provided by <a href="http://just-bi.nl/" target="justbi">Just BI</a> to the SAP/HANA developer community as open source software under the terms of <a href="http://www.apache.org/licenses/LICENSE-2.0" target="apache20">the Apache 2.0 License</a>. This means you are free to use, modify and distribute ODXL. For the exact terms and conditions, please refer to the license text.
<br/>
<br/>
The source code is <a href="https://github.com/just-bi/odxl" target="github">available on github</a>. Developers are encouraged to check out the source code and to contribute to the project.
You can contribute in many ways: we value any feedback, suggestions for new features, filing bug reports, or code enhancements.
<br/>
<br/>
If you require professional support for ODXL, please <a href="http://just-bi.nl/contact/" target="justbi">contact Just-BI</a> for details.
<h3>What exactly is ODXL?</h3>
ODXL started as an in-house project at the Just-BI department of custom development.
It was borne from the observation that the SAP/HANA web applications that we develop for our customers often require some form of data export, typically to Microsoft Excel.
Rather than creating this type of functionality again for each project, we decided to invest some time and effort to design and develop this solution in such a way that it can easily be deployed as a reusable component.
And preferably, in a way that feels natural to SAP/HANA xs platform application developers.
<br/>
<br/>
What we came up with, is a xsjs web service that understands requests that look and feel like standard OData <code>GET</code> requests, but which returns the data in some custom output format.
ODXL was designed to make it easily extensible so that developers can build their own modules that create and deliver the data in whatever output format suits their requirements.
<br/>
<br/>
This is illustrated in the high-level overview below:
<br/>
<br/>
<a href="https://github.com/just-bi/odxl" target="github"><img src="https://drive.google.com/uc?export=download&id=0BzdLoKoT3p_ebWZ1aEN0SUY2bFU" /></a>
<br/>
<br/>
For customers of Just-BI, there is an immediate requirement to get Microsoft Excel output.
So, we went ahead and implemented output handlers for .xlsx and .csv formats, and we included those in the project.
This means that ODXL supports data export to the .xlsx and .csv formats right out of the box.
<br/>
<br/>
However, support for any particular output format is entirely optional and can be controlled by configuration and/or extension:<ul>
<li>Developers can develop their own output handlers to supply data export to whatever output format they like.</li>
<li>SAP/HANA Admins and/or application developers can choose to install only those output handlers they require, and configure how Content-Type headers and OData $format values map to output handlers.</li>
</ul>
<h3>So ODXL is OData? Doesn't SAP/HANA suppport OData already?</h3>
The SAP/HANA platform provides data access via the OData standard.
This facility is very convenient for object-level read- and write access to database data for typical modern web applications.
In this scenario, the web application would typically use asynchronous XML Http requests, and data would be exchanged in either Atom (a XML dialect) or JSON format.
<a href="http://www.odata.org/" target="odata" style="float: right"><img src="https://drive.google.com/uc?export=download&id=0BzdLoKoT3p_ebGhmelFXbDJnWDA"/></a>
<br/>
<br/>
ODXL's primary goal is to provide web applications with a way to export datasets in the form of documents.
Data export tasks typically deal with data sets that are quite a bit larger than the ones accessed from within a web application.
In addition, an data export document might very well compromise multiple parts - in other words, it may contain multiple datasets.
The typical example is exporting multiple lists of different items from a web application to a workbook containaing multiple spreadsheets with data.
In fact, the concrete use case from whence ODXL originated was the requirement to export multiple datasets to Microsoft Excel .xlsx workbooks.
<br/>
<br/>
So, ODXL is not OData.
Rather, ODXL is complementary to SAP/HANA OData services.
That said, the design of ODXL does borrow elements from standard OData.
<h3>OData Features, Extensions and omissions</h3>
ODXL <code>GET</code> requests follow the syntax and features of <a href="http://www.odata.org/documentation/odata-version-2-0/operations/" target="odata">OData standard <code>GET</code> requests</a>.
Here's a simple example to illustrate the ODXL <code>GET</code> request:
<pre>
GET "RBOUMAN"/"PRODUCTS"?$select=PRODUCTCODE, PRODUCTNAME& $filter=PRODUCTVENDOR eq 'Classic Metal Creations' and QUANTITYINSTOCK gt 1&$orderby=BUYPRICE desc&$skip=0&$top=5
</pre>
This request is build up like so:<ul>
<li><code>"RBOUMAN"/"PRODUCTS"</code>: get data from the <code>"PRODUCTS"</code> table in the database schema called <code>"RBOUMAN"</code>.</li>
<li><code>$select=PRODUCTCODE, PRODUCTNAME</code>: Only get values for the columns <code>PRODUCTCODE</code> and <code>PRODUCTNAME</code>.</li>
<li><code>$filter=PRODUCTVENDOR eq 'Classic Metal Creations' and QUANTITYINSTOCK ge 1</code>: Only get in-stock products from the vendor <code>'Classic Metal Creations'</code>.</li>
<li><code>$orderby=BUYPRICE desc</code>: Order the data from highest price to lowest.</li>
<li><code>$skip=0&$top=5</code>: Only get the first five results.</li>
</ul>
For more detailed information about invoking the odxl service, check out the section about the sample application.
The sample application offers a very easy way to use ODXL for any table, view, or calculation view you can access and allows you to familiarize yourself in detail with the URL format.
<br/>
<br/>
In addition, ODXL supports <a href="http://www.odata.org/documentation/odata-version-2-0/batch-processing/" target="odata">the OData <code>$batch</code> <code>POST</code> request</a> to support export of multiple datasets into a single response document.
<br/>
<br/>
The reasons to follow OData in these respects are quite simple:<ul>
<li>OData is simple and powerful. It is easy to use, and it gets the job done. There is no need to reinvent the wheel here.</li>
<li>ODXL's target audience, that is to say, SAP/HANA application developers, are already familiar with OData. They can integrate and use ODXL into their applications with minimal effort, and maybe even reuse the code they use to build their OData queries to target ODXL.</li>
</ul>
ODXL does not follow the OData standard with respect to the format of the response.
This is a feature: OData only specifies Atom (an XML dialect) and JSON output, whereas ODXL can supply any output format.
ODXL can support any output format because it allows developers to plug-in their own modules called output handlers that create and deliver the output.
<br/>
<br/>
Currently ODXL provides two output handlers: one for comma-separated values (.csv), and one for Microsoft Excel (.xlsx).
If that is all you need, you're set. And if you need some special output format, you can use the code of these output handlers to see how it is done and then write your own output handler.
<br/>
<br/>
ODXL does respect the OData standard with regard to how the client can specify what type of response they would like to receive.
Clients can specify the MIME-type of the desired output format in a standard HTTP <code>Accept:</code> request header:<ul>
<li><code>Accept: text/csv</code> specifies that the response should be returned in comma separated values format.</li>
<li><code>Accept: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</code> specifies that the response should be returned in open office xml workbook format (Excel .xlsx format).</li>
</ul>
Alternatively, they can specify a <code>$format=<format></code> query option, where <code><format></code> identifies the output format:<ul>
<li><code>$format=csv</code> for csv format</li>
<li><code>$format=xlsx</code> for .xlsx format</li>
</ul>
Note that a format specified by the <code>$format</code> query option will override any format specified in an <code>Accept:</code>-header, as per OData specification.
<br/>
<br/>
ODXL admins can configure which MIME-types will be supported by a particular ODXL service instance, and how these map to pluggable output handlers.
In addition, they can configure how values for passed for the $format query option map to MIME-types.
ODXL comes with a standard configuration with mappings for the predefined output handlers for .csv and .xlsx output.
<br/>
<br/>
On the request side of things, most of OData's features are implemented by ODXL:<ul>
<li>The <code><a href="http://www.odata.org/documentation/odata-version-2-0/uri-conventions/#SelectSystemQueryOption" target="odata">$select</a></code> query option to specify which fields are to be returned</li>
<li>The <code><a href="http://www.odata.org/documentation/odata-version-2-0/uri-conventions/#FilterSystemQueryOption" target="odata">$filter</a></code> query option allows complex conditions restricting the returned data. OData standard functions are implemented too.</li>
<li>The <code><a href="http://www.odata.org/documentation/odata-version-2-0/uri-conventions/#SkipSystemQueryOption" target="odata">$skip</a></code> and <code><a href="http://www.odata.org/documentation/odata-version-2-0/uri-conventions/#TopSystemQueryOption" target="odata">$top</a></code> query options to export only a portion of the data</li>
<li>The <code><a href="http://www.odata.org/documentation/odata-version-2-0/uri-conventions/#OrderBySystemQueryOption" target="odata">$orderby</a></code> query option to specify how the data should be sorted</li>
</ul>
ODXL currently does not offer support for the following OData features:<ul>
<li><code><a href="http://www.odata.org/documentation/odata-version-2-0/uri-conventions/#ExpandSystemQueryOption" target="odata">$expand</a></code></li>
<li><code><a href="http://www.odata.org/blog/queryable-odata-metadata/" target="odata">$metadata</a></code></li>
</ul>
The features that are currently not supported may be implemented in the future.
For now, we feel the effort the implement them and adequately map their semantics to ODXL may not be worth the trouble.
However, an implementation can surely be provided should there be sufficient interest from the community.
<h3>Installation</h3>
Use ODXL presumes you already have a SAP/HANA installation with a properly working xs engine. You also need HANA Studio, or Eclipse with the SAP HANA Tools plugin installed.
The steps are a little bit different, depending on whether you just want to use ODXL, or whether you want to actively develop the ODXL project.
<br/>
<br/>
Here are the steps if you just want to use ODXL, and have no need to actively develop the project:<ol>
<li>In HANA Studio/Eclipse, create a new HANA xs project. Alternatively, find an existing HANA xs project.</li>
<li>Find the ODXL repository on github, and <a href="https://github.com/just-bi/odxl/archive/master.zip">download the project as a zipped folder</a>. (Select a particular branch if you desire so; typically you'll want to get the master branch)</li>
<li>Extract the project from the zip. This will yield a folder. Copy its contents, and place them into your xs project directory (or one of its sub directories)</li>
<li>Activate the new content.</li>
</ol>
After taking these steps, you should now have a working ODXL service, as well as a sample application.
The service itself is in the service subdirectory, and you'll find the sample application inside the app subdirectory.
<br/>
<br/>
The service and the application are both self-contained xs applications, and should be completely independent in terms of resources.
The service does not require the application to be present, but obviously, the application does rely on being able to call upon the service.
<br/>
<br/>
If you only need the service, for example, because you want to call it directly from your own sample application, then you don't need the sample application.
You can safely copy only the contents of the service directory and put those right inside your project directory (or one of its subdirectories) in that case.
But even then, you might still want to hang on to the sample application, because you can use that to generate the web service calls that you might want to do from within your application.
<br/>
<br/>
If you want to hack on ODXL then you might want to fork or clone the <a href="https://github.com/just-bi/odxl" target="github">ODXL github repository</a>. If you do this inside a SAP/HANA xs project, or if you create a project pointing to that location, you can then deploy that to SAP/HANA and use that to send pull requests in case you want to contribute your changes back into the project.
<h3>Getting started with the sample application</h3>
To get up and running quickly, we included a sample web application in the ODXL project.
The purpose of this sample application is to provide an easy way to evaluate and test ODXL.
<br/>
<br/>
The sample application lets you browse the available database schemas and queryable objects: tables and views, including calculation views (or at least, their SQL queryable runtime representation).
After making the selection, it will build up a form showing the available columns. You can then use the form to select or deselect columns, apply filter conditions, and/or specify any sorting order.
If the selected object is a calculation view that defines input parameters, then a form will be shown where you can enter values for those too.
<br/>
<br/>
In the mean while, as you're entering options into the form, a textarea will show the URL that should be used to invoke the ODXL service. If you like, you can manually tweak this URL as well.
Finally, you can use one of the download links to immediately download the result corresponding to the current URL in either .csv or .xlsx format.
<br/>
<br/>
Alternatively, you can hit a button to add the URL to a batch request.
When you're done adding items to the batch, you can hit the download workbook button to download as single .xlsx workbook, containing one worksheet for each dataset in the batch.
<br/>
<br/>
<img src="https://drive.google.com/uc?export=download&id=0BzdLoKoT3p_edlFCVEFLNzFpZzg"/>
<h3>What versions of SAP/HANA are supported?</h3>
We initially built and tested ODXL on SPS9.
The initial implementation used the $.hdb database interface, as well as the $.util.Zip builtin.
<br/>
<br/>
We then built abstraction layers for both database access and zip support to allow automtic fallback to the $.db database interface, and to use a pure javascript implementation of the zip algorithm based on Stuart Knightley's JSZip library.
We tested this on SPS8, and everyting seems to work fine there.
<br/>
<br/>
We have not actively tested earlier SAP/HANA versions, but as far as we know, ODXL should work on any earlier version.
If you find that it doesn't, then please let us know - we will gladly look into the issue and see if we can provide a solution.
<h3>Why Open Source? What's the Business Model? What's the catch?</h3>
For Just BI, Open Source software is not a business model, but a development model.
While some companies build a successful business model around selling custom code code, this is currently not Just-BI's primary goal.
Rather, Just-BI is a consulting company that focuses mainly on Business Intelligence solutions around the SAP ecosystem.
Our areas of expertise include Business Objects, SAP BW, SAP HANA, as well as custom BI (web)
Helping customers by providing solutions for their business problems is Just-BI's primary concern - not selling code.
<br/>
<br/>
However, we do acknowledge that sometimes, custom code plays an essential role in building a business solution for our customers.
In these cases, we will gladly help our customers to design, build and deploy such solutions.
But even in these cases we will try to look for standard component toolkits, like SAP UI5, or frameworks like Angular as a basis for our work.
<br/>
<br/>
The urge to standardize on familiar, well known toolkits and libraries hardly needs justification.
In the end, customers don't have the end goal of acquiring and owning too many custom coded solutions, because today's hot new custom solution is tomorrow's legacy.
The more a customer relies on custom code, the harder it will become to maintain and to move forward.
<br/>
<br/>
Sometimes, a particular building block that we need for applications may not be publicly available already.
If such a building block is sufficiently generic (i.e., not bound to any particular customer) then we have every reason to want that to become a standard.
For a generic and reusable component like ODXL, we believe that an open source model is the right way to do that.
<br/>
<br/>
We think that an open source development model will help maintain and advance ODXL.
By using an open source release and development model, we have potentially more eyes to scrutinize our code, find bugs, suggest features, etc.
In addition we hope our customers will feel more confident to embrace an open source solution, since they need not be locked into only our company for support and ongoing development.
<h3>How to Contribute</h3>
If you want to, there are many different ways to contribute to ODXL.<ol>
<li>If you want to suggest a new feature, or report a defect, then please <a href="https://github.com/just-bi/odxl/issues" target="github">use the github issue tracker</a>.</li>
<li>If you want to contribute code for a bugfix, or for a new feature, then please send a pull request. If you are considering to contribute code then we do urge you to <a href="https://github.com/just-bi/odxl/issues" target="github">first create an issue</a> to open up discussion with fellow ODXL developers on how to best scratch your itch</li>
<li>If you are using ODXL and if you like it, then consider to spread the word - tell your co-workers about it, write a blog, or a tweet, or a facebook post.</li>
</ol>
Thank you in advance for your contributions!
<h3>Finally</h3>
I hope you enjoyed this post! I hope ODXL will be useful to you. If so, I look forward to getting your feedback on how it works for you and how we might improve it. Thanks for your Time!rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com4tag:blogger.com,1999:blog-15319370.post-64821836440893058592016-03-20T23:51:00.000+01:002016-03-21T01:56:45.417+01:00Installing the Open Source Xavier XML/A client on the Jedox Premium OLAP Suite<a href="http://www.jedox.com/en/" target="jedox">Jedox</a> is a software vendor that specializes in OLAP services and solutions. The company has been around quite a while and is probably best known for their <a href="https://en.wikipedia.org/wiki/Palo_(OLAP_database)" target="jedox">PALO</a> MOLAP engine and the matching add-in for Microsoft Excel.
<br/>
<br/>
Jedox' flagship product, Jedox Premium comprises the Palo MOLAP engine, API's, a REST server, and ETL server, and client tools. It also comes with a MDX interpreter and a XML for Analysis server. An interesting tidbit is that the MDX layer is not considered native, and Jedox' own clients use a lower level API, or address it via the REST service.
<br/>
<br/>
In this blog post I will explain how to install and configure the Open Source browser-based ad-hoc query and analysis tool Xavier to use it with Jedox. A video of the process is embedded below:<br/>
<iframe width="420" height="315" src="https://www.youtube.com/embed/18XoCj1aBz4" frameborder="0" allowfullscreen></iframe>
<br/>
<br/>
Here's a written list of instructions to get up and running with Xavier and Jedox:<ol>
<li>
<a href="http://www.jedox.com/en/product/free-software-trial/jedox-premium-trial" target="jedox">Download Jedox Premium</a>.
Run the downloaded installer to actually install the product.
By default, it will be installed in <code>C:\Program Files (x86)\Jedox\Jedox Suite</code>.
In the remainder of this post, I will refer to this directory as "the Jedox Suite directory".
</li>
<li>
<a href="https://github.com/rpbouman/xavier/blob/master/dist/xavier.zip?raw=true" target="xavier">Download xavier.zip</a>.
Unpack the zip. A xavier directory will be extracted.
</li>
<li>
Stop the JedoxSuiteHttpdService.
If you don't know about windows services, then <a href="https://technet.microsoft.com/en-us/library/cc736564(v=ws.10).aspx#BKMK_services" target="microsoft">look here</a>.
</li>
<li>
Copy the xavier directory that you extracted from xavier.zip into the <code>Jedox Suite\httpd\app\docroot</code> directory.
</li>
<li>
Open the <code>Jedox Suite/httpd/conf/httpd.conf</code> file in a text editor.
You should probably make a backup copy of the <code>httpd.conf</code> file before editing it so you can always revert your changes.
</li>
<li>
Add a line to load the HTTP proxy module.
To do that, search the <code>httpd.conf</code> file for a bunch of lines that start with <code>LoadModule</code>.
Look for a line that reads:
<br/>
<br/>
<code>LoadModule proxy_http_module modules/mod_proxy_http.so</code>
<br/>
<br/>
In my installation, the line is already present, like this:
<pre>
<IfDefine JDX_DEV>
LoadModule log_config_module modules/mod_log_config.so
<b>LoadModule proxy_http_module modules/mod_proxy_http.so</b>
LoadModule setenvif_module modules/mod_setenvif.so
</IfDefine>
</pre>
Now, what you'll want to do is to cut this line out of the <code><IfDefine JDX_DEV></code> block, and put it outside that block, for example, right before it, like this:
<pre>
<b>LoadModule proxy_http_module modules/mod_proxy_http.so</b>
<IfDefine JDX_DEV>
LoadModule log_config_module modules/mod_log_config.so
LoadModule setenvif_module modules/mod_setenvif.so
</IfDefine>
</pre>
</li>
<li>
Add a proxy configuration so that web applications deployed on the Apache HTTP server can access the Jedox XML/A service as if it lives in the same domain as the web application.
To do that, add a <code>Location</code> directive at the end of the <code>httpd.conf</code> file, like this:
<pre>
<Location /xavier/Xmla>
ProxyPass http://localhost:4242/xmla/
ProxyPassReverse http://localhost:4242/xmla/
SetEnv proxy-chain-auth
</Location>
</pre>
This allows a web application on the Apache HTTP server to access the XML/A service via the URL <code>/xavier/Xmla</code>.
By default, the place where the Jedox XML/A service lives is <code>http://localhost:4242/xmla</code>.
You can verify this by crosscecking this with the configation in <code>Jedox Suite\odbo\config.ini</code>:
the values for the <code>MDXAddress</code> and <code>MDXPort</code> should match the server and port in the URLs configured for <code>ProxyPass</code> and <code>ProxyPassReverse</code>
</li>
<li>
Save the changes to your httpd.conf file, and start the JedoxSuiteHttpdService.
If the service starts, you should be good to go.
If it doesn't, check the <code>Jedox Suite/log/apache_error.log</code> file and see if you can find some information there that can help you troubleshoot your problem.
</li>
</ol>
If all went well, you should now be able to navigate to <a target="http://localhost/xavier/resources/html/index.html" target="xavier">http://localhost/xavier/resources/html/index.html</a> and you should see the xavier welcome screen. Note that this assumes the Jedox HTTP server is running on its default port (80). If you chose another port for the HTTP server when installing Jedox, the URL for xavier would have to be amended respectively. For example, I chose port 8181, and hence my URL would be <code>http://localhost/xavier/resources/html/index.html</code> instead.
<br/>
<br/>
If you're in doubt what port you chose for your Jedox HTTP server, you can look it up in the <code>Jedox Suite/httpd/conf/httpd.conf</code> file. Look for a line that starts with <code>Define JDX_PORT_HTTP</code>. The port is specified right after that, enclosed in double quotes. rpboumanhttp://www.blogger.com/profile/13365137747952711328noreply@blogger.com1