Friday, December 21, 2012

Javascript Dagre & D3 to visualize Big Data dataflows

Big Data projects often involve a lot of stovepipe processing, and visualizing data flows is a powerful way to convey data provenance to the end user, and even allow control of the involved processes.

There are a number of tools available to visualize data flows, but most suffer from some limitation. Some allow labeling of only nodes, but not edges. Some do not have a provision for the node label to be inside of the node. Most use general-purpose graph layout algorithms such as "gravity", whereas dataflow diagrams have distinct start nodes and end nodes and are better represented mostly orthoganlly either top-down or left-right. (The well-known dataflow language G from LabVIEW is left-right, but top-down is better suited for web browsers.)  Graphviz can generate a nice top-down layout, but for the web can only at this time produce static images (albeit dynamically), with no mouseovers to facilitate drill-downs into processes or data stores.

Dagre, a Javascript library built on top of the D3.js visualization toolkit, is very well-suited to visualizing Big Data data flows. Below is an example (contrived, but illustrates the idea):

Dagre generated the above from the succinct input below:

digraph {
    A [label="Apache"];
    B [label=""]
    C [label="parseaccesslogs.pig"];
    D [label="parseerrorlogs.pig"];
    E [label="PageViewsPer503"];
    A -> B [label="access_log"];
    A -> B [label="error_log"];
    B -> C [label="/logs/access_log_$DATE.txt"]
    B -> D [label="/logs/error_log_$DATE.txt"]
    C -> E [label="MySQL:VISITORS"];
    D -> E [label="MySQL:ERRORS"];

You can paste it into Dagre's live demo yourself, and you can see that mouseover highlighting is possible, meaning it is possible to hook it to do drill-downs.

Saturday, December 1, 2012

Apache Thrift in Windows

Apache Thrift, originally developed by Facebook, is an immensely useful general-purpose inter-process communication (IPC) code generation tool and library. Although it supports a variety of IPC mechanisms, sockets are its primary conduit, and as such, is naturally language agnostic and actually its tool can generate code for a dozen different languages.

I use it to provide PHP web interfaces to monitor and control C++ scientific/industrial semi-embedded systems (desktop PCs loaded with data acquisition and control hardware).

Sometimes, those PCs are running Windows. With the recent 0.90 release, Apache Thrift support for Windows is leaps and bounds beyond what it used to be, but it's still "only" 98%. Here are the missing steps:

  1. First of all the good news: Use on Windows no longer requires Cygwin or MinGW, despite what the outdated documentation states.
  2. You can download a pre-built Thrift compiler directly from
  3. You will, however, still need to compile the Thrift libraries yourself, if you plan to use Thrift with a compiled language such as C++. Thankfully, the Thrift distribution comes with a Microsoft Visual C++ .sln solution file. The thing to know, however, is that it is a Visual C++ 2010 .sln file, and will not work work with Visual C++ 2008. You can use Visual Studio 2012, but recall Visual Studio 2012 does not work with XP, which I still use for development because of both data acquisition hardware drivers and some legacy software development tools (for some legacy codebases). Thankfully, you can use the freely available Visual C++ 2010 Express, which is still available for download even though Visual Studio 2012 has been released. To download an ISO (to preserve your ability to reinstall in the future) instead of a stub/Internet download, select the option for the "All-in-One ISO".
  4. The \thrift-0.9.0\lib\cpp\thrift.sln contains two projects: libthrift and libthriftnb. The libthriftnb is for the non-blocking server, and if you want to use it from a server, you must link in both libthrift and libthriftnb, as well as utilize TNonblockingServer instead of TSimpleServer. Note that "non-blocking" means non-blocking from the client perspective. On the server side, the call server->serve() actually blocks. To make either TNonblockingServer or TSimpleServer non-blocking from the server code perspective, just wrap it inside a new boost::thread().
  5. Compiling libthriftnb is trickier. First it requires libevent. To compile libevent, Start->All Programs->Microsoft Visual Studio 2010 Express->Visual Studio Command Prompt (2010), navigate to the libevent directory, and nmake -f Makefile.nmake. Second, libthriftnb pulls in Thrift library code that does #include <tr1/functional>, but since Visual C++ 2010 doesn't support TR1, you can just replace it with <boost/functional.hpp>.
  6. To compile the libthrift project (and this applies to libthriftnb as well), from the Microsoft Visual C++ drop-down menu, Project->Properties and Configuration Properties->C++->General->Additional Include Directories: C:\Program Files\boost\boost_1_51 (of course download Boost first).
  7. Then, to compile your Visual C++ server code that links to libthrift, from the Microsoft Visual C++ drop-down menu, Project->Properties:
    • Configuration Properties->C/C++->General->Additional Include Directories: C:\Program Files\boost\boost_1_41;C:\thrift-0.9.0\lib\cpp\src (for libthriftnb, also include C:\libevent-2.0.21-stable\include;C:\libevent-2.0.21-stable\WIN32-Code;C:\libevent-2.0.21-stable)
    • Configuration Properties->Linker->General->Additional Library Directories: C:\thrift-0.9.0\lib\cpp\Release;C:\Program Files\boost\boost_1_51\lib
    • Configuration Properties->Linker->Input->Additional Dependencies: libboost_thread-vc100-mt-1_51.lib;libboost_chrono-vc100-mt-1_51.lib;libthrift.lib. (For the Debug version, substitute mt-gd for mt.)
  8. In your server code, include the following code prior to invocation of any of the Thrift code:
    WSADATA wsaData = {};
    WORD wVersionRequested = MAKEWORD(2, 2);
    int err = WSAStartup(wVersionRequested, &wsaData);

Tuesday, October 30, 2012

Installing 64-bit drivers from 32-bit installer

If you are using a 32-bit Windows installer, it is not straightforward to have it install a 64-bit driver. There are at least two reasons why you might be in this situation:
  1. Your installer software is not the latest (or maybe doesn't even have a 64-bit version yet) ... or ...
  2. Most of your components are 32-bit with just one or two that you want to differentiate 32 vs 64 bit.
The problem arises because shelling out to msiexec.exe from a 32-bit installer (and in the case of InstallShield, whether that be from InstallScript or as a Custom Action), the 32-bit C:\Windows\SysWOW64\msiexec.exe gets executed instead of the 64-bit C:\Windows\System32\msiexec.exe.

The basic answer comes from technet and the VB.Net code below is adapted from that with a slight improvement. By compiling the VB.Net code into an executable and shelling to that as an intermediary, the 32-bit world can be escaped from. The slight improvement to the code below is it preserves quotes around quoted arguments such as pathnames with spaces.

Module Module1
    Sub Main()
    Dim arrArgs As Array
    Dim Args As String = ""
    Dim intCount As Integer = 0

    arrArgs = System.Environment.GetCommandLineArgs()
    For Each Arg In arrArgs
        If intCount <> 0 Then
            If Arg.IndexOf(" ") > -1 Then
                Args = Args & " """ & Arg & """"
                Args = Args & " " & Arg
            End If
        End If
        intCount = intCount + 1
    Shell("cmd.exe /C" & Args, AppWinStyle.NormalFocus, True)
    End Sub
End Module

Then the InstallScript to invoke it is below.  It detects whether the OS is 64-bit, and if so installs the 64-bit drivers via the VB.Net code above (which is compiled to an executable cmd64.exe); otherwise, it installs the 32-bit drivers.

if ( REMOVEALLMODE=0 ) then
    if (Is(FILE_EXISTS, WINSYSDIR^"CsSsm.dll") = FALSE) then
        if (SYSINFO.bIsWow64) then
            svProgramCmd64 = TARGETDIR^"GaGe64\\cmd64.exe";
            svCmd64MsiExecPath = WINSYSDIR64^"msiexec.exe";
            svCmd64MsiPath = TARGETDIR^"GaGe64\\CompuScope.msi";
            svCmd64Param = svCmd64MsiExecPath + " /i " +

                           svCmd64MsiPath + " /passive /norestart";
            svProgramMsiExec = WINSYSDIR^"msiexec.exe";
            svGaGe32MsiPath = TARGETDIR^"GaGe32\\CompuScope.msi";
            svGaGe32Param = "/i " + svGaGe32MsiPath +

                            " /passive /norestart";

The code above is for installing drivers for a GaGe CompuScope analog-to-digital converter board.  I am a user of GaGe boards, not an employee or representative of GaGe.

Sunday, October 21, 2012

XML/XSL/HTML5 for reports instead of PDF

Since video of my actual presentation to the Denver HTML5 Meetup on October 22, 2012 won't be posted for a few more months, I quickly recorded the 10-minute YouTube below.

Below are the slides.

XML/XSL/HTML5 for reports instead of PDF

The official documentation on embedding XSL in XML actually dates from circa 2000.  Firefox still allows it.  Here is the overall structure of the .XML file:

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="#stylesheet"?>
<!DOCTYPE doc [
<!ATTLIST xsl:stylesheet
 <!--Start XSL-->
 <xsl:stylesheet id="stylesheet"
  xmlns:xsl="" >

  <xsl:template match="xsl:stylesheet" />
  <xsl:template match="/doc">
     <style type="text/css">

       <!-- Your CSS goes here -->
     <script type="text/javascript">
       <!-- Your Javascript goes here -->

       <!-- Whatever is in here will get printed at the top of every page -->
       <!-- Main HTML, including "canvas" tags etc. -->

 <!--Start XML-->
  <datapoint x="2.1" y="3.0" />
  <datapoint x="2.9" y="5.2" />

Below is the bit of magic to draw up the XML data into Javascript memory space.  Assuming there is a Javascript constructor called ChartSeries that takes four parameters )name, array of x values, array of y values, color), the code below uses XSL to shove the x values inline in a comma-separated manner into the Javascript.

var mychartseries = new ChartSeries("Channel 1",[0
 <xsl:for-each select="seriesdata/datapoint">
  <xsl:value-of select="concat(',',@x)"/>
 <xsl:for-each select="seriesdata/datapoint">
  <xsl:value-of select="concat(',',@y)"/>
 ], "Yellow");

Friday, October 19, 2012

Memory writes expensive but parallelizable on Radeon GPGPU

Using a Radeon 7970 as a GPGPU, I was running into some seeming limitations on how quickly I could download data off of the board into main CPU RAM.  There seemed to be about a 20MB/sec limitation for the board, which is of course nowhere near the 16 GB/sec limit of PCI 3.0 x16.  It turns out the limitation is for a single work unit (out of the 2048 work units/processors on the board).  It also turns out that because writes to global memory (i.e. memory sharable with the CPU host) are so expensive, it can often become more important to parallelize the memory writes than to parallelize the computations!  To me, this was counterintuitive because I envisioned writes to shared memory as being serial and fast, but they instead seem to be on some kind of time multiplex for the multiple work units.

Consider the following code that computes the first 1024 Fibonacci numbers, and does so 1024 times over:

#include <iostream>
#include <Windows.h>
#include <CL/cl.h>

int main(int argc, char ** argv) {
 cl_platform_id platform;
 clGetPlatformIDs(1, &platform, NULL);
 cl_device_id device;
 clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
 cl_context context = clCreateContext(NULL, 1, &device, NULL, NULL, NULL);
 cl_command_queue queue = clCreateCommandQueue(context, device, 0, NULL);
 const char *source =
 "__kernel void fibonacci(__global double* dst) {\n"
 "    __local double buff[1026];\n"
 "    buff[0] = 0, buff[1] = 1;\n"
 "    for (int i = 0; i < 1024; i++) {\n"
 "        for (int j = 0; j < 1024; j++)\n"
 "            buff[j+2] = buff[j+1] + buff[j];\n"
 "        async_work_group_copy(&dst[i*1024], &buff[2], 1024, 0);\n"
 "    }\n"
 const size_t global_work_size = 1;
 cl_program program = clCreateProgramWithSource(context, 1, &source, NULL, NULL);
 clBuildProgram( program, 1, &device, NULL, NULL, NULL);
 cl_kernel kernel = clCreateKernel( program, "fibonacci", NULL);
 cl_mem buf = clCreateBuffer(context, CL_MEM_WRITE_ONLY, 1024 * 1024 * 8, NULL, NULL);
 clSetKernelArg(kernel, 0, sizeof(buf), (void*)&buf);
 LARGE_INTEGER pcFreq = {}, pcStart = {}, pcEnd = {};
 clEnqueueNDRangeKernel(queue, kernel, 1, NULL, &global_work_size, NULL, 0, NULL, NULL);
 std::cout << 8.0 * pcFreq.QuadPart / (pcEnd.QuadPart-pcStart.QuadPart) << "MB/sec";

Running on an i5-2500, the benchmarks are:
As-is: 21 MB/sec
Memory transfer commented out: 2814 MB/sec
Inner for loop commented out: 38 MB/sec

Clearly the memory transfer is taking the bulk of the time, and the computation of Fibonacci numbers hardly any time at all.  The way to speed it up is to speed up the memory write, but what could possibly be faster than async_work_group_copy()?  It turns out there is a bit of intelligent cache maintenance going on behind the scenes.  If we can write to buff[] from multiple work units, then async_work_group_copy() can pull the data from the memory associated with multiple work units, and it goes much faster.

But how can Fibonacci be parallelized, when it is seemingly a serial, recursive calculation?  We can do so with lookahead.  Based on the basic calculation x2 = x1 + x0, we have:
x3 = x2 + x1
   = x1 + x0 + x1
   = 2 * x1 + x0
x4 = x3 + x2
   = 2 * x1 + x0 + x1 + x0
   = 3 * x1 + 2 * x0
x5 = x4 + x3
   = 3 * x1 + 2 * x0 + 2 * x1 + x0
   = 5 * x1 + 3 * x0
x6 = x5 + x4
   = 5 * x1 + 3 * x0 + 3 * x1 + 2 * x0
   = 8 * x1 + 5 * x0
x7 = x6 + x5
   = 8 * x1 + 5 * x0 + 5 * x1 + 3 * x0
   = 13 * x1 + 8 * x0
x8 = x7 + x6
   = 13 * x1 + 8 * x0 + 8 * x1 + 5 * x0
   = 21 * x1 + 13 * x0
x9 = x8 + x7
   = 21 * x1 + 13 * x0 + 13 * x1 + 8 * x0
   = 34 * x1 + 21 * x0
And our new parallel code is:

 const char *source =
 "__kernel void fibonacci(__global double* dst) {\n"
 "    __local double buff[1026];\n"
 "    __private double coef[8][2] = {{1,1}, {2,1}, {3,2}, {5,3},\n"

 "                                   {8,5}, {13,8}, {21,13}, {34,21}};\n"
 "    buff[0] = 0, buff[1] = 1;\n"
 "    for (int i = 0; i < 1024; i++) {\n"
 "        for (int j = 0; j < 1024; j += 8)\n"
 "            buff[j+2+get_global_id(0)] =\n"

 "                coef[get_global_id(0)][0] * buff[j+1]\n"
 "                + coef[get_global_id(0)][1] * buff[j];\n"
 "        async_work_group_copy(&dst[i*1024], &buff[2], 1024, 0);\n"
 "    }\n"
 const size_t global_work_size = 8;

This runs at
8 work units: 122 MB/sec
That's a 6x speedup for increasing the number of work units by 8x!  We could no doubt speed it up even more by increasing the look-ahead to increase the number of work units.

Recall that when we commented out the computation completely it was only 38 MB/sec, so the speedup is from parallelizing the memory writes, not from parallelizing the computation.

Thanks once again to the folks at in helping me work through this.

Tuesday, October 2, 2012

Supercomputing for $500

Desktop supercomputing is now cheap, mainstream, and mature.  Using GPGPU (General Purpose computing on a Graphics Processing Unit), you can write C programs that execute 25x as fast as a high-end desktop computer alone for just $500 more.

The OpenCL standard, started in 2008, is now mature.  It provides a way for C/C++ programs on Windows and Linux to compile and load special OpenCL C programs onto GPGPUs, which are just off-the-shelf high-end graphics cards that videogame enthusiasts usually buy.  When you buy one of these cards for your supercomputing project, expect lots of snickers from your purchasing or shipping/receiving department when it arrives with computer videogame monsters on the box.

As an example, the approx. $500 Radeon 7970 has 2048 processing cores on it, each capable of double-precision floating point running at about 1 GHz executing on average one double-precision floating point operation per clock cycle.  The double-precision is actually new to this generation of Radeon and the OpenCL PDF document standard hasn't even been updated yet to include the data type, even though the API SDK header files have been.

Using the freeware GPU Caps software, the Radeon 7970 by itself (without assistance from my desktop computer's 3.3 Ghz Intel i5 2500) clocks in at 25x the computational power of the four-core (single processor) Intel i5 by itself.

To get a dual-processor Intel motherboard and second Intel processor is a $1000 increment, and that's only a 2x speedup, so a 25x speedup for a $500 increment isn't just a better deal, it's a new paradigm.  As Douglas Englebart said, a large enough quantitative change produces a qualitative change.

Up to four such cards can be ganged together in a single computer for a total 100x speedup.  But since each card is physically three cards wide (to accommodate the built-in liquid cooling and fans) even though it has just one PCIe connector, you will need a special rack-mount motherboard to go to that extreme (note I have not tried this!).

By comparison, to go 100x in the other direction, to get a computer with 1% of the computation power of my desktop i5, it would require going back 15 years to a Pentium II.  So a four-Radeon system represents a sudden 15-year leap into the future.

Sunday, September 9, 2012

libav in C++

libav, which is intended for codecs, also serves as a nice signal processing library for scientific applications as it has assembly FFT routines optimized for various processors including x86 SSE and ARM NEON.  Written in C for C, it can be a little tricky to use in C++.

The first trick is to wrap the #includes within extern "C":

extern "C" {
    #include <libavutil/avutil.h>
    #include <libavcodec/avfft.h>

The second trick is that to use the nice C++ array types std::vector or boost::multi_array, it is necessary to use a custom allocator class that calls the libav av_malloc() and av_free().  The reason is that these ensure the 32-byte alignment that libav assumes (without checking every time) is present when it utilizes SIMD instructions.  Without a custom allocator, std::vector and boost::multi_array just use new[], which does not allocate aligned, and libav in the process of ANDing addresses ends up running past the end of the buffer and generating a segmentation fault.

The code below uses a custom allocator adapted from The C++ Standard Library -- a Tutorial and Reference.  The advantage of using std::vector or boost::multi_array, of course, is automated memory management using the Resource Acquisition Is Initialization pattern/idiom, similar to C++ auto_ptr (which can't be used for arrays because it is hard-coded to delete instead of delete[]).  Although std::vector is used below, the same allocator works equally well with boost::multi_array.


#include <cstring>
#include <vector>

extern "C" {
#include <libavutil/avutil.h>
#include <libavcodec/avfft.h>

template <typename T> class allocator_av {
    typedef T               value_type;
    typedef T*              pointer;
    typedef const T*        const_pointer;
    typedef T&              reference;
    typedef const T&        const_reference;
    typedef std::size_t     size_type;

    template <typename U>
    struct rebind {
        typedef allocator_av<U> other;

    allocator_av() {}
    template <typename U> allocator_av(const allocator_av<U>&) {}
    size_type max_size () const { return 1 << 16; }

    pointer allocate (size_type num, const void* = 0) {
        return  static_cast<pointer>(av_malloc(num*sizeof(T)));

    void construct (pointer p, const T& value) {}
    void destroy (pointer p) {}
    void deallocate (pointer p, size_type num) { av_free(p); }

int main(int argc, char** argv) {
    std::vector<FFTComplex, allocator_av<FFTComplex> > z(256);
    FFTContext* c = av_fft_init(8, 0);

Tuesday, August 28, 2012

Samsung 11.8 tablet key for signal processing

The Samsung 11.8, information about which came out of the Apple lawsuit, is a good match for in-the-field scientific/data acquisition/NDT applications.  Why?  Because it will be the first tablet with an ARM Cortex-A15 processor, specifically the Samsung Exynos 5.

It's just an ARM processor, right?  ARM is extremely low power consumption, but mediocre computational performance, right?  Wrong -- ARM has grown up.  ARM has maintained its extreme low power consumption while catching up to Intel performance.  And the Cortex-A15 is a huge step forward in that regard.  It features NEON, which is the ARM equivalent to Intel MMX/SSE/AVX vector processing, which speeds up by multiples the signal and image processing used in scientific computing.  And the NEON in Cortex-A15 can do double-precision.  Oddly, the current generation Samsung Galaxy tablet, 10.1, dropped NEON even though its even older predecessor had it.  But the Samsung 11.8 Cortex-A15 NEON has 128-bit ALUs instead of the 64-bit ALUs that the earlier ARM NEON processors had.

The Samsung 10.1 has thus far escaped injunction from Apple, so there is hope the 11.8 will as well.  The iPad won't get Cortex-A15 until the iPad 5.  There is hope that the Samsung 11.8 will be unveiled tomorrow at the Unpacked event in Berlin.

Monday, August 27, 2012

C++ regex source string cannot be a temporary

regex is new to C++11 and was available via Boost before then.  In playing around with it, I could not figure out why the following worked sometimes, but not all the time:

// Bad code
smatch sm1;
regex_search(string("helloworld.jpg"), sm1,

cout << sm1[1] << endl;

Then, thanks to I learned that smatch retains just pointers to the source string "helloworld.jpg" and not actual substrings.  So when the temporary object string("helloworld.jpg") gets released, the smatch pointers are left pointing to released (and thus undefined) memory.

The correct code is:

// Good code
smatch sm1;
string s1("helloworld.jpg")
regex_search(s1, sm1, regex("(.*)jpg"));
cout << sm1[1] << endl;

Now, in real code you wouldn't pass a hard-coded temporary object, but during debug/development you might to test out how regex handles different scenarios, which is what I was trying to do when I ran into this.

Thursday, June 21, 2012

W7 can finally rotate bitmap fonts

With HTML5 canvas and its built-in ability to rotate text, I came across (i.e. learned the hard way) about the limitations of bitmap fonts, such as MS Sans Serif, in XP.  XP has trouble rotating bitmap fonts at small point sizes, while Windows 7 does not.  So the following HTML5 canvas code works in Windows 7 & Firefox but fails under XP & Firefox:
  <script type="text/javascript">
   window.onload = function() {
    var ctx = document.getElementById('canvas').getContext('2d');
    ctx.font = "9px MS Sans Serif";
    ctx.fillText("TEST", 10, 10);
  <canvas id="canvas" width=800 height=500 />
Switching to a TrueType font such as Arial solves the problem.  So the lesson is to avoid bitmap fonts such as MS Sans Serif when using HTML5 Canvas.

This change in font handling for Windows 7 is probably related to Microsoft's revamping of font scaling to deal with high-resolution netbooks with tiny screens.

Sunday, June 17, 2012

HTML5 is the holy grail

I was initially excited about the UI design philosophy of Win8 Metro. But then I realized that HTML5 can do 95% of what Metro can, and also be truly cross-platform.

I see HTML5 as the cross-platform holy grail that developers have been seeking since the WORA days of Java 15 years ago. First it was supposed to be Java, then Microsoft embraced and extinguished it, and besides it had too big of a footprint download (and a clumsy download process to boot). Then Flash was supposed to be the universal small-footprint. It was just about to take off, then Apple extinguished it by not supporting it at all (completely skipping the "embracing" step). Then Microsoft finally decided to stop holding back .NET from web development -- the purpose for which it seemingly was originally designed but never delivered upon until Silverlight. But by then Windows market share was too small for Microsoft to force a Windows-only solution on the web world.

Even when it comes to CPU-intensive signal & image processing, Javascript seems to be "fast enough".  E.g. these guys show real-time 2-D FFT. Admittedly, the combination of SSE/AVX and multi-threading/multi-core would have provided a 30x speedup, but I've been playing around with real-time 2D graphics in JavaScript and have been amazed at its performance. I was even guilty of premature optimization -- I started out coding for double-buffering the graphics with two Canvases and ended up throwing out the double-buffering because with just one Canvas there was no flicker.

On today's processors, Javascript will be "fast enough" for many applications. E.g., I do scientific software for a living, and I'm partitioning the work into what has to be done natively -- mostly the acquisition and crunching of tens of gigabytes of data at a time -- vs. what can be done cross-platform -- the final post-processing of tens of megabytes of pre-processed data.

Javascript can even access hardware-accelerated 3D with WebGL.

HTML5 is W3C standard. It's not Sun. It's not Adobe. It's not Microsoft. It's W3C.

HTML5 is the holy grail of WORA.

Wednesday, February 22, 2012

ProgramData virtualization

It's not enough to move data stores from Program Files to ProgramData when upgrading applications from XP to Vista/W7. It's also necessary to, in the application installation, set the permissions on those files to read/write for the group "Users". Otherwise, Vista/W7 will silently virtualize those files into shadow copies to the Users folder on a per-user basis.