Understanding SmartArrays

From a programmer's point of view, SmartArrays is all of the following:

From a software architecture perspective, SmartArrays consists of an array engine and wrapper classes for using that engine with an object-oriented programming language. The SmartArrays array engine is packaged as a dynamic link library (DLL) under Windows or as a shared object library under Linux and Unix. It performs all the work of allocating memory to hold arrays, storing data in the arrays, and performing operations on those arrays.

The SmartArrays wrapper classes provide an object model for working with data. These classes are provided for different languages, with each version implementing the same set of array methods.

The different language interfaces all use the same array engine executable library.

The SmartArrays classes are consistent across different language wrappers. This means that code that uses SmartArrays objects is highly portable. Algorithms expressed as array operations are almost identical in all the supported languages. This allows a data-intensive application written in Java to be readily translated to C#. The only code changes needed are those compelled by syntactic differences of the languages themselves.

Because all the various SmartArrays Developer Kits use exactly the same executable component, performance is consistent across languages. Therefore, data-intensive algorithms will provide essentially identical performance, no matter which language you choose. Debates about the speed differences of different languages are largely moot for applications based on SmartArrays. Instead, you can expect consistent performance no matter which language you use.

Virtual Databases

Most applications of SmartArrays are built around the virtual database classes provided in the SmartArrays data management toolkit.  These classes provide a framework for structuring analytical data in a form that resembles the tables and columns of a relational database.  Other classes provide a variety of analytical data manipulations and calculations on bulk data.  Tools also exist to import bulk data from external databases into SmartArrays data objects.

Array Thinking

Most programmers are able to build SmartArrays-based programs quickly and master the syntax and the most commonly used array operations in a few days. But if you have not worked with an array-oriented approach before, you will probably find that it takes longer than that to become fluent in array-oriented thinking. This is to be expected -- you're stretching your mind around a new paradigm.

If you are old enough to remember the days before object-oriented programming, you will understand that it takes a little time for a new way of thinking about algorithms to settle in. When you learned your first object-oriented language, you probably had to make a conscious effort to express your ideas in an object model. With time, you probably found that thinking in objects became second nature and that you automatically tackled new problems by thinking in terms of classes and objects.

Array programming is a similar conceptual shift. As you learn the vocabulary of array operations and practice putting them to use, you will find that it gets easier and easier. You will visualize collections of data as arrays and algorithms as sequences of array operations. With practice, thinking in an array model will become second nature, like thinking in an object model. At this point you will find that you can develop fast, data-intensive programs very rapidly, writing a few lines of code in minutes that might have taken pages of code and days of work before SmartArrays.

Array Programming

The first and essential concept to grasp with programming with SmartArrays is that there is only one array class: SmArray. Every array is an instance of SmArray, no matter how large or small, how simple or complex its structure, or what kind of data it holds.

Class SmArray provides a large vocabulary of array methods. Most of these methods follow a consistent pattern of syntax: they operate on the array object, perhaps taking other arrays as arguments, and return a new array object as their result. 

The metadata about an array - its shape and type - are stored inside the array itself. This allows a great deal of flexibility in working with arrays. Your program can, for example, express a calculation as operations on numbers without concern for the internal representation. SmartArrays will automatically select an internal type for data that holds the answer correctly. Similarly, SmartArrays automatically enforces array bounds and conformability, producing an appropriate exception instead of a more serious failure.

Because all SmartArrays objects are of the same class, any array can in principle interoperate with any other array. This provides one of the key benefits to working with SmartArrays - the ability to take data from different sources, put it all on an equal footing, and make it work together.

The data management tookit classes are all built using SmArray objects as their data containers.

Array Data

SmartArrays defines exactly three conceptual data types - number, character, and string - though it uses a wider variety of internal data representations. As far as array operations are concerned, you need not be concerned with issues of data representation. Instead, you can concentrate on the computation being performed and let SmartArrays select the appropriate internal data representation. Most programming languages and databases require a numeric variable to be declared as boolean, integer or real; with SmartArrays, a number is a number. If a number is integral and not too large, it will be stored as a 32-bit integer. If a value computed in an array is too large to fit in an integer field, or if it needs a fractional part, SmartArrays automatically changes the internal representation to a real number. Nothing in your program changes. This has profound implications should you ever change to a different machine architecture with different machine data types (like the emerging 64-bit architectures). It's only when you need to move data between a SmartArray and your traditional program do you need to worry about data types because you will have declared data in C++ or Java to be one of their strict data types.

The methods of SmartArrays are all array oriented. You don't build an array just so you can iterate through its items, you apply operations to whole arrays generating a whole array as the result. You write lines of code in your native language as though it were an array processing language. Thus, SmartArrays aren't designed to hold data, they are designed to crunch data.

Often a SmartArray will contain all items of the same type - all numbers or all strings. However this is not required. You can have an array that contains numbers in some positions and strings in other positions. For example, a relational table can be represented as a matrix where each column contains either all numbers or all strings.

Beyond this, a SmartArray may contain other SmartArrays. Such an array is said to be nested. Again, the value of a nested SmartArray is not just that it contains other SmartArrays but that you can compute on the array. For example, you can take a SmartArray method designed to operate on a whole array and instead apply it to each item of an array. This is the multi-dimensional analogue of control structures or iterators in your traditional code. The number of times the method is applied depends on the number of items in the array, not on controls the programmer explicitly puts into the program. This can lead to very compact programs that express some mathematical computation in not so many more characters that the mathematics itself.

Thus, well written SmartArrays programs can be quite concise. This means that programmers are more productive and perhaps more important, the programs they produce are more maintainable. - Less code is good!.