For Really, Really, Big Data

Applications can now exploit all the memory available on large-scale servers, up to and beyond the terabyte range.


Image: NASA/ESA

 SmartArrays Release 5

64-Bit SmartArrays

Earlier versions of SmartArrays were constrained by the limits of 32-bit addressing, which prevented SmartArrays from holding more than roughly 1.5 gigabytes of data in memory at once. These limits have forced developers to adopt creative workarounds, such as partitioning data into segments, which complicates code and reduces performance. 64-bit SmartArrays essentially eliminates the address space constraint, allowing applications to fully exploit today’s large-memory servers and workstations with 64-bit operating systems. Applications can now exploit all the memory available on large-scale servers, up to and beyond the terabyte range.

64-bit SmartArrays requires a 64-bit operating system that uses 64-bit memory addresses. As in prior releases, the 32-bit version can be used in a 64-bit OS if it is incorporated in a 32-bit application.

Both 32-bit and 64-Bit Versions Included

Release 5 of the SmartArrays SDK ships with both 32-bit and 64-bit versions. Developers can select the appropriate library for the address model of the application being developed. The SmartArrays engine (native code library) comes in 32-bit and 64-bit variants, which must be matched with the appropriate .NET or Java wrapper library.

The SmartArrays engines, used with C++, Java, and .NET, are found in the \bin install directory:

  • com_smartarrays_engineV5.dll for 32-bit applications.
  • com_smartarrays_engineV5_x64.dll for 64-bit applications.

The SmartArrays wrapper class libraries for .NET are found in the \bin install directory:

  • SmartArraysV5.dll for 32-bit applications.
  • SmartArraysV5_x64.dll for 64-bit applications.

The SmartArrays wrapper class libraries for Java are found in the \lib install directory:

  • SmartArraysV5.jar for 32-bit applications.
  • SmartArraysV5_x64.jar for 64-bit applications.

Upgrading Your Applications to 64-Bit

32-bit and 64-bit versions of SmartArrays are essentially 100% compatible except for the limits on array size. The format of arrays and the behavior of array operations are identical, and upgrading an existing application is usually just a matter of linking to the 64bit version and recompiling (for .NET) or referencing the 64-bit libraries (for Java).

The data formats of saved SmartArrays arrays and SmData table and column objects remain the same. Data that was created by earlier releases of 32-bit SmartArrays can be loaded and used by both the 32 and 64 bit variants of Release 5.

Array Sizes and Limits in 64-bit

The 64-bit version of SmartArrays continues to use a 32-bit integer data type, which means that arrays can be indexed or manipulated with 32-bit integers as in prior releases. This approach provides binary compatibility with file-based SmartArrays data and conforms to the behavior of 64-bit Java and 64-bit .NET, both of which also keep the same size limits as their 32-bit counterparts. Note, however, that 64-bit SmartArrays handles larger arrays than either Java or .NET because SmartArrays uses its own internal memory management. Thus the largest possible double[] in either Java or .NET is 2 billion bytes (or about 268 million values, while the largest SmArray of type dtDouble is 2 billion values, or 16 billion bytes.

Comparison: 64 versus 32 Bit SmartArrays

The table below compares the size limits of a single array in SmartArrays in the 32-bit and 64-bit versions. For context, it also shows the maximum size of a Java int[] and .NET double[]array in both platforms.

Attribute 32 Bit 64 Bit
Maximum memory used 2GB (note1) 8TB (note 2)
Maximum array dimension 2,147,483,647 2,147,483,647
Maximum array memory size:
dtBoolean
dtByte
dtChar (2-byte)
dtInt
dtMixed
dtString
dtNested

268.4MB
2048 MB
1024 MB
512 MB
256 MB
107 MB
512 MB (note 3)
address space (note 4)

268.4 MB
2048 MB
4096 MB
8192 MB
16536 MB
40 GB
8192 MB (note 3)
address space (note 4)
Maximum size of a native int[] array
Java platform
.NET platform

512 MB
512 MB

512 MB
512 MB
Maximum size of a native double[] array
Java Platform
.NET platform

256 MB
256 MB

256 MB
256 MB

Notes:

1. In Windows Server 32-bit it is possible to increase the process address space to 3GB under certain circumstances.

2. This is the architectural limit of the 64 bit Windows memory model. Some versions have smaller limits. See http://msdn.microsoft.com/en-us/library/aa366778.aspx for details. Currently Windows cannot support more than 2TB of physical memory on servers and 128GB on workstations. Limits for Linux/Unix kernels are similar. No processor currently manufactured can decode addresses of more than 48 bits (256TB).

3. Size of the string array itself, excluding the space occupied by the strings in the string table.

4. A treeucture array can contain an arbitrary number of arrays at its leaves; therefore the limit is the amount of virtual memory that SmartArrays can obtain from the operating system.

Memory Mapping Very Large Files

SmartArrays is able to memory map portions of arbitrarily large flat files. This is true for both the 32-bit and 64-bit versions. However, the maximum size of the region that can be mapped to a single array is limited by the addressability. With the 32-bit engine, you can address up to 2GB of memory at any location in the file, or you can have any number of separate arrays mapped to different parts of the files at once, provided that the total amount of virtual address space used does not exceed the OS process limit of 2GB. With the 64-bit version you can map up to 2G data values (i.e. 16GB for of doubles, 8GB of ints) at any portion on the file, and any number of separate arrays can be mapped concurrently, limited only by the virtual memory available from the operating system.

You can specify the offset parameter to SmArray.fileArray() as an integer-valued double to map a region that begins more than 2 billion bytes into a large file.

Developing and Deploying 64-Bit ASP.NET Web Applications

We recommend the use of IIS (Internet Information Services) version 7 or newer when deploying 64-bit ASP.NET applications because it allows 64-bit and 32-bit application pools to be used simultaneously. It is necessary to create an application pool, and set Enable 32-Bit Applications to False. Then your 64-bit web application can be added to this application pool.

Developing and debugging an ASP.NET web application that targets 64-bit processes can be a challenge. The ASP.NET test web server program is a handy way to debug web applications within Visual Studio, but it is only available as a 32-bit program and therefore cannot host a 64-bit ASP.NET project. One approach is to develop and debug using 32-bit IIS, then move the application to IIS on a 64-bit operating system. However, we are investigating other tools that can assist in this effort and will post notes on our web site on the ones that work well.

Running 64-Bit SmartArrays with Java

SmartArrays provides two separate .jar files of wrapper classes for using SmartArrays with Java:

  • Smartarraysv5.jar is the Java wrapper for the 32-bit SmartArrays engine
  • SmartArraysV5_x64 is the Java wrapper for the 64-bit SmartArrays engine. Despite the “x64” in the name, this is the correct Java library to use with any 64-bit processor supported by SmartArrays such as SPARC or Itanium.

Internally, there is no difference between these wrapper libraries except that one loads the 32-bit SmartArrays engine and the other uses the 64-bit engine. You will need to use an appropriate 32-bit or 64-bit JVM (Java Virtual Machine) that matches the SmartArrays engine. It is not possible to use the 32-bit engine with the 64-bit .jar file and vice versa.

The Java command line launcher java.exe accepts the –d32 and –d64 parameters and loads the appropriate JVM. Your code can tell whether it is running under 32-bit or 64-bit by examining the value of the system property sun.arch.data.model.

Additonal useful information on Java platforms and general JVM tuning can be obtained here: http://www.oracle.com/technetwork/java/hotspotfaq-138619.html