Apache Spark ADO.NET Provider

Read, Write, and Update Spark from .NET Apps.

Easily connect Apache Spark SQL with BI, ETL, Reporting, and Custom Apps.


  download   buy now

Other Technologies


Spark Logo

The Spark ADO.NET Data Provider enables user to easily connect to Spark data from .NET applications. Rapidly create and deploy powerful .NET applications that integrate with Apache Spark.

Features

  • Maps SQL to Spark SQL, enabling direct standard SQL-92 access to Apache Spark
  • Fully compatible with the DataBricks Enterprise Platform
  • Connect to live Apache Spark SQL data, for real-time data access
  • Full support for data aggregation and complex JOINs in SQL queries
  • Secure connectivity through modern cryptography, including TLS 1.2, SHA-256, ECC, etc.
  • Seamless integration with leading BI, reporting, and ETL tools and with custom applications

Specifications

  • DataBind to Spark using standard Visual Studio wizards.
  • Comprehensive support for CRUD (Create, Read, Update, and Delete).
  • Supports ADO.NET Entity Framework (EF 5 & 6), LINQ to Datasets, etc.
  • Full Unicode support for data, parameter, & metadata.
  • Support for 32-bit and 64-bit operating systems.
  • Supports .NET Framework 4.0+ and .NET Standard 2.0 (.NET Core 2.1+, .NET 6.0).

Windows
macOS
Unix

AVAILABLE IN:
ADO.NET SUBSCRIPTIONS
200+ ADO.NET Drivers For Real-Time BI & Reporting.
LEARN MORE

Where can I use the Apache Spark ADO.NET Provider?

ADO.NET Access to Apache Spark SQL

Full-featured and consistent SQL access to any supported data source through ADO.NET


  • Fully-managed .NET

    100% fully managed ADO.NET libraries supporting .NET Standard, .NET Core 2.0, & Xamarin.

  • Developer Friendly

    Seamless integration with all versions of Visual Studio.

  • Powerful ADO.NET Features

    Including support for ADO.NET Entity Framework (EF 5 & 6), ADO.NET 2.0, LINQ to Datasets, etc.

  • Replication and Caching

    Our replication and caching commands make it easy to copy data to local and cloud data stores such as Oracle, SQL Server, Google Cloud SQL, etc. The replication commands include many features that allow for intelligent incremental updates to cached data.

  • String, Date, Numeric SQL Functions

    The driver includes a library of 50 plus functions that can manipulate column values into the desired result. Popular examples include Regex, JSON, and XML processing functions.

  • Collaborative Query Processing

    Our drivers enhance the data source's capabilities by additional client side processing, when needed, to enable analytic summaries of data such as SUM, AVG, MAX, MIN, etc.

  • Easily Customizable and Configurable

    The data model exposed by our ADO.NET Providers can easily be customized to add or remove tables/columns, change data types, etc. without requiring a new build. These customizations are supported at runtime using human-readable schema files that are easy to edit.

  • Secure Connectivity

    Includes standard Enterprise-class security features such as TLS/ SSL data encryption for all client-server communications.

Standard ADO.NET Access to Spark

The Apache Spark ADO.NET Provider offers the most natural way to access Spark data from any .NET application. Simply use Apache Spark Data Provider objects to connect and access data just as you would access any traditional database. You will be able to use the Apache Spark Data Provider through Visual Studio Server Explorer, in code through familiar classes, and in data controls like DataGridView, GridView, DataSet, etc.

The CData ADO.NET Provider for Apache Spark hides the complexity of accessing data and provides additional powerful security features, smart caching, batching, socket management, and more.

Working with DataAdapters, DataSets, DataTables, etc.

The Apache Spark Data Provider has the same ADO.NET architecture as the native .NET data providers for SQL Server and OLEDB, including: SparkConnection, SparkCommand, SparkDataAdapter, SparkDataReader, SparkDataSource, SparkParameter, etc. Because of this you can now access Spark data in an easy, familiar way.

For example:

using (SparkConnection conn = new SparkConnection("...")) {
	string select = "SELECT * FROM ApacheSpark";
	SparkCommand cmd = new SparkCommand(select, conn);
	SparkDataAdapter adapter = new SparkDataAdapter(cmd);
	using (adapter) {
		DataTable table = new DataTable();
		adapter.Fill(table);		
		...
	}
}

More Than Read-Only: Full Update/CRUD Support

Apache Spark Data Provider goes beyond read-only functionality to deliver full support for Create, Read, Update, and Delete operations (CRUD). Your end-users can interact with the data presented by the Apache Spark Data Provider as easily as interacting with a database table.

using (SparkConnection connection = new SparkConnection(connectionString)) {
	SparkDataAdapter dataAdapter = new SparkDataAdapter(
	"SELECT Id, Where FROM ApacheSpark", connection);
  
	dataAdapter.UpdateCommand = new SparkCommand(
		"UPDATE ApacheSpark SET Where = @Where " +
		"WHERE Id = @ID", connection);

	dataAdapter.UpdateCommand.Parameters.AddWithValue("@Where", "Where");
	dataAdapter.UpdateCommand.Parameters.AddWithValue("@Id", "80000173-1387137645");

	DataTable ApacheSparkTable = new DataTable();
	dataAdapter.Fill(ApacheSparkTable);

	DataRow firstrow = ApacheSparkTable.Rows[0];
	firstrow["Where"] = "New Location";

	dataAdapter.Update(ApacheSparkTable);
}

ADO.NET Provider Performance

With traditional approaches to remote access, performance bottlenecks can spell disaster for applications. Regardless if an application is created for internal use, a commercial project, web, or mobile application, slow performance can rapidly lead to project failure. Accessing data from any remote source has the potential to create these problems. Common issues include:

  1. Network Connections - Slow network connections and latency issues are common in mobile applications.
  2. Service Delays - Delays due to service interruptions, resulting in server hardware or software updates.
  3. Large Data - Intentional or unintentional requests for large amounts of data.
  4. Disconnects - Complete loss of network connectivity.

The CData ADO.NET Provider for Apache Spark solves these issues by supporting powerful smart caching technology that can greatly improve the performance and dramatically reduce application bottlenecks.

Smart Caching

Smart caching is a configurable option that works by storing queried data into a local database. Enabling smart caching creates a persistent local cache database that contains a replica of data retrieved from the remote source. The cache database is small, lightweight, blazing-fast, and it can be shared by multiple connections as persistent storage.

Caching with our ADO.NET Providers is highly configurable, including options for:

  • Auto Cache - Maintain an automatic local cache of data on all requests. The provider will automatically load data into the cache database each time you execute a SELECT query. Each row returned by the query will be inserted or updated as necessary into the corresponding table in the cache database.
  • Explicit Cache - Cache only on demand. Developers decide exactly what data gets stored in the cache and when it is updated. Explicit caching provides full control over the cache contents by using explicit execution of CACHE statements.
  • No Cache - All requests access only live data and no local cache file is created.

This powerful caching functionality increases application performance and allows applications to disconnect and continue limited functioning without writing code for additional local storage and/or data serialization/deserialization.

More information about ADO.NET Provider caching and best caching practices is available in the included help files.

Visual Studio Integration & Server Explorer

Working with the new Apache Spark ADO.NET Provider is easy. As a fully-managed .NET Data Provider, the Apache Spark Data Provider integrates seamlessly with the Visual Studio development environment as well as any .NET application.

As an ADO.NET Data Provider, Apache Spark ADO.NET Provider can be used to access and explore Apache Spark SQL data directly from the Visual Studio Server Explorer.


It's easy. As a standard ADO.NET adapter, developers can connect the Server Explorer to Apache Spark ADO.NET Provider just like connecting to any standard database.

  • Add a new Data Connection from the Server Explorer and select the Apache Spark SQL Data Source

  • Configure the basic connection properties to access your Apache Spark SQL account data.

Explore all of the data available! Apache Spark ADO.NET Provider makes it easy to access live Apache Spark SQL data from Visual Studio.

  • After configuring the connection, explore the feeds, views, and services provided by the Apache Spark SQL Data Source.
  • These constructs return live Apache Spark SQL data that developers can work with directly from within Visual Studio!

Developer Integration: Databind to Spark

Connecting Web, Desktop, and Mobile .NET applications with Apache Spark SQL is just like working with SQL Server. It is even possible to integrate Apache Spark ADO.NET Provider into applications without writing code.


Developers are free to access the Apache Spark ADO.NET Provider in whatever way they like best. Either visually through the Visual Studio Winforms or Webforms designers, or directly through code.

  • Developers can connect the Apache Spark SQL Data Source directly to form components by configuring the object's smart tags.

  • Add a new Data Connection from the Server Explorer and select the Apache Spark SQL Data Source. Then, select the feed, view, or services you would like to connect the object to.

Done! It's just like connecting to SQL Server.

  • Once the object is bound to the data source, applications can easily interact with Apache Spark SQL data with full read/write (CRUD) support.