Discover how a bimodal integration strategy can address the major data management challenges facing your organization today.
Get the Report →How to Access HDFS Data Using Entity Framework
This article shows how to access HDFS data using an Entity Framework code-first approach. Entity Framework 6 is available in .NET 4.5 and above.
Microsoft Entity Framework serves as an object-relational mapping framework for working with data represented as objects. Although Visual Studio offers the ADO.NET Entity Data Model wizard to automatically generate the Entity Model, this model-first approach may present challenges when your data source undergoes changes or when you require greater control over entity operations. In this article, we will delve into the code-first approach for accessing HDFS data through the CData ADO.NET Provider, providing you with more flexibility and control.
- Open Visual Studio and create a new Windows Form Application. This article uses a C# project with .NET 4.5.
- Run the command 'Install-Package EntityFramework' in the Package Manger Console in Visual Studio to install the latest release of Entity Framework.
Modify the App.config file in the project to add a reference to the HDFS Entity Framework 6 assembly and the connection string.
In order to authenticate, set the following connection properties:
- Host: Set this value to the host of your HDFS installation.
- Port: Set this value to the port of your HDFS installation. Default port: 50070
<configuration> ... <connectionStrings> <add name="HDFSContext" connectionString="Offline=False;Host=sandbox-hdp.hortonworks.com;Port=50070;Path=/user/root;User=root;" providerName="System.Data.CData.HDFS" /> </connectionStrings> <entityFramework> <providers> ... <provider invariantName="System.Data.CData.HDFS" type="System.Data.CData.HDFS.HDFSProviderServices, System.Data.CData.HDFS.Entities.EF6" /> </providers> <entityFramework> </configuration> </code>
- Add a reference to System.Data.CData.HDFS.Entities.EF6.dll, located in the lib -> 4.0 subfolder in the installation directory.
- Build the project at this point to ensure everything is working correctly. Once that's done, you can start coding using Entity Framework.
- Add a new .cs file to the project and add a class to it. This will be your database context, and it will extend the DbContext class. In the example, this class is named HDFSContext. The following code example overrides the OnModelCreating method to make the following changes:
- Remove PluralizingTableNameConvention from the ModelBuilder Conventions.
- Remove requests to the MigrationHistory table.
using System.Data.Entity; using System.Data.Entity.Infrastructure; using System.Data.Entity.ModelConfiguration.Conventions; class HDFSContext : DbContext { public HDFSContext() { } protected override void OnModelCreating(DbModelBuilder modelBuilder) { // To remove the requests to the Migration History table Database.SetInitializer<HDFSContext>(null); // To remove the plural names modelBuilder.Conventions.Remove<PluralizingTableNameConvention>(); } }
- Create another .cs file and name it after the HDFS entity you are retrieving, for example, Files. In this file, define both the Entity and the Entity Configuration, which will resemble the example below:
using System.Data.Entity.ModelConfiguration; using System.ComponentModel.DataAnnotations.Schema; [System.ComponentModel.DataAnnotations.Schema.Table("Files")] public class Files { [System.ComponentModel.DataAnnotations.Key] public System.String FileId { get; set; } public System.String ChildrenNum { get; set; } }
- Now that you have created an entity, add the entity to your context class:
public DbSet<Files> Files { set; get; }
- With the context and entity finished, you are now ready to query the data in a separate class. For example:
HDFSContext context = new HDFSContext(); context.Configuration.UseDatabaseNullSemantics = true; var query = from line in context.Files select line;