How to Run SQL Queries Across Servers and Access Multiple Databases
Organizations often maintain data across various servers and databases, each serving distinct functions or located in different geographical locations. Accessing and analyzing data stored in these distributed databases is crucial for decision-making, data analysis, and reporting. This article explores various methods for executing SQL across multiple servers, providing step-by-step instructions for each. Through a deeper understanding of SQL query execution across servers, you can optimize data management practices and unlock valuable insights from your distributed database environments.
5 methods for running SQL queries across servers
There are various tools and methods available for executing SQL queries across different servers, each with its own advantages and use cases. This section explores some of the most prominent approaches to multi-server query execution.
1. Using linked servers
Linked servers are a feature in database management systems (DBMS) that allow connections between different database servers. These servers can be of the same type (e.g., two SQL Server instances) or different types (e.g., SQL Server and Oracle).
By setting up linked servers, you can execute queries that reference tables and data residing on remote servers as if they were local objects. This capability enables distributed querying, where data from various sources can be combined and analyzed within a single query, simplifying data integration and reporting processes. Linked servers provide a convenient way to access and manipulate data distributed across a network, eliminating the need to manually transfer data between servers.
Setting up a linked server
Creating and managing linked servers in SQL Server involves several steps. Below is a step-by-step guide:
- Open SQL Server Management Studio (SSMS):
- Launch SSMS and connect to the SQL Server instance where you want to create the linked server.
- Navigate to "Server Objects":
- Expand the server in Object Explorer, then expand "Server Objects" to reveal the "Linked Servers" node.
- Right-click on "Linked Servers" and select "New Linked Server...":
- This will open the "New Linked Server" dialog box.
- Provide linked server details:
- In the "General" tab of the "New Linked Server" dialog box, enter the following details:
- Linked server: Name for the linked server.
- Server type: Select the type of the linked server (e.g., SQL Server, Oracle, etc.).
- Provider: Choose the appropriate OLE DB provider for the linked server type.
- Data source: The network name or IP address of the linked server.
- Catalog: (Optional) The default database to connect to on the linked server.
- Configure security options:
- In the "Security" tab, specify how SQL Server will connect to the linked server.
- Choose the appropriate security context:
- Be made using the login's current security context: Use the credentials of the currently logged-in user.
- Be made using this security context: Specify a remote login and password for connecting to the linked server.
- Set up server options:
- In the "Server Options" tab, configure additional options such as RPC (Remote Procedure Call) and RPC Out settings based on your requirements.
- Test the connection:
- Click on the "Test Connection" button to ensure that SQL Server can establish a connection to the linked server successfully.
- Click "OK" to create the linked server:
- Once you have configured all the necessary settings, click "OK" to create the linked server. It will now appear under "Linked Servers" in Object Explorer.
- Manage linked server properties:
- You can right-click on the newly created linked server and select "Properties" to modify its settings, including security, server options, and provider-specific properties.
- Access objects on the linked server:
- Once the linked server is created, you can reference its objects (tables, views, stored procedures, etc.) in SQL queries using four-part naming convention:
[linked_server_name].[database_name].[schema_name].[object_name].
With the linked server configured, you can execute joins such as:
SELECT * FROM [LinkedServerName].[RemoteDatabaseName].[dbo].[RemoteTable] AS R JOIN [LocalTableName] AS L ON R.CommonColumn = L.CommonColumn;
If you would like to set up linked servers to hundreds of data sources beyond those natively supported by SQL Server, see our article on creating linked servers to external data sources with Connect Cloud.
2. Using PolyBase
PolyBase is a powerful technology integrated into SQL Server that enables querying and analyzing data stored in external sources seamlessly. PolyBase supports querying external SQL Server, Teradata, Oracle, and Hadoop instances, among others, as if they were a local SQL database, meaning you can execute joins between local and external databases. Since PolyBase can communicate directly with external databases, the data in the external databases stays in its original format and location.
When a query is submitted to SQL Server involving external data sources, PolyBase's query optimizer analyzes the query and generates an optimized execution plan. It then moves only the necessary data between external sources and SQL Server, minimizing network overhead and optimizing data transfer. Once data is accessed from external sources, PolyBase processes the query using a combination of SQL Server's relational engine and distributed query processing capabilities.
PolyBase's versatility and scalability make it a valuable tool for organizations seeking to unlock insights from perse data sources while leveraging the familiar SQL Server environment.
Setting up PolyBase and executing a query
- Verify PolyBase installation: Ensure that the PolyBase feature is installed and configured in your SQL Server instance. You can check this during SQL Server installation or by modifying the feature set through SQL Server Configuration Manager.
- Configure external data sources: Use SQL Server Management Studio (SSMS) or Transact-SQL (T-SQL) to create external data sources pointing to the external systems you want to query, such as Hadoop or Azure Blob storage. For example:
CREATE EXTERNAL DATA SOURCE MyHadoopCluster
WITH (LOCATION = 'hdfs://', CREDENTIAL = HadoopCredentials);
- Define external tables: Create external tables in SQL Server that map to the data in the external sources. This step defines the schema and metadata for the external data. For example:
CREATE EXTERNAL TABLE dbo.MyExternalTable
(
Column1 INT,
Column2 VARCHAR(50)
)
WITH
(
LOCATION = '/path/to/external/data',
DATA_SOURCE = MyHadoopCluster
);
- Set up credentials: If accessing secured external data sources, create and configure credentials to authenticate with those sources. For example:
CREATE DATABASE SCOPED CREDENTIAL HadoopCredentials
WITH IDENTITY = 'username',
SECRET = 'password';
- Write the query: Compose a T-SQL query that references the external tables defined in SQL Server along with any native SQL Server tables you wish to query. For example:
SELECT * FROM dbo.NativeTable nt
INNER JOIN dbo.MyExternalTable et ON nt.CommonColumn = et.CommonColumn;
- Execute the query: Use SSMS or any SQL client to execute the PolyBase query against SQL Server. Ensure that the SQL Server instance has sufficient permissions to access the external data sources and execute the query.
3. Using OPENQUERY
OPENQUERY is a SQL Server query function used to execute a pass-through query on a linked server. It allows you to send a query to a remote database server and retrieve the results directly into the local SQL Server instance. OPENQUERY is particularly useful when you need to execute a query that includes functions or features not supported by the local server but are available on the linked server.
First, define a linked server to the remote SQL Server instance:
EXEC sp_addlinkedserver 'RemoteServer', 'SQL Server';
Here, “RemoteServer” is the name of the linked server to create and “SQL Server” specifies the type of server being linked.
Next, execute a SELECT query on the linked server using OPENQUERY:
SELECT * FROM OPENQUERY(RemoteServer, 'SELECT * FROM RemoteDatabase.dbo.RemoteTable');
4. Using OPENROWSET
OPENROWSET is a function in SQL Server that allows you to execute ad hoc queries against OLE DB data sources. It enables you to access and query data from external sources without the need to create a linked server. The key difference between OPENQUERY and OPENROWSET is that OPENROWSET retrieves data from the remote server before executing the query locally, while OPENQUERY lets the remote server execute the query.
The syntax for OPENROWSET is as follows:
SELECT * FROM OPENROWSET('SQLNCLI', 'Server=RemoteServer;Database=RemoteDatabase;Trusted_Connection=yes;', 'SELECT * FROM dbo.RemoteTable')
In this example, “SQLNCLI” is the OLE DB provider to use, “RemoteServer” is the remote SQL Server instance you are targeting, and “RemoteDatabase” is the remote database you are targeting.
5. Using SQL Server replication
SQL Server replication is a data distribution and synchronization technology that allows you to replicate data across multiple SQL Server instances. It can be used to query data across multiple servers by replicating data from one server (the publisher) to one or more destination servers (subscribers). Once the data is replicated, you can query it locally on each subscriber server, enabling distributed querying.
You can leverage automated replications so that changes made to the data on the publisher are propagated to the subscriber servers in near real-time.
Setting up SQL Server replication
To use SQL Server replication to query data across servers. Do the following (exact steps will vary based on the replication tool used):
- Set Up Replication: Configure SQL Server replication by defining publications on the publisher server and subscriptions on the subscriber servers.
- Replicate Data: Replicate the desired tables or databases from the publisher to the subscriber servers.
- Query Replicated Data: Once replication is set up and synchronized, you can query the replicated data locally on each subscriber server using standard SQL queries. This allows you to perform distributed queries across multiple servers without the need for linked servers or other complex configurations.
CData Connect Cloud: SQL Server virtualization for every cloud tool
Data virtualization creates a logical data layer, providing a single, unified hub where you can access data from several data sources at once. This allows you to query your consolidated data from a single interface, including queries that combine data from multiple data sources.
CData Connect Cloud leverages data virtualization to establish connections to hundreds of data sources and integrate them with an extensive list of data applications (including those for analytics, business intelligence, data pipelines, and more).
Try CData Connect Cloud
Get a 30-day free trial of CData Connect Cloud to start upleveling your data management strategy today!
Get started