Applies to: SQL Server (all supported versions) Azure SQL Database
The nodes() method is useful when you want to shred an xml data type instance into relational data. It allows you to identify nodes that will be mapped into a new row.
Every xml data type instance has an implicitly provided context node. For the XML instance stored in a column or variable, this node is the document node. The document node is the implicit node at the top of every xml data type instance.
Vocal nodules can happen to anyone. They’re most often caused by overuse or straining your vocal cords. We take a look at how vocal nodules might affect your voice, speaking, and singing,. NODES innovative approach takes us across Europe. You will find the latest updates here.
The result of the nodes() method is a rowset that contains logical copies of the original XML instances. In these logical copies, the context node of every row instance is set to one of the nodes that is identified with the query expression. This way, later queries can navigate relative to these context nodes.
You can retrieve multiple values from the rowset. For example, you can apply the value() method to the rowset returned by nodes() and retrieve multiple values from the original XML instance. The value() method, when applied to the XML instance, returns only one value.
Note
To view Transact-SQL syntax for SQL Server 2014 and earlier, see Previous versions documentation.
XQuery
Is a string literal, an XQuery expression. If the query expression constructs nodes, these constructed nodes are exposed in the resulting rowset. If the query expression results in an empty sequence, the rowset is empty as well. If the query expression statically results in a sequence that contains atomic values instead of nodes, a static error is raised.
Table(Column)
Is the table name and the column name for the resulting rowset.
As an example, assume that you have the following table:
The following manufacturing instructions document is stored in the table. Only a fragment is shown. Notice that there are three manufacturing locations in the document.
A nodes()
method invocation with the query expression /root/Location
would return a rowset with three rows, each containing a logical copy of the original XML document, and with the context item set to one of the <Location>
nodes:
You can then query this rowset by using xml data type methods. The following query extracts the subtree of the context item for each generated row:
Here is the result:
The rowset returned has maintained the type information. You can apply xml data type methods, such as query(), value(), exist(), and nodes(), to the result of a nodes() method. However, you can't apply the modify() method to modify the XML instance.
Also, the context node in the rowset can't be materialized. That is, you can't use it in a SELECT statement. However, you can use it in IS NULL and COUNT(*).
Scenarios for using the nodes() method are the same as for using OPENXML (Transact-SQL), which provides a rowset view of the XML. However, you don't have to use cursors when you use the nodes() method on a table that contains several rows of XML documents.
The rowset returned by the nodes() method is an unnamed rowset. So, it must be explicitly named by using aliasing.
The nodes() function can't be applied directly to the results of a user-defined function. To use the nodes() function with the result of a scalar user-defined function, you can either:
CROSS APPLY
to select from the alias.The following example shows one way to use CROSS APPLY
to select from the result of a user-defined function.
In the following example, there's an XML document that has a <Root
> top-level element and three <row
> child elements. The query uses the nodes()
method to set separate context nodes, one for each <row
> element. The nodes()
method returns a rowset with three rows. Each row has a logical copy of the original XML, with each context node identifying a different <row
> element in the original document.
The query then returns the context node from each row:
In the following example result, the query method returns the context item and its content:
Applying the parent accessor on the context nodes returns the <Root
> element for all three:
Here is the result:
The bicycle manufacturing instructions are used in this example and are stored in the Instructions xml type column of the ProductModel table.
In the following example, the nodes()
method is specified against the Instructions
column of xml type in the ProductModel
table.
The nodes()
method sets the <Location
> elements as context nodes by specifying the /MI:root/MI:Location
path. The resulting rowset includes logical copies of the original document, one for each <Location
> node in the document, with the context node set to the <Location
> element. As a result, the nodes()
function gives a set of <Location
> context nodes.
The query()
method against this rowset requests self::node
and returns the <Location>
element in each row.
In this example, the query sets each <Location
> element as a context node in the manufacturing instructions document of a specific product model. You can use these context nodes to retrieve values such as these:
Find Location IDs in each <Location
>
Retrieve manufacturing steps (<step
> child elements) in each <Location
>
This query returns the context item, in which the abbreviated syntax '.'
for self::node()
is specified, in the query()
method.
Note the following:
The nodes()
method is applied to the Instructions column and returns a rowset, T (C)
. This rowset contains logical copies of the original manufacturing instructions document with /root/Location
as the context item.
CROSS APPLY applies nodes()
to each row in the Instructions
table and returns only the rows that produce a result set.
Here is the partial result:
The following code queries the XML documents for the manufacturing instructions in the Instructions
column of ProductModel
table. The query returns a rowset that contains the product model ID, manufacturing locations, and manufacturing steps.
Note the following:
The nodes()
method is applied to the Instructions
column and returns the T1 (Locations)
rowset. This rowset contains logical copies of the original manufacturing instructions document, with /root/Location
element as the item context.
nodes()
is applied to the T1 (Locations)
rowset and returns the T2 (steps)
rowset. This rowset contains logical copies of the original manufacturing instructions document, with /root/Location/step
element as the item context.
Here is the result:
The query declares the MI
prefix two times. Instead, you can use WITH XMLNAMESPACES
to declare the prefix one time and use it in the query:
Add Namespaces to Queries with WITH XMLNAMESPACES
Create Instances of XML Data
xml Data Type Methods
In an Azure Batch workflow, a compute node (or node) is a virtual machine that processes a portion of your application's workload. A pool is a collection of these nodes for your application to runs on. This article explains more about nodes and pools, along with considerations when creating and using them in an Azure Batch workflow.
A node is an Azure virtual machine (VM) or cloud service VM that is dedicated to processing a portion of your application's workload. The size of a node determines the number of CPU cores, memory capacity, and local file system size that is allocated to the node.
You can create pools of Windows or Linux nodes by using Azure Cloud Services, images from the Azure Virtual Machines Marketplace, or custom images that you prepare.
Nodes can run any executable or script that is supported by the operating system environment of the node. Executables or scripts include *.exe, *.cmd, *.bat, and PowerShell scripts (for Windows) and binaries, shell, and Python scripts (for Linux).
All compute nodes in Batch also include:
By default, nodes can communicate with each other, but they can't communicate with virtual machines that are not part of the same pool. To allow nodes to communicate securely with other virtual machines, or with an on-premises network, you can provision the pool in a subnet of an Azure virtual network (VNet). When you do so, your nodes can be accessed through public IP addresses. These public IP addresses are created by Batch and may change over the lifetime of the pool. You can also create a pool with static public IP addresses that you control, which ensures that they won't change unexpectedly.
A pool is the collection of nodes that your application runs on.
Azure Batch pools build on top of the core Azure compute platform. They provide large-scale allocation, application installation, data distribution, health monitoring, and flexible adjustment (scaling) of the number of compute nodes within a pool.
Every node that is added to a pool is assigned a unique name and IP address. When a node is removed from a pool, any changes that are made to the operating system or files are lost, and its name and IP address are released for future use. When a node leaves a pool, its lifetime is over.
A pool can be used only by the Batch account in which it was created. A Batch account can create multiple pools to meet the resource requirements of the applications it will run.
The pool can be created manually, or automatically by the Batch service when you specify the work to be done. When you create a pool, you can specify the following attributes:
Important
Batch accounts have a default quota that limits the number of cores in a Batch account. The number of cores corresponds to the number of compute nodes. You can find the default quotas and instructions on how to increase a quota in Quotas and limits for the Azure Batch service. If your pool is not achieving its target number of nodes, the core quota might be the reason.
When you create a Batch pool, you specify the Azure virtual machine configuration and the type of operating system you want to run on each compute node in the pool.
There are two types of pool configurations available in Batch.
Important
While you can currently create pools using either configuration, new pools should be configured using Virtual Machine Configuration and not Cloud Services Configuration. All current and new Batch features will be supported by Virtual Machine Configuration pools. Cloud Services Configuration pools do not support all features and no new capabilities are planned. You won't be able to create new 'CloudServiceConfiguration' pools or add new nodes to existing pools after February 29, 2024.
The Virtual Machine Configuration specifies that the pool is composed of Azure virtual machines. These VMs may be created from either Linux or Windows images.
The Batch node agent is a program that runs on each node in the pool and provides the command-and-control interface between the node and the Batch service. There are different implementations of the node agent, known as SKUs, for different operating systems. When you create a pool based on the Virtual Machine Configuration, you must specify not only the size of the nodes and the source of the images used to create them, but also the virtual machine image reference and the Batch node agent SKU to be installed on the nodes. For more information about specifying these pool properties, see Provision Linux compute nodes in Azure Batch pools. You can optionally attach one or more empty data disks to pool VMs created from Marketplace images, or include data disks in custom images used to create the VMs. When including data disks, you need to mount and format the disks from within a VM to use them.
Warning
Cloud Services Configuration pools are deprecated. Please use Virtual Machine Configuration pools instead. For more information, see Migrate Batch pool configuration from Cloud Services to Virtual Machine.
The Cloud Services Configuration specifies that the pool is composed of Azure Cloud Services nodes. Cloud Services provides only Windows compute nodes.
Available operating systems for Cloud Services Configuration pools are listed in the Azure Guest OS releases and SDK compatibility matrix, and available compute node sizes are listed in Sizes for Cloud Services. When you create a pool that contains Cloud Services nodes, you specify the node size and its OS Family (which determines which versions of .NET are installed with the OS). Cloud Services is deployed to Azure more quickly than virtual machines running Windows. If you want pools of Windows compute nodes, you may find that Cloud Services provide a performance benefit in terms of deployment time.
As with worker roles within Cloud Services, you can specify an OS Version. We recommend that you specify Latest (*)
for the OS Version so that the nodes are automatically upgraded, and there is no work required to cater to newly released versions. The primary use case for selecting a specific OS version is to ensure application compatibility, which allows backward compatibility testing to be performed before allowing the version to be updated. After validation, the OS Version for the pool can be updated and the new OS image can be installed. Any running tasks will be interrupted and requeued.
When you create a pool, you need to select the appropriate nodeAgentSkuId, depending on the OS of the base image of your VHD. You can get a mapping of available node agent SKU IDs to their OS Image references by calling the List Supported Node Agent SKUs operation.
To learn how to create a pool with custom images, see Use the Shared Image Gallery to create a custom pool.
Alternatively, you can create a custom pool of virtual machines using a managed image resource. For information about preparing custom Linux images from Azure VMs, see How to create an image of a virtual machine or VHD. For information about preparing custom Windows images from Azure VMs, see Create a managed image of a generalized VM in Azure.
When creating a Virtual Machine Configuration pool using the Batch APIs, you can set up the pool to run tasks in Docker containers. Currently, you must create the pool using an image that supports Docker containers. Use the Windows Server 2016 Datacenter with Containers image from the Azure Marketplace, or supply a custom VM image that includes Docker Community Edition or Enterprise Edition and any required drivers. The pool settings must include a container configuration that copies container images to the VMs when the pool is created. Tasks that run on the pool can then reference the container images and container run options.
For more information, see Run Docker container applications on Azure Batch.
When you create a pool, you can specify which types of nodes you want and the target number for each. The two types of nodes are:
Low-priority nodes may be preempted when Azure has insufficient surplus capacity. If a node is preempted while running tasks, the tasks are requeued and run again once a compute node becomes available again. Low-priority nodes are a good option for workloads where the job completion time is flexible and the work is distributed across many nodes. Before you decide to use low-priority nodes for your scenario, make sure that any work lost due to preemption will be minimal and easy to recreate.
You can have both low-priority and dedicated compute nodes in the same pool. Each type of node has its own target setting, for which you can specify the desired number of nodes.
The number of compute nodes is referred to as a target because, in some situations, your pool might not reach the desired number of nodes. For example, a pool might not achieve the target if it reaches the core quota for your Batch account first. Or, the pool might not achieve the target if you have applied an automatic scaling formula to the pool that limits the maximum number of nodes.
For pricing information for both low-priority and dedicated nodes, see Batch Pricing.
When you create an Azure Batch pool, you can choose from among almost all the VM families and sizes available in Azure. Azure offers a range of VM sizes for different workloads, including specialized HPC or GPU-enabled VM sizes. Note that node sizes can only be chosen at the time a pool is created. In other words, once a pool is created, its node size cannot be changed.
For more information, see Choose a VM size for compute nodes in an Azure Batch pool.
For dynamic workloads, you can apply an automatic scaling policy to a pool. The Batch service will periodically evaluate your formula and dynamically adjusts the number of nodes within the pool according to the current workload and resource usage of your compute scenario. This allows you to lower the overall cost of running your application by using only the resources you need, and releasing those you don't need.
You enable automatic scaling by writing an automatic scaling formula and associating that formula with a pool. The Batch service uses the formula to determine the target number of nodes in the pool for the next scaling interval (an interval that you can configure). You can specify the automatic scaling settings for a pool when you create it, or enable scaling on a pool later. You can also update the scaling settings on a scaling-enabled pool.
As an example, perhaps a job requires that you submit a large number of tasks to be executed. You can assign a scaling formula to the pool that adjusts the number of nodes in the pool based on the current number of queued tasks and the completion rate of the tasks in the job. The Batch service periodically evaluates the formula and resizes the pool, based on workload and your other formula settings. The service adds nodes as needed when there are a large number of queued tasks, and removes nodes when there are no queued or running tasks.
A scaling formula can be based on the following metrics:
When automatic scaling decreases the number of compute nodes in a pool, you must consider how to handle tasks that are running at the time of the decrease operation. To accommodate this, Batch provides a node deallocation option that you can include in your formulas. For example, you can specify that running tasks are stopped immediately and then requeued for execution on another node, or allowed to finish before the node is removed from the pool. Note that setting the node deallocation option as taskcompletion
or retaineddata
will prevent pool resize operations until all tasks have completed, or all task retention periods have expired, respectively.
For more information about automatically scaling an application, see Automatically scale compute nodes in an Azure Batch pool.
Tip
To maximize compute resource utilization, set the target number of nodes to zero at the end of a job, but allow running tasks to finish.
The max tasks per node configuration option determines the maximum number of tasks that can be run in parallel on each compute node within the pool.
The default configuration specifies that one task at a time runs on a node, but there are scenarios where it is beneficial to have two or more tasks executed on a node simultaneously. See the example scenario in the concurrent node tasks article to see how you can benefit from multiple tasks per node.
You can also specify a fill type, which determines whether Batch spreads the tasks evenly across all nodes in a pool, or packs each node with the maximum number of tasks before assigning tasks to another node.
In most scenarios, tasks operate independently and do not need to communicate with one another. However, there are some applications in which tasks must communicate, like MPI scenarios.
You can configure a pool to allow internode communication so that nodes within a pool can communicate at runtime. When internode communication is enabled, nodes in Cloud Services Configuration pools can communicate with each other on ports greater than 1100, and Virtual Machine Configuration pools do not restrict traffic on any port.
Enabling internode communication also impacts the placement of the nodes within clusters and might limit the maximum number of nodes in a pool because of deployment restrictions. If your application does not require communication between nodes, the Batch service can allocate a potentially large number of nodes to the pool from many different clusters and data centers to enable increased parallel processing power.
If desired, you can add a start task that will executes on each node as that node joins the pool, and each time a node is restarted or reimaged. The start task is especially useful for preparing compute nodes for the execution of tasks, like installing the applications that your tasks run on the compute nodes.
You can specify application packages to deploy to the compute nodes in the pool. Application packages provide simplified deployment and versioning of the applications that your tasks run. Application packages that you specify for a pool are installed on every node that joins that pool, and every time a node is rebooted or reimaged.
For more information about using application packages to deploy your applications to your Batch nodes, see Deploy applications to compute nodes with Batch application packages.
When you provision a pool of compute nodes in Batch, you can associate the pool with a subnet of an Azure virtual network (VNet). To use an Azure VNet, the Batch client API must use Azure Active Directory (AD) authentication. Azure Batch support for Azure AD is documented in Authenticate Batch service solutions with Active Directory.
The VNet must be in the same subscription and region as the Batch account you use to create your pool.
The subnet specified for the pool must have enough unassigned IP addresses to accommodate the number of VMs targeted for the pool; that is, the sum of the targetDedicatedNodes
and targetLowPriorityNodes
properties of the pool. If the subnet doesn't have enough unassigned IP addresses, the pool partially allocates the compute nodes, and a resize error occurs.
Your Azure Storage endpoint needs to be resolved by any custom DNS servers that serve your VNet. Specifically, URLs of the form <account>.table.core.windows.net
, <account>.queue.core.windows.net
, and <account>.blob.core.windows.net
should be resolvable.
Multiple pools can be created in the same VNet or in the same subnet (as long as it has sufficient address space). A single pool can't exist across multiple VNets or subnets.
Additional VNet requirements differ, depending on whether the Batch pool is in the Virtual Machine configuration or the Cloud Services configuration. For new pool deployments into a VNet, the Virtual Machine configuration is recommended.
Supported VNets - Azure Resource Manager-based VNets only
Subnet ID - When specifying the subnet using the Batch APIs, use the resource identifier of the subnet. The subnet identifier is of the form:
/subscriptions/{subscription}/resourceGroups/{group}/providers/Microsoft.Network/virtualNetworks/{network}/subnets/{subnet}
Permissions - Check whether your security policies or locks on the VNet's subscription or resource group restrict a user's permissions to manage the VNet.
Additional networking resources - Batch automatically creates additional networking resources in the resource group containing the VNet.
Important
For each 100 dedicated or low-priority nodes, Batch creates: one network security group (NSG), one public IP address, and one load balancer. These resources are limited by the subscription's resource quotas. For large pools, you might need to request a quota increase for one or more of these resources.
The subnet must allow inbound communication from the Batch service to be able to schedule tasks on the compute nodes, and outbound communication to communicate with Azure Storage or other resources as needed by your workload. For pools in the Virtual Machine configuration, Batch adds NSGs at the network interfaces (NICs) level attached to compute nodes. These NSGs are configured with the following additional rules:
BatchNodeManagement
service tag.Important
Use caution if you modify or add inbound or outbound rules in Batch-configured NSGs. If communication to the compute nodes in the specified subnet is denied by an NSG, the Batch service will set the state of the compute nodes to unusable. Additionally, no resource locks should be applied to any resource created by Batch, since this can prevent cleanup of resources as a result of user-initiated actions such as deleting a pool.
You don't have to specify NSGs at the virtual network subnet level, because Batch configures its own NSGs (see above). If you have an NSG associated with the subnet where Batch compute nodes are deployed, or if you would like to apply custom NSG rules to override the defaults applied, you must configure this NSG with at least the inbound and outbound security rules shown in the following tables.
Configure inbound traffic on port 3389 (Windows) or 22 (Linux) only if you need to permit remote access to the compute nodes from outside sources. You may need to enable port 22 rules on Linux if you require support for multi-instance tasks with certain MPI runtimes. Allowing traffic on these ports is not strictly required for the pool compute nodes to be usable.
Warning
Batch service IP addresses can change over time. Therefore, we highly recommend that you use the BatchNodeManagement
service tag (or a regional variant) for the NSG rules indicated in the following tables. Avoid populating NSG rules with specific Batch service IP addresses.
Inbound security rules
Source IP addresses | Source service tag | Source ports | Destination | Destination ports | Protocol | Action |
---|---|---|---|---|---|---|
N/A | BatchNodeManagement service tag (if using a regional variant, in the same region as your Batch account) | * | Any | 29876-29877 | TCP | Allow |
User source IPs for remotely accessing compute nodes and/or compute node subnet for Linux multi-instance tasks, if required. | N/A | * | Any | 3389 (Windows), 22 (Linux) | TCP | Allow |
Outbound security rules
Source | Source ports | Destination | Destination service tag | Destination ports | Protocol | Action |
---|---|---|---|---|---|---|
Any | * | Service tag | Storage (if using regional variant, in the same region as your Batch account) | 443 | TCP | Allow |
Any | * | Service tag | BatchNodeManagement (if using regional variant, in the same region as your Batch account) | 443 | TCP | Allow |
Outbound to BatchNodeManagement
is required for contacting the Batch service from the compute nodes such as for Job Manager tasks.
Warning
Cloud Service Configuration Pools are deprecated. Please use Virtual Machine Configuration Pools instead.
Supported VNets - Classic VNets only
Subnet ID - When specifying the subnet using the Batch APIs, use the resource identifier of the subnet. The subnet identifier is of the form:
/subscriptions/{subscription}/resourceGroups/{group}/providers/Microsoft.ClassicNetwork /virtualNetworks/{network}/subnets/{subnet}
Permissions - The Microsoft Azure Batch
service principal must have the Classic Virtual Machine Contributor
Azure role for the specified VNet.
The subnet must allow inbound communication from the Batch service to be able to schedule tasks on the compute nodes, and outbound communication to communicate with Azure Storage or other resources.
You do not need to specify an NSG, because Batch configures inbound communication only from Batch IP addresses to the pool nodes. However, If the specified subnet has associated NSGs and/or a firewall, configure the inbound and outbound security rules as shown in the following tables. If communication to the compute nodes in the specified subnet is denied by an NSG, the Batch service sets the state of the compute nodes to unusable.
Configure inbound traffic on port 3389 for Windows if you need to permit RDP access to the pool nodes. This is not required for the pool nodes to be usable.
Inbound security rules
Source IP addresses | Source ports | Destination | Destination ports | Protocol | Action |
---|---|---|---|---|---|
Any Although this requires effectively 'allow all', the Batch service applies an ACL rule at the level of each node that filters out all non-Batch service IP addresses. | * | Any | 10100, 20100, 30100 | TCP | Allow |
Optional, to allow RDP access to compute nodes. | * | Any | 3389 | TCP | Allow |
Outbound security rules
Source | Source ports | Destination | Destination ports | Protocol | Action |
---|---|---|---|---|---|
Any | * | Any | 443 | Any | Allow |
For more information about setting up a Batch pool in a VNet, see Create a pool of virtual machines with your virtual network.
Tip
To ensure that the public IP addresses used to access nodes don't change, you can create a pool with specified public IP addresses that you control.
When you design your Azure Batch solution, you must specify how and when pools are created, and how long compute nodes within those pools are kept available.
On one end of the spectrum, you can create a pool for each job that you submit, and delete the pool as soon as its tasks finish execution. This maximizes utilization because the nodes are only allocated when needed, and they are shut down once they're idle. While this means that the job must wait for the nodes to be allocated, it's important to note that tasks are scheduled for execution as soon as nodes are individually allocated and the start task has completed. Batch does not wait until all nodes within a pool are available before assigning tasks to the nodes. This ensures maximum utilization of all available nodes.
At the other end of the spectrum, if having jobs start immediately is the highest priority, you can create a pool ahead of time and make its nodes available before jobs are submitted. In this scenario, tasks can start immediately, but nodes might sit idle while waiting for them to be assigned.
A combined approach is typically used for handling a variable but ongoing load. You can have a pool in which multiple jobs are submitted, and can scale the number of nodes up or down according to the job load. You can do this reactively, based on current load, or proactively, if load can be predicted. For more information, see Automatic scaling policy.
An autopool is a pool that is created by the Batch service when a job is submitted, rather than being created prior to the jobs that will run in the pool. The Batch service will manage the lifetime of an autopool according to the characteristics that you specify. Most often, these pools are also set to delete automatically after their jobs have completed.
You typically need to use certificates when you encrypt or decrypt sensitive information for tasks, like the key for an Azure Storage account. To support this, you can install certificates on nodes. Encrypted secrets are passed to tasks via command-line parameters or embedded in one of the task resources, and the installed certificates can be used to decrypt them.
You use the Add certificate operation (Batch REST) or CertificateOperations.CreateCertificate method (Batch .NET) to add a certificate to a Batch account. You can then associate the certificate with a new or existing pool.
When a certificate is associated with a pool, the Batch service installs the certificate on each node in the pool. The Batch service installs the appropriate certificates when the node starts up, before launching any tasks (including the start task and job manager task).
If you add a certificate to an existing pool, you must reboot its compute nodes in order for the certificate to be applied to the nodes.