The Black Box of .NET Headline Animator

January 12, 2025

What is LINQ to Xml Atomization?

Atomization is the process that the LINQ to Xml runtime uses to store and reuse string instances. Atomization means that if two XName objects have the same local name, and they're in the same namespace, they share the same instance. Likewise, if two XNamespace objects have the same namespace URI, they share the same instance. This optimizes memory usage and improves performance when comparing for equality of:
  • One XName instance to a different XName instance
  • One XNamespace instance to a different XNamespace instance
because the underlying intermediate language only has to compare references instead of string comparisons which would take longer. Being able to take advantage of this requires that both XName and XNamespace implement the equality and inequality operators - which of course they do. This concept is particularly relevant when working with XElement and XAttribute instances, where the same names might appear multiple times. XName and XNamespace also implement the Implicit Operator that converts strings to XName or XNamespace instances. This allows for automatically passing atomized instances as parameters to LINQ to Xml methods which then have better performance because of atomization. For example, this code implicitly passes an XName (with a value of "x") to the Descendants method:
var root = new XElement("root",
  new XElement("x", "1"),
  new XElement("y",
    new XElement("x", "1"),
    new XElement("x", "3")
  )
);

foreach (var e in root.Descendants("x").Where(e => e.Value == "1"))
{
  ...
}
Atomization is similar to string interning in .NET, where identical string literals are stored only once in memory. When LINQ to Xml processes XML data, it often encounters repeated element and attribute names. Instead of creating a new string object for each occurrence, it reuses existing string instances. It does this by caching the Name property in XName and XNamespace in an internal static XHashtable<WeakReference<XNamespace>>. You can find the source code here:

Practical Implications

So how does atomization affect my application?
  1. Element and Attribute Names: When you create elements or attributes using LINQ to Xml, the names are atomized. For example, if you create multiple elements with the same name, LINQ to Xml will store the name only once.
  2. Namespace Handling: Atomization also applies to XML namespaces. When you define namespaces in your XML, LINQ to XML ensures that each unique namespace URI is stored only once.
  3. Value Atomization: While element and attribute names are atomized, the values are not automatically atomized. However, if you're feeling adventurous and you frequently use the same values, you might consider implementing your own caching mechanism to achieve similar benefits. Now before you go off and write your own caching mechanism to cache values, consider that the .NET team has done a lot of work ensuring the caching of names is both thread-safe and performant. I've never found that I needed to do this though the largest xml documents I've had to work with are in the tens of MB's. If you're using much larger xml documents of hundreds of MB's or even 1 GB+ in size, then you may find this worthwhile.

Pre-Atomization

You might be thinking "..this is great, I can't do anything to improve performance here!". But there is. Unfortunately, even though you effectively can pass atomized instances to LINQ to Xml methods, there is a small cost. This is because the Implicit Operator has to be invoked. You can refer to the Atomization Benchmarks at the end of this post to get the details on some benchmarking I did. In a nutshell, the results show that pre-atomizing is just over 2x faster. That being said, we are talking about a couple hundred nanoseconds in the context of my test for an element with 3 child alements, each having an attribute and string content. The benefit of pre-atomization becomes much more evident with very large XML documents. Here is the xml used for test data in the benchmarks:
<aw:Root xmlns:aw="http://www.adventure-works.com">
  <aw:Data ID="1">4,100,000</aw:Data>
  <aw:Data ID="2">3,700,000</aw:Data>
  <aw:Data ID="3">1,150,000</aw:Data>
</aw:Root>
The full code for the benchmarks can be found at this gist: Linq to Xml - XName Atomization Benchmark.cs

Atomization and ReferenceEquals

Let's take a look at XName as an example. There are two ways to directly create XName instances:
  1. The XName.Get(String) or XName.Get(String, String) methods. See here
  2. The XNamespace.Addition(XNamespace, String) Operator method. See here
They are also created indirectly by LINQ to Xml when you create XDocument, XElement, and XAttribute instances. Here is some code to demonstrate that there is a single atomized instance referred to regardless of how the instance was created.
// Note that these are not explicitly declared as const
string x = "element";
string y = "element";
var stringsHaveSameReference = object.ReferenceEquals(x, y);
Console.WriteLine(
  $"string 'x' has same reference as string 'y' (expect true): {stringsHaveSameReference}");

// Create new XName instances (indirectly) thru XElement ctor
// using a string as the name
var xNameViaGet1 = XName.Get(localName, namespaceUri);
var xNameViaGet2 = XName.Get(localName, namespaceUri);

// Check if XName instances are the same
namesHaveSameReference = object.ReferenceEquals(xNameViaGet1, xNameViaGet2);
Console.WriteLine($"xNameViaGet1 is same reference as xNameViaGet2 (expect true): {namesHaveSameReference}");

// Create XName instances via XNamespace.Addition Operator
XNamespace ns = namespaceUri;
XName xNameViaNSAddition1 = ns + localName;
XName xNameViaNSAddition2 = ns + localName;

// Check if XName instances are the same
namesHaveSameReference = object.ReferenceEquals(xNameViaNSAddition1, xNameViaNSAddition2);
Console.WriteLine(
  $"xNameViaNSAddition1 is same reference as xNameViaNSAddition2 (expect true): {namesHaveSameReference}");

// Create XElement and XAttribute using XName instances
XElement ele = new XElement(xNameViaGet1, "value1");
XAttribute attr = new XAttribute(xNameViaGet2, "value2");

// Check if XName instances in XElement and XAttribute are the same
namesHaveSameReference = object.ReferenceEquals(ele.Name, attr.Name);
Console.WriteLine(
  $"ele.Name is same reference as attr.Name (expect true): {namesHaveSameReference}");

// Compare XName references that were created differently
namesHaveSameReference = object.ReferenceEquals(xNameViaGet1, xNameViaNSAddition1);
Console.WriteLine(
  $"xNameViaGet1 is same reference as xNameViaNSAddition1 (expect true): {namesHaveSameReference}");
namesHaveSameReference = object.ReferenceEquals(xNameViaGet1, ele.Name);
Console.WriteLine(
  $"xNameViaGet1 is same reference as ele.Name (expect true): {namesHaveSameReference}");

// Create 2 XElement instances with the same name of 'root' and same value
XElement eleViaCtor = new XElement("root", "value");
XElement eleViaParse = XElement.Parse("<root>value</root>");

// Note that the 2 XElements DO NOT have the same reference
bool xelementsHaveSameReference = object.ReferenceEquals(eleViaCtor, eleViaParse);
Console.WriteLine(
  $"eleViaCtor and eleViaParse refer to same instance: {xelementsHaveSameReference}");

// However, their respective XName properties DO have the same reference
namesHaveSameReference = object.ReferenceEquals(eleViaCtor.Name, eleViaParse.Name);
Console.WriteLine($"eleViaCtor.Name and eleViaParse.Name refer to same instance: {namesHaveSameReference}");

Atomization Benchmarks

These tests compared creating XElement and XAttribute instances by either passing a string or an XName instance to the constructors. Note that the memory allocations were identicial since they are all referring to the same XName instance.
Method                            | Mean     | Error   | StdDev  | Ratio | Gen0   | Allocated | Alloc Ratio |
----------------------------------|---------:|--------:|--------:|------:|-------:|----------:|------------:|
XNode_Construction_Passing_Strings| 355.9 ns | 3.69 ns | 3.45 ns |  1.00 | 0.0391 |     656 B |        1.00 |
XNode_Construction_Passing_XName  | 136.0 ns | 1.95 ns | 1.82 ns |  0.38 | 0.0391 |     656 B |        1.00 |
The full code for the benchmarks can be found at this gist: Linq to Xml - XName Atomization Benchmark.cs
Share

January 11, 2025

How to Control Namespace Prefixes with LINQ to Xml

LINQ to Xml is a great improvement over the XmlDocument DOM approach that uses types that are part of the System.Xml namespace. LINQ to Xml uses types that are part of the System.Xml.Linq namespace. In my experience, it shines for a large majority of the use cases for working with XML by offering a simpler method of accessing:
  • Elements (XElement)
  • Attributes (XAttribute)
  • Nodes (XNode)
  • Comments (XComment)
  • Text (XText)
  • declarations (XDeclaration)
  • Namespaces (XNamespace)
  • Processing Instructions (XProcessingInstruction)
However, this post isn't intended to be an intro to LINQ to Xml. You can read more about it here. The remaining use cases that need more complex and advanced XML processing are better handled by utilizing the XmlDocument DOM. This is because it has been around longer (since .NET Framework 1.1) whereas LINQ to Xml was added in .NET Framework 3.5. As a result the XmlDocument DOM is richer in features. I'd liken it to working with C++ instead of C# - you can do a lot more with it and it's been around a lot longer, but it is more cumbersome to use than C#. Now getting back to the matter at hand... When working with namespace prefixes using LINQ to Xml, it will handle the serialization of the prefixes for you. This is handled both by the XmlWriter as well as calling XDocument.ToString() or XElement.ToString(). This is helpful in most cases as it abstracts all of the details and just does it for you. It's not helpful when you need to control the prefixes that are serialized. Some examples of when you might need to control prefix serialization are:
  • comparing documents for semantic equivalence
  • normalizing an xml document
  • or canonicalizing an xml document

Even LINQ to Xml's XNode.DeepEquals() method doesn't consider elements or attributes with different prefixes across XElement or XDocument instances to be equivalent. How disappointing. 

Here is an example of how LINQ to Xml takes care of prefix serialization for you:
const string xml =  @"
<x:root xmlns:x='http://ns1' xmlns:y='http://ns2'>
    <child a='1' x:b='2' y:c='3'/>
</x:root>";

XElement original = XElement.Parse(xml);

// Output the XML
Console.WriteLine(original.ToString());
The output is:
<x:root xmlns:x="http://ns1" xmlns:y="http://ns2">
    <child a="1" x:b="2" y:c="3" />
</x:root>
Now create an XElement based on the original. This illustrates that the namespace prefixes specified in the original XElement are not maintained in the copy:
XElement copy =
  new XElement(original.Name,
    original.Elements()
      .Select(e => new XElement(e.Name, e.Attributes(), e.Elements(), e.Value)));

Console.WriteLine("\nNew XElement:");
Console.WriteLine(copy.ToString());
The output is:
New XElement:
<root xmlns="http://ns1">
  <child a="1" p2:b="2" p3:c="3" xmlns:p3="http://ns2" xmlns:p2="http://ns1" xmlns="" />
</root>

What happened to my prefixes of x and y? Where did the prefixes p2 and p3 come from? Also, note how there are some additional namespace declarations that aren't in the original XElement. We didn't touch anything related to namespaces when we created the copy; we just selected the elements and attributes from the original. You can't even use XNode.DeepEquals() to compare them for equivalence since the attributes are now different as well as the element names. I'll leave that as an exercise for the reader.

Controlling the prefixes yourself, although not very intuitive, is actually really easy. You might consider trying to change the Name property on the xmlns namespace declaration attribute to get the prefix rewritten. However, unlike the XElement.Name property which is writable, the XAttribute.Name property is readonly.

One caveat here: IF you happen to have superfluous (duplicate) namespace prefixes pointing to the same namespace, you are out of luck. I've written a lengthy description about this on Stack Overflow answering this question: c# linq xml how to use two prefixes for same namespace.

Would you believe me if I told you that LINQ is actually going to help you rewrite the prefixes? In fact, it doesn't just help you, it rewrites all of the prefixes for you! I bet you didn't see that coming! This is because modifications to the xml made at runtime cause LINQ to Xml to keep its tree updated so that it is in a consistent state. Imagine if you did made a programmatic change that caused the xml tree to be out of whack per se and the LINQ to Xml runtime didn't make updates for you to keep things in sync. Well, welcome to debugging hell.

So, all we have to do are a couple of really simple things and LINQ to Xml handles the rewriting for us. There are two approaches: you can either create a new element or modify the existing one. I've done some performance profiling and found that for larger documents above 1MB it is more efficient to create new elements and attributes rather than modifying existing ones. Some of this has to do with all of the allocations that are made when LINQ to Xml invokes the event handlers on the XObject class: see XObject Class. These are fired, thus allocating memory, regardless of whether you have event handlers wired to them or not. I'll show you both patterns so you can decide what works for you. The steps to rewrite vary depending on which approach you decide upon.

Let's use the following xml as an example with the assumption that we want to rewrite these prefixes for both namespaces that are declared and we want to rewrite prefix a to aNew and prefix b to bNew:
XNamespace origANamespace = "http://www.a.com";
XNamespace origBNamespace = "http://www.b.com";
const string originalAPrefix = "a";
const string originalBPrefix = "b";

const string xml = @"
<a:foo xmlns:a='http://www.a.com' xmlns:b='http://www.b.com'>
  <b:bar/>
  <b:bar/>
  <b:bar/>
  <a:bar b:att1='val'/>
</a:foo>
""";

Creating a new XElement

There are three steps.
Step 1: Create a new XNamespace and point it to the namespace whose prefix should be rewritten
// Define new namespaces
XNamespace newANamespace = origANamespace;
XNamespace newBNamespace = origBNamespace;
Step 2: Create new namespace declaration attributes that use the XNamespaces you created in Step 1
XAttribute newANamespaceXmlnsAttr =
  new XAttribute(XNamespace.Xmlns + "aNew", newANamespace);
XAttribute newBNamespaceXmlnsAttr =
  new XAttribute(XNamespace.Xmlns + "bNew", newBNamespace);
Step 3: Create a XElement that you will use this new namespace and contains all of the elements, attributes, and children of the original XElement
// Create a new XElement with the new namespace  
XElement newElement = 
  new XElement(newANamespace + originalElement.Name.LocalName,
    newANamespaceXmlnsAttr,
    newBNamespaceXmlnsAttr,
    originalElement
      .Elements()
      .Select(e => 
        new XElement(
          e.Name,
          e.Attributes(),
          e.Elements(),
          e.Value)));

Modifying an existing XElement

There are four steps with the last step being optional depending if you want to remove the old namespace prefix declaration. THe first two steps are identical to the creating a new XElement approach.
Step 1: Create a new XNamespace that has the new prefix you want and point it to the namespace whose prefix should be rewritten
// Define new namespaces
XNamespace newANamespace = origANamespace;
XNamespace newBNamespace = origBNamespace;
Step 2: Create new namespace declaration attributes that use the XNamespaces you created in Step 1
XAttribute newANamespaceXmlnsAttr =
  new XAttribute(XNamespace.Xmlns + "aNew", newANamespace);
XAttribute newBNamespaceXmlnsAttr =
  new XAttribute(XNamespace.Xmlns + "bNew", newBNamespace);
Step 3: Modify the original XElement by adding the new namespace declaration attributes
originalElement.Add(
  newANamespaceXmlnsAttr,
  newBNamespaceXmlnsAttr);
Step 4 (optional): Remove the original namespace declaration that has now been rewritten
// now remove the original 'a' and 'b' namespace declarations
originalElement
  .Attribute(XNamespace.Xmlns + originalAPrefix)?
  .Remove();
originalElement
  .Attribute(XNamespace.Xmlns + originalBPrefix)?
  .Remove();
I hope you found this helpful. Drop a comment and let me know.

Share

December 24, 2024

Memory Leaks - Part 2: Debugging Tools to Diagnose a Memory Leak

If you are reading this you may be in the unfortunate position of looking for an elusive and possibly intermittent problem in your production application. Debugging and diagnosing a memory leak or an OutOfMemoryException can be a daunting, intimidating and challenging task. Fortunately, there are a number of tools to help; some of these are "paid license" apps, but there are even more tools that are free. I'll discuss some .NET Debugging Tools available. I've used all of the following tools with the exception of SciTech .NET Memory Profiler. Each has its advantages and disadvantages. Personally, I prefer the free tools for several reasons:

  1. Getting approval for licensing and purchase of a 3rd-party tool is an uphill task and you rarely have time to wait for the approval process when it is a production issue.
  2. I find the feature set of a combination of the free tools gives you the largest "surface area" of features to help. Some are better at strict data collection, others have graphical views, some are static for post-mortem debugging, and others can do real-time analysis.
  3. Even though these are free, they are very robust and provide just as much data as the paid license tools. A good example of this is WinDbg which was developed in 1993 by Microsoft for in-house debugging of the Windows Kernel.

Free Tools

Advantages

  • Duh! It's free
  • There are lots of tools to choose from
  • All of the ones I've seen are from reputable and well-known companies.
  • You can often find blog posts (ahem...), articles, reddit threads, etc. that can provide some direction for getting started.

Disadvantages

  • Formal documentation can be lacking. Finding a blog post or article is great but comprehensive detail is often missing or at best glossed over. This can make getting started a bit more of a challenge if the tool is new to you.
  • Your company may have restrictions on using free tools for fear of malware, liability, or other reasons.

Licensed Tools

Hope this helps!


Share

May 14, 2019

Unifying .NET - One framework to rule them all?

.NET 5 is on the way and scheduled to be delivered in November 2020. It is meant to unify .NET Framework, .NET Core, Mono (and possibly .NET Standard) - i.e. the entire .NET Platform.

After 17 years of working in .NET (since it was first released in 2002), I'm excited to see a unified solution to the splintered framework that .NET has become as of late.  IMO, .NET Framework, .NET Core, .NET Standard, and Mono have grown into a big bad monster that is growing faster than its parts can manage. While we know (and expect) technology to evolve and grow over time, we hope that it doesn't fragment and splinter so much that its parts become dissimilar.  I've seen it grow from a set of APIs that were questionable to something that is almost overwhelming - and certainly difficult to keep up with.  I'm pleased to see that Microsoft has acknowledged the splintering and has been working on a solution since at least December 2018.  I'm looking forward to seeing how much simpler things will become.


Now if they could just FINALLY fix "Edit and Continue"...........

(image is copyright of Microsoft)

Share

April 19, 2019

Missing Start Page from Visual Studio 2019

Oh Microsoft, why hast thou forsaken the beloved 'Start Page' and replaced it with a modal window?

The new 'Start Window' which replaces the 'Start Page'

You can see the design reasoning behind the new VS 2019 Start Window as posted at 'Get to code: How we designed the new Visual Studio start window' and 'New Start Window and New Project Dialog Experience in Visual Studio 2019'.

I sincerely appreciate any amount of thought, consideration, or testing that a company decides to invest in their products - especially a flagship produce like Visual Studio.  Based on the design reasoning Microsoft certainly had good intentions and did put a good amount of thought and testing into the effort.  However, I think they missed the mark.  Perform any Google search on "missing start page visual studio 2019" or look on the Developer Community Feedback Site and you'll see devs crying out for the beloved Start Page.

Some things are better left untouched and left alone and the Start Page is one of them. Some might argue the new 'Start Window' is a better experience but why make it a modal window?  Really?  In Visual Studio 2019 Preview 1, at least the option to restore the 'Start Page' was available as an option in the Startup settings:



However, somewhere along the way the 'Start Page' item has disappeared from the drop-down...headsmack!  Here's what the options are in version 16.0.2:



Ok, now I'm getting frustrated.  I get it. You're trying to funnel me into this new window that you think is better.  Well, my response is



Fortunately, Microsoft hasn't completely done away with the 'Start Page'...yet.  You can still add it by customizing the toolbar to add the Start Page button:

1. Right-click the toolbar and select 'Customize':

2. Select the 'Commands' tab:

3. Select 'Toolbar' and change the dropdown to whatever menu you'd like, then click the 'Add Command' button:
4. Choose 'File' from the Categories list box, then select 'Start Page' from the Commands list box:

So, there you go!  At least it's still there for now.  I'd bet any amount of money that they change the experience back so that either the 'Start Page' option is available from the Environment/Startup setting. To be fair, Microsoft has improved significantly at listening to community feedback.
Share