How to generate TwiML using Strings in C#

Niels Swimberghe - 2/21/2023 - .NET

This blog post was written for Twilio and originally published at the Twilio blog.

Over the decades, C# has added different ways to create a string, each with their own benefit. In this tutorial, you'll learn how to generate TwiML using the different C# string features with an ASP.NET Core Minimal API and compare it to the object oriented way of generating TwiML.

But first, let's get you up to speed on how Twilio uses webhooks and TwiML to respond to text messages and voice calls.

Prerequisites #

Here's what you will need to follow along:

.NET 6 SDK, or .NET 7 SDK to use Raw String Literals (later versions will work too)
A code editor or IDE. I recommend JetBrains Rider, Visual Studio 2022 Preview (will also work in the non-preview Visual Studio in the future), or VS Code with the C# plugin
A free Twilio account (sign up with Twilio for free and get trial credit)
A Twilio Phone Number
The ngrok CLI, and optionally, a free ngrok account

Experience with ASP.NET Core and Minimal APIs is not required but recommended. Here's an article that brings you up to speed on how to integrate ASP.NET Core Minimal APIs with Twilio.

You can find the source code for this tutorial on GitHub. Use it if you run into any issues, or submit an issue, if you run into problems.

Before creating your application, let's get you up to speed on how Twilio uses webhooks and TwiML to respond to text messages and voice calls.

Webhooks and TwiML #

Using Twilio, you can build programmatic text message and voice call applications. Twilio can do a lot of things like playing audio, gathering input, and recording a call. But Twilio doesn't know how you want to respond to voice calls and text messages. That's why when Twilio receives a call or text message, Twilio will send an HTTP request to your application asking for instructions.

A webhook is a user-defined HTTP callback. When an event happens in a service, your application is notified of that event using an HTTP request.

Twilio uses webhooks heavily throughout all its products. Here's a diagram of what it looks like when there's an incoming text message to your Twilio Phone Number and your application is handling the messaging webhook:

Phone texts "Ahoy!" to a Twilio Phone Number, Twilio sends the SMS details (from and to phone number and the body of the message) via HTTP to your web application, then your application responds with TwiML instructions instructing to respond with "Hi!". Twilio receives the instructions and sends "Hi!" back to the original sender.

When your Twilio Phone Number receives a text message, Twilio will send an HTTP request with the message details to your application. Your application then has to respond with TwiML (Twilio Markup Language) to instruct Twilio how to respond. TwiML is XML with special tags defined by Twilio to provide instructions on how to respond to messages and voice calls. In the diagram depicted above, Twilio will respond with "Hi!" because the app responded with the following TwiML:

<?xml version="1.0" encoding="utf-8"?>
<Response>
  <Message>Hi!</Message>
</Response>

You can learn more about TwiML for Programmable Messaging here and TwiML for Programmable Voice here.

Create an ASP.NET Core Minimal API to handle voice calls #

First, let's create an ASP.NET Core Minimal API to handle voice calls without using Raw String Literals.

Open a terminal and create a new ASP.NET Core Minimal API project using the .NET CLI:

dotnet new web -o TwimlStrings 
cd TwimlStrings

These commands will create the project in a folder named TwimlStrings and navigate into the folder.

Open the project in your editor, and update the Program.cs file with the following C# code:

using System.Security;

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

app.MapGet("/what-does-the-fox-say", () => Results.Text("" +
    "<?xml version=\"1.0\" encoding=\"utf-8\"?>" +
    "<Response>" +
    "    <Gather action=\"/answer\" method=\"GET\" input=\"speech\"> " +
    "        <Say>What does the fox say?</Say>" +
    "   </Gather>" +
    "    <Say>Ring-ding-ding-ding-dingeringeding!</Say>" +
    "</Response>",
    contentType: "application/xml"
));

app.MapGet("/answer", (string speechResult) => Results.Text("" +
    "<?xml version=\"1.0\" encoding=\"utf-8\"?>" +
    "<Response>" +
    "    <Say>You said: " + SecurityElement.Escape(speechResult) + "</Say>" +
    "</Response>",
    contentType: "application/xml"
));

app.Run();

When your Twilio Phone Number receives a call, Twilio will send an HTTP GET request to the /what-does-the-fox-say endpoint, which will set the content-type response header to application/xml, and return the following TwiML in the response:

<?xml version="1.0" encoding="utf-8"?><Response>    <Gather action="/answer" method="GET" input="speech">         <Say>What does the fox say?</Say>   </Gather>    <Say>Ring-ding-ding-ding-dingeringeding!</Say></Response>

That's not very readable, so here's the formatted version:

<?xml version="1.0" encoding="utf-8"?>
<Response>
    <Gather action="/answer" method="GET" input="speech">
        <Say>What does the fox say?</Say>
    </Gather>
    <Say>Ring-ding-ding-ding-dingeringeding!</Say>
</Response>

When Twilio receives the TwiML, it'll ask the caller "What does the fox say?" and listen for a speech response. When the caller responds, Twilio will send an HTTP request, this time to the /answer endpoint, with the transcription of what the caller said. If the caller doesn't respond, Twilio will say "Ring-ding-ding-ding-dingeringeding!" to the caller.

The /answer endpoint will retrieve the transcription from the query string parameter SpeechResult by binding it to the speechResult parameter. The speechResult is used to construct some more TwiML, but it is first escaped using SecurityElement.Escape to ensure no XML is in the variable. The /answer endpoint will generate the following TwiML when the caller says "ring ding ding":

<?xml version="1.0" encoding="utf-8"?>
<Response>
    <Say>You said: ring ding ding</Say>
</Response>

When Twilio receives this TwiML, Twilio will convert "You said: ring ding ding" to speech and stream the audio to the caller.

In case you're not familiar with the song I am referencing in this app, "What does the fox say?" originates from this famous song.

Start your .NET application using your editor or using the terminal using the .NET CLI:

dotnet run

Before integrating Twilio, open a browser and navigate to your web application URL with the suffix /what-does-the-fox-say, to verify the TwiML that is generated.

Make the application publicly accessible #

For Twilio to be able to send HTTP requests to your local web server, the server needs to be publicly accessible. ngrok is a free and secure tunneling service that can make your local web server public.

To start ngrok, run the following ngrok command in a separate terminal:

ngrok http [YOUR_ASPNET_URL]

Replace [YOUR_ASPNET_URL] with the localhost URL from your .NET application. If you're using an HTTPS localhost URL, you'll need to authenticate ngrok. The ngrok command will display an HTTPS Forwarding URL that makes your local web server public.

ngrok http command output showing information about the secure tunnel, most importantly the public forwarding URLs

Configure your Twilio Phone Number #

Now it's time to update your Twilio Phone Number to send HTTP requests to your /what-does-the-fox-say endpoint via the ngrok Forwarding URL. The URL should look something like https://cd2f8809cbd0.ngrok.io/what-does-the-fox-say.

Go to the Active Phone Numbers section in the Twilio Console and click on your Twilio Phone Number.
This will take you to the configuration for the phone number. Find the Voice & Fax section and under the "A CALL COMES IN" label, set the dropdown to "Webhook". In the text field next to it, enter your ngrok forwarding URL with /what-does-the-fox-say appended to it. Select “HTTP GET” in the last dropdown. Finally, click the Save button at the bottom of the page.

Voice & Fax section in the Twilio Phone Number configuration page. Under the "A CALL COMES IN" label, a dropdown is set to "Webhook", the text field next to it is configured with "https://cd2f8809cbd0.ngrok.io/what-does-the-fox-say", and the dropdown next to that is set to "HTTP GET".

Test that the application works #

To test out your application, call your Twilio Phone Number. You should hear the question "What does the fox say?" which you can respond to, wait 5 seconds, and then your response will be read back to you.

After you've done that, stop your .NET application using your editor, or if you're using the terminal, press ctrl + c.

Use Verbatim Strings #

The first way you can improve your code is by using Verbatim Strings. To turn your string into a Verbatim String, prefix your string with the "at" @ symbol.

@"The fox said: ""I like using verbatim strings,
because it supports multi-line text,
you don't need to escape forward slashes (/), and backslashes (\),
and you can easily escape double quotes by doubling the double quotes""."

As the Verbatim String above says, you don't need to escape forward slashes or backslashes, however you still need to escape double quotes. However, instead of using a backslash to escape the double quotes, you place two double quotes and the resulting character will be a single double quote.

Applying this knowledge to your TwiML Program.cs file, you will have the following endpoint code:

app.MapGet("/what-does-the-fox-say", () => Results.Text(
    @"<?xml version=""1.0"" encoding=""utf-8""?>
    <Response>
        <Gather action=""/answer"" method=""GET"" input=""speech""> 
            <Say>What does the fox say?</Say>
       </Gather>
        <Say>Ring-ding-ding-ding-dingeringeding!</Say>
    </Response>",
    contentType: "application/xml"
));

app.MapGet("/answer", (string speechResult) => Results.Text(
    @"<?xml version=""1.0"" encoding=""utf-8""?>
    <Response>
        <Say>You said: " + SecurityElement.Escape(speechResult) + @"</Say>
    </Response>",
    contentType: "application/xml"
));

The resulting TwiML will be formatted better, but still be indented oddly because the tabs used to nicely indent the code are also included in the string:

<?xml version="1.0" encoding="utf-8"?>
    <Response>
        <Gather action="/answer" method="GET" input="speech"> 
            <Say>What does the fox say?</Say>
       </Gather>
        <Say>Ring-ding-ding-ding-dingeringeding!</Say>
    </Response>

By using Verbatim Strings, you are able to use multi-line text so you don't have to concatenate the string for every line of TwiML. You may think that this will also improve performance as you aren't creating as many strings and then concatenating them. But the compiler will optimize your code so that there's no difference in the end result.

Use String Interpolation #

By using Verbatim Strings, you were able to decrease the amount of opening and closing double quotes to construct your TwiML string which makes the code a lot easier to read. However, you still had to use string concatenation to embed the speechResult into your TwiML string for the /answer endpoint.

To embed variables in your string, you can use String.Format or, much better, String Interpolation. First, start your string with the dollar $ sign then use the curly brackets {} to embed C# expressions.

For example, the following code will write "What does the fox say?" to the console.

var animal = "fox";
Console.WriteLine($"What does the {animal} say?");

Applying this knowledge to the /answer endpoint, you can further improve the code like this:

app.MapGet("/answer", (string speechResult) => Results.Text(
    $@"<?xml version=""1.0"" encoding=""utf-8""?>
    <Response>
        <Say>You said: {SecurityElement.Escape(speechResult)}</Say>
    </Response>",
    contentType: "application/xml"
));

C# 11 and .NET 7 only: Use Raw String Literals #

C# 11 is still in preview, but you can try it out today using the .NET 7 previews. Make sure you have .NET 7 installed and are using .NET 7 in your project. Then, open your project file at TwimlStrings.csproj and add the <LangVersion>preview</LangVersion> to the first <PropertyGroup> node. This will enable the C# 11 preview features.

Your TwimlStrings.csproj in the project folder, should look something like this:

<Project Sdk="Microsoft.NET.Sdk.Web">
  <PropertyGroup>
    <TargetFramework>net7.0</TargetFramework>
    <Nullable>enable</Nullable>
    <ImplicitUsings>enable</ImplicitUsings>
    <LangVersion>preview</LangVersion>
  </PropertyGroup>
</Project>

Now that you have enabled C# 11, let's talk about the new Raw String Literals. To use Raw String Literals, you start and end your string with at least 3 double quotes. For example:

"""
    I like using Raw String Literals,
    because it supports multi-line text,
    you don't need to escape forward slashes (/), backslashes (\), and double quotes* ("),
    and you can indent your code better without having the indentation included in your string. 
"""

The syntax highlighting may be off in the next snippets and snippet above because the blog doesn't have C# 11 highlighting yet.

As with Verbatim Strings, you don't have to escape forward slashes and backslashes, but, as opposed to Verbatim Strings, in Raw String Literals you also don't have to escape your double quotes, more or less.

You can use double quotes as long as the number of subsequent double quotes is less than the amount of double quotes you started and ended your string with. This is clearer with an example:

// 3 start and end double quotes, so 1 and 2 subsequent double quotes is allowed
"""This " and this "" is allowed.""" 

// 3 start and end double quotes, so 3 subsequent double quotes is NOT allowed
"""This """ is not allowed."""

// 4 start and end double quotes, so 3 subsequent double quotes is NOT allowed
""""This """ is allowed.""""

Lines one and two are valid Raw String Literals because the number of subsequent double quotes is lower than the number of double quotes used to start and end the Raw String Literal.

Raw String Literals also makes it easier to format your strings. Take the following method that generates a Verbatim String as an example:

public string GenerateHtml()
{
    return @"
    <p>
    This is a <b>multi-line Verbatim String</b> to generate some HTML
    </p>
    ";
}

The resulting string will be:


        <p>
        This is a <b>multi-line Verbatim String</b> to generate some HTML
        <p>

Notice the newlines and the tab indentation? I formatted the Verbatim String for the sake of readability, but as a result I also got the newlines and tabs in it which I do not want.

Let's take a look at the Raw String Literal version:

string GenerateString()
{
    return """
    <p>
    This is a <b>multi-line Raw String Literal</b>
    </p>
    """;
}

The resulting string now looks like this:

<p>
This is a <b>multi-line Raw String Literal</b>
</p>

The unnecessary newlines and tabs are gone. The way Raw String Literals indent your strings depends on where you place the starting and ending double quotes, and also how you indent the content of your string. It's a little funky and hard to explain all the variations, so I recommend experimenting with it and reading the Microsoft documentation on Raw String Literals to learn more.

Using this knowledge, you can apply Raw String Literals to your endpoints like this:

app.MapGet("/what-does-the-fox-say", () => Results.Text("""
    <?xml version="1.0" encoding="utf-8"?>
    <Response>
        <Gather action="/answer" method="GET" input="speech">
            <Say>
                What does the fox say?
            </Say>
        </Gather>
        <Say>Ring-ding-ding-ding-dingeringeding!</Say>
    </Response>
    """,
    contentType: "application/xml"
));

app.MapGet("/answer", (string speechResult) => Results.Text($"""
    <?xml version="1.0" encoding="utf-8"?>
    <Response>
        <Say>You said: {SecurityElement.Escape(speechResult)}</Say>
    </Response>
    """ ,
    contentType: "application/xml"
));

Now the resulting TwiML will be readable and not contain unnecessary newlines and tabs.

Other ways to generate TwiML #

There are many other ways you could generate TwiML. Since TwiML is XML, you can use the APIs from the System.Xml and System.Xml.Linq namespace. Twilio also has a helper library for .NET that lets you generate TwiML in an object-oriented way, and the helper library for ASP.NET helps you return the TwiML in the HTTP response.

In the terminal where you ran your project, run the following commands to add the Twilio package and Twilio.AspNet.Core package:

dotnet add package Twilio
dotnet add package Twilio.AspNet.Core

Then, update the Program.cs file with the following code:

using Twilio.AspNet.Core.MinimalApi;
using Twilio.TwiML;
using Twilio.TwiML.Voice;

var builder = WebApplication.CreateBuilder(args);
var app = builder.Build();

app.MapGet("/what-does-the-fox-say", () =>
{
    var response = new VoiceResponse();
    var gather = new Gather(
        input: new List<Gather.InputEnum> {Gather.InputEnum.Speech},
        action: new Uri("/answer", UriKind.Relative),
        method: Twilio.Http.HttpMethod.Get
    );
    gather.Append(new Say("What does the fox say?"));
    response.Append(gather);
    response.Say("Ring-ding-ding-ding-dingeringeding!");
    return Results.Extensions.TwiML(response);
});

app.MapGet("/answer", (string speechResult) =>
{
    var response = new VoiceResponse()
        .Say($"You said: {speechResult}");
    return Results.Extensions.TwiML(response);
});

app.Run();

The resulting TwiML will be the same as before. Notice, however, that you don't need to use the SecurityElement.Escape anymore to escape the user input, because the Twilio helper library will XML encode it for you. Under the hood, the helper library uses the APIs from System.Linq.Xml to generate the XML string which will also do the formatting for you.

Now, you may be wondering, why would you use strings when you can use the Twilio helper library, or maybe you're wondering why you would use the helper library when you can use strings.

The advantages of creating TwiML using strings, is that you don't need any dependencies and it's more memory efficient & performant. However, it is easier to make a mistake in your TwiML and there will be no compilation errors to prevent you from doing that. You also have to XML escape user input yourself to protect yourself from XML injection.

The advantages of using the Twilio helper library are that you are using fully typed objects and methods that provide IntelliSense in IDEs. If you make a typo, your project will not build and you will receive a compiler error telling you where your typo is. However, you do depend on the Twilio helper library and its dependencies, and you have to create a bunch of objects that are ultimately serialized to an XML string which is less memory efficient and slower.

They both have their own advantages and disadvantages, but you can mix and match based on your use case.

Next steps #

You learned how Twilio uses webhooks and TwiML to give you control over how to respond to a call or text message. You then learned how to generate TwiML using different string features in C#, such as Verbatim Strings, String Interpolation, and C# 11's new Raw String Literal feature. You then looked at the difference between generating TwiML using strings and generating TwiML using the Twilio helper library.

Here are a couple more resources to further your learning on Minimal APIs and Twilio:

We can't wait to see what you build. Let us know!