Moving from AnsiString to UnicodeString in Delphi

If your organization is running a Delphi application developed in version 2007 or older, you are sitting on a «Pre-Unicode» codebase. In 2009, Embarcadero introduced a fundamental change that redefined the very fabric of the Delphi language: the switch from AnsiString to UnicodeString as the default string type.

While this change was necessary to support global character sets and modern Windows APIs, it remains the single biggest technical hurdle in legacy migration.

Why the Shift is More Than «Just a Recompile»

Before Delphi 2009, a string was essentially a Byte array where one character equaled one byte. In the Unicode world, a string (now WideString/UnicodeString) uses UTF-16 encoding, where a character typically occupies two bytes.

This shift impacts your code in three critical areas:

1. Pointer Arithmetic and Buffer Sizes

If your code calculates buffer sizes based on string length (e.g., Length(MyString)), it may now be off by a factor of two. Hardcoded logic that assumes SizeOf(Char) = 1 will result in memory corruption or «buffer overflows» because, in modern Delphi, SizeOf(Char) is 2.

2. Windows API Calls

Legacy applications often call Windows API functions (like SendMessage or GetWindowText). Older code specifically targeted the «A» (Ansi) versions of these functions. To maintain compatibility and performance, these should be updated to the «W» (Wide/Unicode) variants, or the underlying data handling must be carefully marshaled.

3. Data Integrity and External Integrations

When your application communicates with external DLLs, file streams, or older databases, it expects a specific byte-stream format. If you send a 2-byte Unicode string to a system expecting a 1-byte Ansi string, you won’t just get weird characters (like «mojibake«), but you risk crashing the integration or corrupting the database entirely.

The Danger of Manual Conversion

The risk of a manual conversion is «silent failure.» Your application might compile successfully, but it could fail at runtime when a user enters a special character or when a specific file is processed. Auditing millions of lines of code for SizeOf errors or incorrect pointer math is a monumental task that often takes months and carries a high risk of human error.

The GDK Approach: Precision Through Automation

At GDK Software, we have refined a methodology to navigate the Unicode minefield without the manual headache. We don’t just «find and replace»; we use a data-driven approach to ensure structural integrity.

Our GDK Duster tool is specifically programmed to recognize patterns that indicate Unicode risks. Duster scans the codebase for risky patterns, such as pointer arithmetic on strings, Move() operations, and legacy API calls. Instead of a developer having to manually investigate every string usage, Duster applies proven solutions from our extensive database to refactor the code to be Unicode-compliant.

As recently announced, this automated approach is now available for C++Builder projects as well. We use the C++ compiler to catch errors and apply our «solution database» to bridge the gap between Borland C++ 4/5/6 and the modern Embarcadero environment.

Secure Your Data, Modernize Your Code

The transition from Ansi to Unicode is the most «dangerous» part of a Delphi upgrade, but it is also the most rewarding. It opens the door to modern web services, internationalization, and 64-bit performance.

Don’t let the fear of corrupted data hold your software back. Let GDK Software provide the roadmap and the tools to make your transition seamless.

Ready to modernize?

Explore our Delphi Upgrade Services or see how we upgrade legacy Borland C++ applications.