1

I have a C++ DLL I want to call into from Python. I have no control over the C++ DLL nor do I have its source code or headers.

The mangled functions are of the form:

2980  BA3 005A3060 ?getFoo@FooLib@@YAAEAVFoo@1@XZ
2638  A4D 005A3020 ?getApplicationData@Foo@FooLib@@QEAAAEAVApplicationData@2@XZ
2639  A4E 005A3030 ?getApplicationData@Foo@FooLib@@QEBAAEBVApplicationData@2@XZ
2738  AB1 000F8A30 ?getDataRootPath@ApplicationData@FooLib@@QEBA?AV?$basic_string@_WU?$char_traits@_W@std@@V?$allocator@_W@2@@std@@XZ

With the aid of Copilot, I was able to translate these to (crossing fingers that this is right):

Foo __cdecl FooLib::getFoo()
ApplicationData& __thiscall FooLib::Foo::getApplicationData()
const ApplicationData& __thiscall FooLib::Foo::getApplicationData() const
std::wstring __thiscall FooLib::ApplicationData::getDataRootPath() const

From Python, with ctypes docs, I was able assemble this:

from ctypes import *
dll = WinDLL(r"c:\path\to\Foo.dll")

getFoo = getattr(dll, "?getFoo@FooLib@@YAAEAVFoo@1@XZ")
getFoo.argtypes = []
getFoo.restype = c_void_p

getAppData = getattr(dll, "?getApplicationData@Foo@FooLib@@QEAAAEAVApplicationData@2@XZ")
getAppData.argtypes = [c_void_p]
getAppData.restype = c_void_p

getAppData(getFoo()) # returns a pointer reliably, same value each time

Note that as far as Python ctypes is concerned, the two DLL entries for getApplicationData produce the same values.

However, the final function does not work, crashing Python or throwing an Access Violation, likely because it returns a C++ string type, and ctypes cannot handle that. Best recommendations I've seen have been to create a shim DLL that can call the C++ function and convert it to a C string, which Python ctypes can handle.

On the shim side, I can load the function pointer addresses, but two parts seem to be a problem for the getApplicationData and getDataRootPath functions, which are member functions of the Foo class and ApplicationData classes, respectively. Solutions to this kind of problem on the internet seem to be sparse, or at minimum not fully explained. I'm sure there are also subtleties about addresses and pointers in C++ that are getting in my way as well.

So: How can I solve this issue?

7
  • More likely crashing because __thiscall calling convention isn't support by ctypes ("C" types). Commented Sep 23 at 17:38
  • The getApplicationData call (seems to) work though, I just pass the pointer from getFoo() as the first argument. Commented Sep 23 at 17:42
  • 1
    That doesn't mean its a correct pointer. In any case, using export "C" wrappers is the way to go. Commented Sep 23 at 17:48
  • What makes you even go this way? Because I can only agree with Nostradamus here : "trouble ahead". You don't have a document API to use, there is name mangling stuff that's compiler dependent. I really hope this isn't going to end up in a product Commented Sep 23 at 18:30
  • Down below you say I've gotten the static functions in the library to call correctly, but not the member functions. That is even more tricky : you will need instances of those classes first., which means calling their constructors. Commented Sep 23 at 18:35

3 Answers 3

2

For completeness, here is the shim that resulted from @Quuxplusone's answers. Caveat that the final function call into FooLib is throwing an exception, but it seems to be a separate issue.

Key elements:

  • extern "C" blocks cannot handle C++ types, because it's C code, so they just have to go in a separate function that returns a C type.

  • Dummy namespace, class, and function definitions matching the mangled functions being looked up.

  • using statements to declare the pointer type locally.

  • Pass the class instance as the first argument to the member function.

#include <string>
#include <windows.h>
#include <cstdio>

// Forward declarations for the Foo library classes
namespace FooLib {
    class ApplicationData {
        public:
            std::wstring getDataRootPath() const;
    };

    class Foo {
        public:
            ApplicationData& getApplicationData();
    };

    Foo& getFoo();
}

// Static variable to store the library path
static HMODULE g_hFooLib = NULL;

wchar_t* CopyWideString(const std::wstring& wstr) {
    if (wstr.empty()) {
        wchar_t* result = new wchar_t[1];
        result[0] = L'\0';
        return result;
    }

    size_t len = wstr.length() + 1; // +1 for null terminator
    wchar_t* result = new wchar_t[len];
    wcscpy_s(result, len, wstr.c_str());
    return result;
}

// C++ types need to be kept away from C interface
wchar_t* GetDataRootPathCPP() {
    if (!g_hFooLib) {
        fwprintf(stdout, L"GetDataRootPathCPP: library not loaded, call LoadFooLib(path) first.\n");
        return nullptr;
    }
    using FooLib_GetFoo = FooLib::Foo&();
    auto *getFoo = (FooLib_GetFoo*)GetProcAddress(g_hFooLib, "?getFoo@FooLib@@YAAEAVFoo@1@XZ");
    if (!getFoo) {
        fwprintf(stdout, L"GetDataRootPathCPP: GetProcAddress for getFoo failed\n");
        return nullptr;
    }

    using FooLib_GetApplicationData = FooLib::ApplicationData&(FooLib::Foo&);
    auto *getApplicationData = (FooLib_GetApplicationData*)GetProcAddress(g_hFooLib, "?getApplicationData@Foo@FooLib@@QEAAAEAVApplicationData@2@XZ");
    if (!getApplicationData) {
        fwprintf(stdout, L"GetDataRootPathCPP: GetProcAddress for getApplicationData failed\n");
        return nullptr;
    }
    
    using FooLib_GetDataRootPath = std::wstring(FooLib::ApplicationData&);
    auto *getDataRootPath = (FooLib_GetDataRootPath*)GetProcAddress(g_hFooLib, "?getDataRootPath@ApplicationData@FooLib@@QEBA?AV?$basic_string@_WU?$char_traits@_W@std@@V?$allocator@_W@2@@std@@XZ");
    if (!getDataRootPath) {
        fwprintf(stdout, L"GetDataRootPathCPP: GetProcAddress for getDataRootPath failed\n");
        return nullptr;
    }

    try {
        FooLib::Foo& Foo = getFoo();
        fwprintf(stdout, L"getFoo succeeded.\n");
        FooLib::ApplicationData& appData = getApplicationData(Foo);
        fwprintf(stdout, L"getApplicationData succeeded.\n");
        std::wstring ws = getDataRootPath(appData);
        fwprintf(stdout, L"getDataRootPath succeeded.\n");
        auto path = CopyWideString(ws);
        fwprintf(stdout, L"CopyWideString succeeded.\n");
        return path;
    }
    catch (...) {
        fwprintf(stdout, L"GetDataRootPathCPP: exception caught while obtaining data root path\n");
        return nullptr;
    }
}

//// External interface

// Function to set the library path
extern "C" __declspec(dllexport) bool LoadFooLib(const wchar_t* path) {
    // If we already have a library loaded, free it
    if (g_hFooLib) {
        fwprintf(stdout, L"LoadFooLib: freeing previously loaded library\n");
        FreeLibrary(g_hFooLib);
        g_hFooLib = NULL;
    }

    // Try to load the library
    g_hFooLib = LoadLibraryW(path);

    if (g_hFooLib == NULL) {
        DWORD error = GetLastError();
        LPWSTR msgBuf = NULL;
        DWORD fmt_flags = FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS;
        DWORD len = FormatMessageW(fmt_flags, NULL, error, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), (LPWSTR)&msgBuf, 0, NULL);

        // Trim trailing newlines/spaces commonly appended by FormatMessage
        if (msgBuf && len > 0) {
            // remove CR/LF and trailing spaces
            while (len > 0 && (msgBuf[len-1] == L'\n' || msgBuf[len-1] == L'\r' || msgBuf[len-1] == L' ' || msgBuf[len-1] == L'\t')) {
                msgBuf[len-1] = L'\0';
                --len;
            }
        }

        if (msgBuf && len > 0) {
            fwprintf(stdout, L"LoadFooLib: Failed to load library '%s' (Win32 error %lu): %s\n", path ? path : L"(null)", error, msgBuf);
            LocalFree(msgBuf);
        } else {
            fwprintf(stdout, L"LoadFooLib: Failed to load library '%s' (Win32 error %lu). No additional message available.\n", path ? path : L"(null)", error);
        }
        return false;
    }
    fwprintf(stdout, L"LoadFooLib: successfully loaded '%s'\n", path ? path : L"(null)");
    return true;
}

extern "C" __declspec(dllexport) wchar_t* GetDataRootPath() {
    return GetDataRootPathCPP();
}

extern "C" __declspec(dllexport) void FreeString(wchar_t* str) {
    delete[] str;
}
Sign up to request clarification or add additional context in comments.

1 Comment

LGTM, except that you don't need the if (wstr.empty()) branch; it's doing exactly the same thing (new[] and wcscpy) as the other branch. Stylistically, recommend wstr.size()` over wstr.length().
1

That's what I'm stumbling on, getting the function pointers to the DLL code and calling those functions [...] I've gotten the static functions in the library to call correctly, but not the member functions.

Fortunately, the MSVC calling convention for x86-64 is such that calling a member function works exactly like calling a non-member function with a hidden first parameter corresponding to the this pointer. So, given:

struct S {
  int f(int arg);
};
int f(S *self, int arg);

we find that ps->f(42) and f(ps, 42) give the same codegen.

So here's another attempt at an answer (Godbolt). Same as before, we set up some "dummy" Foo and ApplicationData classes. But this time we don't even need them for any physical reason; I just kept them for pedagogical purposes.

namespace {
  class ApplicationData {
  public:
    std::wstring getDataRootPath() const;
  };

  class Foo {
  public:
    ApplicationData& getApplicationData();
    const ApplicationData& getApplicationData() const;
  };
  Foo& getFoo();
}
char *createMyThing() {
  using GetFoo = Foo&();
  auto *getFoo = (GetFoo*)GetProcAddress("?getFoo@FooLib@@YAAEAVFoo@1@XZ");
  using GetApplicationData = ApplicationData&(Foo&);
  auto *getApplicationData = (GetApplicationData*)GetProcAddress("?getApplicationData@Foo@FooLib@@QEAAAEAVApplicationData@2@XZ");
  using GetDataRootPath = std::wstring(ApplicationData&);
  auto *getDataRootPath = (GetDataRootPath*)GetProcAddress("?getDataRootPath@ApplicationData@FooLib@@QEBA?AV?$basic_string@_WU?$char_traits@_W@std@@V?$allocator@_W@2@@std@@XZ");

  std::wstring ws = getDataRootPath(getApplicationData(getFoo()));
  char *p = (char*)std::malloc(ws.size() + 1);
  for (size_t i = 0; i <= ws.size(); ++i) {
    p[i] = ws[i];  // assuming there's no actually non-ASCII characters in there
  }
  return p;
}
void freeMyThing(char *p) {
  std::free(p);
}

I say "we don't need them" because we could just replace all our pointers and references with void* (or anything else that's passed and returned in the same way as a pointer or reference). We could do this:

  using GetFoo = void*();
  auto *getFoo = (GetFoo*)GetProcAddress("?getFoo@FooLib@@YAAEAVFoo@1@XZ");
  using GetApplicationData = void*(void*);
  auto *getApplicationData = (GetApplicationData*)GetProcAddress("?getApplicationData@Foo@FooLib@@QEAAAEAVApplicationData@2@XZ");
  using GetDataRootPath = std::wstring(void*);
  auto *getDataRootPath = (GetDataRootPath*)GetProcAddress("?getDataRootPath@ApplicationData@FooLib@@QEBA?AV?$basic_string@_WU?$char_traits@_W@std@@V?$allocator@_W@2@@std@@XZ");

Buyer beware: I'm still very hazy on the actual "DLL" parts of this — what an actual call to GetProcAddress would look like, and whether it's a problem that both your Python thing and this shim DLL are trying to load the original DLL — stuff like that.


And/or... You already know how to get a "function pointer" on the Python side:

getDataRootPath = getattr(dll, "?getDataRootPath@ApplicationData@FooLib@@QEBA?AV?$basic_string@_WU?$char_traits@_W@std@@V?$allocator@_W@2@@std@@XZ")

Can you figure out how to convert that Python object into a void* or void(*)() suitable for passing into your shim C function? Like, can you write the C++ side simply as...?

char *createMyThing(
    void *theApplicationData_fromPython,
    void *getDataRootPath_fromPython,
) {
  using GetDataRootPath = std::wstring(void*);
  auto *getDataRootPath = (GetDataRootPath*)getDataRootPath_fromPython;

  std::wstring ws = getDataRootPath(theApplicationData_fromPython);
  char *p = (char*)std::malloc(ws.size() + 1);
  for (size_t i = 0; i <= ws.size(); ++i) {
    p[i] = ws[i];  // assuming there's no actually non-ASCII characters in there
  }
  return p;
}
void freeMyThing(char *p) {
  std::free(p);
}

6 Comments

First the good news - this solution did allow me to call a member function! Now the bad news - the compiler is warning that std::wstring is not valid in C-linkage functions. Now, warnings aren't errors. But when I run it, calling that function results in a crash. It's the same place Python crashes via ctypes, so at least I reached parity. Now I'm left wondering whether the issue is related to the wstring and C linkage; or the other possibility is that some internal state of the library is wrong, and I'd have to figure that out first.
@NickBauer: Hm. Are you marking any of these functions as extern "C"? If so, stop doing that. I can confirm that marking f as extern "C" here seems to make it give wrong codegen. So the fix is don't do that, then.
I'm not, but I'm wondering if its taking up the state of the enclosing function?
Probably need to post the complete code for your shim DLL, then (directly in your question if it's short, or via a Godbolt link if not). Also: The original DLL is giving you a bunch of bytes it claims is a wstring. What does that bunch of bytes look like in hex? (Define your own struct WString { alignas(8) char bytes_[sizeof(std::wstring)]; ~WString() {} }; and print out the bytes. This could show that (contrary to my expectation) the DLL was built with a different idea of what wstring is than what's in your current platform's Microsoft STL.
It seems I just needed to move the C++ code into a separate function that returns a C string, and just have a thin extern "C" wrapper around the call to the C++ function. Don't know if it works yet, but compiler stops complaining.
I'm going to mark this as the answer, even though it's partial, but I'll put my full shim in another answer.
0

Part of the issue here (which you might not have realized, coming from Python) is that in C++ there's a difference between std::string and std::wstring. The latter is a string of wchar_t "wide characters," which are popular on Windows but not really anywhere else — they're 16-bit UTF-16. So your function isn't even giving you a string of char suitable for Python!

Writing a shim DLL is definitely the path of least resistance. It could be something like this [completely untested, of course]:

#include <cstdlib>
#include <string>

namespace FooLib {
  class ApplicationData {
  public:
    std::wstring getDataRootPath() const;
  };

  class Foo {
  public:
    ApplicationData& getApplicationData();
    const ApplicationData& getApplicationData() const;
  };
  Foo& getFoo();
}
char *createMyThing() {
  std::wstring ws = FooLib::getFoo().getApplicationData().getDataRootPath();
  char *p = (char*)std::malloc(ws.size() + 1);
  for (size_t i = 0; i <= ws.size(); ++i) {
    p[i] = ws[i];  // assuming there's no actually non-ASCII characters in there
  }
  return p;
}
void freeMyThing(char *p) {
  std::free(p);
}

We see that this code compiled with MSVC calls:

call ?getFoo@FooLib@@YAAEAVFoo@1@XZ
call ?getApplicationData@Foo@FooLib@@QEAAAEAVApplicationData@2@XZ
call ?getDataRootPath@ApplicationData@FooLib@@QEBA?AV?$basic_string@_WU?$char_traits@_W@std@@V?$allocator@_W@2@@std@@XZ

which at a glance looks like the right symbol names for what you said.


Be aware that this assumes your shim DLL will be built with the same version of std::wstring as the original DLL was. Microsoft's ABI is not officially stable (AFAIK), although I wouldn't expect them to have changed anything so fundamental as basic_string either.

Btw, Copilot got the first one wrong: it's not Foo FooLib::getFoo() but rather Foo& FooLib::getFoo(). Which is lucky, because it makes the shim a wee bit easier to write, compared to if we had to deal with a Foo returned by value. :)

6 Comments

Thanks! Though, Python can work with wchar_t strings just fine, which is one less headache.
But wait, the root issue is getting this to work with the loaded, existing library. That's what I'm stumbling on, getting the function pointers to the DLL code and calling those functions.
[deleted]
[deleted]
"cast appropriately" is possibly stickiest part. :-) I've gotten the static functions in the library to call correctly, but not the member functions.
Yeah, I'm trying to make it work by creating a DLL with similar signatures, and ran into the same issue. I deleted my comment before realizing you replied. I did get it to work by linking the target DLL to the wrapper DLL, but you still need to create a header file that mimics the API calls for it to link. Also if you do that, you don't need to dynamically load the static functions.
Hmm, wonder if I could convince the manufacturer to share the header...
@NickBauer: Oh, I see now what you mean. You're using the-equivalent-of-dlsym (is that GetProcAddress?), but that gives you a 64-bit pointer to some code (the moral equivalent of a function pointer), whereas what you need is a member function pointer. You would be basically out of luck if this were x86-32, but since you're on x86-64 you can just treat s->f(args...) as if it were f(s, args...). I'll post another attempted answer.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.