2

Trying to grab data from a website using Google Apps Script to put it directly into a spreadsheet. The fetch does not seem to be working, where the Python requests equivalent works just fine.

Python code:

page = requests.get("someurl?as_data_structure", headers={'user-agent':'testagent'})

GAS code:

var page = UrlFetchApp.fetch("someurl?as_data_structure", headers={'user-agent':'testagent'});

The only required header is the user-agent, and the error I am getting from the GAS code is what I would usually get from the Python code if I hadn't included the header. I am new to js but as far as I know this is the proper way to do it..?

EDIT: Now got the headers in the right place but the issue persists, exactly the same error as before.

var options = {"headers": {"User-Agent": "testagent"}};
var page = UrlFetchApp.fetch("someurl?as_data_structure", options);
2
  • Quote the error? Commented May 12, 2019 at 12:57
  • Error: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>403 Forbidden</title> </head><body> <h1>Forbidden</h1> <p>You don't have permission to access / on this server.<br /> </p> (This is exactly what i get on Python if i don't use the header) Commented May 12, 2019 at 13:06

1 Answer 1

10

Star ★(on top left) the issue here for Google developers to prioritize the issue.


Google doesn't always document it's restrictions(Annoying?). One such restriction is changing the user agent. It's fixed to

"User-Agent": "Mozilla/5.0 (compatible; Google-Apps-Script)"

You can't change it.

Sample Test:

function testUrlFetchAppHeaders() {
  var options = {
    headers: {
      'User-Agent':
        'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
    },
  };
  var fakeRequest = UrlFetchApp.getRequest(
    'https://www.httpbin.org/headers',
    options
  );//providing fake assurance
  var realRequest = UrlFetchApp.fetch(
    'https://www.httpbin.org/headers',
    options
  );//like a wrecking ball
  Logger.log({ fake: fakeRequest, real: realRequest });
}

Sample Response:

{
  "fake": {
    "headers": {
      "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36"
    },
    "method": "get",
    "payload": "",
    "followRedirects": true,
    "validateHttpsCertificates": true,
    "useIntranet": false,
    "contentType": null,
    "url": "https://www.httpbin.org/headers"
  },
  "real": {
    "headers": {
      "Accept-Encoding": "gzip,deflate,br",
      "Host": "www.httpbin.org",
      "User-Agent": "Mozilla/5.0 (compatible; Google-Apps-Script)"
    }
  }
}

getRequest(url)

Returns the request that would be made if the operation was invoked.

This method does not actually issue the request.

Neither does it accurately return the request that would be made.

Sign up to request clarification or add additional context in comments.

8 Comments

Ah thanks. I wanted to update the spreadsheet with data from this website using a button on the spreadsheet itself, since GAS has this limitation is it still possible to achieve what I'm trying to do?
@STUD Depends. Can the website provide the data without this header?
perhaps, but that would take some convincing. Would it be possible to add the GAS User-Agent to the website alongside the existing one? If that's possible it would probably be accepted.
@STUD If you have admin access to the website, You probably can. If not, and If this header is required, then you'd need to host your own server. Make requests to your server > Let your server make the same request as new request with a new user agent.
I dont think I would go so far as to host my own server for it, its a small project. I know the admin tho, I might be able to convince him. Thanks for all the help :)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.