tmdbdata
tmdbdata.RmdTMDB data functions with httr2::functions()
This Package uses httr2
package functions to generate and perform requests then
extract results from the response body.
Create Basic Request
A base_tmdb_request is constructed by
looking at settings found in the API
documentation.
For example:
-
httr2::req_headers()info in the Response Format and Authentication Documentation.
-
httr2::req_throttle()info in the Rate Limits Documentation
base_tmdb_request
#> function (auth_token = get_auth_token())
#> {
#> httr2::request("https://api.themoviedb.org/3/") %>% httr2::req_headers(accept = "application/json",
#> Authorization = paste("Bearer", auth_token), .redact = "Authorization") %>%
#> httr2::req_error(body = tmdb_error_body) %>% httr2::req_user_agent("tmdbdata (https://github.com/novakowd/tmdbdata)") %>%
#> httr2::req_throttle(rate = 40, realm = "https://api.themoviedb.org/3/")
#> }
#> <bytecode: 0x55b7d9ed8c48>
#> <environment: namespace:tmdbdata>
📝 default of auth_token = get_auth_token(), will only work after the environment variable has been setup as mentioned in Authentication
The Authorization header info is also redacted in the
following objects.
Running the above base_tmdb_request() function generates
a basic <httr2_request> object:
- Normally
auth_tokenshould not need to be specified if the Authentication steps were followed
base_request <- base_tmdb_request(auth_token = decrypt_auth_token())
base_request
#> <httr2_request>
#> GET https://api.themoviedb.org/3/
#> Headers:
#> • accept : "application/json"
#> • Authorization: <REDACTED>
#> Body: empty
#> Options:
#> • useragent: "tmdbdata (https://github.com/novakowd/tmdbdata)"
#> Policies:
#> • error_body : <function>
#> • throttle_realm: "https://api.themoviedb.org/3/"The functions req_headers(), req_error(),
req_user_agent(), and req_throttle() do NOT
change the URL https://api.themoviedb.org/3/, but they DO
add other elements to the request such as Headers,
Options, and Policies.
Append Request Details
The next step is to append a path to the base URL and add relevant arguments to help with the request. Find appropriate arguments by consulting the API Documentation.
The below example looks at the Search>Movie Documentation which states:
- there is a required parameter called
queryand, - some other optional parameters, some of which have default values.
Let’s try putting together a request to
search the
movies for
Avengers:
base_request %>%
req_url_path_append("search","movie") %>%
req_url_query(query = "Avengers")
#> <httr2_request>
#> GET https://api.themoviedb.org/3/search/movie?query=Avengers
#> Headers:
#> • accept : "application/json"
#> • Authorization: <REDACTED>
#> Body: empty
#> Options:
#> • useragent: "tmdbdata (https://github.com/novakowd/tmdbdata)"
#> Policies:
#> • error_body : <function>
#> • throttle_realm: "https://api.themoviedb.org/3/"The functions
req_url_path_append()andreq_url_query()modified the url to https://api.themoviedb.org/3/search/movie?query=Avengers
Perform Request
Now that we have built a request,
httr2::req_perform() allows us to get a
response.
-
httr2::req_perform()submits the request to the serverassigning to a `variable` stores the response in-memory, as to not exceed the API's [Rate Limits](https://developer.themoviedb.org/docs/rate-limiting)
response <- base_request %>%
req_url_path_append("search", "movie") %>%
req_url_query(query = "Avengers") %>%
req_perform()
response
#> <httr2_response>
#> GET https://api.themoviedb.org/3/search/movie?query=Avengers
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (11773 bytes)Response Structure
Printing the <httr2_response> object above does
not show much information, though more information is available when
inspecting closer:
class(response)
#> [1] "httr2_response"
data.frame(class = unlist(lapply(response,class)),
length = unlist(lapply(response,length)))
#> class length
#> method character 1
#> url character 1
#> status_code integer 1
#> headers httr2_headers 17
#> body raw 11773
#> request httr2_request 8
#> cache environment 0
names(response$headers)
#> [1] "content-type" "date" "etag" "content-encoding"
#> [5] "server" "cache-control" "x-memc" "x-memc-key"
#> [9] "x-memc-age" "x-memc-expires" "vary" "x-cache"
#> [13] "via" "x-amz-cf-pop" "alt-svc" "x-amz-cf-id"
#> [17] "vary"The actual ‘Data’ that we’re looking for is in the $body
element, though it is not in a usable format yet:
Response Body
httr2
provides many resp_body_*()
functions to extract the body data, depending on the API
response format(s).
The API
Documentation states the only supported response
format is JSON, so we use the
resp_body_json() function and use the
simplifyVector = T to make the resulting lists easier to
work with.
body <- response %>%
resp_body_json()Inspecting the body shows it is a list with 4 elements.
lapply(body, class)
#> $page
#> [1] "integer"
#>
#> $results
#> [1] "list"
#>
#> $total_pages
#> [1] "integer"
#>
#> $total_results
#> [1] "integer"The $page, $total_pages, and
$total_results elements are all integer
values:
body[c("page", "total_pages", "total_results")]
#> $page
#> [1] 1
#>
#> $total_pages
#> [1] 7
#>
#> $total_results
#> [1] 128The $results element is a nested list from the
json response which then uses
tidyr::unnest_wider() to clean the data and get a result in
a data frame (tibble::tibble())
tibble::tibble(results = body$results) %>%
tidyr::unnest_wider(results) %>%
dplyr::glimpse()
#> Rows: 20
#> Columns: 14
#> $ adult <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ backdrop_path <chr> "/Al127H6f1RXpESdg0MGNL2g8mzO.jpg", "/mDfJG3LC3Dqb67…
#> $ genre_ids <list> [16, 35, 878], [12, 28, 878], [878, 28, 12], [28, 1…
#> $ id <int> 1359227, 299536, 24428, 40081, 257346, 1154598, 2995…
#> $ original_language <chr> "en", "en", "en", "zh", "ja", "en", "en", "en", "en"…
#> $ original_title <chr> "LEGO Marvel Avengers: Mission Demolition", "Avenger…
#> $ overview <chr> "A young, aspiring hero and superhero fan inadverten…
#> $ popularity <dbl> 4.2844, 33.0676, 30.4445, 1.1969, 2.0574, 2.5716, 21…
#> $ poster_path <chr> "/4KfgyzCgJeG0XJDbqNztdP730Pv.jpg", "/7WsyChQLEftFiD…
#> $ release_date <chr> "2024-10-17", "2018-04-25", "2012-04-25", "1978-12-2…
#> $ title <chr> "LEGO Marvel Avengers: Mission Demolition", "Avenger…
#> $ video <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ vote_average <dbl> 6.807, 8.236, 7.747, 6.800, 6.365, 6.552, 8.238, 7.2…
#> $ vote_count <int> 96, 30518, 31785, 98, 275, 126, 26330, 23421, 724, 0…Missing Data
The body$total_results is 128, yet the
body$results only contain 20 rows.
This is because body$total_pages = 7 but each response
only returns one page.
To get the other pages, we need to specify an additional argument
page = n to the response, like so:
page2_response <- base_request %>%
req_url_path_append("search", "movie") %>%
req_url_query(query = "Avengers",
page = 2) %>% ### NEW ARGUMENT
req_perform() %>%
resp_body_json(simplifyVector = T)
tibble::tibble(results = page2_response$results ) %>%
tidyr::unnest_wider(results) %>%
dplyr::glimpse()
#> Rows: 20
#> Columns: 14
#> $ adult <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ backdrop_path <chr> "/r5uAQQahZzcTYyPdlomDggyxHkV.jpg", "/d1bKi2TRB8iFEz…
#> $ genre_ids <list<list>> [<35, 14, 28>], [<35, 27>], [<10751, 16, 28, …
#> $ id <int> 538153, 1353766, 940543, 385411, 48230, 109088, 5217…
#> $ original_language <chr> "en", "xx", "en", "el", "ja", "zh", "en", "fr", "it"…
#> $ original_title <chr> "Avengers of Justice: Farce Wars", "風流少女殺人事件", "LEGO…
#> $ overview <chr> "While trying to remain a good husband and father, S…
#> $ popularity <dbl> 1.3169, 0.2304, 0.3979, 0.0409, 0.3615, 0.3538, 0.60…
#> $ poster_path <chr> "/yymsCwKPbJIF1xcl2ih8fl7OxAa.jpg", "/alcvekyWjaRdCZ…
#> $ release_date <chr> "2018-07-20", "2024-09-22", "2022-01-17", "1975-01-0…
#> $ title <chr> "Avengers of Justice: Farce Wars", "A Brighter Summe…
#> $ video <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ vote_average <dbl> 5.300, 7.800, 7.031, 8.400, 7.100, 6.800, 5.100, 7.4…
#> $ vote_count <int> 28, 2, 16, 9, 14, 23, 64, 25, 33, 24, 6, 149, 4, 17,…This ‘second page’ table contains the next 20 rows.
To get all rows we need to repeat this until page = 7
To combine all of the pages from the search we can simply call the wrapper function:
search_movies(query = "Avengers",
auth_token = decrypt_auth_token()) %>%
dplyr::glimpse()
#> Rows: 128
#> Columns: 14
#> $ adult <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ backdrop_path <chr> "/rthMuZfFv4fqEU4JVbgSW9wQ8rs.jpg", "/mDfJG3LC3Dqb67…
#> $ genre_ids <list> [28, 878, 12], [12, 28, 878], [878, 28, 12], [12, 8…
#> $ id <int> 986056, 299536, 24428, 299534, 99861, 1003596, 10035…
#> $ original_language <chr> "en", "en", "en", "en", "en", "en", "en", "en", "en"…
#> $ original_title <chr> "Thunderbolts*", "Avengers: Infinity War", "The Aven…
#> $ overview <chr> "After finding themselves ensnared in a death trap, …
#> $ popularity <dbl> 239.5162, 33.0676, 30.4445, 21.9561, 15.8466, 15.175…
#> $ poster_path <chr> "/m9EtP1Yrzv6v7dMaC9mRaGhd1um.jpg", "/7WsyChQLEftFiD…
#> $ release_date <chr> "2025-04-30", "2018-04-25", "2012-04-25", "2019-04-2…
#> $ title <chr> "Thunderbolts*", "Avengers: Infinity War", "The Aven…
#> $ video <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ vote_average <dbl> 7.500, 8.236, 7.747, 8.238, 7.271, 0.000, 0.000, 6.8…
#> $ vote_count <int> 874, 30518, 31785, 26330, 23421, 0, 0, 96, 126, 724,…