tmdbdata

library(tmdbdata)
library(dplyr)
library(tidyr)
library(httr2)

TMDB data functions with `httr2::functions()`

This Package uses httr2 package functions to generate and perform requests then extract results from the response body.

Create Basic Request

A base_tmdb_request is constructed by looking at settings found in the API documentation.

For example:

httr2::req_headers() info in the Response Format and Authentication Documentation.
httr2::req_throttle() info in the Rate Limits Documentation

base_tmdb_request
#> function (auth_token = get_auth_token()) 
#> {
#>     httr2::request("https://api.themoviedb.org/3/") %>% httr2::req_headers(accept = "application/json", 
#>         Authorization = paste("Bearer", auth_token), .redact = "Authorization") %>% 
#>         httr2::req_error(body = tmdb_error_body) %>% httr2::req_user_agent("tmdbdata (https://github.com/novakowd/tmdbdata)") %>% 
#>         httr2::req_throttle(rate = 40, realm = "https://api.themoviedb.org/3/")
#> }
#> <bytecode: 0x55b7d9ed8c48>
#> <environment: namespace:tmdbdata>

📝 default of auth_token = get_auth_token(), will only work after the environment variable has been setup as mentioned in Authentication

The Authorization header info is also redacted in the following objects.

Running the above base_tmdb_request() function generates a basic <httr2_request> object:

Normally auth_token should not need to be specified if the Authentication steps were followed

base_request <- base_tmdb_request(auth_token = decrypt_auth_token())
base_request
#> <httr2_request>
#> GET https://api.themoviedb.org/3/
#> Headers:
#> • accept       : "application/json"
#> • Authorization: <REDACTED>
#> Body: empty
#> Options:
#> • useragent: "tmdbdata (https://github.com/novakowd/tmdbdata)"
#> Policies:
#> • error_body    : <function>
#> • throttle_realm: "https://api.themoviedb.org/3/"

The functions req_headers(), req_error(), req_user_agent(), and req_throttle() do NOT change the URL https://api.themoviedb.org/3/, but they DO add other elements to the request such as Headers, Options, and Policies.

Append Request Details

The next step is to append a path to the base URL and add relevant arguments to help with the request. Find appropriate arguments by consulting the API Documentation.

The below example looks at the Search>Movie Documentation which states:

there is a required parameter called query and,
some other optional parameters, some of which have default values.

Let’s try putting together a request to search the movies for Avengers:

base_request %>% 
  req_url_path_append("search","movie") %>% 
  req_url_query(query = "Avengers")
#> <httr2_request>
#> GET https://api.themoviedb.org/3/search/movie?query=Avengers
#> Headers:
#> • accept       : "application/json"
#> • Authorization: <REDACTED>
#> Body: empty
#> Options:
#> • useragent: "tmdbdata (https://github.com/novakowd/tmdbdata)"
#> Policies:
#> • error_body    : <function>
#> • throttle_realm: "https://api.themoviedb.org/3/"

The functions req_url_path_append() and req_url_query() modified the url to https://api.themoviedb.org/3/search/movie?query=Avengers

Perform Request

Now that we have built a request, httr2::req_perform() allows us to get a response.

httr2::req_perform() submits the request to the server

assigning to a `variable` stores the response in-memory, as to not exceed the API's [Rate Limits](https://developer.themoviedb.org/docs/rate-limiting)

response <- base_request %>% 
  req_url_path_append("search", "movie") %>% 
  req_url_query(query = "Avengers") %>% 
  req_perform()

response
#> <httr2_response>
#> GET https://api.themoviedb.org/3/search/movie?query=Avengers
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (11773 bytes)

Response Structure

Printing the <httr2_response> object above does not show much information, though more information is available when inspecting closer:

class(response)
#> [1] "httr2_response"

data.frame(class = unlist(lapply(response,class)),
           length = unlist(lapply(response,length)))
#>                     class length
#> method          character      1
#> url             character      1
#> status_code       integer      1
#> headers     httr2_headers     17
#> body                  raw  11773
#> request     httr2_request      8
#> cache         environment      0

names(response$headers)
#>  [1] "content-type"     "date"             "etag"             "content-encoding"
#>  [5] "server"           "cache-control"    "x-memc"           "x-memc-key"      
#>  [9] "x-memc-age"       "x-memc-expires"   "vary"             "x-cache"         
#> [13] "via"              "x-amz-cf-pop"     "alt-svc"          "x-amz-cf-id"     
#> [17] "vary"

The actual ‘Data’ that we’re looking for is in the $body element, though it is not in a usable format yet:

response$body %>% glimpse()
#>  raw [1:11773] 7b 22 70 61 ...

Response Body

httr2 provides many resp_body_*() functions to extract the body data, depending on the API response format(s).
The API Documentation states the only supported response format is JSON, so we use the resp_body_json() function and use the simplifyVector = T to make the resulting lists easier to work with.

body <- response %>% 
  resp_body_json()

Inspecting the body shows it is a list with 4 elements.

lapply(body, class)
#> $page
#> [1] "integer"
#> 
#> $results
#> [1] "list"
#> 
#> $total_pages
#> [1] "integer"
#> 
#> $total_results
#> [1] "integer"

The $page, $total_pages, and $total_results elements are all integer values:

body[c("page", "total_pages", "total_results")]
#> $page
#> [1] 1
#> 
#> $total_pages
#> [1] 7
#> 
#> $total_results
#> [1] 128

The $results element is a nested list from the json response which then uses tidyr::unnest_wider() to clean the data and get a result in a data frame (tibble::tibble())

tibble::tibble(results = body$results) %>% 
  tidyr::unnest_wider(results) %>% 
  dplyr::glimpse()
#> Rows: 20
#> Columns: 14
#> $ adult             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ backdrop_path     <chr> "/Al127H6f1RXpESdg0MGNL2g8mzO.jpg", "/mDfJG3LC3Dqb67…
#> $ genre_ids         <list> [16, 35, 878], [12, 28, 878], [878, 28, 12], [28, 1…
#> $ id                <int> 1359227, 299536, 24428, 40081, 257346, 1154598, 2995…
#> $ original_language <chr> "en", "en", "en", "zh", "ja", "en", "en", "en", "en"…
#> $ original_title    <chr> "LEGO Marvel Avengers: Mission Demolition", "Avenger…
#> $ overview          <chr> "A young, aspiring hero and superhero fan inadverten…
#> $ popularity        <dbl> 4.2844, 33.0676, 30.4445, 1.1969, 2.0574, 2.5716, 21…
#> $ poster_path       <chr> "/4KfgyzCgJeG0XJDbqNztdP730Pv.jpg", "/7WsyChQLEftFiD…
#> $ release_date      <chr> "2024-10-17", "2018-04-25", "2012-04-25", "1978-12-2…
#> $ title             <chr> "LEGO Marvel Avengers: Mission Demolition", "Avenger…
#> $ video             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ vote_average      <dbl> 6.807, 8.236, 7.747, 6.800, 6.365, 6.552, 8.238, 7.2…
#> $ vote_count        <int> 96, 30518, 31785, 98, 275, 126, 26330, 23421, 724, 0…

Missing Data

The body$total_results is 128, yet the body$results only contain 20 rows.
This is because body$total_pages = 7 but each response only returns one page.

To get the other pages, we need to specify an additional argument page = n to the response, like so:

page2_response <- base_request %>% 
  req_url_path_append("search", "movie") %>% 
  req_url_query(query = "Avengers",
                page = 2) %>%                ### NEW ARGUMENT
  req_perform() %>% 
  resp_body_json(simplifyVector = T) 

tibble::tibble(results = page2_response$results ) %>% 
  tidyr::unnest_wider(results) %>% 
  dplyr::glimpse()
#> Rows: 20
#> Columns: 14
#> $ adult             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ backdrop_path     <chr> "/r5uAQQahZzcTYyPdlomDggyxHkV.jpg", "/d1bKi2TRB8iFEz…
#> $ genre_ids         <list<list>> [<35, 14, 28>], [<35, 27>], [<10751, 16, 28, …
#> $ id                <int> 538153, 1353766, 940543, 385411, 48230, 109088, 5217…
#> $ original_language <chr> "en", "xx", "en", "el", "ja", "zh", "en", "fr", "it"…
#> $ original_title    <chr> "Avengers of Justice: Farce Wars", "風流少女殺人事件", "LEGO…
#> $ overview          <chr> "While trying to remain a good husband and father, S…
#> $ popularity        <dbl> 1.3169, 0.2304, 0.3979, 0.0409, 0.3615, 0.3538, 0.60…
#> $ poster_path       <chr> "/yymsCwKPbJIF1xcl2ih8fl7OxAa.jpg", "/alcvekyWjaRdCZ…
#> $ release_date      <chr> "2018-07-20", "2024-09-22", "2022-01-17", "1975-01-0…
#> $ title             <chr> "Avengers of Justice: Farce Wars", "A Brighter Summe…
#> $ video             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ vote_average      <dbl> 5.300, 7.800, 7.031, 8.400, 7.100, 6.800, 5.100, 7.4…
#> $ vote_count        <int> 28, 2, 16, 9, 14, 23, 64, 25, 33, 24, 6, 149, 4, 17,…

This ‘second page’ table contains the next 20 rows.
To get all rows we need to repeat this until page = 7

To combine all of the pages from the search we can simply call the wrapper function:

search_movies(query = "Avengers",
              auth_token = decrypt_auth_token()) %>% 
  dplyr::glimpse()
#> Rows: 128
#> Columns: 14
#> $ adult             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ backdrop_path     <chr> "/rthMuZfFv4fqEU4JVbgSW9wQ8rs.jpg", "/mDfJG3LC3Dqb67…
#> $ genre_ids         <list> [28, 878, 12], [12, 28, 878], [878, 28, 12], [12, 8…
#> $ id                <int> 986056, 299536, 24428, 299534, 99861, 1003596, 10035…
#> $ original_language <chr> "en", "en", "en", "en", "en", "en", "en", "en", "en"…
#> $ original_title    <chr> "Thunderbolts*", "Avengers: Infinity War", "The Aven…
#> $ overview          <chr> "After finding themselves ensnared in a death trap, …
#> $ popularity        <dbl> 239.5162, 33.0676, 30.4445, 21.9561, 15.8466, 15.175…
#> $ poster_path       <chr> "/m9EtP1Yrzv6v7dMaC9mRaGhd1um.jpg", "/7WsyChQLEftFiD…
#> $ release_date      <chr> "2025-04-30", "2018-04-25", "2012-04-25", "2019-04-2…
#> $ title             <chr> "Thunderbolts*", "Avengers: Infinity War", "The Aven…
#> $ video             <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ vote_average      <dbl> 7.500, 8.236, 7.747, 8.238, 7.271, 0.000, 0.000, 6.8…
#> $ vote_count        <int> 874, 30518, 31785, 26330, 23421, 0, 0, 96, 126, 724,…