본문 바로가기

English/Some Dev Stuff

[Adobe DSP] Automatically send Adobe TubeMogul (DSP) data to mail using API

I have automated sending data reports every Monday using Adobe DSP (TubeMogul) data.

 

While Adobe DSP has a built-in report email feature, I received specific requirements for extracting and formatting data that the default reports couldn't meet. As a result, I embarked on a personal project to fulfill the requests from the Campaign team.

 

Luckily, I obtained API information that wasn't readily available through Google searches. Adobe(Thanks, P) provided me with the details, but unfortunately, I couldn't find an API related to user authentication, so I had to scrape user login information to gain access.

 

Although I majored in computer science, I'm not a professional developer and only indulge in coding as a hobby when I'm curious about something. Due to my concern that sharing the code might reveal my skill level, I hesitated to make it public.

 

Nevertheless, I've decided to document the progress of this project for personal records.

 

import requests
from bs4 import BeautifulSoup as bs
import re
import pandas as pd
import datetime
import time
from pandas import json_normalize
import json
import numpy as np
from pathlib import Path
import smtplib
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.base import MIMEBase
from email import encoders
import os
from email.mime.application import MIMEApplication

def login():
    login_data = {
        'user': *Adobe DSP ID*,
        'password': *Adobe DSP PW*
    }

    with requests.Session() as session:
        request = session.post('https://advertising.adobe.com/auth/login', data=login_data)
        return session

def extract_token(session):
    url = session.post('https://advertising.adobe.com/playtime/brand_reports')
    soup = bs(url.text, 'html.parser')
    script = soup.find('script', text=re.compile('jsSharedGlobalVariables'))
    token = re.findall('[A-Za-z0-9]{32}', str(script))[0]
    return token

def fetch_data(session, token, start_day, end_day):
    bearer_token = 'Bearer ' + token
    headers = {
        'Authorization': bearer_token,
        'Content-Type': 'application/json',
        'X-Requested-With': 'XMLHttpRequest'
    }
    params = {
        'start_day': start_day,
        'end_day': end_day,
        'start_hour': '00',
        'end_hour': '00',
        'timezone': 'Asia/Seoul'
    }

    params_b = {
        'start_day': start_day,
        'end_day': start_day,
        'timezone': 'Asia/Seoul'
    }

    campaigns_request = session.get('https://api.tubemogul.com/v2/reporting/campaigns', headers=headers, params=params)
    campaigns_json = campaigns_request.json()
    campaigns_df = pd.DataFrame(campaigns_json['items'])

    campaign_ids = list(np.array(campaigns_df['campaign_id'].tolist()))

    placements_data = pd.DataFrame([])
    placements_stats = pd.DataFrame([])
    placements_url = 'https://api.tubemogul.com/v3/trafficking/campaigns/{}/placements'
    for campaign_id in campaign_ids:
        url = placements_url.format(str(campaign_id))
        placements_request = session.get(url, headers=headers)
        placements_json = placements_request.json()

        placements_data = placements_data.append(pd.DataFrame(placements_json.get('items')))

        placements_df = placements_data[['placement_id']]
        placement_ids = list(np.array(placements_df['placement_id'].tolist()))

    ads_data = pd.DataFrame([])
    ads_stats = pd.DataFrame([])
    ads_url = 'https://api.tubemogul.com/v2/reporting/placements/{}/ads?'
    for placement_id in placement_ids:
        url = ads_url.format(str(placement_id))
        ads_request = session.get(url, headers=headers, params=params_b)
        ads_json = ads_request.json()

        ads_data = ads_data.append(pd.DataFrame(ads_json['items']))
        ads_stats = ads_stats.append(json_normalize(data=ads_json['items'], record_path=['stats', 'buckets']))
        excel_data = pd.concat([ads_data, ads_stats], axis=1)

    excel_data['AD ID'] = excel_data['ad_id']
    excel_data['AD name'] = excel_data['ad_name']
    excel_data['Spent'] = excel_data['data.advertiser_costs'] / 1000000
    excel_data['Gross(17%)'] = excel_data['data.advertiser_costs'] * 1.38 / 1000000
    excel_data['Impressions'] = excel_data['data.impressions']
    excel_data['Views'] = excel_data['data.views']
    excel_data['25% Views'] = excel_data['data.completions25']
    excel_data['50% Views'] = excel_data['data.completions50']
    excel_data['75% Views'] = excel_data['data.completions75']
    excel_data['100% Views'] = excel_data['data.completions100']
    excel_data['Clicks'] = excel_data['data.total_click_throughs']

    ad_summary = excel_data[['AD ID', 'AD name', 'Spent', 'Gross(17%)', 'Impressions', 'Views', '25% Views', '50% Views',
                             '75% Views', '100% Views', 'Clicks']]
    ad_summary_grouped = ad_summary.groupby(['AD ID', 'AD name']).sum()
    return pd.DataFrame(ad_summary_grouped)

def save_csv(data, filename):
    data.to_csv(filename)

def send_email(from_address, to_addresses, subject, body, files):
    msg = MIMEMultipart()
    msg['From'] = from_address
    msg['To'] = ', '.join(to_addresses)
    msg['Subject'] = subject
    msg.attach(MIMEText(body, 'plain'))

    for file in files:
        with open(file, 'rb') as attachment:
            part = MIMEBase('application', 'octet-stream')
            part.set_payload(attachment.read())
            encoders.encode_base64(part)
            part.add_header('Content-Disposition', 'attachment', filename=os.path.basename(file))
            msg.attach(part)

    smtp_server = 'smtp.worksmobile.com' 
    smtp_port = 465
    smtp_user = from_address
    smtp_password = *Sender Mail Password*

    with smtplib.SMTP_SSL(smtp_server, smtp_port) as server:
        server.login(smtp_user, smtp_password)
        server.sendmail(from_address, to_addresses, msg.as_string())

def main():
    session = login()
    token = extract_token(session)

    num_days = 7
    today = datetime.date.today()
    for day in range(1, num_days + 1):
        date = today - datetime.timedelta(days=day)
        start_day = date.strftime('%Y-%m-%d')
        end_day = today.strftime('%Y-%m-%d')

        data = fetch_data(session, token, start_day, end_day)
        filename = f'{start_day}.csv'
        save_csv(data, filename)

    from_address = 'jihee.yoo@jellyfish.com'
    to_addresses = ['jihee.yoo@jellyfish.com']
    subject = f'{start_day} ~ {end_day} TubeMogul AD Report'
    body = f'{start_day} ~ {end_day} TubeMogul AD Report'
    files = [f'{start_day}.csv' for day in range(2, num_days + 1)]

    send_email(from_address, to_addresses, subject, body, files)

if __name__ == '__main__':
    main()

 

 

A brief explanation of the code:

  • Login to a Website: The login() function uses the requests library to log in to a website using a specific user account and password. It establishes a session and returns the session object.
  • Extract Token: The extract_token(session) function extracts a token from a webpage using BeautifulSoup to parse the HTML content. The token is used for authentication in later API requests.
  • Fetch Data: The fetch_data(session, token, start_day, end_day) function retrieves data from an API using the session and token. It collects data for a given range of days and processes the received JSON data into a pandas DataFrame.
    • The campaign API is used to fetch campaign data, which includes placement IDs. Then, the placement_ids variable stores the extracted placement IDs from the campaigns.
    • Next, a for loop is used to iterate over each placement ID in the placement_ids list. For each placement ID, the script makes an API call to the ads API to extract the ad data associated with that specific placement.
    • The ad data is then stored in the ads_data and ads_stats variables, and these two DataFrames are merged into excel_data.
    • In summary, the process involves extracting placement ID values using the campaign API and then using the placement API to fetch the ad values associated with each placement.
  • Save CSV: The save_csv(data, filename) function saves the DataFrame into a CSV file with a specified filename.
  • Send Email: The send_email(from_address, to_addresses, subject, body, files) function sends an email with attachments using the smtplib library. It requires the sender's email address, a list of recipient addresses, a subject, body text, and a list of files to attach.
  • Main Function: The main() function is the script's entry point. It logs in to the website, extracts the token, fetches data for the last 7 days, saves it to CSV files, and then sends an email with the CSV attachments to specified recipients.

Result:

The data in the screenshot is just a sample, representing the results for a single day. However, in reality, the data includes information for the entire past week. To make it convenient for the stakeholders, I compressed the data into a zip file and emailed it to them.