Get your personal Signal Ocean API subscription key (acquired here) and replace it below:
signal_ocean_api_key = '' # Replace with your subscription key
Scraped Cargoes API¶
The goal of Scraped Cargoes API is to collect and return scraped cargoes by the given filters or cargo IDs. This can be done by using the ScrapedCargoesAPI
class and calling appropriate methods
1. Request by filters¶
Cargoes can be retrieved for specific filters, by calling the get_cargoes
method with the following arguments:
Required¶
vessel_type
The vessel type
Additionally, at least one of the following is required
message_ids
List of MessageIDs
external_message_ids
List of ExternalMessageIDs
received_date_from
Earliest date the cargo received
received_date_to
Latest date the cargo received
updated_date_from
Earliest date the cargo updated
updated_date_to
Latest date the cargo updated
Mixing received and updated dates is not allowed
It's highly recommended to use UTC dates, since this is the internally used format
2. Request by cargo IDs¶
Cargoes can be retrieved for specific cargo IDs, by calling the get_cargoes_by_cargo_ids
method with the following argument:
Required¶
cargo_ids
A list of cargo ids to retrieve
Additional optional arguments¶
Both methods, also accept the following optional arguments:
include_details
If this field is True
the following columns will be included in the response (otherwise they will be None
):
parsed_part_id, line_from, line_to, in_line_order, source
include_scraped_fields
If this field is True
the following columns will be included in the response (otherwise they will be None
):
scraped_laycan, scraped_load, scraped_load2, scraped_discharge, scraped_discharge_options, scraped_discharge2, scraped_charterer, scraped_cargo_type, scraped_quantity, scraped_delivery_date, scraped_delivery_from, scraped_delivery_to,
scraped_redelivery_from, scraped_redelivery_to
include_labels
If this field is True
the following columns will be included in the response (otherwise they will be None
):
load_name, load_taxonomy, load_name2, load_taxonomy2, discharge_name, discharge_taxonomy, discharge_name2, discharge_taxonomy2, charterer, cargo_type, cargo_type_group, delivery_from_name, delivery_from_taxonomy, delivery_to_name, delivery_to_taxonomy, redelivery_from_name, redelivery_from_taxonomy, redelivery_to_name, redelivery_to_taxonomy, charter_type, cargo_status
include_content
If this field is True
the following columns will be included in the response (otherwise they will be None
):
content
include_sender
If this field is True
the following columns will be included in the response (otherwise they will be None
):
sender
include_debug_info
If this field is True
the following columns will be included in the response (otherwise they will be None
):
is_private
Default value is
True
for the arguments described above
Installation¶
To install Signal Ocean SDK, simply run the following command
%%capture
%pip install signal-ocean
Quickstart¶
Import signal-ocean
and other modules required for this demo
from signal_ocean import Connection
from signal_ocean.scraped_cargoes import ScrapedCargoesAPI, ScrapedCargo
from datetime import datetime, timedelta
import pandas as pd
import plotly.graph_objects as go
Create a new instance of the ScrapedCargoesAPI
class
connection = Connection(signal_ocean_api_key)
api = ScrapedCargoesAPI(connection)
Now you are ready to retrieve your data
Request by date¶
To get all tanker cargoes received the last 4 days, you must declare appropriate vessel_type
and received_date_from
variables
vessel_type = 1 # Tanker
received_date_from = datetime.utcnow() - timedelta(days=4)
And then call get_cargoes
method, as below
scraped_cargoes = api.get_cargoes(
vessel_type=vessel_type,
received_date_from=received_date_from,
)
next(iter(scraped_cargoes), None)
ScrapedCargo(cargo_id=33891609, message_id=47953999, external_message_id=None, parsed_part_id=58810511, line_from=14, line_to=14, in_line_order=1, source='Email', updated_date=datetime.datetime(2023, 9, 22, 12, 25, 49, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2023, 9, 22, 12, 23, 30, tzinfo=datetime.timezone.utc), is_deleted=False, scraped_laycan='29-30', laycan_from=datetime.datetime(2023, 9, 29, 0, 0, tzinfo=datetime.timezone.utc), laycan_to=datetime.datetime(2023, 9, 30, 0, 0, tzinfo=datetime.timezone.utc), scraped_load='pembroke', load_geo_id=3433, load_name='Pembroke Dock', load_taxonomy_id=2, load_taxonomy='Port', scraped_load2=None, load_geo_id2=None, load_name2=None, load_taxonomy_id2=None, load_taxonomy2=None, scraped_discharge='ecc', scraped_discharge_options=None, discharge_geo_id=24740, discharge_name='Canada Atlantic Coast', discharge_taxonomy_id=4, discharge_taxonomy='Level0', scraped_discharge2=None, discharge_geo_id2=None, discharge_name2=None, discharge_taxonomy_id2=None, discharge_taxonomy2=None, scraped_charterer='valero', charterer_id=1796, charterer='Valero', scraped_cargo_type='ums', cargo_type_id=135, cargo_type='Unleaded Motor Spirit', cargo_type_group_id=120000, cargo_type_group='Clean', scraped_quantity='37kt', quantity=37000.0, quantity_buffer=0.0, quantity_from=37000.0, quantity_to=37000.0, size_from=None, size_to=None, scraped_delivery_date=None, delivery_date_from=None, delivery_date_to=None, scraped_delivery_from=None, delivery_from_geo_id=None, delivery_from_name=None, delivery_from_taxonomy_id=None, delivery_from_taxonomy=None, scraped_delivery_to=None, delivery_to_geo_id=None, delivery_to_name=None, delivery_to_taxonomy_id=None, delivery_to_taxonomy=None, scraped_redelivery_from=None, redelivery_from_geo_id=None, redelivery_from_name=None, redelivery_from_taxonomy_id=None, redelivery_from_taxonomy=None, scraped_redelivery_to=None, redelivery_to_geo_id=None, redelivery_to_name=None, redelivery_to_taxonomy_id=None, redelivery_to_taxonomy=None, charter_type_id=0, charter_type='Voyage', cargo_status_id=None, cargo_status=None, content='valero 37kt ums pembroke ta-ecc-uswc off 29-30', subject='SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER', sender='SSY', is_private=False)
For better visualization, it's convenient to insert data into a DataFrame
df = pd.DataFrame(scraped_cargoes)
df.head()
cargo_id | message_id | external_message_id | parsed_part_id | line_from | line_to | in_line_order | source | updated_date | received_date | ... | redelivery_to_taxonomy_id | redelivery_to_taxonomy | charter_type_id | charter_type | cargo_status_id | cargo_status | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 33891609 | 47953999 | None | 58810511 | 14 | 14 | 1.0 | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | ... | None | None | 0 | Voyage | NaN | None | valero 37kt ums pembroke ta-ecc-uswc off 29-30 | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False | |
1 | 33891610 | 47953999 | None | 58810511 | 15 | 15 | 1.0 | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | ... | None | None | 0 | Voyage | NaN | None | shell 37kt ums brofjorden ukc-ta off 28-30 - b... | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False | |
2 | 33891611 | 47953999 | None | 58810511 | 14 | 14 | 2.0 | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | ... | None | None | 0 | Voyage | NaN | None | valero 37kt ums pembroke ta-ecc-uswc off 29-30 | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False | |
3 | 33891612 | 47953999 | None | 58810511 | 15 | 15 | NaN | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | ... | None | None | 0 | Voyage | NaN | None | shell 37kt ums brofjorden ukc-ta off 28-30 - b... | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False | |
4 | 33891613 | 47953999 | None | 58810511 | 14 | 14 | NaN | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | ... | None | None | 0 | Voyage | NaN | None | valero 37kt ums pembroke ta-ecc-uswc off 29-30 | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False |
5 rows × 81 columns
Request by Message or ExternalMessage IDs¶
To retrieve cargoes for particular message ID(s), you should include an extra parameter called message_ids
when using the get_cargoes
method. This parameter should contain a list of message IDs. For instance,
message_ids = [47502652, 47503150, 47528120]
scraped_cargoes_by_message_ids = api.get_cargoes(
vessel_type=vessel_type,
message_ids=message_ids,
)
next(iter(scraped_cargoes_by_message_ids), None)
ScrapedCargo(cargo_id=33640251, message_id=47502652, external_message_id=None, parsed_part_id=58483539, line_from=35, line_to=35, in_line_order=None, source='Email', updated_date=datetime.datetime(2023, 9, 15, 3, 10, 17, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2023, 9, 15, 3, 7, 42, tzinfo=datetime.timezone.utc), is_deleted=False, scraped_laycan='27-sep', laycan_from=datetime.datetime(2023, 9, 27, 0, 0, tzinfo=datetime.timezone.utc), laycan_to=datetime.datetime(2023, 9, 27, 0, 0, tzinfo=datetime.timezone.utc), scraped_load='nigeria', load_geo_id=171, load_name='Nigeria', load_taxonomy_id=3, load_taxonomy='Country', scraped_load2=None, load_geo_id2=None, load_name2=None, load_taxonomy_id2=None, load_taxonomy2=None, scraped_discharge='ukcm', scraped_discharge_options=None, discharge_geo_id=25025, discharge_name='Mediterranean / UK Continent', discharge_taxonomy_id=6, discharge_taxonomy='Level2', scraped_discharge2=None, discharge_geo_id2=None, discharge_name2=None, discharge_taxonomy_id2=None, discharge_taxonomy2=None, scraped_charterer='exxon', charterer_id=529, charterer='ExxonMobil', scraped_cargo_type=None, cargo_type_id=None, cargo_type=None, cargo_type_group_id=None, cargo_type_group=None, scraped_quantity='130', quantity=130000.0, quantity_buffer=0.0, quantity_from=130000.0, quantity_to=130000.0, size_from=None, size_to=None, scraped_delivery_date=None, delivery_date_from=None, delivery_date_to=None, scraped_delivery_from=None, delivery_from_geo_id=None, delivery_from_name=None, delivery_from_taxonomy_id=None, delivery_from_taxonomy=None, scraped_delivery_to=None, delivery_to_geo_id=None, delivery_to_name=None, delivery_to_taxonomy_id=None, delivery_to_taxonomy=None, scraped_redelivery_from=None, redelivery_from_geo_id=None, redelivery_from_name=None, redelivery_from_taxonomy_id=None, redelivery_from_taxonomy=None, scraped_redelivery_to=None, redelivery_to_geo_id=None, redelivery_to_name=None, redelivery_to_taxonomy_id=None, redelivery_to_taxonomy=None, charter_type_id=0, charter_type='Voyage', cargo_status_id=None, cargo_status=None, content='exxon 130 nigeria / ukcm 27-sep', subject='ALLIANCE SUEZMAX MARKET REPORT 15 SEP 2023', sender='Alliance Tanker', is_private=False)
You can achieve a similar result for external message IDs by providing an argument called external_message_ids
.
Request by Cargo IDs¶
To get data for specific cargo ID(s), you must call the get_cargoes_by_cargo_ids
method for a list of desired cargo ID(s)
Date arguments are not available in this method
cargo_ids = [23780101, 23799896, 23799890, 23799892, 23790303] # Or add a list of your desired cargo IDs
scraped_cargoes_by_ids = api.get_cargoes_by_cargo_ids(
cargo_ids=cargo_ids,
)
df_by_ids = pd.DataFrame(scraped_cargoes_by_ids)
df_by_ids.head()
cargo_id | message_id | external_message_id | parsed_part_id | line_from | line_to | in_line_order | source | updated_date | received_date | ... | redelivery_to_taxonomy_id | redelivery_to_taxonomy | charter_type_id | charter_type | cargo_status_id | cargo_status | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 23790303 | 30829741 | None | 45820017 | 494 | 494 | None | 2022-11-18 08:39:59+00:00 | 2022-11-18 00:00:00+00:00 | ... | None | None | 0 | Voyage | None | None | exxon 145 26-27 nov usg/ukcm - firm 2nd cargo ... | SUEZMAX MORNING UPDATE FROM SIMPSON SPENCE YOUNG | SSY | False | |
1 | 23799890 | 30842695 | None | 45831137 | 110 | 110 | None | 2022-11-18 12:31:38+00:00 | 2022-11-18 00:00:00+00:00 | ... | None | None | 0 | Voyage | None | None | ioc 130 22-23 dec greater plutonio/paradip - q... | AFTERNOON SUEZMAX FIXTURE REPORT FROM SIMPSON ... | SSY | False | |
2 | 23799892 | 30842695 | None | 45831137 | 100 | 100 | None | 2022-11-18 12:31:38+00:00 | 2022-11-18 00:00:00+00:00 | ... | None | None | 0 | Voyage | None | None | cnr 140 ely dec basrah/west - rumoured | AFTERNOON SUEZMAX FIXTURE REPORT FROM SIMPSON ... | SSY | False | |
3 | 23799896 | 30842695 | None | 45831137 | 108 | 108 | None | 2022-11-18 12:31:38+00:00 | 2022-11-18 00:00:00+00:00 | ... | None | None | 0 | Voyage | None | None | repsol 130 11-12 dec wafr/ukcm - firm | AFTERNOON SUEZMAX FIXTURE REPORT FROM SIMPSON ... | SSY | False | |
4 | 23780101 | 30814158 | None | 45808575 | 69 | 69 | None | 2022-11-18 03:48:04+00:00 | 2022-11-18 03:44:48+00:00 | ... | None | None | 0 | Voyage | None | None | houston ref 70-145 ecmex/usg 27-29/11 | SIMPSON SPENCE YOUNG SINGAPORE SUEZMAX REPORT ... | SSY | False |
5 rows × 81 columns
Usage of optional arguments¶
By default, all fields are returned. In many cases, it is convenient to select specific columns. For example, if we want to compare scraped and mapped fields
scraped_mapped_columns = [
'scraped_charterer',
'charterer',
'scraped_quantity',
'quantity',
'scraped_load',
'load_name',
]
scraped_mapped_df = pd.DataFrame(scraped_cargoes, columns=scraped_mapped_columns)
scraped_mapped_df.head()
scraped_charterer | charterer | scraped_quantity | quantity | scraped_load | load_name | |
---|---|---|---|---|---|---|
0 | valero | Valero | 37kt | 37000.0 | pembroke | Pembroke Dock |
1 | shell | Shell | 37kt | 37000.0 | brofjorden | Brofjorden |
2 | valero | Valero | 37kt | 37000.0 | pembroke | Pembroke Dock |
3 | shell | Shell | 37kt | 37000.0 | brofjorden | Brofjorden |
4 | valero | Valero | 37kt | 37000.0 | pembroke | Pembroke Dock |
Examples¶
Let's start by fetching all tanker cargoes received the last 2 weeks
example_vessel_type = 1 # Tanker
example_date_from = datetime.utcnow() - timedelta(days=14)
example_scraped_cargoes = api.get_cargoes(
vessel_type=example_vessel_type,
received_date_from=example_date_from,
)
Exclude deleted scraped cargoes¶
The is_deleted
property of a scraped cargo indicates whether it is valid or not. If it is set to True
, the corresponding cargo_id
has been replaced by a new one.
For the sake of completeness, we will exclude deleted scraped cargoes in the following examples
example_scraped_cargoes = [cargo for cargo in example_scraped_cargoes if not cargo.is_deleted]
next(iter(example_scraped_cargoes), None)
ScrapedCargo(cargo_id=33530079, message_id=47319156, external_message_id=None, parsed_part_id=58350583, line_from=16, line_to=16, in_line_order=None, source='Email', updated_date=datetime.datetime(2023, 9, 12, 11, 55, 35, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2023, 9, 12, 11, 52, 28, tzinfo=datetime.timezone.utc), is_deleted=False, scraped_laycan='21-23', laycan_from=datetime.datetime(2023, 9, 21, 0, 0, tzinfo=datetime.timezone.utc), laycan_to=datetime.datetime(2023, 9, 23, 0, 0, tzinfo=datetime.timezone.utc), scraped_load='nspain', load_geo_id=75, load_name='Spain', load_taxonomy_id=3, load_taxonomy='Country', scraped_load2=None, load_geo_id2=None, load_name2=None, load_taxonomy_id2=None, load_taxonomy2=None, scraped_discharge='ta', scraped_discharge_options='wccam-ukc-med', discharge_geo_id=25019, discharge_name='Atlantic America', discharge_taxonomy_id=6, discharge_taxonomy='Level2', scraped_discharge2=None, discharge_geo_id2=None, discharge_name2=None, discharge_taxonomy_id2=None, discharge_taxonomy2=None, scraped_charterer='repsol', charterer_id=1380, charterer='Repsol', scraped_cargo_type='ums', cargo_type_id=135, cargo_type='Unleaded Motor Spirit', cargo_type_group_id=120000, cargo_type_group='Clean', scraped_quantity='37kt', quantity=37000.0, quantity_buffer=0.0, quantity_from=37000.0, quantity_to=37000.0, size_from=None, size_to=None, scraped_delivery_date=None, delivery_date_from=None, delivery_date_to=None, scraped_delivery_from=None, delivery_from_geo_id=None, delivery_from_name=None, delivery_from_taxonomy_id=None, delivery_from_taxonomy=None, scraped_delivery_to=None, delivery_to_geo_id=None, delivery_to_name=None, delivery_to_taxonomy_id=None, delivery_to_taxonomy=None, scraped_redelivery_from=None, redelivery_from_geo_id=None, redelivery_from_name=None, redelivery_from_taxonomy_id=None, redelivery_from_taxonomy=None, scraped_redelivery_to=None, redelivery_to_geo_id=None, redelivery_to_name=None, redelivery_to_taxonomy_id=None, redelivery_to_taxonomy=None, charter_type_id=0, charter_type='Voyage', cargo_status_id=None, cargo_status=None, content='repsol 37kt ums nspain ta-wccam-ukc-med off 21-23', subject='SSY CPP MR LIST + UPDATE - TUESDAY 12TH SEPTEMBER', sender='SSY', is_private=False)
Now, we are ready to insert our data into a dataframe and keep only specific fields
example_columns = [
'charterer',
'laycan_from',
'load_name',
'quantity',
'is_deleted',
]
data = pd.DataFrame(example_scraped_cargoes, columns=example_columns)
data.head()
charterer | laycan_from | load_name | quantity | is_deleted | |
---|---|---|---|---|---|
0 | Repsol | 2023-09-21 00:00:00+00:00 | Spain | 37000.0 | False |
1 | Irving | 2023-09-23 00:00:00+00:00 | Continent | 37000.0 | False |
2 | Repsol | 2023-09-21 00:00:00+00:00 | Spain | 37000.0 | False |
3 | Irving | 2023-09-23 00:00:00+00:00 | Continent | 37000.0 | False |
4 | Repsol | 2023-09-21 00:00:00+00:00 | Spain | 37000.0 | False |
Top 10 Charterers¶
In this example, we will find the top 10 Charterers, based on the number of distinct available cargoes
top_chrtr_ser = data[['charterer', 'laycan_from']].drop_duplicates().charterer.value_counts().head(10)
top_chrtr_df = top_chrtr_ser.to_frame(name='CargoCount').reset_index().rename(columns={'index': 'Charterer'})
top_chrtr_df
charterer | CargoCount | |
---|---|---|
0 | GCC BUNKERS | 10 |
1 | BP | 9 |
2 | Trafigura | 8 |
3 | ENI | 8 |
4 | Petrobras | 8 |
5 | Shell | 8 |
6 | Vitol | 8 |
7 | Unipec | 6 |
8 | Bharat Petroleum | 6 |
9 | Repsol | 6 |
And display results in a bar plot
top_chrtr_fig = go.Figure()
bar = go.Bar(
x=top_chrtr_df.charterer.tolist(),
y=top_chrtr_df.CargoCount.tolist(),
)
top_chrtr_fig.add_trace(bar)
top_chrtr_fig.update_xaxes(title_text="Charterer")
top_chrtr_fig.update_yaxes(title_text="Number of available Cargoes")
top_chrtr_fig.show()
Total quantity to load in specific areas per day the next week¶
this_week_days = pd.date_range(start=datetime.utcnow().date(), freq='D', periods=7, tz='UTC')
areas = data[data.load_name.notna()].load_name.value_counts().head().index.tolist()
areas
['Spain', 'Arabian Gulf', 'Continent', 'US Gulf', 'Ras Tanura']
Create the pivot table
areas_mask = data.load_name.isin(areas) & data.laycan_from.isin(this_week_days)
df_areas = data[areas_mask]
df_pivot = pd.pivot_table(
df_areas,
columns='load_name',
index='laycan_from',
values='quantity',
aggfunc=pd.Series.sum,
fill_value=0,
).reindex(index=this_week_days, fill_value=0).reset_index().rename(columns={'index': 'laycan_from'})
df_pivot
load_name | laycan_from | Arabian Gulf | Continent | Ras Tanura | Spain | US Gulf |
---|---|---|---|---|---|---|
0 | 2023-09-26 00:00:00+00:00 | 0 | 0 | 0 | 120000 | 0 |
1 | 2023-09-27 00:00:00+00:00 | 0 | 0 | 0 | 90000 | 0 |
2 | 2023-09-28 00:00:00+00:00 | 0 | 0 | 0 | 180000 | 0 |
3 | 2023-09-29 00:00:00+00:00 | 0 | 0 | 0 | 0 | 0 |
4 | 2023-09-30 00:00:00+00:00 | 75000 | 30000 | 0 | 60000 | 0 |
5 | 2023-10-01 00:00:00+00:00 | 1070000 | 240000 | 260000 | 0 | 145000 |
6 | 2023-10-02 00:00:00+00:00 | 790000 | 37000 | 260000 | 0 | 0 |
And display the results as timeseries
def area_button(area):
args = [
{'visible': [i == areas.index(area) for i in range(len(areas))]},
{
'title': f'Total Quantity to load in {area} per day',
'showlegend': True
},
]
return dict(
label=area,
method='update',
args=args,
)
title = 'Total Quantity to load per day'
today = datetime.combine(datetime.utcnow().date(), datetime.min.time())
areas_fig = go.Figure()
area_buttons = []
for area in areas:
if area not in df_pivot.columns:
continue
area_scatter_plot = go.Scatter(
x=df_pivot.laycan_from,
y=df_pivot[area],
name=area,
mode='lines',
)
areas_fig.add_trace(area_scatter_plot)
area_buttons.append(area_button(area))
buttons = list([
dict(
label='All',
method='update',
args=[
{'visible': [True for _ in range(len(areas))]},
{
'title': title,
'showlegend': True
}
],
),
*area_buttons,
])
areas_fig.update_layout(
title=title,
updatemenus=[go.layout.Updatemenu(
active=0,
buttons=buttons,
)],
xaxis_range=[today - timedelta(hours=4), today + timedelta(hours=24*6 + 4)],
)
areas_fig.show()
Export data to csv¶
output_path = '' # Change output_path with your path
filename = 'last_two_weeks_cargoes.csv'
if not data.empty:
data.to_csv(output_path+filename, index=False)