Get your personal Signal Ocean API subscription key (acquired here) and replace it below:
signal_ocean_api_key = '' # Replace with your subscription key
Scraped Positions API¶
The goal of Scraped Positions API is to collect and return scraped positions by the given filters or position IDs. This can be done by using the ScrapedPositionsAPI
class and calling appropriate methods
1. Request by filters¶
Positions can be retrieved for specific filters, by calling the get_positions
method with the following arguments:
Required¶
vessel_type
The vessel type
Additionally, at least one of the following is required
imos
List of IMOs
message_ids
List of MessageIDs
external_message_ids
List of ExternalMessageIDs
received_date_from
Earliest date the position received
received_date_to
Latest date the position received
updated_date_from
Earliest date the position updated
updated_date_to
Latest date the position updated
Mixing received and updated dates is not allowed
It's highly recommended to use UTC dates, since this is the internally used format
2. Request by position IDs¶
Positions can be retrieved for specific position IDs, by calling the get_positions_by_position_ids
method with the following argument:
Required¶
position_ids
A list of position ids to retrieve
Additional optional arguments¶
Both methods, also accept the following optional arguments:
include_details
If this field is True
the following columns will be included in the response (otherwise they will be None
):
parsed_part_id, line_from, line_to, source
include_scraped_fields
If this field is True
the following columns will be included in the response (otherwise they will be None
):
scraped_vessel_name, scraped_deadweight, scraped_year_built, scraped_open_date, scraped_open_port, scraped_commercial_operator, scraped_cargo_type, scraped_last_cargo_types
include_vessel_details
If this field is True
the following columns will be included in the response (otherwise they will be None
):
vessel_name, deadweight, year_built, liquid_capacity, vessel_type_id, vessel_type, vessel_class
include_labels
If this field is True
the following columns will be included in the response (otherwise they will be None
):
open_name, open_taxonomy, commercial_operator, cargo_type, cargo_type_group, last_cargo_types
include_content
If this field is True
the following columns will be included in the response (otherwise they will be None
):
content
include_sender
If this field is True
the following columns will be included in the response (otherwise they will be None
):
sender
include_debug_info
If this field is True
the following columns will be included in the response (otherwise they will be None
):
is_private
Default value is
True
for all the optional arguments described above
Installation¶
To install Signal Ocean SDK, simply run the following command
%%capture
%pip install signal-ocean
Quickstart¶
Import signal-ocean
and other modules required for this demo
from signal_ocean import Connection
from signal_ocean.scraped_positions import ScrapedPositionsAPI
from datetime import datetime, timedelta
import pandas as pd
import plotly.graph_objects as go
Create a new instance of the ScrapedPositionsAPI
class
connection = Connection(signal_ocean_api_key)
api = ScrapedPositionsAPI(connection)
Now you are ready to retrieve your data
Request by date¶
To get all tanker positions received the last 4 days, you must declare appropriate vessel_type
and received_date_from
variables
vessel_type = 1 # Tanker
received_date_from = datetime.utcnow() - timedelta(days=4)
And then call get_positions
method, as below
scraped_positions = api.get_positions(
vessel_type=vessel_type,
received_date_from=received_date_from,
)
next(iter(scraped_positions), None)
ScrapedPosition(position_id=233128785, message_id=47953999, external_message_id=None, parsed_part_id=58810511, line_from=95, line_to=95, source='Email', updated_date=datetime.datetime(2023, 9, 22, 12, 25, 49, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2023, 9, 22, 12, 23, 30, tzinfo=datetime.timezone.utc), is_deleted=False, scraped_vessel_name='eco yosemite park', scraped_deadweight='50', scraped_year_built=None, imo=9877573, vessel_name='Eco Yosemite Park', deadweight=49999, year_built=2020, liquid_capacity=51942, vessel_type_id=1, vessel_type='Tanker', vessel_class_id=88, vessel_class='MR2', scraped_open_date='05/10', open_date_from=datetime.datetime(2023, 10, 5, 0, 0, tzinfo=datetime.timezone.utc), open_date_to=datetime.datetime(2023, 10, 5, 0, 0, tzinfo=datetime.timezone.utc), scraped_open_port='aratu', open_geo_id=3228, open_name='Aratu', open_taxonomy_id=2, open_taxonomy='Port', scraped_commercial_operator='clearlake', commercial_operator_id=1901, commercial_operator='Clearlake Shipping', scraped_cargo_type=None, cargo_type_id=None, cargo_type=None, cargo_type_group_id=None, cargo_type_group=None, scraped_last_cargo_types=None, last_cargo_types_ids=None, last_cargo_types=None, has_ballast=False, has_dry_dock=False, has_if=False, has_on_hold=False, has_on_subs=False, has_prompt=False, has_uncertain=False, is_position_list=False, content='18/10 eco yosemite park 50 53 183 20 aratu 05/10 clearlake', subject='SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER', sender='SSY', is_private=False)
For better visualization, it's convenient to insert data into a DataFrame
df = pd.DataFrame(scraped_positions)
df.head()
position_id | message_id | external_message_id | parsed_part_id | line_from | line_to | source | updated_date | received_date | is_deleted | ... | has_if | has_on_hold | has_on_subs | has_prompt | has_uncertain | is_position_list | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 233128785 | 47953999 | None | 58810511 | 95 | 95 | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | False | ... | False | False | False | False | False | False | 18/10 eco yosemite park 50 53 183 20 aratu 05/... | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False | |
1 | 233128786 | 47953999 | None | 58810511 | 88 | 88 | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | False | ... | False | False | False | False | False | False | 10/10 ncc reem 45 53 183 12 vila do conde 28/0... | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False | |
2 | 233128787 | 47953999 | None | 58810511 | 83 | 83 | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | False | ... | False | False | False | False | False | False | 07/10 dalma 50 53 183 09 tema 25/09 tmc ulsd/ ... | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False | |
3 | 233128788 | 47953999 | None | 58810511 | 75 | 75 | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | False | ... | False | False | False | False | False | False | 05/10 akrisios 50 54 183 23 gib 01/10 heidmar ... | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False | |
4 | 233128789 | 47953999 | None | 58810511 | 81 | 81 | 2023-09-22 12:25:49+00:00 | 2023-09-22 12:23:30+00:00 | False | ... | False | False | False | False | False | False | 07/10 ps queen 51 54 183 06 lagos 25/09 prive ... | SSY CPP MR LIST+ UPDATE - FRIDAY 22TH SEPTEMBER | SSY | False |
5 rows × 53 columns
Request by IMOs¶
To get positions for specific vessel(s) by their IMO number(s), you can simple call the get_positions
method for a list of desired IMO(s)
Adding some date argument is always feasible
imos = [9321720, 9385192, 9325049, 9406013, 9645437] # Or add a list of your desired IMOs
scraped_positions_by_imos = api.get_positions(
vessel_type=vessel_type,
received_date_from=received_date_from,
imos=imos,
)
df_by_imos = pd.DataFrame(scraped_positions_by_imos)
df_by_imos.head()
position_id | message_id | external_message_id | parsed_part_id | line_from | line_to | source | updated_date | received_date | is_deleted | ... | has_if | has_on_hold | has_on_subs | has_prompt | has_uncertain | is_position_list | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 233161704 | 47960547 | None | 58816771 | 81 | 81 | 2023-09-22 15:59:45+00:00 | 2023-09-22 15:56:37+00:00 | False | ... | False | False | False | False | False | False | 14-oct cap victor 158 07 fos 26-sep euronav et... | MJLF USG SUEZMAX POSITIONS | MJLF | False | |
1 | 233161740 | 47960547 | None | 58816771 | 55 | 55 | 2023-09-22 15:59:45+00:00 | 2023-09-22 15:56:37+00:00 | False | ... | False | False | True | False | False | False | 10-oct subs foc aegean vision 158 17 fos 22-se... | MJLF USG SUEZMAX POSITIONS | MJLF | False | |
2 | 233407796 | 48060201 | None | 58889512 | 61 | 61 | 2023-09-25 14:34:29+00:00 | 2023-09-25 14:28:55+00:00 | False | ... | False | False | False | False | False | False | 15-oct cap victor 7 158 fos 28-sep euronav arr... | BRAZIL SUEZMAX LIST FROM SIMPSON SPENCE YOUNG | SSY | False | |
3 | 233407875 | 48060201 | None | 58889512 | 104 | 104 | 2023-09-25 14:34:29+00:00 | 2023-09-25 14:28:55+00:00 | False | ... | False | False | False | False | False | False | 22-oct aegean vision 17 158 fos 5-oct arcadia ... | BRAZIL SUEZMAX LIST FROM SIMPSON SPENCE YOUNG | SSY | False | |
4 | 233489943 | 48066651 | None | 58895655 | 70 | 70 | 2023-09-25 17:07:32+00:00 | 2023-09-25 17:04:59+00:00 | False | ... | False | False | False | False | False | False | 16-oct cap victor 158 07 fos 28-sep euronav ar... | MJLF USG SUEZMAX POSITIONS | MJLF | False |
5 rows × 53 columns
Request by Message or ExternalMessage IDs¶
To retrieve positions for particular message ID(s), you should include an extra parameter called message_ids
when using the get_positions
method. This parameter should contain a list of message IDs. For instance,
message_ids = [47238320, 47244008, 47244573]
scraped_positions_by_message_ids = api.get_positions(
vessel_type=vessel_type,
message_ids=message_ids,
)
next(iter(scraped_positions_by_message_ids), None)
ScrapedPosition(position_id=231186063, message_id=47238320, external_message_id=None, parsed_part_id=58295114, line_from=75, line_to=75, source='Email', updated_date=datetime.datetime(2023, 9, 11, 13, 40, 25, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2023, 9, 11, 13, 36, 39, tzinfo=datetime.timezone.utc), is_deleted=False, scraped_vessel_name='t.kurucesme', scraped_deadweight='105', scraped_year_built='15', imo=9692478, vessel_name='T. Kurucesme', deadweight=105171, year_built=2015, liquid_capacity=116922, vessel_type_id=1, vessel_type='Tanker', vessel_class_id=86, vessel_class='Aframax', scraped_open_date='28/09', open_date_from=datetime.datetime(2023, 9, 28, 0, 0, tzinfo=datetime.timezone.utc), open_date_to=datetime.datetime(2023, 9, 28, 0, 0, tzinfo=datetime.timezone.utc), scraped_open_port='milazzo', open_geo_id=3557, open_name='Milazzo', open_taxonomy_id=2, open_taxonomy='Port', scraped_commercial_operator='ditas', commercial_operator_id=412, commercial_operator='Ditas Deniz', scraped_cargo_type=None, cargo_type_id=None, cargo_type=None, cargo_type_group_id=None, cargo_type_group=None, scraped_last_cargo_types=None, last_cargo_types_ids=None, last_cargo_types=None, has_ballast=False, has_dry_dock=False, has_if=True, has_on_hold=False, has_on_subs=False, has_prompt=False, has_uncertain=False, is_position_list=False, content='t.kurucesme ditas 105 15 milazzo 28/09 2 prjctng x usg', subject='aframax med-blsea and ukc-balt position list(s) ...', sender='Banchero & Costa', is_private=False)
You can achieve a similar result for external message IDs by providing an argument called external_message_ids
.
Request by Position IDs¶
In the same manner, to get data for specific position ID(s), you must call the get_positions_by_position_ids
method for a list of desired position ID(s)
Date arguments are not available in this method
position_ids = [182459667, 182459702, 182624943, 182624998, 182508037] # Or add a list of your desired position IDs
scraped_positions_by_ids = api.get_positions_by_position_ids(
position_ids=position_ids,
)
df_by_ids = pd.DataFrame(scraped_positions_by_ids)
df_by_ids.head()
position_id | message_id | external_message_id | parsed_part_id | line_from | line_to | source | updated_date | received_date | is_deleted | ... | has_if | has_on_hold | has_on_subs | has_prompt | has_uncertain | is_position_list | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 182459667 | 30791168 | None | 45785098 | 21 | 21 | 2022-11-17 11:57:58+00:00 | 2022-11-17 11:54:41+00:00 | False | ... | False | False | False | False | False | False | 20/11 SEASHARK 32 178 37,947 11.00 GER 04 ... | SIMPSON|SPENCE|YOUNG LTD – HANDY LIST (BASIS M... | SSY | False | |
1 | 182459702 | 30791168 | None | 45785098 | 59 | 59 | 2022-11-17 11:57:58+00:00 | 2022-11-17 11:54:41+00:00 | False | ... | False | False | False | False | False | False | 27/11 MOUNT OLYMPUS 40 182 42,241 11.97 MAR... | SIMPSON|SPENCE|YOUNG LTD – HANDY LIST (BASIS M... | SSY | False | |
2 | 182508037 | 30800115 | None | 45793950 | 27 | 27 | 2022-11-17 15:27:04+00:00 | 2022-11-17 15:23:16+00:00 | False | ... | False | False | False | False | False | False | ardmore exporter 49 52 14 yabucoa 19/11 ardmor... | MJLF MR LIST BSS HOUSTON NOV 17 | MJLF | False | |
3 | 182624943 | 30849799 | None | 45839901 | 88 | 88 | 2022-11-18 16:58:54+00:00 | 2022-11-18 16:55:19+00:00 | False | ... | False | False | False | False | False | False | 14-dec advantage spice 156 10 48.1 rotterdam 2... | MJLF USG SUEZMAX POSITIONS | MJLF | False | |
4 | 182624998 | 30849799 | None | 45839901 | 12 | 12 | 2022-11-18 16:58:54+00:00 | 2022-11-18 16:55:19+00:00 | False | ... | True | False | False | False | False | False | 26-nov proj eagle san pedro 157 12 49.0 off ga... | MJLF USG SUEZMAX POSITIONS | MJLF | False |
5 rows × 53 columns
Usage of optional arguments¶
By default, all fields are returned. In many cases, it is convenient to select specific columns. For example, if we want to compare scraped and mapped fields
scraped_mapped_columns = [
'scraped_vessel_name',
'vessel_name',
'scraped_deadweight',
'deadweight',
'scraped_commercial_operator',
'commercial_operator',
'scraped_open_port',
'open_name',
]
scraped_mapped_df = pd.DataFrame(scraped_positions, columns=scraped_mapped_columns)
scraped_mapped_df.head()
scraped_vessel_name | vessel_name | scraped_deadweight | deadweight | scraped_commercial_operator | commercial_operator | scraped_open_port | open_name | |
---|---|---|---|---|---|---|---|---|
0 | eco yosemite park | Eco Yosemite Park | 50 | 49999.0 | clearlake | Clearlake Shipping | aratu | Aratu |
1 | ncc reem | NCC Reem | 45 | 45498.0 | bahri | Bahri | vila do conde | Vila Do Conde |
2 | dalma | Dalma | 50 | 50162.0 | tmc | TMC Shipping | tema | Tema |
3 | akrisios | Akrisios | 50 | 50113.0 | heidmar | Heidmar | gib | Gibraltar (WP) |
4 | ps queen | Ps Queen | 51 | 51218.0 | prive shipping | Prive Shipping | lagos | Lagos |
Examples¶
Let's start by fetching all tanker positions received the last week
example_vessel_type = 1 # Tanker
example_date_from = datetime.utcnow() - timedelta(days=7)
example_scraped_positions = api.get_positions(
vessel_type=example_vessel_type,
received_date_from=example_date_from,
)
Exclude deleted scraped positions¶
The is_deleted
property of a scraped position indicates whether it is valid or not. If it is set to True
, the corresponding position_id
has been replaced by a new one.
For the sake of completeness, we will exclude deleted scraped positions in the following examples.
example_scraped_positions = [position for position in example_scraped_positions if not position.is_deleted]
next(iter(example_scraped_positions), None)
ScrapedPosition(position_id=232554159, message_id=47725226, external_message_id=None, parsed_part_id=58644923, line_from=104, line_to=104, source='Email', updated_date=datetime.datetime(2023, 9, 19, 12, 53, 33, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2023, 9, 19, 12, 46, 39, tzinfo=datetime.timezone.utc), is_deleted=False, scraped_vessel_name='baker spirit', scraped_deadweight='157', scraped_year_built='09', imo=9408073, vessel_name='Baker Spirit', deadweight=156929, year_built=2009, liquid_capacity=171257, vessel_type_id=1, vessel_type='Tanker', vessel_class_id=85, vessel_class='Suezmax', scraped_open_date='25/09', open_date_from=datetime.datetime(2023, 9, 25, 0, 0, tzinfo=datetime.timezone.utc), open_date_to=datetime.datetime(2023, 9, 25, 0, 0, tzinfo=datetime.timezone.utc), scraped_open_port='cape horn', open_geo_id=7337, open_name='Cape Horn (WP)', open_taxonomy_id=1, open_taxonomy='GeoAsset', scraped_commercial_operator='teekay', commercial_operator_id=1663, commercial_operator='Teekay Corp', scraped_cargo_type=None, cargo_type_id=None, cargo_type=None, cargo_type_group_id=None, cargo_type_group=None, scraped_last_cargo_types=None, last_cargo_types_ids=None, last_cargo_types=None, has_ballast=True, has_dry_dock=False, has_if=False, has_on_hold=False, has_on_subs=False, has_prompt=False, has_uncertain=False, is_position_list=False, content='baker spirit teekay 157 09 cape horn 25/09 12/10 in ballast', subject='suezmax lists bss meg, emed, nweurope, usg and waf ....', sender='Banchero & Costa', is_private=False)
Now, we are ready to insert our data into a dataframe and keep only specific fields
example_columns = [
'imo',
'commercial_operator',
'open_date_to',
'open_name',
'is_deleted',
]
data = pd.DataFrame(example_scraped_positions, columns=example_columns).astype({'imo': 'Int64'})
data.head()
imo | commercial_operator | open_date_to | open_name | is_deleted | |
---|---|---|---|---|---|
0 | 9408073 | Teekay Corp | 2023-09-25 00:00:00+00:00 | Cape Horn (WP) | False |
1 | 9772125 | Sun Enterprises | 2023-09-19 00:00:00+00:00 | Trieste | False |
2 | 9790983 | Thenamaris | 2023-09-25 00:00:00+00:00 | Greece | False |
3 | 9579511 | Almi Tankers | 2023-09-22 00:00:00+00:00 | New Orleans | False |
4 | 9831854 | Frontline | 2023-09-28 00:00:00+00:00 | Sarroch | False |
Top 10 Commercical Operators¶
In this example, we will find the top 10 Commercial Operators, based on the number of their vessels opening
top_co_ser = data[['commercial_operator', 'imo']].drop_duplicates().commercial_operator.value_counts().head(10)
top_co_df = top_co_ser.to_frame(name='VesselCount').reset_index().rename(columns={'index': 'CommercialOperator'})
top_co_df
commercial_operator | VesselCount | |
---|---|---|
0 | Trafigura | 59 |
1 | Frontline | 31 |
2 | Tankers International | 25 |
3 | Scorpio Commercial Management | 24 |
4 | Maran Tankers Management | 24 |
5 | Minerva Marine | 23 |
6 | Thenamaris | 23 |
7 | Cardiff Marine | 22 |
8 | Shell | 22 |
9 | Penfield Marine | 20 |
And display results in a bar plot
top_co_fig = go.Figure()
bar = go.Bar(
x=top_co_df.commercial_operator.tolist(),
y=top_co_df.VesselCount.tolist(),
)
top_co_fig.add_trace(bar)
top_co_fig.update_xaxes(title_text="Commercial Operator")
top_co_fig.update_yaxes(title_text="Number of Vessels opening")
top_co_fig.show()
Vessels opening at specific ports¶
In this example, we will create a visualization, for the number of distinct vessels opening at specific ports per day over the next week
this_week_days = pd.date_range(start=datetime.utcnow().date(), freq='D', periods=7, tz='UTC')
ports = data[data.open_name.notna()].open_name.value_counts().head().index.tolist()
ports
['Gibraltar', 'US Gulf', 'Singapore', 'Rotterdam', 'New York']
Create the pivot table
ports_mask = data.open_name.isin(ports) & data.open_date_to.isin(this_week_days)
df_ports = data[ports_mask]
df_pivot = pd.pivot_table(
df_ports,
columns='open_name',
index='open_date_to',
values='imo',
aggfunc=pd.Series.nunique,
fill_value=0,
).reindex(index=this_week_days, fill_value=0).reset_index().rename(columns={'index': 'open_date_to'})
df_pivot
open_name | open_date_to | Gibraltar | New York | Rotterdam | Singapore | US Gulf |
---|---|---|---|---|---|---|
0 | 2023-09-26 00:00:00+00:00 | 1 | 4 | 3 | 6 | 2 |
1 | 2023-09-27 00:00:00+00:00 | 1 | 1 | 2 | 5 | 2 |
2 | 2023-09-28 00:00:00+00:00 | 1 | 4 | 1 | 4 | 2 |
3 | 2023-09-29 00:00:00+00:00 | 2 | 1 | 1 | 7 | 1 |
4 | 2023-09-30 00:00:00+00:00 | 3 | 1 | 8 | 10 | 2 |
5 | 2023-10-01 00:00:00+00:00 | 5 | 1 | 4 | 7 | 1 |
6 | 2023-10-02 00:00:00+00:00 | 2 | 1 | 0 | 4 | 1 |
And display the results as timeseries
def port_button(port):
args = [
{'visible': [i == ports.index(port) for i in range(len(ports))]},
{
'title': f'Vessels opening at {port} per day',
'showlegend': True
},
]
return dict(label=port,
method='update',
args=args,
)
title = 'Vessels opening per day'
today = datetime.combine(datetime.utcnow().date(), datetime.min.time())
ports_fig = go.Figure()
port_buttons = []
for port in ports:
if port not in df_pivot.columns:
continue
port_scatter_plot = go.Scatter(
x=df_pivot.open_date_to,
y=df_pivot[port],
name=port,
mode='lines',
)
ports_fig.add_trace(port_scatter_plot)
port_buttons.append(port_button(port))
buttons = list([
dict(
label='All',
method='update',
args=[
{'visible': [True for _ in range(len(ports))]},
{
'title': title,
'showlegend': True
}
],
),
*port_buttons,
])
ports_fig.update_layout(
title=title,
updatemenus=[go.layout.Updatemenu(
active=0,
buttons=buttons,
)],
xaxis_range=[today - timedelta(hours=4), today + timedelta(hours=24*6 + 4)],
)
ports_fig.show()
Export data to csv¶
output_path = '' # Change output_path with your path
filename = 'last_week_positions.csv'
if not data.empty:
data.to_csv(output_path+filename, index=False)