Get your personal Signal Ocean API subscription key (acquired here) and replace it below:
signal_ocean_api_key = '' # Replace with your subscription key
Scraped Positions API¶
The goal of Scraped Positions API is to collect and return scraped positions by the given filters. This can be done by using the ScrapedPositionsAPI
class and calling appropriate methods
1. Request by filters¶
Positions can be retrieved for specific filters, by calling the get_positions
method with the following arguments:
Required¶
vessel_type
The vessel type
Additionally, at least one of the following is required
position_ids
List of PositionIDs to retrieve
imos
List of IMOs
message_ids
List of MessageIDs
external_message_ids
List of ExternalMessageIDs
received_date_from
Earliest date the position received
received_date_to
Latest date the position received
updated_date_from
Earliest date the position updated
updated_date_to
Latest date the position updated
Mixing received and updated dates is not allowed
It's highly recommended to use UTC dates, since this is the internally used format
2. Request by page token¶
Positions can also be retrieved, by calling the get_positions_incremental
method.
This method will also return next_request_token
, which the user should use to their next incremental request to retrieve only the latest updated cargoes.
Required¶
vessel_type
The vessel type
Optional¶
page_token
The key that should be used as a parameter of the token to retrieve the relevant page.
3. Retrieve page token¶
The token referenced above can be retrieved for a specific date, by calling the get_positions_incremental_token
method with the following argument:
Required¶
updated_date_from
Earliest the position updated.
Additional optional arguments¶
Methods get_positions
and get_positions_incremental
, also accept the following optional arguments:
include_details
If this field is True
the following columns will be included in the response (otherwise they will be None
):
parsed_part_id, line_from, line_to, source
include_scraped_fields
If this field is True
the following columns will be included in the response (otherwise they will be None
):
scraped_vessel_name, scraped_deadweight, scraped_year_built, scraped_open_date, scraped_open_port, scraped_commercial_operator, scraped_cargo_type, scraped_last_cargo_types
include_vessel_details
If this field is True
the following columns will be included in the response (otherwise they will be None
):
vessel_name, deadweight, year_built, liquid_capacity, vessel_type_id, vessel_type, vessel_class
include_labels
If this field is True
the following columns will be included in the response (otherwise they will be None
):
open_name, open_taxonomy, commercial_operator, cargo_type, cargo_type_group, last_cargo_types
include_content
If this field is True
the following columns will be included in the response (otherwise they will be None
):
content
include_sender
If this field is True
the following columns will be included in the response (otherwise they will be None
):
sender
include_debug_info
If this field is True
the following columns will be included in the response (otherwise they will be None
):
is_private
Default value is
True
for all the optional arguments described above
Installation¶
To install Signal Ocean SDK, simply run the following command
%%capture
%pip install signal-ocean
Quickstart¶
Import signal-ocean
and other modules required for this demo
from signal_ocean import Connection
from signal_ocean.scraped_positions import ScrapedPositionsAPI
from datetime import datetime, timedelta
import pandas as pd
import plotly.graph_objects as go
Create a new instance of the ScrapedPositionsAPI
class
connection = Connection(signal_ocean_api_key)
api = ScrapedPositionsAPI(connection)
Now you are ready to retrieve your data
Request by date¶
To get all tanker positions received the last 4 days, you must declare appropriate vessel_type
and received_date_from
variables
vessel_type = 1 # Tanker
received_date_from = datetime.utcnow() - timedelta(days=4)
And then call get_positions
method, as below
scraped_positions = api.get_positions(
vessel_type=vessel_type,
received_date_from=received_date_from,
)
next(iter(scraped_positions), None)
ScrapedPosition(position_id=280995305, message_id=69566610, external_message_id=None, parsed_part_id=74684036, line_from=5, line_to=5, source='Email', updated_date=datetime.datetime(2024, 7, 29, 8, 58, 8, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2024, 7, 29, 8, 56, 1, tzinfo=datetime.timezone.utc), is_deleted=False, low_confidence=False, scraped_vessel_name='BLUE SKY I', scraped_deadweight='38402mts', scraped_year_built=None, imo=9335903, vessel_name='Blue Sky I', deadweight=38402, year_built=2006, liquid_capacity=43744, vessel_type_id=1, vessel_type='Tanker', vessel_class_id=89, vessel_class='MR1', scraped_open_date='05/Aug', open_date_from=datetime.datetime(2024, 8, 5, 0, 0, tzinfo=datetime.timezone.utc), open_date_to=datetime.datetime(2024, 8, 5, 0, 0, tzinfo=datetime.timezone.utc), scraped_open_port='Chittagong', open_geo_id=3215, open_name='Chittagong', open_taxonomy_id=2, open_taxonomy='Port', scraped_commercial_operator=None, commercial_operator_id=None, commercial_operator=None, scraped_cargo_type=None, cargo_type_id=None, cargo_type=None, cargo_type_group_id=None, cargo_type_group=None, scraped_last_cargo_types=None, last_cargo_types_ids=None, last_cargo_types=None, has_ballast=False, has_dry_dock=False, has_if=False, has_on_hold=False, has_on_subs=False, has_prompt=False, has_uncertain=False, is_position_list=True, content='Vessel: M/T BLUE SKY I dwt 38402mts open Chittagong 05/Aug', subject='Harper Petersen Albis - Special Vessel Position - MT BLUE SKY I', sender='harperpetersen.com', is_private=True)
For better visualization, it's convenient to insert data into a DataFrame
df = pd.DataFrame(scraped_positions)
df.head()
position_id | message_id | external_message_id | parsed_part_id | line_from | line_to | source | updated_date | received_date | is_deleted | ... | has_if | has_on_hold | has_on_subs | has_prompt | has_uncertain | is_position_list | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 280995305 | 69566610 | None | 74684036 | 5 | 5 | 2024-07-29 08:58:08+00:00 | 2024-07-29 08:56:01+00:00 | False | ... | False | False | False | False | False | True | Vessel: M/T BLUE SKY I dwt 38402mts open Chit... | Harper Petersen Albis - Special Vessel Positio... | harperpetersen.com | True | |
1 | 280996508 | 69568756 | None | 74685478 | 99 | 99 | 2024-07-29 09:08:29+00:00 | 2024-07-29 09:07:57+00:00 | False | ... | False | False | False | False | False | False | grand winner 5 50 52 21 mackay 09/08 pan ocean... | BRS CPP MR LIST BSS YOSU - 29/07/24 | BRS Group | True | |
2 | 280996509 | 69568756 | None | 74685478 | 21 | 21 | 2024-07-29 09:08:29+00:00 | 2024-07-29 09:07:57+00:00 | False | ... | False | False | True | False | False | False | grand winner 2 49 52 21 mackay 22/07 pan ocean... | BRS CPP MR LIST BSS YOSU - 29/07/24 | BRS Group | True | |
3 | 280996510 | 69568756 | None | 74685478 | 110 | 110 | 2024-07-29 09:08:29+00:00 | 2024-07-29 09:07:57+00:00 | False | ... | False | False | False | False | False | False | grand winner 3 49 52 21 tauranga 08/08 pan oce... | BRS CPP MR LIST BSS YOSU - 29/07/24 | BRS Group | True | |
4 | 280996511 | 69568756 | None | 74685478 | 75 | 75 | 2024-07-29 09:08:29+00:00 | 2024-07-29 09:07:57+00:00 | False | ... | False | False | False | False | False | False | nomiki 44 42 02 pulau laut 05/08 unknown go - ... | BRS CPP MR LIST BSS YOSU - 29/07/24 | BRS Group | True |
5 rows × 54 columns
Request by IMOs¶
To get positions for specific vessel(s) by their IMO number(s), you can simple call the get_positions
method for a list of desired IMO(s)
Adding some date argument is always feasible
imos = [9321720,9385192,9325049,9406013,9645437] # Or add a list of your desired IMOs
scraped_positions_by_imos = api.get_positions(
vessel_type=vessel_type,
received_date_from=received_date_from,
imos=imos,
)
df_by_imos = pd.DataFrame(scraped_positions_by_imos)
df_by_imos.head()
position_id | message_id | external_message_id | parsed_part_id | line_from | line_to | source | updated_date | received_date | is_deleted | ... | has_if | has_on_hold | has_on_subs | has_prompt | has_uncertain | is_position_list | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 281075434 | 69600287 | None | 74706024 | 10 | 10 | 2024-07-29 12:55:06+00:00 | 2024-07-29 12:54:33+00:00 | False | ... | False | False | False | False | False | False | 30/07 aegean vision arcadia 158 17 dty 29/07 a... | SIDI KERIR SUEZMAX POSITIONS LIST 29TH JUL 2024 | Bravo Tankers | True | |
1 | 281075440 | 69600287 | None | 74706024 | 42 | 42 | 2024-07-29 12:55:06+00:00 | 2024-07-29 12:54:33+00:00 | False | ... | False | False | False | False | False | False | 09/08 pinnacle spirit teekay 159 08 dty 29/07 ... | SIDI KERIR SUEZMAX POSITIONS LIST 29TH JUL 2024 | Bravo Tankers | True | |
2 | 281139185 | 69611578 | None | 74714401 | 56 | 56 | 2024-07-29 15:01:49+00:00 | 2024-07-29 14:59:22+00:00 | False | ... | False | False | False | False | False | False | 14-aug pinnacle spirit 160 08 wilhelmshaven 29... | GALBRAITHS SUEZMAX LIST BASIS USG | Ifchor Galbraiths | True | |
3 | 281139194 | 69611578 | None | 74714401 | 7 | 7 | 2024-07-29 15:01:49+00:00 | 2024-07-29 14:59:22+00:00 | False | ... | True | False | False | False | False | False | 2-aug cape benat 157 10 houston 2-aug maersk proj | GALBRAITHS SUEZMAX LIST BASIS USG | Ifchor Galbraiths | True | |
4 | 281161463 | 69615532 | None | 74717700 | 69 | 69 | 2024-07-29 16:03:05+00:00 | 2024-07-29 16:00:47+00:00 | False | ... | False | False | False | False | False | False | 16-aug pinnacle spirit 159 08 wilhelmshaven 29... | MJLF USG SUEZMAX POSITIONS | MJLF | False |
5 rows × 54 columns
Request by Message or ExternalMessage IDs¶
To retrieve positions for particular message ID(s), you should include an extra parameter called message_ids
when using the get_positions
method. This parameter should contain a list of message IDs. For instance,
message_ids = [47238320,47244008,47244573,69350262,69348062]
scraped_positions_by_message_ids = api.get_positions(
vessel_type=vessel_type,
message_ids=message_ids,
)
next(iter(scraped_positions_by_message_ids), None)
ScrapedPosition(position_id=231186063, message_id=47238320, external_message_id=None, parsed_part_id=58295114, line_from=75, line_to=75, source='Email', updated_date=datetime.datetime(2023, 9, 11, 13, 40, 25, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2023, 9, 11, 13, 36, 39, tzinfo=datetime.timezone.utc), is_deleted=False, low_confidence=False, scraped_vessel_name='t.kurucesme', scraped_deadweight='105', scraped_year_built='15', imo=9692478, vessel_name='T. Kurucesme', deadweight=105171, year_built=2015, liquid_capacity=116922, vessel_type_id=1, vessel_type='Tanker', vessel_class_id=86, vessel_class='Aframax', scraped_open_date='28/09', open_date_from=datetime.datetime(2023, 9, 28, 0, 0, tzinfo=datetime.timezone.utc), open_date_to=datetime.datetime(2023, 9, 28, 0, 0, tzinfo=datetime.timezone.utc), scraped_open_port='milazzo', open_geo_id=3557, open_name='Milazzo', open_taxonomy_id=2, open_taxonomy='Port', scraped_commercial_operator='ditas', commercial_operator_id=412, commercial_operator='Ditas Deniz', scraped_cargo_type=None, cargo_type_id=None, cargo_type=None, cargo_type_group_id=None, cargo_type_group=None, scraped_last_cargo_types=None, last_cargo_types_ids=None, last_cargo_types=None, has_ballast=False, has_dry_dock=False, has_if=True, has_on_hold=False, has_on_subs=False, has_prompt=False, has_uncertain=False, is_position_list=False, content='t.kurucesme ditas 105 15 milazzo 28/09 2 prjctng x usg', subject='aframax med-blsea and ukc-balt position list(s) ...', sender='Banchero & Costa', is_private=False)
You can achieve a similar result for external message IDs by providing an argument called external_message_ids
.
Request by Position IDs¶
In the same manner, to get data for specific position ID(s), you must call the get_positions
method for a list of desired position ID(s)
Date arguments are not available in this method
position_ids = [182459667,182459702,182624943,182624998,182508037] # Or add a list of your desired position IDs
scraped_positions_by_ids = api.get_positions(
vessel_type=vessel_type,
position_ids=position_ids,
)
df_by_ids = pd.DataFrame(scraped_positions_by_ids)
df_by_ids.head()
position_id | message_id | external_message_id | parsed_part_id | line_from | line_to | source | updated_date | received_date | is_deleted | ... | has_if | has_on_hold | has_on_subs | has_prompt | has_uncertain | is_position_list | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 182459667 | 30791168 | None | 45785098 | 21 | 21 | 2022-11-17 11:57:58+00:00 | 2022-11-17 11:54:41+00:00 | False | ... | False | False | False | False | False | False | 20/11 SEASHARK 32 178 37,947 11.00 GER 04 ... | SIMPSON|SPENCE|YOUNG LTD – HANDY LIST (BASIS M... | SSY | False | |
1 | 182459702 | 30791168 | None | 45785098 | 59 | 59 | 2022-11-17 11:57:58+00:00 | 2022-11-17 11:54:41+00:00 | False | ... | False | False | False | False | False | False | 27/11 MOUNT OLYMPUS 40 182 42,241 11.97 MAR... | SIMPSON|SPENCE|YOUNG LTD – HANDY LIST (BASIS M... | SSY | False | |
2 | 182508037 | 30800115 | None | 45793950 | 27 | 27 | 2022-11-17 15:27:04+00:00 | 2022-11-17 15:23:16+00:00 | False | ... | False | False | False | False | False | False | ardmore exporter 49 52 14 yabucoa 19/11 ardmor... | MJLF MR LIST BSS HOUSTON NOV 17 | MJLF | False | |
3 | 182624943 | 30849799 | None | 45839901 | 88 | 88 | 2022-11-18 16:58:54+00:00 | 2022-11-18 16:55:19+00:00 | False | ... | False | False | False | False | False | False | 14-dec advantage spice 156 10 48.1 rotterdam 2... | MJLF USG SUEZMAX POSITIONS | MJLF | False | |
4 | 182624998 | 30849799 | None | 45839901 | 12 | 12 | 2022-11-18 16:58:54+00:00 | 2022-11-18 16:55:19+00:00 | False | ... | True | False | False | False | False | False | 26-nov proj eagle san pedro 157 12 49.0 off ga... | MJLF USG SUEZMAX POSITIONS | MJLF | False |
5 rows × 54 columns
Retrieve incremental token¶
The get_positions_incremental_token
allows retrieval of page_token
by adding the updated_date_from
value.
This page_token
can be used as a variable to the get_positions_incremental
method, in order to retrieve only new or updated positions.
For instance, to retrieve the relevant page token for last 3 days' data, you must call the get_positions_incremental_token
, as below
updated_date_from = datetime.utcnow() - timedelta(days=3)
page_token = api.get_positions_incremental_token(
updated_date_from=updated_date_from,
)
page_token
'MjAyNC0wNy0zMFQwODo1MDoxOC4wMDBa'
Request by page token¶
If the page_token
parameter is not specified, the get_positions_incremental
method will retrieve all positions. You can call the get_positions_incremental
method, as below
scraped_positions_incremental = api.get_positions_incremental(
vessel_type=vessel_type,
page_token=page_token,
)
df_by_ids = pd.DataFrame(scraped_positions_incremental.data)
df_by_ids.head()
position_id | message_id | external_message_id | parsed_part_id | line_from | line_to | source | updated_date | received_date | is_deleted | ... | has_if | has_on_hold | has_on_subs | has_prompt | has_uncertain | is_position_list | content | subject | sender | is_private | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 281297965 | 69693869 | None | 74767967.0 | 43.0 | 43.0 | 2024-07-30 09:15:14+00:00 | 2024-07-30 09:12:43+00:00 | False | ... | True | False | False | False | False | False | 15/08 ncc danah 46 183 52590 12.17 sau 11 gibr... | SSY – MR LIST (BASIS MALTA) – TUESDAY 30TH JUL... | SSY | False | |
1 | 281297976 | 69693869 | None | 74767967.0 | 45.0 | 45.0 | 2024-07-30 09:15:14+00:00 | 2024-07-30 09:12:43+00:00 | False | ... | False | False | False | False | False | False | 20/08 uog constantine g 50 183 53879 13.14 lib... | SSY – MR LIST (BASIS MALTA) – TUESDAY 30TH JUL... | SSY | False | |
2 | 281297977 | 69693869 | None | 74767967.0 | 40.0 | 40.0 | 2024-07-30 09:15:14+00:00 | 2024-07-30 09:12:43+00:00 | False | ... | False | False | False | False | True | False | 14/08 arvin 47 183 53184 12.22 lib 08 gaeta 13... | SSY – MR LIST (BASIS MALTA) – TUESDAY 30TH JUL... | SSY | False | |
3 | 281297978 | 69693869 | None | 74767967.0 | 39.0 | 39.0 | 2024-07-30 09:15:14+00:00 | 2024-07-30 09:12:43+00:00 | False | ... | False | False | False | False | True | False | 14/08 hakata princess 50 183 56949 13.10 pan 1... | SSY – MR LIST (BASIS MALTA) – TUESDAY 30TH JUL... | SSY | False | |
4 | 281297979 | 69693869 | None | 74767967.0 | 32.0 | 32.0 | 2024-07-30 09:15:14+00:00 | 2024-07-30 09:12:43+00:00 | False | ... | False | False | False | False | False | False | 11/08 avax 50 183 53658 13.32 mar 23 gibraltar... | SSY – MR LIST (BASIS MALTA) – TUESDAY 30TH JUL... | SSY | False |
5 rows × 54 columns
Also, the get_positions_incremental
method will retrieve the next_request_token
, which should be used as the page_token
for the next incremental call and retrieve only new or updated positions.
next_request_token = scraped_positions_incremental.next_request_token
next_request_token
'MjAyNC0wOC0wMlQwODo1MDoxOC45MDha'
Usage of optional arguments¶
By default, all fields are returned. In many cases, it is convenient to select specific columns. For example, if we want to compare scraped and mapped fields
scraped_mapped_columns = [
'scraped_vessel_name',
'vessel_name',
'scraped_deadweight',
'deadweight',
'scraped_commercial_operator',
'commercial_operator',
'scraped_open_port',
'open_name',
]
scraped_mapped_df = pd.DataFrame(scraped_positions, columns=scraped_mapped_columns)
scraped_mapped_df.head()
scraped_vessel_name | vessel_name | scraped_deadweight | deadweight | scraped_commercial_operator | commercial_operator | scraped_open_port | open_name | |
---|---|---|---|---|---|---|---|---|
0 | BLUE SKY I | Blue Sky I | 38402mts | 38402.0 | None | None | Chittagong | Chittagong |
1 | grand winner | None | 50 | NaN | pan ocean | Pan Ocean | mackay | Mackay |
2 | grand winner | None | 49 | NaN | pan ocean | Pan Ocean | mackay | Mackay |
3 | grand winner | None | 49 | NaN | pan ocean | Pan Ocean | tauranga | Tauranga |
4 | nomiki | Nomiki | 44 | 44485.0 | unknown | None | pulau laut | Tanjung Pemancingan |
Examples¶
Let's start by fetching all tanker positions received the last week
example_vessel_type = 1 # Tanker
example_date_from = datetime.utcnow() - timedelta(days=7)
example_scraped_positions = api.get_positions(
vessel_type=example_vessel_type,
received_date_from=example_date_from,
)
Exclude deleted scraped positions¶
The is_deleted
property of a scraped position indicates whether it is valid or not. If it is set to True
, the corresponding position_id
has been replaced by a new one.
For the sake of completeness, we will exclude deleted scraped positions in the following examples.
example_scraped_positions = [position for position in example_scraped_positions if not position.is_deleted]
next(iter(example_scraped_positions), None)
ScrapedPosition(position_id=280874923, message_id=69455617, external_message_id=None, parsed_part_id=74607938, line_from=21, line_to=21, source='Email', updated_date=datetime.datetime(2024, 7, 26, 9, 30, 24, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2024, 7, 26, 9, 29, 39, tzinfo=datetime.timezone.utc), is_deleted=False, low_confidence=False, scraped_vessel_name='british sailor', scraped_deadweight='45', scraped_year_built='16', imo=9724673, vessel_name='British Sailor', deadweight=45999, year_built=2016, liquid_capacity=53838, vessel_type_id=1, vessel_type='Tanker', vessel_class_id=88, vessel_class='MR2', scraped_open_date='01/08', open_date_from=datetime.datetime(2024, 8, 1, 0, 0, tzinfo=datetime.timezone.utc), open_date_to=datetime.datetime(2024, 8, 1, 0, 0, tzinfo=datetime.timezone.utc), scraped_open_port='lagos', open_geo_id=3683, open_name='Lagos', open_taxonomy_id=2, open_taxonomy='Port', scraped_commercial_operator=None, commercial_operator_id=None, commercial_operator=None, scraped_cargo_type=None, cargo_type_id=None, cargo_type=None, cargo_type_group_id=None, cargo_type_group=None, scraped_last_cargo_types=None, last_cargo_types_ids=None, last_cargo_types=None, has_ballast=False, has_dry_dock=False, has_if=False, has_on_hold=False, has_on_subs=False, has_prompt=False, has_uncertain=False, is_position_list=False, content='british sailor 45 53 16 12.32 183.06 3 bp lagos 01/08 waiting to berth', subject='WAFR HANDY/MR LIST BSS LAGOS', sender='BRS Group', is_private=True)
Now, we are ready to insert our data into a dataframe and keep only specific fields
example_columns = [
'imo',
'commercial_operator',
'open_date_to',
'open_name',
'is_deleted',
]
data = pd.DataFrame(example_scraped_positions, columns=example_columns).astype({'imo': 'Int64'})
data.head()
imo | commercial_operator | open_date_to | open_name | is_deleted | |
---|---|---|---|---|---|
0 | 9724673 | None | 2024-08-01 00:00:00+00:00 | Lagos | False |
1 | 9682394 | Vitol | 2024-08-01 00:00:00+00:00 | Walvis Bay | False |
2 | 9465966 | Torm | 2024-08-04 00:00:00+00:00 | Lome | False |
3 | 9637076 | Ardmore Shipping | 2024-08-02 00:00:00+00:00 | West Africa | False |
4 | 9455741 | Tankerska Plovidba | 2024-08-01 00:00:00+00:00 | Limboh | False |
Top 10 Commercical Operators¶
In this example, we will find the top 10 Commercial Operators, based on the number of their vessels opening
top_co_ser = data[['commercial_operator', 'imo']].drop_duplicates().commercial_operator.value_counts().head(10)
top_co_df = top_co_ser.to_frame(name='VesselCount').reset_index().rename(columns={'index': 'CommercialOperator'})
top_co_df
commercial_operator | VesselCount | |
---|---|---|
0 | Trafigura | 107 |
1 | Hafnia | 93 |
2 | Shell | 78 |
3 | Maersk | 74 |
4 | Torm | 70 |
5 | Vitol | 66 |
6 | Scorpio Commercial Management | 64 |
7 | Norden | 58 |
8 | Clearlake Shipping | 56 |
9 | Norden Tanker Pool | 53 |
And display results in a bar plot
top_co_fig = go.Figure()
bar = go.Bar(
x=top_co_df.commercial_operator.tolist(),
y=top_co_df.VesselCount.tolist(),
)
top_co_fig.add_trace(bar)
top_co_fig.update_xaxes(title_text="Commercial Operator")
top_co_fig.update_yaxes(title_text="Number of Vessels opening")
top_co_fig.show()
Vessels opening at specific ports¶
In this example, we will create a visualization, for the number of distinct vessels opening at specific ports per day over the next week
this_week_days = pd.date_range(start=datetime.utcnow().date(), freq='D', periods=7, tz='UTC')
ports = data[data.open_name.notna()].open_name.value_counts().head().index.tolist()
ports
['Singapore', 'US Gulf', 'New York', 'Fujairah', 'Rotterdam']
Create the pivot table
ports_mask = data.open_name.isin(ports) & data.open_date_to.isin(this_week_days)
df_ports = data[ports_mask]
df_pivot = pd.pivot_table(
df_ports,
columns='open_name',
index='open_date_to',
values='imo',
aggfunc=pd.Series.nunique,
fill_value=0,
).reindex(index=this_week_days, fill_value=0).reset_index().rename(columns={'index': 'open_date_to'})
df_pivot
open_name | open_date_to | Fujairah | New York | Rotterdam | Singapore | US Gulf |
---|---|---|---|---|---|---|
0 | 2024-08-02 00:00:00+00:00 | 23 | 13 | 10 | 33 | 17 |
1 | 2024-08-03 00:00:00+00:00 | 20 | 8 | 8 | 24 | 12 |
2 | 2024-08-04 00:00:00+00:00 | 12 | 3 | 9 | 17 | 14 |
3 | 2024-08-05 00:00:00+00:00 | 11 | 4 | 7 | 21 | 11 |
4 | 2024-08-06 00:00:00+00:00 | 10 | 4 | 5 | 22 | 7 |
5 | 2024-08-07 00:00:00+00:00 | 6 | 7 | 6 | 19 | 3 |
6 | 2024-08-08 00:00:00+00:00 | 8 | 3 | 5 | 25 | 3 |
And display the results as timeseries
def port_button(port):
args = [
{'visible': [i == ports.index(port) for i in range(len(ports))]},
{
'title': f'Vessels opening at {port} per day',
'showlegend': True
},
]
return dict(label=port,
method='update',
args=args,
)
title = 'Vessels opening per day'
today = datetime.combine(datetime.utcnow().date(), datetime.min.time())
ports_fig = go.Figure()
port_buttons = []
for port in ports:
if port not in df_pivot.columns:
continue
port_scatter_plot = go.Scatter(
x=df_pivot.open_date_to,
y=df_pivot[port],
name=port,
mode='lines',
)
ports_fig.add_trace(port_scatter_plot)
port_buttons.append(port_button(port))
buttons = list([
dict(
label='All',
method='update',
args=[
{'visible': [True for _ in range(len(ports))]},
{
'title': title,
'showlegend': True
}
],
),
*port_buttons,
])
ports_fig.update_layout(
title=title,
updatemenus=[go.layout.Updatemenu(
active=0,
buttons=buttons,
)],
xaxis_range=[today - timedelta(hours=4), today + timedelta(hours=24*6 + 4)],
)
ports_fig.show()
Export data to csv¶
output_path = '' # Change output_path with your path
filename = 'last_week_positions.csv'
if not data.empty:
data.to_csv(output_path+filename, index=False)