Capesize Discharges to China¶
Run this example in Colab.¶
APIs Used : Voyages API
Description :
The main goal of this notebook is to retrieve a dataset containing the voyages of Capesizes that discharged / will discharge / currently discharging to China from a year before onwards.
The script walks though the installation of the signal ocean SDK and import of the required dependencies used for the processing of the data. Also the parameters
operations_to_display, country_to_filter, segment, vessel_class_to_filter, voyages_first_load_from
are initialized, in order to be used to achieve the desired output.
Next the Voyages API is called to fetch the corresponding data to our query.
After that, we expand the Voyages events to perform a Data cleaning operation and retrieve only the Data fit to our criteria and the most relevant columns from the dataset.
Lastly, an Excel file is made and exported to a downloadable format.
Output : Excel file containing the resulting Table.
Setup¶
Install the Signal Ocean package:
!pip install signal-ocean
Collecting signal-ocean Downloading signal_ocean-13.3.0-py3-none-any.whl.metadata (2.2 kB) Requirement already satisfied: requests<3,>=2.23.0 in /usr/local/lib/python3.11/dist-packages (from signal-ocean) (2.32.3) Requirement already satisfied: python-dateutil<3,>=2.8.1 in /usr/local/lib/python3.11/dist-packages (from signal-ocean) (2.8.2) Requirement already satisfied: pandas<3,>=1.0.3 in /usr/local/lib/python3.11/dist-packages (from signal-ocean) (2.2.2) Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.11/dist-packages (from signal-ocean) (1.26.4) Collecting strictly-typed-pandas==0.1.4 (from signal-ocean) Downloading strictly_typed_pandas-0.1.4-py3-none-any.whl.metadata (3.0 kB) Collecting typeguard<3.0.0,>=2.13.3 (from signal-ocean) Downloading typeguard-2.13.3-py3-none-any.whl.metadata (3.6 kB) Requirement already satisfied: pandas-stubs in /usr/local/lib/python3.11/dist-packages (from strictly-typed-pandas==0.1.4->signal-ocean) (2.2.2.240909) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas<3,>=1.0.3->signal-ocean) (2025.1) Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas<3,>=1.0.3->signal-ocean) (2025.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil<3,>=2.8.1->signal-ocean) (1.17.0) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.23.0->signal-ocean) (3.4.1) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.23.0->signal-ocean) (3.10) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.23.0->signal-ocean) (2.3.0) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.23.0->signal-ocean) (2025.1.31) Requirement already satisfied: types-pytz>=2022.1.1 in /usr/local/lib/python3.11/dist-packages (from pandas-stubs->strictly-typed-pandas==0.1.4->signal-ocean) (2025.1.0.20250204) Downloading signal_ocean-13.3.0-py3-none-any.whl (155 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.6/155.6 kB 3.0 MB/s eta 0:00:00 Downloading strictly_typed_pandas-0.1.4-py3-none-any.whl (9.6 kB) Downloading typeguard-2.13.3-py3-none-any.whl (17 kB) Installing collected packages: typeguard, strictly-typed-pandas, signal-ocean Attempting uninstall: typeguard Found existing installation: typeguard 4.4.2 Uninstalling typeguard-4.4.2: Successfully uninstalled typeguard-4.4.2 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. inflect 7.5.0 requires typeguard>=4.0.1, but you have typeguard 2.13.3 which is incompatible. Successfully installed signal-ocean-13.3.0 strictly-typed-pandas-0.1.4 typeguard-2.13.3
Import the required Dependencies:
from signal_ocean import Connection
from signal_ocean.voyages import VoyagesAPI
import pandas as pd
from datetime import datetime, timedelta
import plotly.express as px
Parametrisation¶
signal_ocean_api_key = '' # Replace with your subscription key
operations_to_display = 'Discharge' # Here you can filter for: Load, Discharge, Stop, Start, Dry dock
country_to_filter = 'China' # The country that we want the discharges of
segment = 'Dry' # This could be either 'Dry', 'Tanker', 'LPG', 'LNG'
vessels_class_to_filter = 'Capesize' # Value from the set
# ('Handysize', 'Handymax', 'Supramax', 'Panamax', 'Capesize','VLOC')
voyages_first_load_from = (datetime.today() - timedelta(365)).strftime("%Y-%m-%d") # We pick the start of the first load date as our time window
Main CodeBlock¶
Get your personal Signal Ocean API subscription key (acquired here) and use it to create a Connection
:
connection = Connection(signal_ocean_api_key)
Call the Voyages API¶
The Voyages API retrieves information about vessel voyages.
api = VoyagesAPI(connection)
For more information, see the Voyages API section.
Get vessel class id for Capesizes
vessel_classes = api.get_vessel_classes() # retrieval of all available Vessel Classes
vessel_classes_df = pd.DataFrame(v.__dict__ for v in vessel_classes) # Load into DataFrame for processing
vessel_classes_df = vessel_classes_df[vessel_classes_df['vessel_type']==segment] # Filter only Dry Vessel Classes
vessel_classes_df = vessel_classes_df[vessel_classes_df['vessel_class_name']==vessels_class_to_filter] # Return id of desired vessel Class
vessel_classes_df
vessel_class_id | vessel_class_name | vessel_type_id | vessel_type | |
---|---|---|---|---|
5 | 70 | Capesize | 3 | Dry |
Calling the Voyages API to get the data, create one dataframe by linking Voyages with their events¶
vessel_classes_df = vessel_classes_df.reset_index(drop=True)
vessel_class_id_to_look_for = vessel_classes_df.loc[0, 'vessel_class_id']
voyages = api.get_voyages_by_advanced_search(vessel_class_id = vessel_class_id_to_look_for,first_load_arrival_date_from = voyages_first_load_from, event_purpose = operations_to_display)
# We call the API to fetch data based on the parameters that we want, in this case for Discharges of Capes for Voyages, having first load one year ago
voyages = pd.DataFrame([v.__dict__ for v in voyages]) # We load the result to a DataFrame
pd.set_option('display.max_columns', None)
voyages.tail(2) # Display 2 most recent for educational purposes
imo | voyage_number | vessel_type_id | vessel_class_id | vessel_status_id | commercial_operator_id | deleted | events | id | horizon_id | horizon | latest_received_ais | vessel_name | pit_vessel_name | vessel_type | vessel_class | trade | trade_id | vessel_status | deadweight | year_built | commercial_operator | start_date | first_load_arrival_date | end_date | charterer_id | charterer | rate | rate_type | ballast_bonus | ballast_bonus_type | cargo_type_id | cargo_type | cargo_type_source_id | cargo_type_source | cargo_sub_type_id | cargo_sub_type | cargo_sub_type_source_id | cargo_sub_type_source | cargo_group_id | cargo_group | cargo_group_source_id | cargo_group_source | cargo_sub_group_id | cargo_sub_group | cargo_sub_group_source_id | cargo_sub_group_source | quantity | quantity_unit_id | quantity_unit | quantity_in_barrels | quantity_source_id | quantity_source | cubic_size | laycan_from | laycan_to | fixture_status_id | fixture_status | fixture_date | fixture_is_coa | fixture_is_hold | is_implied_by_ais | has_manual_entries | ballast_distance | predicted_ballast_distance | laden_distance | predicted_laden_distance | suez_crossing | panama_crossing | canakkale_crossing | bosporus_crossing | torres_strait_crossing | magellan_strait_crossing | great_belt_crossing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9611 | 1019709 | 1 | 3 | 70 | 1 | 441.0 | False | (VoyageEvent(id='IF8F3D46SEDEEACD00', voyage_i... | IF8F3D46VEDEEACD00 | 2 | Current | 2025-02-24 15:36:45+00:00 | Mount Anjin | None | Dry | Capesize | Bulk | 4 | Voyage | 211500 | 2025 | Eastern Pacific Shipping | 2025-01-08 06:36:25+00:00 | 2025-02-24 17:59:54.635000+00:00 | 2025-03-20 02:33:57.244000+00:00 | NaN | None | NaN | None | NaN | None | 5258 | Iron Ore Fines | 8 | EstimatedLow | NaN | None | NaN | None | 5644 | Ores and Rocks | 8 | EstimatedLow | 28 | Iron Ore | 8 | EstimatedLow | 169000.0 | 1 | MetricTonnes | None | 1 | Estimated | NaN | NaT | NaT | 5.0 | PossFixed | 2025-02-10 23:54:22+00:00 | False | False | True | None | 5190.99 | 31.82 | NaN | 3820.94 | None | None | None | None | None | None | None |
9612 | 1032268 | 1 | 3 | 70 | 1 | 1109.0 | False | (VoyageEvent(id='IFC04C46SEDEC34000', voyage_i... | IFC04C46VEDEC34000 | 2 | Current | 2025-02-24 15:15:25+00:00 | SG Aquamarine | None | Dry | Capesize | Bulk | 4 | Voyage | 180000 | 2024 | NYK Line | 2024-11-12 03:10:46+00:00 | 2025-01-24 19:55:49+00:00 | 2025-03-08 19:51:51.715000+00:00 | NaN | None | NaN | None | NaN | None | 5258 | Iron Ore Fines | 10 | EstimatedHigh | NaN | None | NaN | None | 5644 | Ores and Rocks | 10 | EstimatedHigh | 28 | Iron Ore | 10 | EstimatedHigh | 174000.0 | 1 | MetricTonnes | None | 1 | Estimated | NaN | NaT | NaT | -2.0 | NotSet | NaT | False | False | None | None | 11958.04 | NaN | 7056.57 | 1410.78 | None | None | None | None | None | None | None |
selected_columns = ['id','imo', 'voyage_number','vessel_name',# Voyages API returns a lot of info regarding a voyage
'vessel_class','cargo_sub_group', # So it is a good idea to choose the columns that we
'cargo_type','start_date','events'] # are most interested for
voyages_new = voyages[selected_columns].copy() # copy the selected columns to a new dataframe
voyages_new.rename(columns={'id': 'voyage_id','start_date':'voyage_start_date'}, inplace = True)
voyages_new.head(5) # Display format
voyage_id | imo | voyage_number | vessel_name | vessel_class | cargo_sub_group | cargo_type | voyage_start_date | events | |
---|---|---|---|---|---|---|---|---|---|
0 | I10210046VEDE250C00 | 1057024 | 1 | Granitz | Capesize | Coal | Thermal Coal | 2024-08-10 20:51:47+00:00 | (VoyageEvent(id='I10210046SEDE250C00', voyage_... |
1 | I8BDAC0VEDD5F4B00 | 9165504 | 125 | Lady Cedros | Capesize | Minerals | Salt | 2024-02-22 19:56:46+00:00 | (VoyageEvent(id='I8BDAC0SEDD5F4B00', voyage_id... |
2 | I8BDAC0VEDDAE6500 | 9165504 | 126 | Lady Cedros | Capesize | Minerals | Salt | 2024-04-24 19:59:54+00:00 | (VoyageEvent(id='I8BDAC0SEDDAE6500', voyage_id... |
3 | I8BDAC0VEDDFD7F00 | 9165504 | 127 | Lady Cedros | Capesize | Minerals | Salt | 2024-06-20 23:47:08+00:00 | (VoyageEvent(id='I8BDAC0SEDDFD7F00', voyage_id... |
4 | I8BDAC0VEDE4C9900 | 9165504 | 128 | Lady Cedros | Capesize | Minerals | Salt | 2024-08-15 03:56:28+00:00 | (VoyageEvent(id='I8BDAC0SEDE4C9900', voyage_id... |
Explode 'events' column values to create 1 row per event and extract event info into columns
events = voyages_new.explode('events')
events['events'] = events['events'].apply(lambda x: x.__dict__)
events = pd.concat([events.drop(['events'], axis=1), events['events'].apply(pd.Series)], axis=1)
events.head(2)
voyage_id | imo | voyage_number | vessel_name | vessel_class | cargo_sub_group | cargo_type | voyage_start_date | id | voyage_id | event_type_id | event_type | event_horizon_id | event_horizon | purpose | event_date | arrival_date | sailing_date | latitude | longitude | geo_asset_id | geo_asset_name | port_id | port_name | port_unlocode | country_id | country | area_idlevel0 | area_name_level0 | area_idlevel1 | area_name_level1 | area_idlevel2 | area_name_level2 | area_idlevel3 | area_name_level3 | low_ais_density | quantity | quantity_unit_id | quantity_unit | quantity_in_barrels | event_details | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | I10210046VEDE250C00 | 1057024 | 1 | Granitz | Capesize | Coal | Thermal Coal | 2024-08-10 20:51:47+00:00 | I10210046SEDE250C00 | None | 2 | VoyageStart | 0 | Historical | Start | 2024-08-10 20:51:47+00:00 | NaT | NaT | 10.3608 | -75.5102 | 6822 | Mammonal Shipyard | 3324 | Mamonal | CO MAM | 56 | Colombia | 24746 | Caribs | 9 | Caribs | 25019 | Atlantic America | 25028 | West | None | NaN | NaN | None | None | None |
0 | I10210046VEDE250C00 | 1057024 | 1 | Granitz | Capesize | Coal | Thermal Coal | 2024-08-10 20:51:47+00:00 | I10210046TEDE250C00 | None | 1 | PortCall | 1 | Current | Load | NaT | 2024-08-10 23:58:47+00:00 | 2024-08-30 00:10:32.300000+00:00 | 10.3608 | -75.5102 | 6822 | Mammonal Shipyard | 3324 | Mamonal | CO MAM | 56 | Colombia | 24746 | Caribs | 9 | Caribs | 25019 | Atlantic America | 25028 | West | True | 138000.0 | 1.0 | MetricTonnes | None | (VoyageEventDetail(id='I10210046LEDE250C00', e... |
Filter events based on the desired output and drop columns that are not needed for this analysis¶
if not country_to_filter: # This checks for both None and empty string
print("The string is null or empty")
else:
events = events[(events['country'] == country_to_filter)] # keep only the events that have to do with China
events = events[(events['purpose'] == operations_to_display)] # keep only the discharges from these events
events.drop(columns = ['port_id', 'voyage_id','event_date','geo_asset_id',
'country_id','area_idlevel0','area_idlevel1',
'area_idlevel2','area_idlevel3','quantity_unit_id',
'quantity_in_barrels','quantity_unit_id','event_details'], inplace=True)
df = events.copy() # copy the output to a new dataframe
df = df.sort_values(by='arrival_date') # sort the datarame by the vessel arrival date
Export the data to an Excel file¶
file_path = segment + '-' + vessels_class_to_filter + '-' + operations_to_display + ' for ' + country_to_filter+'.xlsx' # Name of the exported excel file
# Ensure datetimes are timezone-unaware
df['voyage_start_date'] = df['voyage_start_date'].dt.tz_localize(None)
df['arrival_date'] = df['arrival_date'].dt.tz_localize(None)
df['sailing_date'] = df['sailing_date'].dt.tz_localize(None)
df.to_excel(file_path, index=False)