Capesize Discharges to China¶

Run this example in Colab.¶

APIs Used : Voyages API

Description :

The main goal of this notebook is to retrieve a dataset containing the voyages of Capesizes that discharged / will discharge / currently discharging to China from a year before onwards.

The script walks though the installation of the signal ocean SDK and import of the required dependencies used for the processing of the data. Also the parameters operations_to_display, country_to_filter, segment, vessel_class_to_filter, voyages_first_load_from are initialized, in order to be used to achieve the desired output.

Next the Voyages API is called to fetch the corresponding data to our query.

After that, we expand the Voyages events to perform a Data cleaning operation and retrieve only the Data fit to our criteria and the most relevant columns from the dataset.

Lastly, an Excel file is made and exported to a downloadable format.

Output : Excel file containing the resulting Table.

Setup¶

Install the Signal Ocean package:

In [1]:

Copied!

!pip install signal-ocean
!pip install signal-ocean

Collecting signal-ocean
  Downloading signal_ocean-13.3.0-py3-none-any.whl.metadata (2.2 kB)
Requirement already satisfied: requests<3,>=2.23.0 in /usr/local/lib/python3.11/dist-packages (from signal-ocean) (2.32.3)
Requirement already satisfied: python-dateutil<3,>=2.8.1 in /usr/local/lib/python3.11/dist-packages (from signal-ocean) (2.8.2)
Requirement already satisfied: pandas<3,>=1.0.3 in /usr/local/lib/python3.11/dist-packages (from signal-ocean) (2.2.2)
Requirement already satisfied: numpy>=1.18.5 in /usr/local/lib/python3.11/dist-packages (from signal-ocean) (1.26.4)
Collecting strictly-typed-pandas==0.1.4 (from signal-ocean)
  Downloading strictly_typed_pandas-0.1.4-py3-none-any.whl.metadata (3.0 kB)
Collecting typeguard<3.0.0,>=2.13.3 (from signal-ocean)
  Downloading typeguard-2.13.3-py3-none-any.whl.metadata (3.6 kB)
Requirement already satisfied: pandas-stubs in /usr/local/lib/python3.11/dist-packages (from strictly-typed-pandas==0.1.4->signal-ocean) (2.2.2.240909)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas<3,>=1.0.3->signal-ocean) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas<3,>=1.0.3->signal-ocean) (2025.1)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil<3,>=2.8.1->signal-ocean) (1.17.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.23.0->signal-ocean) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.23.0->signal-ocean) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.23.0->signal-ocean) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3,>=2.23.0->signal-ocean) (2025.1.31)
Requirement already satisfied: types-pytz>=2022.1.1 in /usr/local/lib/python3.11/dist-packages (from pandas-stubs->strictly-typed-pandas==0.1.4->signal-ocean) (2025.1.0.20250204)
Downloading signal_ocean-13.3.0-py3-none-any.whl (155 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 155.6/155.6 kB 3.0 MB/s eta 0:00:00
Downloading strictly_typed_pandas-0.1.4-py3-none-any.whl (9.6 kB)
Downloading typeguard-2.13.3-py3-none-any.whl (17 kB)
Installing collected packages: typeguard, strictly-typed-pandas, signal-ocean
  Attempting uninstall: typeguard
    Found existing installation: typeguard 4.4.2
    Uninstalling typeguard-4.4.2:
      Successfully uninstalled typeguard-4.4.2
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
inflect 7.5.0 requires typeguard>=4.0.1, but you have typeguard 2.13.3 which is incompatible.
Successfully installed signal-ocean-13.3.0 strictly-typed-pandas-0.1.4 typeguard-2.13.3

Import the required Dependencies:

In [2]:

Copied!





from signal_ocean import Connection
from signal_ocean.voyages import VoyagesAPI
import pandas as pd
from datetime import datetime, timedelta
import plotly.express as px
from signal_ocean import Connection
from signal_ocean.voyages import VoyagesAPI
import pandas as pd
from datetime import datetime, timedelta
import plotly.express as px

Parametrisation¶

In [3]:

Copied!

signal_ocean_api_key = '' # Replace with your subscription key

operations_to_display = 'Discharge' # Here you can filter for: Load, Discharge, Stop, Start, Dry dock

country_to_filter = 'China' # The country that we want the discharges of

segment = 'Dry' # This could be either 'Dry', 'Tanker', 'LPG', 'LNG'
vessels_class_to_filter = 'Capesize' # Value from the set
                                     # ('Handysize', 'Handymax', 'Supramax', 'Panamax', 'Capesize','VLOC')

voyages_first_load_from = (datetime.today() - timedelta(365)).strftime("%Y-%m-%d") # We pick the start of the first load date as our time window
signal_ocean_api_key = '' # Replace with your subscription key

operations_to_display = 'Discharge' # Here you can filter for: Load, Discharge, Stop, Start, Dry dock

country_to_filter = 'China' # The country that we want the discharges of

segment = 'Dry' # This could be either 'Dry', 'Tanker', 'LPG', 'LNG'
vessels_class_to_filter = 'Capesize' # Value from the set
                                     # ('Handysize', 'Handymax', 'Supramax', 'Panamax', 'Capesize','VLOC')

voyages_first_load_from = (datetime.today() - timedelta(365)).strftime("%Y-%m-%d") # We pick the start of the first load date as our time window

Main CodeBlock¶

Get your personal Signal Ocean API subscription key (acquired here) and use it to create a Connection:

In [4]:

Copied!

connection = Connection(signal_ocean_api_key)
connection = Connection(signal_ocean_api_key)

Call the Voyages API¶

The Voyages API retrieves information about vessel voyages.

In [5]:

Copied!

api = VoyagesAPI(connection)
api = VoyagesAPI(connection)

For more information, see the Voyages API section.

Get vessel class id for Capesizes

In [6]:

Copied!





vessel_classes = api.get_vessel_classes() # retrieval of all available Vessel Classes
vessel_classes_df = pd.DataFrame(v.__dict__ for v in vessel_classes) # Load into DataFrame for processing
vessel_classes_df = vessel_classes_df[vessel_classes_df['vessel_type']==segment] # Filter only Dry Vessel Classes
vessel_classes_df = vessel_classes_df[vessel_classes_df['vessel_class_name']==vessels_class_to_filter] # Return id of desired vessel Class
vessel_classes_df
vessel_classes = api.get_vessel_classes() # retrieval of all available Vessel Classes
vessel_classes_df = pd.DataFrame(v.__dict__ for v in vessel_classes) # Load into DataFrame for processing
vessel_classes_df = vessel_classes_df[vessel_classes_df['vessel_type']==segment] # Filter only Dry Vessel Classes
vessel_classes_df = vessel_classes_df[vessel_classes_df['vessel_class_name']==vessels_class_to_filter] # Return id of desired vessel Class
vessel_classes_df

Out[6]:

	vessel_class_id	vessel_class_name	vessel_type_id	vessel_type
5	70	Capesize	3	Dry

Calling the Voyages API to get the data, create one dataframe by linking Voyages with their events¶

In [7]:

Copied!

vessel_classes_df = vessel_classes_df.reset_index(drop=True)
vessel_class_id_to_look_for = vessel_classes_df.loc[0, 'vessel_class_id']
vessel_classes_df = vessel_classes_df.reset_index(drop=True)
vessel_class_id_to_look_for = vessel_classes_df.loc[0, 'vessel_class_id']

In [8]:

Copied!





voyages = api.get_voyages_by_advanced_search(vessel_class_id = vessel_class_id_to_look_for,first_load_arrival_date_from = voyages_first_load_from, event_purpose = operations_to_display)
# We call the API to fetch data based on the parameters that we want, in this case for Discharges of Capes for Voyages, having first load one year ago

voyages = pd.DataFrame([v.__dict__ for v in voyages]) # We load the result to a DataFrame
pd.set_option('display.max_columns', None)
voyages.tail(2) # Display 2 most recent for educational purposes
voyages = api.get_voyages_by_advanced_search(vessel_class_id = vessel_class_id_to_look_for,first_load_arrival_date_from = voyages_first_load_from, event_purpose = operations_to_display)
# We call the API to fetch data based on the parameters that we want, in this case for Discharges of Capes for Voyages, having first load one year ago

voyages = pd.DataFrame([v.__dict__ for v in voyages]) # We load the result to a DataFrame
pd.set_option('display.max_columns', None)
voyages.tail(2) # Display 2 most recent for educational purposes

Out[8]:

	imo	voyage_number	vessel_type_id	vessel_class_id	vessel_status_id	commercial_operator_id	deleted	events	id	horizon_id	horizon	latest_received_ais	vessel_name	pit_vessel_name	vessel_type	vessel_class	trade	trade_id	vessel_status	deadweight	year_built	commercial_operator	start_date	first_load_arrival_date	end_date	charterer_id	charterer	rate	rate_type	ballast_bonus	ballast_bonus_type	cargo_type_id	cargo_type	cargo_type_source_id	cargo_type_source	cargo_sub_type_id	cargo_sub_type	cargo_sub_type_source_id	cargo_sub_type_source	cargo_group_id	cargo_group	cargo_group_source_id	cargo_group_source	cargo_sub_group_id	cargo_sub_group	cargo_sub_group_source_id	cargo_sub_group_source	quantity	quantity_unit_id	quantity_unit	quantity_in_barrels	quantity_source_id	quantity_source	cubic_size	laycan_from	laycan_to	fixture_status_id	fixture_status	fixture_date	fixture_is_coa	fixture_is_hold	is_implied_by_ais	has_manual_entries	ballast_distance	predicted_ballast_distance	laden_distance	predicted_laden_distance	suez_crossing	panama_crossing	canakkale_crossing	bosporus_crossing	torres_strait_crossing	magellan_strait_crossing	great_belt_crossing
9611	1019709	1	3	70	1	441.0	False	(VoyageEvent(id='IF8F3D46SEDEEACD00', voyage_i...	IF8F3D46VEDEEACD00	2	Current	2025-02-24 15:36:45+00:00	Mount Anjin	None	Dry	Capesize	Bulk	4	Voyage	211500	2025	Eastern Pacific Shipping	2025-01-08 06:36:25+00:00	2025-02-24 17:59:54.635000+00:00	2025-03-20 02:33:57.244000+00:00	NaN	None	NaN	None	NaN	None	5258	Iron Ore Fines	8	EstimatedLow	NaN	None	NaN	None	5644	Ores and Rocks	8	EstimatedLow	28	Iron Ore	8	EstimatedLow	169000.0	1	MetricTonnes	None	1	Estimated	NaN	NaT	NaT	5.0	PossFixed	2025-02-10 23:54:22+00:00	False	False	True	None	5190.99	31.82	NaN	3820.94	None	None	None	None	None	None	None
9612	1032268	1	3	70	1	1109.0	False	(VoyageEvent(id='IFC04C46SEDEC34000', voyage_i...	IFC04C46VEDEC34000	2	Current	2025-02-24 15:15:25+00:00	SG Aquamarine	None	Dry	Capesize	Bulk	4	Voyage	180000	2024	NYK Line	2024-11-12 03:10:46+00:00	2025-01-24 19:55:49+00:00	2025-03-08 19:51:51.715000+00:00	NaN	None	NaN	None	NaN	None	5258	Iron Ore Fines	10	EstimatedHigh	NaN	None	NaN	None	5644	Ores and Rocks	10	EstimatedHigh	28	Iron Ore	10	EstimatedHigh	174000.0	1	MetricTonnes	None	1	Estimated	NaN	NaT	NaT	-2.0	NotSet	NaT	False	False	None	None	11958.04	NaN	7056.57	1410.78	None	None	None	None	None	None	None

In [9]:

Copied!

selected_columns = ['id','imo', 'voyage_number','vessel_name',# Voyages API returns a lot of info regarding a voyage
                    'vessel_class','cargo_sub_group',         # So it is a good idea to choose the columns that we
                    'cargo_type','start_date','events']       # are most interested for

voyages_new = voyages[selected_columns].copy() # copy the selected columns to a new dataframe

voyages_new.rename(columns={'id': 'voyage_id','start_date':'voyage_start_date'}, inplace = True)

voyages_new.head(5) # Display format
selected_columns = ['id','imo', 'voyage_number','vessel_name',# Voyages API returns a lot of info regarding a voyage
                    'vessel_class','cargo_sub_group',         # So it is a good idea to choose the columns that we
                    'cargo_type','start_date','events']       # are most interested for

voyages_new = voyages[selected_columns].copy() # copy the selected columns to a new dataframe

voyages_new.rename(columns={'id': 'voyage_id','start_date':'voyage_start_date'}, inplace = True)

voyages_new.head(5) # Display format

Out[9]:

	voyage_id	imo	voyage_number	vessel_name	vessel_class	cargo_sub_group	cargo_type	voyage_start_date	events
0	I10210046VEDE250C00	1057024	1	Granitz	Capesize	Coal	Thermal Coal	2024-08-10 20:51:47+00:00	(VoyageEvent(id='I10210046SEDE250C00', voyage_...
1	I8BDAC0VEDD5F4B00	9165504	125	Lady Cedros	Capesize	Minerals	Salt	2024-02-22 19:56:46+00:00	(VoyageEvent(id='I8BDAC0SEDD5F4B00', voyage_id...
2	I8BDAC0VEDDAE6500	9165504	126	Lady Cedros	Capesize	Minerals	Salt	2024-04-24 19:59:54+00:00	(VoyageEvent(id='I8BDAC0SEDDAE6500', voyage_id...
3	I8BDAC0VEDDFD7F00	9165504	127	Lady Cedros	Capesize	Minerals	Salt	2024-06-20 23:47:08+00:00	(VoyageEvent(id='I8BDAC0SEDDFD7F00', voyage_id...
4	I8BDAC0VEDE4C9900	9165504	128	Lady Cedros	Capesize	Minerals	Salt	2024-08-15 03:56:28+00:00	(VoyageEvent(id='I8BDAC0SEDE4C9900', voyage_id...

Explode 'events' column values to create 1 row per event and extract event info into columns

In [10]:

Copied!





events = voyages_new.explode('events')
events['events'] = events['events'].apply(lambda x: x.__dict__)
events = pd.concat([events.drop(['events'], axis=1), events['events'].apply(pd.Series)], axis=1)
events.head(2)
events = voyages_new.explode('events')
events['events'] = events['events'].apply(lambda x: x.__dict__)
events = pd.concat([events.drop(['events'], axis=1), events['events'].apply(pd.Series)], axis=1)
events.head(2)

Out[10]:

	voyage_id	imo	voyage_number	vessel_name	vessel_class	cargo_sub_group	cargo_type	voyage_start_date	id	voyage_id	event_type_id	event_type	event_horizon_id	event_horizon	purpose	event_date	arrival_date	sailing_date	latitude	longitude	geo_asset_id	geo_asset_name	port_id	port_name	port_unlocode	country_id	country	area_idlevel0	area_name_level0	area_idlevel1	area_name_level1	area_idlevel2	area_name_level2	area_idlevel3	area_name_level3	low_ais_density	quantity	quantity_unit_id	quantity_unit	quantity_in_barrels	event_details
0	I10210046VEDE250C00	1057024	1	Granitz	Capesize	Coal	Thermal Coal	2024-08-10 20:51:47+00:00	I10210046SEDE250C00	None	2	VoyageStart	0	Historical	Start	2024-08-10 20:51:47+00:00	NaT	NaT	10.3608	-75.5102	6822	Mammonal Shipyard	3324	Mamonal	CO MAM	56	Colombia	24746	Caribs	9	Caribs	25019	Atlantic America	25028	West	None	NaN	NaN	None	None	None
0	I10210046VEDE250C00	1057024	1	Granitz	Capesize	Coal	Thermal Coal	2024-08-10 20:51:47+00:00	I10210046TEDE250C00	None	1	PortCall	1	Current	Load	NaT	2024-08-10 23:58:47+00:00	2024-08-30 00:10:32.300000+00:00	10.3608	-75.5102	6822	Mammonal Shipyard	3324	Mamonal	CO MAM	56	Colombia	24746	Caribs	9	Caribs	25019	Atlantic America	25028	West	True	138000.0	1.0	MetricTonnes	None	(VoyageEventDetail(id='I10210046LEDE250C00', e...

Filter events based on the desired output and drop columns that are not needed for this analysis¶

In [11]:

Copied!





if not country_to_filter:  # This checks for both None and empty string
  print("The string is null or empty")
else:
  events = events[(events['country'] == country_to_filter)] # keep only the events that have to do with China

  events = events[(events['purpose'] == operations_to_display)] # keep only the discharges from these events


  events.drop(columns = ['port_id', 'voyage_id','event_date','geo_asset_id',
                         'country_id','area_idlevel0','area_idlevel1',
                         'area_idlevel2','area_idlevel3','quantity_unit_id',
                         'quantity_in_barrels','quantity_unit_id','event_details'], inplace=True)
if not country_to_filter:  # This checks for both None and empty string
  print("The string is null or empty")
else:
  events = events[(events['country'] == country_to_filter)] # keep only the events that have to do with China

  events = events[(events['purpose'] == operations_to_display)] # keep only the discharges from these events


  events.drop(columns = ['port_id', 'voyage_id','event_date','geo_asset_id',
                         'country_id','area_idlevel0','area_idlevel1',
                         'area_idlevel2','area_idlevel3','quantity_unit_id',
                         'quantity_in_barrels','quantity_unit_id','event_details'], inplace=True)

In [12]:

Copied!

df = events.copy() # copy the output to a new dataframe
df = df.sort_values(by='arrival_date') # sort the datarame by the vessel arrival date
df = events.copy() # copy the output to a new dataframe
df = df.sort_values(by='arrival_date') # sort the datarame by the vessel arrival date

Export the data to an Excel file¶

In [13]:

Copied!





file_path = segment + '-' + vessels_class_to_filter + '-' + operations_to_display + ' for ' + country_to_filter+'.xlsx' # Name of the exported excel file

# Ensure datetimes are timezone-unaware
df['voyage_start_date'] = df['voyage_start_date'].dt.tz_localize(None)
df['arrival_date'] = df['arrival_date'].dt.tz_localize(None)
df['sailing_date'] = df['sailing_date'].dt.tz_localize(None)
df.to_excel(file_path, index=False)
file_path = segment + '-' + vessels_class_to_filter + '-' + operations_to_display + ' for ' + country_to_filter+'.xlsx' # Name of the exported excel file

# Ensure datetimes are timezone-unaware
df['voyage_start_date'] = df['voyage_start_date'].dt.tz_localize(None)
df['arrival_date'] = df['arrival_date'].dt.tz_localize(None)
df['sailing_date'] = df['sailing_date'].dt.tz_localize(None)
df.to_excel(file_path, index=False)