Scraped Lineups API Example¶

Run this example in Colab ¶

Get your personal Signal Ocean API subscription key (acquired here) and replace it below:

In [1]:

Copied!

signal_ocean_api_key = '' # Replace with your subscription key
signal_ocean_api_key = '' # Replace with your subscription key

Scraped Lineups API¶

The goal of Scraped Lineups API is to collect and return scraped lineups by the given filters. This can be done by using the ScrapedLineupsAPI class and calling appropriate methods

1. Request by filters¶

Lineups can be retrieved for specific filters, by calling the get_lineups method with the following arguments:

Required¶

vessel_type The vessel type

Additionally, at least one of the following is required

lineup_ids List of LineupIDs

imos List of IMOs

message_ids List of MessageIDs

external_message_ids List of ExternalMessageIDs

received_date_from Earliest date the lineup received

received_date_to Latest date the lineup received

updated_date_from Earliest date the lineup updated

updated_date_to Latest date the lineup updated

Mixing received and updated dates is not allowed

It's highly recommended to use UTC dates, since this is the internally used format

2. Request by page token¶

Lineups can also be retrieved, by calling the get_lineups_incremental method. This method will also return next_request_token, which the user should use to their next incremental request to retrieve only the latest updated cargoes.

Required¶

vessel_type The vessel type

Optional¶

page_token The key that should be used as a parameter of the token to retrieve the relevant page.

3. Retrieve page token¶

The token referenced above can be retrieved for a specific date, by calling the get_lineups_incremental_token method with the following argument:

Required¶

updated_date_from Earliest date the lineup updated

Additional optional arguments¶

Methods get_lineups and get_lineups_incremental also accept the following optional arguments:

include_details If this field is True the following columns will be included in the response (otherwise they will be None):

parsed_part_id, line_from, line_to, in_line_order, source

include_scraped_fields If this field is True the following columns will be included in the response (otherwise they will be None):

scraped_vessel_name, scraped_imo, scraped_deadweight, scraped_year_built, scraped_eta, scraped_etb, scraped_etd, scraped_location, scraped_quantity, scraped_cargo_type, scraped_api_gravity, scraped_origin, scraped_destination, scraped_supplier, scraped_charterer, scraped_buyer, scraped_port_agent

include_vessel_details If this field is True the following columns will be included in the response (otherwise they will be None):

vessel_name, deadweight, year_built, liquid_capacity, vessel_type_id, vessel_type, vessel_class, commercial_operator_id, commercial_operator

include_labels If this field is True the following columns will be included in the response (otherwise they will be None):

location_name, location_taxonomy, operation_type, cargo_type, cargo_group, origin_name, origin_taxonomy, destination_name, destination_taxonomy, supplier, charterer, buyer, port_agent, vessel_status

include_content If this field is True the following columns will be included in the response (otherwise they will be None):

content, subject

include_sender If this field is True the following columns will be included in the response (otherwise they will be None):

sender

include_debug_info If this field is True the following columns will be included in the response (otherwise they will be None):

is_private

Default value is True for the arguments described above

Installation¶

To install Signal Ocean SDK, simply run the following command

In [2]:

Copied!

%%capture
%pip install signal-ocean
%%capture
%pip install signal-ocean

Quickstart¶

Import signal-ocean and other modules required for this demo

In [3]:

Copied!





from signal_ocean import Connection
from signal_ocean.scraped_lineups import ScrapedLineupsAPI

from datetime import datetime, timedelta
import pandas as pd
import plotly.graph_objects as go
from signal_ocean import Connection
from signal_ocean.scraped_lineups import ScrapedLineupsAPI

from datetime import datetime, timedelta
import pandas as pd
import plotly.graph_objects as go

Create a new instance of the ScrapedLineupsAPI class

In [4]:

Copied!

connection = Connection(signal_ocean_api_key)
api = ScrapedLineupsAPI(connection)
connection = Connection(signal_ocean_api_key)
api = ScrapedLineupsAPI(connection)

Now you are ready to retrieve your data

Request by date¶

To get all tanker lineups received the last twelve hours, you must declare appropriate vessel_type and received_date_from variables

In [5]:

Copied!

vessel_type = 1  # Tanker
received_date_from = datetime.utcnow() - timedelta(hours=12)
vessel_type = 1  # Tanker
received_date_from = datetime.utcnow() - timedelta(hours=12)

And then call get_lineups method, as below

In [6]:

Copied!





scraped_lineups = api.get_lineups(
    vessel_type=vessel_type,
    received_date_from=received_date_from,
)

next(iter(scraped_lineups), None)
scraped_lineups = api.get_lineups(
    vessel_type=vessel_type,
    received_date_from=received_date_from,
)

next(iter(scraped_lineups), None)

Out[6]:

ScrapedLineup(lineup_id=46348340, message_id=69973665, external_message_id=None, parsed_part_id=74958019, line_from=82, line_to=82, in_line_order=None, source='Email', updated_date=datetime.datetime(2024, 8, 2, 6, 26, 27, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2024, 8, 2, 6, 24, 24, tzinfo=datetime.timezone.utc), is_deleted=False, low_confidence=False, scraped_vessel_name='SUEZ PROTOPIA', scraped_imo=None, scraped_deadweight=None, scraped_year_built=None, imo=9380893, vessel_name='Suez Protopia', deadweight=159999, year_built=2008, liquid_capacity=172092, vessel_type_id=1, vessel_type='Tanker', vessel_class_id=85, vessel_class='Suezmax', commercial_operator_id=1597, commercial_operator='Stealth Maritime Corp', scraped_eta='02.08.2024', eta=datetime.datetime(2024, 8, 2, 0, 0, tzinfo=datetime.timezone.utc), scraped_etb=None, etb=None, scraped_etd=None, etd=None, scraped_location='DARDANELLES', location_geo_id=7258, location_name='Dardanelles (WP)', location_taxonomy_id=1, location_taxonomy='GeoAsset', operation_type_id=3, operation_type='Waypoint', scraped_quantity=None, quantity=None, quantity_unit=None, scraped_cargo_type=None, cargo_type_id=None, cargo_type=None, cargo_group_id=None, cargo_group=None, scraped_api_gravity=None, api_gravity=None, scraped_origin='KOCAELI', origin_geo_id=-1, origin_name=None, origin_taxonomy_id=-1, origin_taxonomy='Unknown', scraped_destination='TALLIN', destination_geo_id=3369, destination_name='Tallinn', destination_taxonomy_id=2, destination_taxonomy='Port', scraped_supplier=None, supplier_id=None, supplier=None, scraped_charterer=None, charterer_id=None, charterer=None, scraped_buyer=None, buyer_id=None, buyer=None, scraped_port_agent=None, port_agent_id=None, port_agent=None, vessel_status_id=None, vessel_status=None, content='TTA/ SUEZ PROTOPIA  KOCAELI  TALLIN  02.08.2024- 08:15 HRS', subject='Turkish Straits Daily Report by Boutrosmar Istanbul August 2, 2024', sender='Boutros Maritime & Transports', is_private=True)

For better visualization, it's convenient to insert data into a DataFrame

In [7]:

Copied!

df = pd.DataFrame(scraped_lineups)

df.head()
df = pd.DataFrame(scraped_lineups)

df.head()

Out[7]:

	lineup_id	message_id	external_message_id	parsed_part_id	line_from	line_to	in_line_order	source	updated_date	received_date	...	buyer	scraped_port_agent	port_agent_id	port_agent	vessel_status_id	vessel_status	content	subject	sender	is_private
0	46348340	69973665	None	74958019	82	82	NaN	Email	2024-08-02 06:26:27+00:00	2024-08-02 06:24:24+00:00	...	None	None	None	None	NaN	None	TTA/ SUEZ PROTOPIA KOCAELI TALLIN 02.08.202...	Turkish Straits Daily Report by Boutrosmar Ist...	Boutros Maritime & Transports	True
1	46348341	69973665	None	74958019	81	81	NaN	Email	2024-08-02 06:26:27+00:00	2024-08-02 06:24:24+00:00	...	None	None	None	None	NaN	None	TTA/ ADVANTAGE LOVE TUAPSE PORT SAID 02.08....	Turkish Straits Daily Report by Boutrosmar Ist...	Boutros Maritime & Transports	True
2	46348342	69973665	None	74958019	62	62	NaN	Email	2024-08-02 06:26:27+00:00	2024-08-02 06:24:24+00:00	...	None	None	None	None	NaN	None	TTA/ AEGEAN NOBILITY JEDDAH NOVOROSSIYSK 02...	Turkish Straits Daily Report by Boutrosmar Ist...	Boutros Maritime & Transports	True
3	46348343	69973665	None	74958019	83	83	NaN	Email	2024-08-02 06:26:27+00:00	2024-08-02 06:24:24+00:00	...	None	None	None	None	NaN	None	TTA/ AGIOS GERASIMOS NOVOROSSIYSK TRIESTE 0...	Turkish Straits Daily Report by Boutrosmar Ist...	Boutros Maritime & Transports	True
4	46348344	69973665	None	74958019	60	60	NaN	Email	2024-08-02 06:26:27+00:00	2024-08-02 06:24:24+00:00	...	None	None	None	None	NaN	None	TTA/ VELOS AMBER UST LUGA ISTANBUL 02.08.20...	Turkish Straits Daily Report by Boutrosmar Ist...	Boutros Maritime & Transports	True

5 rows × 78 columns

Request by IMOs¶

To get lineups for specific vessel(s) by their IMO number(s), you can simple call the get_lineups method for a list of desired IMO(s)

Adding some date argument is always feasible

In [8]:

Copied!





imos = [9380673,9359571,9377200,9383651,9287833]  # Or add a list of your desired IMOs

scraped_lineups_by_imos = api.get_lineups(
    vessel_type=vessel_type,
    received_date_from=received_date_from,
    imos=imos,
)

df_by_imos = pd.DataFrame(scraped_lineups_by_imos)
df_by_imos.head()
imos = [9380673,9359571,9377200,9383651,9287833]  # Or add a list of your desired IMOs

scraped_lineups_by_imos = api.get_lineups(
    vessel_type=vessel_type,
    received_date_from=received_date_from,
    imos=imos,
)

df_by_imos = pd.DataFrame(scraped_lineups_by_imos)
df_by_imos.head()

Out[8]:

	lineup_id	message_id	external_message_id	parsed_part_id	line_from	line_to	in_line_order	source	updated_date	received_date	...	buyer	scraped_port_agent	port_agent_id	port_agent	vessel_status_id	vessel_status	content	subject	sender	is_private
0	46350600	69990985	None	74968644	14	14	NaN	Email	2024-08-02 08:23:00+00:00	2024-08-02 08:21:37+00:00	...	None	ISS-AUGUSTA	None	None	None	None	01/08/2024 11:04\tCALAJUNCO M\tITA\t13671\tBRI...	Augusta and S.ta Panagia Bay daily situation	Nolarma	True
1	46350616	69990985	None	74968644	77	77	NaN	Email	2024-08-02 08:23:00+00:00	2024-08-02 08:21:37+00:00	...	None	ISS-AUGUSTA	None	None	None	None	01/08/2024 12:00\t25 Isab Nord\tCALAJUNCO M\tI...	Augusta and S.ta Panagia Bay daily situation	Nolarma	True
2	46350961	69991561	None	74968988	17	17	NaN	Email	2024-08-02 08:26:40+00:00	2024-08-02 08:26:06+00:00	...	None	None	None	None	None	None	calajunco m\tbrindisi\teni\t03/1500\ttbc\ttbc\...	TARANTO TANKER SITUATION -	Nolarma	True
3	46350963	69991561	None	74968988	9	9	1.0	Email	2024-08-02 08:26:40+00:00	2024-08-02 08:26:06+00:00	...	None	None	None	None	None	None	divina\tortona\teni 1\t02/0700\t02/0848\t03/12...	TARANTO TANKER SITUATION -	Nolarma	True
4	46350964	69991561	None	74968988	9	9	NaN	Email	2024-08-02 08:26:40+00:00	2024-08-02 08:26:06+00:00	...	None	None	None	None	None	None	divina\tortona\teni 1\t02/0700\t02/0848\t03/12...	TARANTO TANKER SITUATION -	Nolarma	True

5 rows × 78 columns

Request by Message or ExternalMessage IDs¶

To retrieve lineups for particular message ID(s), you should include an extra parameter called message_ids when using the get_lineups method. This parameter should contain a list of message IDs. For instance,

In [9]:

Copied!





message_ids = [69347177,69344714,69344260,69342271,69339979]
scraped_lineups_by_message_ids = api.get_lineups(
    vessel_type=vessel_type,
    message_ids=message_ids,
)

next(iter(scraped_lineups_by_message_ids), None)
message_ids = [69347177,69344714,69344260,69342271,69339979]
scraped_lineups_by_message_ids = api.get_lineups(
    vessel_type=vessel_type,
    message_ids=message_ids,
)

next(iter(scraped_lineups_by_message_ids), None)

Out[9]:

ScrapedLineup(lineup_id=45619246, message_id=69339979, external_message_id=None, parsed_part_id=74531114, line_from=5, line_to=5, in_line_order=None, source='Email', updated_date=datetime.datetime(2024, 7, 25, 7, 29, 5, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2024, 7, 25, 7, 27, 20, tzinfo=datetime.timezone.utc), is_deleted=False, low_confidence=False, scraped_vessel_name='meltemi', scraped_imo=None, scraped_deadweight=None, scraped_year_built=None, imo=-1, vessel_name=None, deadweight=None, year_built=None, liquid_capacity=None, vessel_type_id=-1, vessel_type='Unknown', vessel_class_id=-1, vessel_class='Unknown', commercial_operator_id=None, commercial_operator=None, scraped_eta='20/07/2024', eta=datetime.datetime(2024, 7, 20, 0, 0, tzinfo=datetime.timezone.utc), scraped_etb='25/07/2024', etb=datetime.datetime(2024, 7, 25, 0, 0, tzinfo=datetime.timezone.utc), scraped_etd='26/07/2024', etd=datetime.datetime(2024, 7, 26, 0, 0, tzinfo=datetime.timezone.utc), scraped_location='trieste', location_geo_id=3565, location_name='Trieste', location_taxonomy_id=2, location_taxonomy='Port', operation_type_id=2, operation_type='Discharge', scraped_quantity='139,091', quantity=139091.0, quantity_unit='Tonnes', scraped_cargo_type='CRUDE', cargo_type_id=19, cargo_type='Crude Oil', cargo_group_id=19, cargo_group=None, scraped_api_gravity=None, api_gravity=None, scraped_origin='marsa el hariga', origin_geo_id=3625, origin_name='Marsa El Hariga', origin_taxonomy_id=2, origin_taxonomy='Port', scraped_destination=None, destination_geo_id=-1, destination_name=None, destination_taxonomy_id=-1, destination_taxonomy='Unknown', scraped_supplier=None, supplier_id=None, supplier=None, scraped_charterer=None, charterer_id=None, charterer=None, scraped_buyer='p66d/ruhr-roor', buyer_id=6064, buyer='Ruhr Oel', scraped_port_agent=None, port_agent_id=None, port_agent=None, vessel_status_id=None, vessel_status=None, content='meltemi\t20/07/2024\t23:00\tyes\t25/07/2024\t13:00\tsiot\t3\t\t26/07/2024\t22:00\tmarsa el hariga\tsame\t139,091\tp66d/ruhr-roor', subject='SIOT TERMINAL TRIESTE DAILY SITUATION', sender='Nolarma', is_private=True)

You can achieve a similar result for external message IDs by providing an argument called external_message_ids.

Request by Lineup IDs¶

In the same manner, to get data for specific lineup ID(s), you must call the get_lineups method for a list of desired lineup ID(s)

Date arguments are not available in this method

In [10]:

Copied!





lineup_ids = [45618852,45621364,45619822,45619821,45619820]  # Or add a list of your desired lineup IDs

scraped_lineups_by_ids = api.get_lineups(
    vessel_type=vessel_type,
    lineup_ids=lineup_ids,
)

df_by_ids = pd.DataFrame(scraped_lineups_by_ids)
df_by_ids.head()
lineup_ids = [45618852,45621364,45619822,45619821,45619820]  # Or add a list of your desired lineup IDs

scraped_lineups_by_ids = api.get_lineups(
    vessel_type=vessel_type,
    lineup_ids=lineup_ids,
)

df_by_ids = pd.DataFrame(scraped_lineups_by_ids)
df_by_ids.head()

Out[10]:

	lineup_id	message_id	external_message_id	parsed_part_id	line_from	line_to	in_line_order	source	updated_date	received_date	...	buyer	scraped_port_agent	port_agent_id	port_agent	vessel_status_id	vessel_status	content	subject	sender	is_private
0	45618852	69335109	None	74528225	81	81	None	Email	2024-07-25 07:07:11+00:00	2024-07-25 07:05:19+00:00	...	None	None	None	None	None	None	TTA/ NEW VISION KOCAELI ALIAGA 25.07.2024- ...	Turkish Straits Daily Report by Boutrosmar Ist...	Boutros Maritime & Transports	True
1	45619820	69344260	None	74533628	33	33	None	Email	2024-07-25 07:47:03+00:00	2024-07-25 07:45:21+00:00	...	None	None	None	None	None	None	\tvalle di granada\t\t\t\t\t27/1500\tload cpp	Milazzo daily situation	Nolarma	True
2	45619821	69344260	None	74533628	32	32	None	Email	2024-07-25 07:47:03+00:00	2024-07-25 07:45:21+00:00	...	None	None	None	None	None	None	\tred garnet\t\t\t\t\t26/0800\tload vgo	Milazzo daily situation	Nolarma	True
3	45619822	69344260	None	74533628	31	31	None	Email	2024-07-25 07:47:03+00:00	2024-07-25 07:45:21+00:00	...	None	None	None	None	None	None	\ttiger eye\t\t\t\t\t26/0800\tload cpp	Milazzo daily situation	Nolarma	True
4	45621364	69347177	None	74535337	19	19	None	Email	2024-07-25 07:59:13+00:00	2024-07-25 07:58:51+00:00	...	Saras	None	None	None	None	None	ohio\t\t30/1900\t\t\t\tceyhan\tdisch\tcrude oi...	Sarroch daily situation -	Nolarma	True

5 rows × 78 columns

Retrieve incremental token¶

The get_lineups_incremental_token allows retrieval of page_token by adding the updated_date_from value. This page_token can be used as a variable to the get_lineups_incremental method, in order to retrieve only new or updated lineups.

For instance, to retrieve the relevant page token for last week's data, you must call the get_lineups_incremental_token, as below

In [11]:

Copied!





updated_date_from = datetime.utcnow() - timedelta(days=7)
page_token = api.get_lineups_incremental_token(
    updated_date_from=updated_date_from,
)

page_token
updated_date_from = datetime.utcnow() - timedelta(days=7)
page_token = api.get_lineups_incremental_token(
    updated_date_from=updated_date_from,
)

page_token

Out[11]:

'MjAyNC0wNy0yNlQwODo0MjoyNi4wMDBa'

Request by page token¶

If the page_token parameter is not specified, the get_lineups_incremental method will retrieve all lineups. You can call the get_lineups_incremental method, as below

In [12]:

Copied!





scraped_lineups_incremental = api.get_lineups_incremental(
    vessel_type=vessel_type,
    page_token=page_token,
)

df_by_ids = pd.DataFrame(scraped_lineups_incremental.data)
df_by_ids.head()
scraped_lineups_incremental = api.get_lineups_incremental(
    vessel_type=vessel_type,
    page_token=page_token,
)

df_by_ids = pd.DataFrame(scraped_lineups_incremental.data)
df_by_ids.head()

Out[12]:

	lineup_id	message_id	external_message_id	parsed_part_id	line_from	line_to	in_line_order	source	updated_date	received_date	...	buyer	scraped_port_agent	port_agent_id	port_agent	vessel_status_id	vessel_status	content	subject	sender	is_private
0	45726016	69455361	None	74607744	23	23	NaN	Email	2024-07-26 09:28:49+00:00	2024-07-26 09:27:55+00:00	...	None	None	None	None	1.0	Expected	\tWHITE STAR\tGIBILTERRA\tENI\t27/0200\tTBC\tT...	CAMBIASO RISSO - DAILY PORT SITUATION	Cambiaso Risso Group	True
1	45726017	69455361	None	74607746	24	24	NaN	Email	2024-07-26 09:28:49+00:00	2024-07-26 09:27:55+00:00	...	None	None	None	None	3.0	AtAnchor	25/2000\tSYN ZUBE\tITA\t3827\tNEMRUT\tL\t2000\...	CAMBIASO RISSO - DAILY PORT SITUATION	Cambiaso Risso Group	True
2	45726018	69455361	None	74607746	34	34	NaN	Email	2024-07-26 09:28:49+00:00	2024-07-26 09:27:55+00:00	...	None	None	None	None	1.0	Expected	28/0700\tLADY\tITA\t24494\tLIVORNO\t\t\t\t	CAMBIASO RISSO - DAILY PORT SITUATION	Cambiaso Risso Group	True
3	45726019	69455361	None	74607743	3	3	NaN	Email	2024-07-26 09:28:49+00:00	2024-07-26 09:27:55+00:00	...	None	None	None	None	4.0	AtBerth	ORIENTAL TULIP\tAT BERTH\t23/07 14,36\t26/07 P...	CAMBIASO RISSO - DAILY PORT SITUATION	Cambiaso Risso Group	True
4	45726020	69455361	None	74607743	4	4	NaN	Email	2024-07-26 09:28:49+00:00	2024-07-26 09:27:55+00:00	...	None	None	None	None	4.0	AtBerth	SUPERBA\tAT BERTH\t23/07 13,18\t28/07 P.M.\tEL...	CAMBIASO RISSO - DAILY PORT SITUATION	Cambiaso Risso Group	True

5 rows × 78 columns

Also, the get_lineups_incremental method will retrieve the next_request_token, which should be used as the page_token for the next incremental call and retrieve only new or updated lineups.

In [13]:

Copied!

next_request_token = scraped_lineups_incremental.next_request_token

next_request_token
next_request_token = scraped_lineups_incremental.next_request_token

next_request_token

Out[13]:

'MjAyNC0wOC0wMlQwODo0MjoyNi43NDNa'

Usage of optional arguments¶

By default, all fields are returned. In many cases, it is convenient to select specific columns. For example, if we want to compare scraped and mapped fields

In [14]:

Copied!





scraped_mapped_columns = [
    'scraped_vessel_name',
    'vessel_name',
    'scraped_location',
    'location_name',
    'scraped_origin',
    'origin_name',
    'scraped_cargo_type',
    'cargo_type',
    'scraped_quantity',
    'quantity',
]

scraped_mapped_df = pd.DataFrame(scraped_lineups, columns=scraped_mapped_columns)

scraped_mapped_df.head()
scraped_mapped_columns = [
    'scraped_vessel_name',
    'vessel_name',
    'scraped_location',
    'location_name',
    'scraped_origin',
    'origin_name',
    'scraped_cargo_type',
    'cargo_type',
    'scraped_quantity',
    'quantity',
]

scraped_mapped_df = pd.DataFrame(scraped_lineups, columns=scraped_mapped_columns)

scraped_mapped_df.head()

Out[14]:

	scraped_vessel_name	vessel_name	scraped_location	location_name	scraped_origin	origin_name	scraped_cargo_type	cargo_type	scraped_quantity	quantity
0	SUEZ PROTOPIA	Suez Protopia	DARDANELLES	Dardanelles (WP)	KOCAELI	None	None	None	None	NaN
1	ADVANTAGE LOVE	Advantage Love	DARDANELLES	Dardanelles (WP)	TUAPSE	Tuapse	None	None	None	NaN
2	AEGEAN NOBILITY	Aegean Nobility	BOSPHORUS	Bosphorus	JEDDAH	Jeddah	None	None	None	NaN
3	AGIOS GERASIMOS	Agios Gerasimos	DARDANELLES	Dardanelles (WP)	NOVOROSSIYSK	Novorossiysk	None	None	None	NaN
4	VELOS AMBER	Velos Amber	BOSPHORUS	Bosphorus	UST LUGA	Ust-Luga	None	None	None	NaN

Examples¶

Let's start by fetching all tanker lineups received the last week

In [15]:

Copied!





example_vessel_type = 1  # Tanker
example_date_from = datetime.utcnow() - timedelta(days=7)

example_scraped_lineups = api.get_lineups(
    vessel_type=example_vessel_type,
    received_date_from=example_date_from,
)
example_vessel_type = 1  # Tanker
example_date_from = datetime.utcnow() - timedelta(days=7)

example_scraped_lineups = api.get_lineups(
    vessel_type=example_vessel_type,
    received_date_from=example_date_from,
)

Exclude deleted scraped lineups¶

The is_deleted property of a scraped lineup indicates whether it is valid or not. If it is set to True, the corresponding lineup_id has been replaced by a new one.

For the sake of completeness, we will exclude deleted scraped lineups in the following examples.

In [16]:

Copied!

example_scraped_lineups = [lineup for lineup in example_scraped_lineups if not lineup.is_deleted]

next(iter(example_scraped_lineups), None)
example_scraped_lineups = [lineup for lineup in example_scraped_lineups if not lineup.is_deleted]

next(iter(example_scraped_lineups), None)

Out[16]:

ScrapedLineup(lineup_id=45726016, message_id=69455361, external_message_id=None, parsed_part_id=74607744, line_from=23, line_to=23, in_line_order=None, source='Email', updated_date=datetime.datetime(2024, 7, 26, 9, 28, 49, tzinfo=datetime.timezone.utc), received_date=datetime.datetime(2024, 7, 26, 9, 27, 55, tzinfo=datetime.timezone.utc), is_deleted=False, low_confidence=False, scraped_vessel_name='WHITE STAR', scraped_imo=None, scraped_deadweight=None, scraped_year_built=None, imo=-1, vessel_name=None, deadweight=None, year_built=None, liquid_capacity=None, vessel_type_id=-1, vessel_type='Unknown', vessel_class_id=-1, vessel_class='Unknown', commercial_operator_id=None, commercial_operator=None, scraped_eta='27/0200', eta=datetime.datetime(2024, 7, 27, 0, 0, tzinfo=datetime.timezone.utc), scraped_etb='TBC', etb=None, scraped_etd='TBC', etd=None, scraped_location='TARANTO', location_geo_id=3564, location_name='Taranto', location_taxonomy_id=2, location_taxonomy='Port', operation_type_id=1, operation_type='Load', scraped_quantity='5.5 KT', quantity=5500.0, quantity_unit='Tonnes', scraped_cargo_type='BITUMEN', cargo_type_id=18, cargo_type='Bitumen', cargo_group_id=130000, cargo_group='Dirty', scraped_api_gravity=None, api_gravity=None, scraped_origin=None, origin_geo_id=None, origin_name=None, origin_taxonomy_id=None, origin_taxonomy=None, scraped_destination='TBC', destination_geo_id=-1, destination_name=None, destination_taxonomy_id=-1, destination_taxonomy='Unknown', scraped_supplier=None, supplier_id=None, supplier=None, scraped_charterer=None, charterer_id=None, charterer=None, scraped_buyer=None, buyer_id=None, buyer=None, scraped_port_agent=None, port_agent_id=None, port_agent=None, vessel_status_id=1, vessel_status='Expected', content='\tWHITE STAR\tGIBILTERRA\tENI\t27/0200\tTBC\tTBC\tLOADING\tBITUMEN\t5.5 KT\tTBC\t', subject='CAMBIASO RISSO - DAILY  PORT SITUATION', sender='Cambiaso Risso Group', is_private=True)

Now, we are ready to insert our data into a dataframe and keep only specific fields

In [17]:

Copied!





example_columns = [
    'imo',
    'location_name',   
    'operation_type',
    'eta',
    'quantity',
    'is_deleted'
]

data = pd.DataFrame(example_scraped_lineups, columns=example_columns).astype({'imo': 'Int64'})

data.head()
example_columns = [
    'imo',
    'location_name',   
    'operation_type',
    'eta',
    'quantity',
    'is_deleted'
]

data = pd.DataFrame(example_scraped_lineups, columns=example_columns).astype({'imo': 'Int64'})

data.head()

Out[17]:

	imo	location_name	operation_type	eta	quantity	is_deleted
0	-1	Taranto	Load	2024-07-27 00:00:00+00:00	5500.0	False
1	-1	Raffineria Di Milazzo	Load	2024-07-25 00:00:00+00:00	2000.0	False
2	-1	Raffineria Di Milazzo	Discharge	2024-07-28 00:00:00+00:00	NaN	False
3	9890549	Genoa	Load	2024-07-23 00:00:00+00:00	12000.0	False
4	9538165	Genoa	None	2024-07-23 00:00:00+00:00	30000.0	False

Top 10 Ports¶

In this example, we will find the top 10 Ports, based on the number of distinct operations

In [18]:

Copied!





top_ports = data[data.operation_type.notna()]\
    [['location_name', 'imo', 'operation_type']]\
    .drop_duplicates()\
    .location_name\
    .value_counts()\
    .head(10)\
    .index\
    .tolist()

top_ports_df = data[data.location_name.isin(top_ports)]

top_ports_pivot = pd.pivot_table(
    top_ports_df,
    index='location_name',
    columns='operation_type',
    values='imo',
    aggfunc=lambda x: len(x.unique()),
    fill_value=0,
    dropna=False,
).reset_index().rename_axis(None, axis=1)

top_ports_pivot.index = top_ports_pivot.location_name

top_ports_pivot  = top_ports_pivot.loc[top_ports].reset_index(drop=True)
top_ports_pivot
top_ports = data[data.operation_type.notna()]\
    [['location_name', 'imo', 'operation_type']]\
    .drop_duplicates()\
    .location_name\
    .value_counts()\
    .head(10)\
    .index\
    .tolist()

top_ports_df = data[data.location_name.isin(top_ports)]

top_ports_pivot = pd.pivot_table(
    top_ports_df,
    index='location_name',
    columns='operation_type',
    values='imo',
    aggfunc=lambda x: len(x.unique()),
    fill_value=0,
    dropna=False,
).reset_index().rename_axis(None, axis=1)

top_ports_pivot.index = top_ports_pivot.location_name

top_ports_pivot  = top_ports_pivot.loc[top_ports].reset_index(drop=True)
top_ports_pivot

Out[18]:

	location_name	Discharge	Load	Unknown	Waypoint
0	Sikka	30	22	0	0
1	Ceyhan	0	44	0	0
2	Hodeidah	36	1	0	0
3	Basrah	10	17	0	0
4	Aden	17	3	0	6
5	Sarroch	8	17	1	0
6	Mumbai	18	5	0	0
7	Vadinar	18	4	0	0
8	Dardanelles (WP)	0	0	0	20
9	Trieste	18	0	0	0

And display results in a stacked bar plot

In [19]:

Copied!





ports_list = top_ports_pivot.location_name.tolist()

disch_bar = 'Discharge' in top_ports_pivot and go.Bar(
    name='Discharge',
    x=ports_list,
    y=top_ports_pivot.Discharge.tolist(),
)
load_bar = 'Load' in top_ports_pivot and go.Bar(
    name='Load',
    x=ports_list,
    y=top_ports_pivot.Load.tolist(),
)
waypoint_bar = 'Waypoint' in top_ports_pivot and go.Bar(
    name='Waypoint',
    x=ports_list,
    y=top_ports_pivot.Waypoint.tolist(),
)
fig_data = list(filter(
    lambda x: x,
    [disch_bar, load_bar, waypoint_bar]
))

fig = go.Figure(data=fig_data)

fig.update_layout(barmode='stack')
fig.show()
ports_list = top_ports_pivot.location_name.tolist()

disch_bar = 'Discharge' in top_ports_pivot and go.Bar(
    name='Discharge',
    x=ports_list,
    y=top_ports_pivot.Discharge.tolist(),
)
load_bar = 'Load' in top_ports_pivot and go.Bar(
    name='Load',
    x=ports_list,
    y=top_ports_pivot.Load.tolist(),
)
waypoint_bar = 'Waypoint' in top_ports_pivot and go.Bar(
    name='Waypoint',
    x=ports_list,
    y=top_ports_pivot.Waypoint.tolist(),
)
fig_data = list(filter(
    lambda x: x,
    [disch_bar, load_bar, waypoint_bar]
))

fig = go.Figure(data=fig_data)

fig.update_layout(barmode='stack')
fig.show()

No description has been provided for this image

Imported quantity in specific ports¶

In this example, we will create a visualization, for the total cargo quantity discharging in specifc ports per day over the next week

In [20]:

Copied!





ports = data[data.operation_type == 'Discharge'][['imo', 'location_name']]\
    .drop_duplicates()\
    .location_name\
    .value_counts()\
    .head()\
    .index\
    .tolist()

this_week_days = pd.date_range(start=datetime.utcnow().date(), freq='D', periods=7, tz='UTC')

ports
ports = data[data.operation_type == 'Discharge'][['imo', 'location_name']]\
    .drop_duplicates()\
    .location_name\
    .value_counts()\
    .head()\
    .index\
    .tolist()

this_week_days = pd.date_range(start=datetime.utcnow().date(), freq='D', periods=7, tz='UTC')

ports

Out[20]:

['Hodeidah', 'Sikka', 'Trieste', 'Mumbai', 'Vadinar']

Create the pivot table

In [21]:

Copied!





exports_mask = data.location_name.isin(ports) & (data.operation_type == 'Discharge')

df_exports = data[exports_mask][['eta', 'location_name', 'quantity', 'imo']].drop_duplicates()

df_pivot = pd\
    .pivot_table(
        df_exports,
        columns='location_name',
        index='eta',
        values='quantity',
        aggfunc=sum,
        fill_value=0,
    )\
    .reindex(index=this_week_days, fill_value=0)\
    .reset_index()\
    .rename(columns={'index': 'eta'})\
    .rename_axis(None, axis=1)

df_pivot
exports_mask = data.location_name.isin(ports) & (data.operation_type == 'Discharge')

df_exports = data[exports_mask][['eta', 'location_name', 'quantity', 'imo']].drop_duplicates()

df_pivot = pd\
    .pivot_table(
        df_exports,
        columns='location_name',
        index='eta',
        values='quantity',
        aggfunc=sum,
        fill_value=0,
    )\
    .reindex(index=this_week_days, fill_value=0)\
    .reset_index()\
    .rename(columns={'index': 'eta'})\
    .rename_axis(None, axis=1)

df_pivot

Out[21]:

	eta	Mumbai	Sikka	Trieste	Vadinar
0	2024-08-02 00:00:00+00:00	11700	153000	209932	278000
1	2024-08-03 00:00:00+00:00	0	532302	264951	1015201
2	2024-08-04 00:00:00+00:00	0	332308	187887	0
3	2024-08-05 00:00:00+00:00	0	100000	425414	0
4	2024-08-06 00:00:00+00:00	29000	0	137357	0
5	2024-08-07 00:00:00+00:00	0	200000	0	0
6	2024-08-08 00:00:00+00:00	0	0	0	0

And display the results as timeseries

In [22]:

Copied!





def port_button(port):
    args = [
        {'visible': [i == ports.index(port) for i in range(len(ports))]},
        {
            'title': f'Exported quantity in {port} per day',
            'showlegend': True
        },
    ]
    
    return dict(
        label=port,
        method='update',
        args=args,
    )

title = 'Exported quantity per day'
today = datetime.combine(datetime.utcnow().date(), datetime.min.time())

ports_fig = go.Figure()

port_buttons = []

for port in ports:
    if port not in df_pivot.columns:
        continue
    port_scatter_plot = go.Scatter(    
        x=df_pivot.eta,
        y=df_pivot[port],
        name=port,
        mode='lines',
    )

    ports_fig.add_trace(port_scatter_plot)
    
    port_buttons.append(port_button(port))
    
buttons = list([
    dict(
        label='All',
        method='update',
        args=[    
            {'visible': [True for _ in range(len(ports))]},
            {
                'title': title,
                'showlegend': True
            }
        ],
    ),
    *port_buttons,
])

ports_fig.update_layout(
    title=title,
    updatemenus=[go.layout.Updatemenu(
        active=0,
        buttons=buttons,
    )],
    xaxis_range=[today - timedelta(hours=4), today + timedelta(hours=24*6 + 4)],
)

ports_fig.show()
def port_button(port):
    args = [
        {'visible': [i == ports.index(port) for i in range(len(ports))]},
        {
            'title': f'Exported quantity in {port} per day',
            'showlegend': True
        },
    ]
    
    return dict(
        label=port,
        method='update',
        args=args,
    )

title = 'Exported quantity per day'
today = datetime.combine(datetime.utcnow().date(), datetime.min.time())

ports_fig = go.Figure()

port_buttons = []

for port in ports:
    if port not in df_pivot.columns:
        continue
    port_scatter_plot = go.Scatter(    
        x=df_pivot.eta,
        y=df_pivot[port],
        name=port,
        mode='lines',
    )

    ports_fig.add_trace(port_scatter_plot)
    
    port_buttons.append(port_button(port))
    
buttons = list([
    dict(
        label='All',
        method='update',
        args=[    
            {'visible': [True for _ in range(len(ports))]},
            {
                'title': title,
                'showlegend': True
            }
        ],
    ),
    *port_buttons,
])

ports_fig.update_layout(
    title=title,
    updatemenus=[go.layout.Updatemenu(
        active=0,
        buttons=buttons,
    )],
    xaxis_range=[today - timedelta(hours=4), today + timedelta(hours=24*6 + 4)],
)

ports_fig.show()

Export data to csv¶

In [23]:

Copied!





output_path = '' # Change output_path with your path
filename = 'last_week_lineups.csv'
if not data.empty:
    data.to_csv(output_path+filename, index=False)
output_path = '' # Change output_path with your path
filename = 'last_week_lineups.csv'
if not data.empty:
    data.to_csv(output_path+filename, index=False)

In [ ]:

Scraped Lineups API Example¶

Run this example in Colab¶

Scraped Lineups API¶

1. Request by filters¶

Required¶

2. Request by page token¶

Required¶

Optional¶

3. Retrieve page token¶

Required¶

Additional optional arguments¶

Installation¶

Quickstart¶

Request by date¶

Request by IMOs¶

Request by Message or ExternalMessage IDs¶

Request by Lineup IDs¶

Retrieve incremental token¶

Request by page token¶

Usage of optional arguments¶

Examples¶

Exclude deleted scraped lineups¶

Top 10 Ports¶

Imported quantity in specific ports¶

Export data to csv¶

Run this example in Colab ¶