> ## Documentation Index
> Fetch the complete documentation index at: https://docs.mangrovesystems.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Bulk import data

> In the Data Inputs section, bulk import data through a project-specific Data Source.

## Importing data

<Steps>
  <Step title="Prepare your data">
    Prepare the raw data file that you need to bring into your project in Mangrove. Acceptable file formats include CSV, XLS, and XLSX.

    * Ensure you're using the right import template
    * Populate all required columns in the template
      <Note>To streamline imports, ensure that data in the import templates reflect the correct formatting (e.g., date format or datetime format) and units (e.g., values in (%) should be between 0 to 100). A best practice is to “paste as value only” into the import templates to avoid copying external links and formulae into the template.</Note>
  </Step>

  <Step title="Upload the file">
    Upload the file into the corresponding Data Source in the Mangrove platform

    * Orders recorded for this customer will have the currency set as the default
    * Save contacts, and important information about the customer's registry accounts
  </Step>

  <Step title="Review the loaded events">
    Events are transformed from your bulk import and populate the Analytics charts and Events Feed.

    * View event data: by selecting each event on the feed
    * Attach evidence files to relevant events: with **Upload Evidence** on each event
    * Filtering events: You can filter for specific event types and time ranges in the feed and on the Analytics charts

    To check the status of ongoing bulk imports, visit the Data Inputs > Bulk Jobs section. If you encounter an error that you need assistance with, please reach out to Mangrove through your shared channel.

    * Reviewing issues with imports: Issues with bulk imports would be reflected as `errored` bulk jobs. Review the error message for more detail on the specific issue transforming the data in your bulk import file into Mangrove.
    * Cancel or Reverse an import: Mistakenly import wrong data? A completed bulk job can be **Reversed**. A bulk job that has already started can be **Canceled** - any existing data transformed from your import file will be deleted.
    * **Retry** an import
  </Step>
</Steps>

## Data transformations

Transformations can be written by Admin users to be applied to every new file bulk imported from that data source.

Transformations can be applied to bulk import files to generate events and evidences from them.

See [Transformation contract](#transformation-contract) for the variables a script can read and the event object shape it must produce.

### Transformation contract

A transformation is a Python script that runs once per import file. The execution environment exposes three top-level variables:

| Variable   | Type         | Description                                                                                             |
| ---------- | ------------ | ------------------------------------------------------------------------------------------------------- |
| `rows`     | `list[dict]` | One dictionary per row in the import file. Access columns by header name, e.g. `row["Date"]`.           |
| `filename` | `str`        | The name of the uploaded file, including extension (e.g. `"march-shipments.csv"`).                      |
| `results`  | `list[dict]` | The script must declare this (`results = []`) and append one event object per event it wants to create. |

A minimal script:

```python theme={null}
from datetime import datetime

results = []
for row in rows:
    results.append({
        "event_type": "shipment",
        "start_time": datetime.strptime(row["Date"], "%m/%d/%Y").isoformat() + " " + row["Timezone"],
        "end_time": datetime.strptime(row["Date"], "%m/%d/%Y").isoformat() + " " + row["Timezone"],
        "data_points": [
            {"slug": "shipment-mass", "value": row["Shipment Mass"]},
        ],
    })
```

#### Event object shape

Each object appended to `results` represents one event. The fields below mirror what Mangrove's transformation runtime accepts.

**Required**

| Field         | Type       | Description                                                                                                                                                                                                       |
| ------------- | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `event_type`  | `string`   | Slug of the event type to create. Must match a configured event type on the project.                                                                                                                              |
| `start_time`  | `string`   | Timestamp in `%Y-%m-%d %H:%M:%S%z` format (e.g. `2024-02-01 12:00:00-04:00`).                                                                                                                                     |
| `end_time`    | `string`   | Timestamp in `%Y-%m-%d %H:%M:%S%z` format.                                                                                                                                                                        |
| `data_points` | `object[]` | Each entry has a `slug` (matching a configured data point) and a `value` matching that data point's value type. Optionally include `evidence_refs: string[]` to link to entries in `evidences` by their `ref_id`. |

**Optional**

| Field         | Type       | Description                                                                                                                                                                                                                                                                                                                                                                                                         |
| ------------- | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `tracking_id` | `string`   | External identifier for deduplication and reconciliation. Required when the event type is configured to require one.                                                                                                                                                                                                                                                                                                |
| `notes`       | `string`   | Free-text notes attached to the event.                                                                                                                                                                                                                                                                                                                                                                              |
| `locations`   | `object[]` | Each entry has a `name` (string, required) and may include `lat` and `long` (float). Mangrove matches on `name`; new locations are created if no match exists.                                                                                                                                                                                                                                                      |
| `feedstock`   | `object`   | An existing project feedstock identified by `name` and `feedstock_type`. Required when the event type is configured to require one.                                                                                                                                                                                                                                                                                 |
| `evidences`   | `object[]` | Text-based evidence files to attach to the event. Each entry takes one of these shapes: <br />• `{ "url_reference": "...", "display_name": "..." }` <br />• `{ "url": "...", "name": "..." }` <br />• `{ "base64": "...", "name": "...", "type": "..." }` <br />• `{ "name": "...", "type": "...", "content": "..." }` <br />Add a `ref_id` to link the evidence to specific data points via their `evidence_refs`. |

### Editing transformations on each Data Source

Admin users can define the event types that are generated from every Data Source through the Transformations Editor.

* In **Data Inputs > Input Settings**, select an existing Data Source
* **Add transformation**
* Select an event type for data from the Data Source
* Edit the Python transformation in the editor

Transformations are written in Python, and need to return a `results` array of objects representing the events to generate from the import data. An example is shown below:

```python Transformation theme={null}
from datetime import datetime

results = []
for row in rows:
	event_object = {
		"event_type": "<event slug>",
		"notes": row["<notes column>"],
		"start_time": parse_datetime(row["<start time column>"] row["<timezone column>"]).isoformat(),
		"end_time": parse_datetime(row["<end time column>"] row["<timezone column>"]).isoformat(),
		"locations": [
			{
				"name": row["<location name column>"],
				"lat": 43.6499286,
				"long": -79.3858228
			}
		],
		"feedstock": {
            "name": "Dairy - Krol Farms",
            "feedstock_type": "Animal Waste"
        },
		"evidences": [
			{
				"name": "<name for evidence file>",
				"ref_id": "evidence_1",
				"type": "json",
				"content": {
					"some_key": "some_content"
				}
			}
		],
		"data_points": [
			{
				"slug": "<datapoint 1 slug>",
				"value": row["<datapoint1 column>"],
				"evidence_refs": ["evidence_1"]
			},
			{
				"slug": "<datapoint 2 slug>",
				"value": row["<datapoint2 column>"]
			}
		]
	}
	results.append(event_object)
```

Here’s an example of what the `results` array might look like following execution of the transformation above:

```json results theme={null}
[
	{
		'event_type': '<event slug>', 
		'notes': 'This is an example event note',
		'start_time': '2024-02-01 12:00:00-04:00',
		'end_time': '2024-02-01 15:30:00-04:00',
		'locations': [
			{
				"name": "<location name>",
				"lat": 43.6499286,
				"long": -79.3858228
			}
		],
		"feedstock": {
            "name": "Dairy - Krol Farms",
            "feedstock_type": "Animal Waste"
        },
		'evidences': [
			{
				'name': '<name for evidence file>',
				'ref_id': 'evidence_1',
				'type': 'json',
				'content': {
					'some_key': 'some_content'
				}
			}
		],
		'data_points': [
			{
				'slug': '<datapoint 1 slug>',
				'value': '<datapoint 1 value>',
				'evidence_refs': ['evidence_1']
			},
			{
				'slug': '<datapoint 2 slug>',
				'value': '<datapoint2 value>'
			}
		]
	}
	{
	  # another event object
	  ...
	}
]
```

Mangrove will process this result to create events in the `Data Inputs > Events` feed that can be used to run production models.

#### Using external packages in transformations

Apart from the Python standard library, Mangrove also supports a curated list of external packages that you can import and use in transformations:

* `mapbox`
* `boto3`
* `geopy`
* `pandas`
* `numpy`