Mongo DB
Airbyte's certified MongoDB connector offers the following features:
- Change Data Capture (CDC) via MongoDB's change streams/Replica Set Oplog.
- Reliable replication of any collection size with checkpointing and chunking of data reads.
- NEW Full refresh syncing of collections.
Quick Start
This section provides information about configuring the MongoDB V2 source connector. If you are upgrading from a previous version of the MongoDB V2 source connector, please refer to the upgrade instructions in this document.
New Installation/New Source Connector Configuration
Here is an outline of the minimum required steps to configure a new MongoDB V2 source connector:
- Create or discover the configuration of a MongoDB replica set, either hosted in MongoDB Atlas or self-hosted.
- Create a new MongoDB source in the Airbyte UI
- (Airbyte Cloud Only) Allow inbound traffic from Airbyte IPs
Once this is complete, you will be able to select MongoDB as a source for replicating data.
Step 1: Create a dedicated read-only MongoDB user
These steps create a dedicated, read-only user for replicating data. Alternatively, you can use an existing MongoDB user with access to the database.
MongoDB Atlas
- Log in to the MongoDB Atlas dashboard.
- From the dashboard, click on "Database Access" under "Security"
- Click on the "+ ADD NEW DATABASE USER" button.
- On the "Add new Database User" modal dialog, choose "Password" for the "Authentication Method".
- In the "Password Authentication" section, set the username to
READ_ONLY_USER
in the first text box and set a password in the second text box.
- Under "Database User Privileges", click on "Select one built-in role for this user" under "Built-in Role" and choose "Only read any database".
- Enable "Restrict Access to Specific Clusters/Federated Database instances" and enable only those clusters/database that you wish to replicate.
- Click on "Add User" at the bottom to save the user.
Self Hosted
These instructions assume that the MongoDB shell is installed. To install the MongoDB shell, please follow these instructions.
- From a terminal window, launch the MongoDB shell:
> mongosh <connection string to cluster> --username <user with admin permissions>;
- Switch to the
admin
database:
test> use admin
switched to db admin
- Create the
READ_ONLY_USER
user with theread
role:
admin> db.createUser({user: "READ_ONLY_USER", pwd: "READ_ONLY_PASSWORD", roles: [{role: "read", db: "TARGET_DATABASE"}]})
Replace READ_ONLY_PASSWORD
with a password of your choice and TARGET_DATABASE
with the name of the database to be replicated.
- Next, enable authentication, if not already enabled. Start by editing the
/etc/mongodb.conf
by adding/editing these specific keys:
net:
bindIp: 0.0.0.0
security:
authorization: enabled
Setting the bindIp
key to 0.0.0.0
will allow connections to database from any IP address. Setting the security.authorization
key to enabled
will enable security and only allow authenticated users to access the database.
Step 2: Discover the MongoDB cluster connection string
These steps outline how to discover the connection string of your MongoDB instance.
MongoDB Atlas
Atlas is MongoDB's cloud-hosted offering. Below are the steps to discover the connection configuration for a MongoDB Atlas-hosted replica set cluster:
- Log in to the MongoDB Atlas dashboard.
- From the dashboard, click on the "Connect" button of the source cluster.
- On the "Connect to <cluster name>" modal dialog, select "Shell" under the "Access your data through tools" section.
- Copy the connection string from the entry labeled "2. Run your connection string in your command line" on the modal dialog, removing/avoiding the quotation marks.
Self Hosted Cluster
Self-hosted clusters are MongoDB instances that are hosted outside of MongoDB Atlas. Below are the steps to discover the connection string for a MongoDB self-hosted replica set cluster.
- Refer to the MongoDB connection string documentation for instructions on discovering a self-hosted deployment connection string.
Step 3: Configure the Airbyte MongoDB Source
To configure the Airbyte MongoDB source, use the database credentials and connection string from steps 1 and 2, respectively. The source will test the connection to the MongoDB instance upon creation.
Replication Methods
The MongoDB source utilizes change data capture (CDC) as a reliable way to keep your data up to date. In addtion MongoDB source now allows for syncing in a full refresh mode.
CDC
Airbyte utilizes the change streams feature of a MongoDB replica set to incrementally capture inserts, updates and deletes using a replication plugin. To learn more how Airbyte implements CDC, refer to Change Data Capture (CDC).
Full Refresh
The Full refresh sync mode added in v4.0.0 allows for reading a the entire contents of a collection, repeatedly. The MongoDB source connector is using checkpointing in Full Refresh read so a sync job that failed for netwrok error for example, Rather than starting over it will continue its full refresh read from a last known point.
Schema Enforcement
By default the MongoDB V2 source connector enforces a schema. This means that while setting up a connector it will sample a configureable number of docuemnts and will create a set of fields to sync. From that set of fields, an admin can then deselect specific fields from the Replication screen to filter them out from the sync.
When the schema enforced option is disabled, MongoDB collections are read in schema-less mode which doesn't assume documents share the same structure. This allows for greater flexibility in reading data that is unstructured or vary a lot in between documents in a single collection. When schema is not enforced, each document will generate a record that only contains the following top-level fields:
{
"_id": <document id>,
"data": {<a JSON cotaining the entire set of fields found in document>}
}
The contents of data
will vary according to the contents of each document read from MongoDB.
Unlike in Schema enforced mode, the same field can vary in type between document. For example field "xyz"
may be a String on one document and a Date on another.
As a result no field will be omitted and no document will be rejected.
When Schema is not enforced there is not way to deselect fields as all fields are read for every document.
Limitations & Troubleshooting
To see connector limitations, or troubleshoot your MongoDB connector, see more in our MongoDB troubleshooting guide.
Configuration Parameters
Parameter Name | Description |
---|---|
Cluster Type | The type of the MongoDB cluster (MongoDB Atlas replica set or self-hosted replica set). |
Connection String | The connection string of the source MongoDB cluster. For Atlas hosted clusters, see the quick start guide for steps to find the connection string. For self-hosted clusters, refer to the MongoDB connection string documentation for more information. |
Database Name | The name of the database that contains the source collection(s) to sync. |
Username | The username which is used to access the database. Required for MongoDB Atlas clusters. |
Password | The password associated with this username. Required for MongoDB Atlas clusters. |
Authentication Source | (MongoDB Atlas clusters only) Specifies the database that the supplied credentials should be validated against. Defaults to admin . See the MongoDB documentation for more details. |
Schema Enforced | Controls whether schema is discovered and enforced. See discussion in Schema Enforcement. |
Initial Waiting Time in Seconds (Advanced) | The amount of time the connector will wait when it launches to determine if there is new data to sync or not. Defaults to 300 seconds. Valid range: 120 seconds to 1200 seconds. |
Size of the queue (Advanced) | The size of the internal queue. This may interfere with memory consumption and efficiency of the connector, please be careful. |
Discovery Sample Size (Advanced) | The maximum number of documents to sample when attempting to discover the unique fields for a collection. Default is 10,000 with a valid range of 1,000 to 100,000. See the MongoDB sampling method for more details. |
Update Capture Mode (Advanced) | Determines how Airbyte looks up the value of an updated document. Default is "Lookup". IMPORTANT : "Post image" is only supported in MongoDB version 6.0+. In addition, the collections of interest must be setup to return pre and post images. Failure to do so will lead to data loss. |
For more information regarding configuration parameters, please see MongoDb Documentation.