Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool


Download 76.78 Kb.
NameDtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool
A typeDocumentation
DocumentDB Data Migration Tool

Overview


The DocumentDB Data Migration Tool is an open source solution to import data to DocumentDB from a variety of sources, including:

  • JSON files

  • MongoDB

  • SQL Server

  • CSV files

  • Azure Table storage

  • DocumentDB collections


While the import tool includes a GUI (dtui.exe), it can also be driven from the command line (dt.exe). In fact, there is an option to output the associated command after setting up an import through the UI. Tabular source data (e.g. SQL Server, CSV files) can be transformed such that hierarchical relationships (subdocuments) can be created (more on this below) during import.

This document includes information on:

  • Each source option

  • Sample command lines to import from each source

  • Each target option

  • Viewing import results

Installation

The migration tool source code is available on GitHub in this repository and a compiled version is available from Microsoft Download Center. You may either compile the solution or simply download and extract the compiled version to a directory of your choice. Then run either:

  • Dtui.exe: Graphical interface version of the tool

  • Dt.exe: Command-line version of the tool

Source Options

JSON File(s)


The JSON file source importer option allows you to import one or more single document JSON files or JSON files that each contain an array of JSON documents. When adding folders that contain JSON files to import, you have the option of recursively searching for files in subfolders:



Here are some command line samples for JSON file import:

Import a single JSON file

dt.exe /s:JsonFile /s.Files:.\Sessions.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=;AccountKey=;Database=;" /t.Collection:Sessions /t.CollectionTier:S3
Import a directory of JSON files

dt.exe /s:JsonFile /s.Files:C:\TESessions\*.json /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /t.Collection:Sessions /t.CollectionTier:S3
Import a directory (including sub-directories) of JSON files

dt.exe /s:JsonFile /s.Files:C:\LastFMMusic\**\*.json /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /t.Collection:Music /t.CollectionTier:S3
Import a directory (single), directory (recursive), and individual JSON files

dt.exe /s:JsonFile /s.Files:C:\Tweets\*.*;C:\LargeDocs\**\*.*;C:\TESessions\Session48172.json;C:\TESessions\Session48173.json;C:\TESessions\Session48174.json;C:\TESessions\Session48175.json;C:\TESessions\Session48177.json /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /t.Collection:subs /t.CollectionTier:S3
Import a single JSON file and partition the data across 4 collections

dt.exe /s:JsonFile /s.Files:D:\\CompanyData\\Companies.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=;AccountKey=;Database=;" /t.Collection:comp[1-4] /t.PartitionKey:name /t.CollectionTier:S3

MongoDB


The MongoDB source importer option allows you to import from an individual MongoDB collection and optionally filter documents using a query and/or modify the document structure by using a projection.



The connection string is in the standard MongoDB format:

mongodb://:@:
/

Tip: Use the Verify command to ensure that the MongoDB instance specified in the connection string field can be accessed.

Enter the name of the collection from which data will be imported. You may optionally specify or provide a file for a query (e.g. {pop: {$gt:5000}}) and/or projection (e.g. {loc:0}) to both filter and shape the data to be imported.

Here are some command line samples to import from MongoDB:

Import all documents from a MongoDB collection

dt.exe /s:MongoDB /s.ConnectionString:mongodb://:@:
/ /s.Collection:zips /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=;AccountKey=;Database=;" /t.Collection:BulkZips /t.IdField:_id /t.CollectionTier:S3
Import documents from a MongoDB collection which match the query and exclude the loc field

dt.exe /s:MongoDB /s.ConnectionString:mongodb://:@:
/ /s.Collection:zips /s.Query:{pop:{$gt:50000}} /s.Projection:{loc:0} /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=;AccountKey=;Database=;" /t.Collection:BulkZipsTransform /t.IdField:_id/t.CollectionTier:S3

MongoDB export files


The MongoDB export JSON file source importer option allows you to import one or more JSON files produced from the mongoexport utility.



When adding folders that contain MongoDB export JSON files for import, you have the option of recursively searching for files in subfolders.

Here is a command line sample to import from MongoDB export JSON files:

dt.exe /s:MongoDBExport /s.Files:D:\mongoemployees.json /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=;AccountKey=;Database=;" /t.Collection:employees /t.IdField:_id /t.Dates:Epoch /t.CollectionTier:S3

SQL Server


The SQL source importer option allows you to import from an individual SQL Server database and optionally filter the records to be imported using a query. In addition, you can modify the document structure by specifying a nesting separator (more on that in a moment).



The format of the connection string is the standard SQL connection string format.

Tip: Use the Verify command to ensure that the SQL Server instance specified in the connection string field can be accessed.

The nesting separator property is used to create hierarchical relationships (sub-documents) during import. Consider the following SQL query:

select CAST(BusinessEntityID AS varchar) as Id, Name, AddressType as [Address.AddressType], AddressLine1 as [Address.AddressLine1], City as [Address.Location.City], StateProvinceName as [Address.Location.StateProvinceName], PostalCode as [Address.PostalCode], CountryRegionName as [Address.CountryRegionName] from Sales.vStoreWithAddresses WHERE AddressType='Main Office'

Which returns the following (partial) results:



Note the aliases such as Address.AddressType and Address.Location.StateProvinceName. By specifying a nesting separator of ‘.’, the import tool will create Address and Address.Location subdocuments during the import. Here is an example of a resulting document in DocumentDB:

{

"id": "956",

"Name": "Finer Sales and Service",

"Address": {

"AddressType": "Main Office",

"AddressLine1": "#500-75 O'Connor Street",

"Location": {

"City": "Ottawa",

"StateProvinceName": "Ontario"

},

"PostalCode": "K4B 1S2",

"CountryRegionName": "Canada"

}

}

Here are some command line samples for SQL import:

Import records from SQL which match a query

dt.exe /s:SQL /s.ConnectionString:"Data Source=;Initial Catalog=AdventureWorks;User Id=advworks;Password=
;" /s.Query:"select CAST(BusinessEntityID AS varchar) as Id, * from Sales.vStoreWithAddresses WHERE AddressType='Main Office'" /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /t.Collection:Stores /t.IdField:Id /t.CollectionTier:S3
Import records from sql which match a query and create hierarchical relationships

dt.exe /s:SQL /s.ConnectionString:"Data Source=;Initial Catalog=AdventureWorks;User Id=advworks;Password=
;" /s.Query:"select CAST(BusinessEntityID AS varchar) as Id, Name, AddressType as [Address.AddressType], AddressLine1 as [Address.AddressLine1], City as [Address.Location.City], StateProvinceName as [Address.Location.StateProvinceName], PostalCode as [Address.PostalCode], CountryRegionName as [Address.CountryRegionName] from Sales.vStoreWithAddresses WHERE AddressType='Main Office'" /s.NestingSeparator:. /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /t.Collection:StoresSub /t.IdField:Id /t.CollectionTier:S3

CSV File(s)


The CSV file source allows you to import one or more CSV files. When adding folders which contain CSV files for import, you have the option of recursively searching for files in subfolders:



Similar to the SQL source, the nesting separator property may be used to create hierarchical relationships (sub-documents) during import. Consider the following CSV header row and data rows:



Note the aliases such as DomainInfo.Domain_Name and RedirectInfo.Redirecting. By specifying a nesting separator of ‘.’, the import tool will create DomainInfo and RedirectInfo subdocuments during the import. Here is an example of a resulting document in DocumentDB:

{

"DomainInfo": {

"Domain_Name": "ACUS.GOV",

"Domain_Name_Address": "http://www.ACUS.GOV"

},

"Federal Agency": "Administrative Conference of the United States",

"RedirectInfo": {

"Redirecting": "0",

"Redirect_Destination": ""

},

"id": "9cc565c5-ebcd-1c03-ebd3-cc3e2ecd814d"

}

The import tool will attempt to infer type information for unquoted values in CSV files (quoted values are always treated as strings). Types are identified in the following order: number, datetime, boolean.

There are two other things to note with CSV import:

  1. By default, unquoted values are always trimmed for tabs and spaces, while quoted values are preserved as-is. This behavior can be overridden with the Trim quoted values checkbox or the /s.TrimQuoted command line option.


  2. By default, an unquoted null is treated as a null value. This behavior can be overridden (i.e. treat an unquoted null as a “null” string) with the Treat unquoted NULL as string checkbox or the /s.NoUnquotedNulls command line option.

Here is a command line sample for CSV import:

dt.exe /s:CsvFile /s.Files:.\Employees.csv /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /t.Collection:Employees /t.IdField:EntityID /t.CollectionTier:S3

Azure Table Storage


The Azure Table storage source importer option allows you to import from an individual Azure Table storage table and optionally filter the table entities to be imported.



The format of the Azure Table storage connection string is:

DefaultEndpointsProtocol=
;AccountName=;AccountKey=;
Tip: Use the Verify command to ensure that the Azure Table storage instance specified in the connection string field can be accessed.
Enter the name of the Azure table from which data will be imported. You may optionally specify a filter.

The Azure Table storage source importer option has the following additional options:

  1. Include Internal Fields

    1. All - Include all internal fields (PartitionKey, RowKey, and Timestamp)

    2. None - Exclude all internal fields

    3. RowKey - Only include the RowKey field

  2. Select Columns

    1. Azure Table storage filters do not support projections. If you want to only import specific Azure Table entity properties, add them to the Select Columns list. All other entity properties will be ignored.

Here is a command line sample to import from Azure Table storage:

dt.exe /s:AzureTable /s.ConnectionString:"DefaultEndpointsProtocol=https;AccountName=;AccountKey=" /s.Table:metrics /s.InternalFields:All /s.Filter:"PartitionKey eq 'Partition1' and RowKey gt '00001'" /s.Projection:ObjectCount;ObjectSize /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /t.Collection:metrics /t.CollectionTier:S3


DocumentDB


The DocumentDB source importer option allows you to import from one or more DocumentDB collections and optionally filter documents using a query.



The format of the DocumentDB connection string is:

AccountEndpoint=;AccountKey=;Database=;

Tip: Use the Verify command to ensure that the DocumentDB instance specified in the connection string field can be accessed.

To import from a single DocumentDB collection, enter the name of the collection from which data will be imported. To import from multiple DocumentDB collections, provide a regular expression to match one or more collection names (e.g. collection01 | collection02 | collection03). You may optionally specify - or provide a file for - a query to both filter and shape the data to be imported.

Note: Since the collection field accepts regular expressions, if you are importing from a single collection whose name contains regular expression characters, then those characters will need to be escaped accordingly.

The DocumentDB source importer option has the following advanced options:

  1. Include Internal Fields: Specifies whether or not to include DocumentDB document system properties in the export (e.g. _rid, _ts).

  2. Number of Retries on Failure: Specifies the number of times to retry the connection to DocumentDB in case of transient failures (e.g. network connectivity interruption).

  3. Retry Interval: Specifies how long to wait between retrying the connection to DocumentDB in case of transient failures (e.g. network connectivity interruption).

  4. Connection Mode: Specifies the connection mode to use with DocumentDB. The available choices are DirectTcp, DirectHttps, and Gateway. The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.



Tip: The import tool defaults to connection mode DirectTcp. If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.

Here are some command line samples to import from DocumentDB:

Migrate data from one DocumentDB collection to another DocumentDB collections

dt.exe /s:DocumentDB /s.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /s.Collection:TEColl /t:DocumentDBBulk /t.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /t.Collection:TESessions /t.CollectionTier:S3
Migrate data from multiple DocumentDB collections to a single DocumentDB collection

dt.exe /s:DocumentDB /s.ConnectionString:"AccountEndpoint=;AccountKey=;Database=;" /s.Collection:comp1|comp2|comp3|comp4 /t:DocumentDBBulk /t.ConnectionString:"AccountEndpoint=;AccountKey=;Database=;" /t.Collection:singleCollection /t.CollectionTier:S3

Export a DocumentDB collection to a JSON file

dt.exe /s:DocumentDB /s.ConnectionString:" AccountEndpoint=;AccountKey=;Database=;" /s.Collection:StoresSub /t:JsonFile /t.File:StoresExport.json /t.Overwrite /t.CollectionTier:S3

Target Options



DocumentDB – Bulk import


The DocumentDB Bulk importer allows you to import from any of the available source options, using a DocumentDB stored procedure for efficiency. The tool supports import to a single DocumentDB collection, as well as, sharded import whereby data is partitioned across multiple DocumentDB collections. Read more about partitioning data in DocumentDB here. The tool will create, execute, and then delete the stored procedure from the target collection(s).



The format of the DocumentDB connection string is:

AccountEndpoint=;AccountKey=;Database=;
Tip: Use the Verify command to ensure that the DocumentDB instance specified in the connection string field can be accessed.

To import to a single collection, enter the name of the collection to which data will be imported and click the Add button. To import to multiple collections, either enter each collection name individually or use the following syntax to specify multiple collections: collection_prefix[start index - end index]. When specifying multiple collections via the aforementioned syntax, keep the following in mind:

  1. Only integer range name patterns are supported. For example, specifying collection[0-3] will produce the following collections: collection0, collection1, collection2, collection3.

  2. You can use an abbreviated syntax: collection[3] will emit same set of collections mentioned in step 1.

  3. More than one substitution can be provided. For example, collection[0-1][0-9] will generate 20 collection names with leading zeros (collection01, ..02, ..03).

Once the collection name(s) have been specified, choose the desired pricing tier of the collection(s) (S1, S2, or S3). For best import performance, choose S3.
Tip: The performance tier setting only applies to collection creation. If the specified collection already exists, its pricing tier will not be modified.

When importing to multiple collections, the import tool supports hash based sharding. In this scenario, specify the document property you wish to use as the Partition Key (if Partition Key is left blank, documents will be sharded randomly across the target collections).

You may optionally specify which field in the import source should be used as the DocumentDB document id property during the import (note that if documents do not contain this property, then the import tool will generate a GUID as the id property value).

There are a number of advanced options available during import. First, while the tool includes a default bulk import stored procedure (BulkInsert.js), you may choose to specify your own import stored procedure:



Additionally, when importing date types (e.g. from SQL Server or MongoDB), you can choose between three import options:



  • String: persist as a string value

  • Epoch: persist as an Epoch number value

  • Both: Persist both string and Epoch number values. This option will create a subdocument, for example:
    "date_joined": {

"Value": "2013-10-21T21:17:25.2410000Z",

"Epoch": 1382390245

}

The DocumentDB Bulk importer has the following additional advanced options:

  1. Batch Size: The tool defaults to a batch size of 50. If the documents to be imported are large, consider lowering the batch size. Conversely, if the documents to be imported are small, consider raising the batch size.

  2. Max Script Size (bytes): The tool defaults to a max script size of 960KB

  3. Disable Sutomatic Id Generation: If every document to be imported contains an id field, then selecting this option can increase performance. Documents missing a unique id field will not be imported.

  4. Number of Retries on Failure: Specifies the number of times to retry the connection to DocumentDB in case of transient failures (e.g. network connectivity interruption).

  5. Retry Interval: Specifies how long to wait between retrying the connection to DocumentDB in case of transient failures (e.g. network connectivity interruption).

  6. Connection Mode: Specifies the connection mode to use with DocumentDB. The available choices are DirectTcp, DirectHttps, and Gateway. The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.



Tip: The import tool defaults to connection mode DirectTcp. If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.

DocumentDB – Sequential record import


The DocumentDB sequential record importer allows you to import from any of the available source options on a record by record basis. You might choose this option if you’re importing to an existing collection that has reached its quota of stored procedures. The tool supports import to a single DocumentDB collection, as well as, sharded import whereby data is partitioned across multiple DocumentDB collections. Read more about partitioning data in DocumentDB here.



The format of the DocumentDB connection string is:

AccountEndpoint=;AccountKey=;Database=;

Tip: Use the Verify command to ensure that the DocumentDB instance specified in the connection string field can be accessed.
To import to a single collection, enter the name of the collection to which data will be imported and click the Add button. To import to multiple collections, either enter each collection name individually or use the following syntax to specify multiple collections: collection_prefix[start index - end index]. When specifying multiple collections via the aforementioned syntax, keep the following in mind:

  1. Only integer range name patterns are supported. For example, specifying collection[0-3] will produce the following collections: collection0, collection1, collection2, collection3.

  2. You can use an abbreviated syntax: collection[3] will emit same set of collections mentioned in step 1.

  3. More than one substitution can be provided. For example, collection[0-1][0-9] will generate 20 collection names with leading zeros (collection01, ..02, ..03).

When importing to multiple collections, the import tool supports hash based sharding. In this scenario, specify the document property you wish to use as the Partition Key (if Partition Key is left blank, documents will be sharded randomly across the target collections).

You may optionally specify which field in the import source should be used as the DocumentDB document id property during the import (note that if documents do not contain this property, then the import tool will generate a GUID as the id property value).
Tip: The performance tier setting only applies to collection creation. If the specified collection already exists, its pricing tier will not be modified.

You may optionally specify which field in the import source should be used as the DocumentDB document id property during the import (note that if documents do not contain this property, then the import tool will generate a GUID as the id property value):

There are a number of advanced options available during import. First, when importing date types (e.g. from SQL Server or MongoDB), you can choose between three import options:



  • String: persist as a string value

  • Epoch: persist as an Epoch number value

  • Both: Persist both string and Epoch number values. This option will create a subdocument, for example:
    "date_joined": {

"Value": "2013-10-21T21:17:25.2410000Z",

"Epoch": 1382390245

}

The DocumentDB - Sequential record importer has the following additional advanced options:

  1. Number of Parallel Requests: The tool defaults to 2 parallel requests. If the documents to be imported are small, consider raising the number of parallel requests. Note that if this number is raised too much, the import may experience throttling.

  2. Disable Automatic Id Generation: If every document to be imported contains an id field, then selecting this option can increase performance. Documents missing a unique id field will not be imported.

  3. Number of Retries on Failure: Specifies the number of times to retry the connection to DocumentDB in case of transient failures (e.g. network connectivity interruption).

  4. Retry Interval: Specifies how long to wait between retrying the connection to DocumentDB in case of transient failures (e.g. network connectivity interruption).

  5. Connection Mode: Specifies the connection mode to use with DocumentDB. The available choices are DirectTcp, DirectHttps, and Gateway. The direct connection modes are faster, while the gateway mode is more firewall friendly as it only uses port 443.



Tip: The import tool defaults to connection mode DirectTcp. If you experience firewall issues, switch to connection mode Gateway, as it only requires port 443.

JSON file


The DocumentDB JSON exporter allows you to export any of the available source options to a JSON file that contains an array of JSON documents. The tool will handle the export for you, or you can choose to view the resulting migration command and run the command yourself.



You may optionally choose to “prettify” the resulting JSON.

You may optionally choose to prettify the resulting JSON, which will increase the size of the resulting document while making the contents more human readable.

Standard JSON export

[{"id":"Sample","Title":"About Paris","Language":{"Name":"English"},"Author":{"Name":"Don","Location":{"City":"Paris","Country":"France"}},"Content":"Don's document in DocumentDB is a valid JSON document as defined by the JSON spec.","PageViews":10000,"Topics":[{"Title":"History of Paris"},{"Title":"Places to see in Paris"}]}]
Prettified JSON export

[

{

"id": "Sample",

"Title": "About Paris",

"Language": {

"Name": "English"

},

"Author": {

"Name": "Don",

"Location": {

"City": "Paris",

"Country": "France"

}

},

"Content": "Don's document in DocumentDB is a valid JSON document as defined by the JSON spec.",

"PageViews": 10000,

"Topics": [

{

"Title": "History of Paris"

},

{

"Title": "Places to see in Paris"

}

]

}]

Advanced Configuration


In the Advanced configuration screen, specify the location of the log file to which you would like any errors written. The following rules apply to this page:

  1. If a file name is not provided, then all errors will be returned on the Results page.

  2. If a file name is provided without a directory, then the file will be created (or overwritten) in the current environment directory.

  3. If you select an existing file, then the file will be overwritten, there is no append option.


Confirm Import Settings and View Command


  1. After specifying source and target information, review the migration information and, optionally, view/copy the resulting migration command (copying the command is useful to automate import operations):






  2. Once you’re satisfied with your source and target options, click Import. The elapsed time, transferred count, and failure information (if you didn’t provide a file name in the Advanced configuration) will update as the import is in process. Once complete, you can export the results (e.g. to deal with any import failures).




  3. You may also start a new import, either keeping the existing settings (e.g. connection string information, source and target choice, etc.) or resetting all values.



Share in:

Related:

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool iconDtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool iconTool Version

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool iconSftmime. Exe

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool iconAbstract Network Administrator Tool is a tool for remote administration...

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool icon3. 4User Interface Tool Functional Requirements 17

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool icon1 It is a powerful web development tool that has eveolved into a...

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool iconA user Interface design tool (for designers): Express Blend for Windows Phone, 4

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool iconThe characters // denote my comments or explanations of what a line...

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool iconThe online version of the help file contains many screen shots that...

Dtui exe: Graphical interface version of the tool Dt exe: Command-line version of the tool iconSample Call Script Long Version and Short Version




forms and shapes


When copying material provide a link © 2017
contacts
filling-form.info
search