Wikipedia

Search results

Monday, 30 June 2014

CLUSTERING: Challenges and Opportunities for Big Data

CLUSTERING: Challenges and Opportunities for Big Data

Demystifying Big Data Analytics

Predictive Cyber Security: Big Data Analytics

Amplifying Security Intelligence with Big Data and Advanced Analytics

Acting on Analytics: How to Build a Data-Driven Enterprise

https://www.brighttalk.com/webcast/1829/80223

Security Analytics for Targeted Attack Detection, Incident Response & Forensics

How Big Data analytics can protect your brand and revenues

Beyond attack signatures: Leveraging realtime analytics to pinpoint threats

https://www.brighttalk.com/webcast/574/108031

Big Data and Investigative Analytics: The New BI Frontier

All you want to know about IT Operational Analytics

Big Data Analytics: How to Generate Return on Information

Hadoop or Bust- Key Considerations for High-Performance Analytics Platform

Big Data: Let's Cut Through the Hype and Get Serious

High Performance Analytics and Big Data with Microsoft

Fighting Advanced Threats with Big Data Analytics

Webtrends Streams - The New Era of Web Analytics

Powering Customer Analytics with Embedded Cloud Integration

How predictive analytics can prevent business outages

Cool Enterprise Apps: How to Add Personalized Analytics

Real-time Analytics from Small Data, Big Data and Huge Data

Hadoop or Bust- Key Considerations for High-Performance Analytics Platform

Analytics for All

Proving Social ROI with Predictive Social Analytics

Transforming Business with Advanced Analytics & Intel Xeon E7 v2

Why Big Data Analytics Needs Business Intelligence Too

Customer Case Studies of Self-Service Big Data Analytics

SAP HANA One Success Story: Performing Text Analytics with SAP HANA One

Risk management and analytics in the real world: Turning problems into profits

Tapping Into The Benefits Of Next Generation Store Analytics

Cloud Analytics: Why it matters in a Hybrid Cloud Management solution

Webtrends Streams - The New Era of Web Analytics

How Big Data analytics can protect your brand and revenues

Customer Analytics: Turn Big Data into Big Value

Self-Service BI + Analytics = Productivity and Smarter Business Decisions

Customer Case Studies of Self-Service Big Data Analytics

How to Build a Data Driven Business Using Data Analytics

Solving IO Bottlenecks in Big Data Analytics

All you want to know about IT Operational Analytics

Hadoop or Bust- Key Considerations for High-Performance Analytics Platform

Increasing Service Management Performance with IT Operations Analytics

Big Data: Let's Cut Through the Hype and Get Serious

Acting on Analytics: How to Build a Data-Driven Enterprise

Big Data and Investigative Analytics: The New BI Frontier

Business unIntelligence: Beyond Analytics and Big Data

The Power of Predictive Analytics: Game-Changing Strategies for Marketing

Demystifying Big Data Analytics

Big Data – Text Analytics

High Performance, Real-Time Analytics: Reality or Myth?

Cloud Analytics: Why it matters in a Hybrid Cloud Management solution

Webtrends Analytics for Optimization

Analytics for All

How to Build a Data Driven Business Using Data Analytics

Taking the Pulse of Healthcare: Big Data and Analytics in the Cloud

Taking the Pulse of Healthcare: Big Data and Analytics in the Cloud

Revolutionary Business Analytics

Delivering Cloud-based Big Analytic Solutions for Financial Services

Analytics at the Speed of Social - 5 Ways to Analyze Your Social Data

BEYOND BI: Big Data Analytics -The New Path to Value

Maximising Your Revenue With Big Data Customer Analytics

Understand Customer Needs related to SAP HANA Advanced Analytics

Scalable Cross-Platform R-Based Predictive Analytics

Saturday, 28 June 2014

MuleSoft's New API Platform: An Interview with Ross Mason

MuleSoft's New API Platform: An Interview with Ross Mason

Creating a Mapping, Session, Workflow using Informatica Powercenter 8.6.1

Informatica Tutorial - Working with Text Files

Informatica Tutorial Part 5

Informatica Tutorial Part 4

Informatica Tutorial Part 3

Informatica Tutorial Part 2

Informatica Tutorial For Beginners Part 1

TOGAF - a quick guide

The Exascale Architecture

Archimate 2.0 for Solution Architects

Introduction to Enterprise Architecture

An Enterprise Architecture introduction

Authentication & Resource Sharing over the Web: OAuth protocol


If you reached this blog and you are not a Mule user (yet) keep reading, I will not cover anything Mule specific. If you are new to OAuth or want to get an introduction to its concepts this post is the right one!
Authentication is vital in any kind of system but it is even more relevant when it comes to the web. As the web grows, more and more sites rely on distributed services and cloud computing. As resources are spread all over the web, sharing them across multiple sites is not an unrealistic requirement considering the following scenarios: a photo lab printing your Flickr photos, a social network using your Google address book to look for friends, or a third-party application utilizing APIs from multiple services. In order for these applications to access user data on other sites, they ask for usernames and passwords. Not only does this require exposing user passwords to someone else it also provides these applications unlimited access to do as they wish.

The valet key metaphor

I will borrow this metaphor from Eran Hammer-Lahav and use it as starting point to go over :
“Many luxury cars today come with a valet key. It is a special key you give the parking attendant and unlike your regular key, will not allow the car to drive more than a mile or two. Some valet keys will not open the trunk, while others will block access to your onboard cell phone address book. Regardless of what restrictions the valet key imposes, the idea is very clever. You give someone limited access to your car with a special key, while using another key to unlock everything else.”
How is this related to and OAuth? The regular key would be your user name and password, whoever has them will access your resources without any limit (just as if it were you). So the question is what you can do when you just want to give someone limited access to your resources but without disclosing your credentials and at the same time being able to revoke the access at any time. Keeping this things in mind we can resume what we want from an method:
  • give partial/restricted access to the protected resources;
  • avoid disclosing your credentials to a third party site;
  • revoke access to these resources at any time.

OAuth

The solution to this problem is OAuth. OAuth provides a method for users to grant third-party access to their resources without sharing their passwords. It also provides a way to grant limited access (in scope, duration, etc.). I will leave the details of OAuth 1.0 and OAuth 2.0 to a later post and focus on the basics for now. Associate the following concepts with the valet key metaphor. Stay with me!
  • Service Provider: a web application that allows access via OAuth.
  • Resource Owner/User: an individual who has an account with the Service Provider.
  • Consumer: a website or application that uses OAuth to access the Service Provider on behalf of the User.
  • Protected Resource: data controlled by the Service Provider, which the Consumer can access through authentication.
Have you figured out how these terms map with the valet key metaphor? The Protected Resource is the luxury car. The Resource Owner is the owner of the car. The consumer is the parking attendant. The Service Provider well… it is not very clear… the Resource Owner is also the Service Provider because it not only owns the resources but has the means to provide access to them.
The Resource Owner instead of giving the credentials to the Consumer, authenticates against the Service Provider and Service Provider passes a Token to the Consumer with which it can access the Protected Resources. This Token provides limited access to the resources, for example it will not be possible for the Consumer to change the Resource Owner’s password as it can happen when passing the user name and password! Additionally, the Resource Owner can revoke access at any time, invalidating the Token the Consumer has.

Conclusion

I have not yet dealt with the complexities of OAuth (version 1.0 in particular) but as you can see this mechanism allows sharing resources across multiple sites in a safe way. Also, the user who owns these resources is in control of the situation as he/she can revoke access to third part sites at any time. In most sites you have an “Application” menu in your personal settings where it lists all of the sites you allowed access and gives you the option to cancel their access. The terminology can be confusing so just make sure you remember the valet key story and the concepts will clearly emerge.
In my next post I will go over OAuth 1.0 flow and then finally I will show how Mule can help you easily build integrations around APIs that use OAuth. Stay tuned!
Follow: @mulejockey @federecio 

Source : http://blogs.mulesoft.org/authentication-resource-sharing-over-the-web-oauth-protocol/

Encrypting passwords in Mule

Jasypt is an open source Java library which provides basic encryption capabilities using a high-level API. This library can be used with Mule to avoid clear text passwords for connectors and endpoints.First, download the latest distribution, unpack it and copy icu4j and jasypt jars to MULE_HOME/lib/user directory.



Then add the following snippet to your Mule config file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<!-- -->
<!-- Configuration for encryptor, based on environment variables. -->
<!-- -->
<!-- In this example, the encryption password will be read from an -->
<!-- environment variable called "MULE_ENCRYPTION_PASSWORD" which, once -->
<!-- the application has been started, could be safely unset. -->
<!-- -->
<spring:bean id="environmentVariablesConfiguration"
class="org.jasypt.encryption.pbe.config.EnvironmentStringPBEConfig">
<spring:property name="algorithm" value="PBEWithMD5AndDES" />
<spring:property name="passwordEnvName" value="MULE_ENCRYPTION_PASSWORD" />
</spring:bean>
<!-- -->
<!-- The will be the encryptor used for decrypting configuration values. -->
<!-- -->
<spring:bean id="configurationEncryptor" class="org.jasypt.encryption.pbe.StandardPBEStringEncryptor">
<spring:property name="config" ref="environmentVariablesConfiguration" />
</spring:bean>
<!-- -->
<!-- The EncryptablePropertyPlaceholderConfigurer will read the -->
<!-- .properties files and make their values accessible as ${var} -->
<!-- -->
<!-- Our "configurationEncryptor" bean (which implements -->
<!-- org.jasypt.encryption.StringEncryptor) is set as a constructor arg. -->
<!-- -->
<spring:bean id="propertyConfigurer"
class="org.jasypt.spring.properties.EncryptablePropertyPlaceholderConfigurer">
<spring:constructor-arg ref="configurationEncryptor" />
<spring:property name="locations">
<spring:list>
<spring:value>credentials.properties</spring:value>
</spring:list>
</spring:property>
</spring:bean>
view raw
JasyptConfig.xml
hosted with ❤ by GitHub
Next, you will need to encrypt your passwords using Jasypt command
line tools. For example,  if your Mule application connects to the MySql
database using password “dbpassword”, encrypt it using the following
command:

1
$ ./encrypt.sh input="dbpassword" password=MyEncryptionPassword algorithm=PBEWithMD5AndDES
view raw
gistfile1.txt
hosted with ❤ by GitHub
Where MyEncryptionPassword is your encryption key.  This command will produce the following output:



ka56rcI0bDpUWoAhy5Y+PrVvqu/wMCnL


Now create a properties file that will list your encrypted passwords and
place it in your project src/main/resources directory, e.g. credentials.properties:


1
database.password=ENC(ka56rcI0bDpUWoAhy5Y+PrVvqu/wMCnL)
Note the ENC() around our encrypted password, this is a que for Jasypt that it is dealing with an encrypted value.

Add the name of this file to the list of locations in the
propertyConfigurer bean. Now you can use the property name in your data
source configuration:

1
2
3
4
5
6
7
<spring:bean id="jdbcDataSource"
class="org.enhydra.jdbc.standard.StandardDataSource" destroy-method="shutdown">
<spring:property name="driverName" value="com.mysql.jdbc.Driver" />
<spring:property name="url" value="jdbc:mysql://localhost/db1" />
<spring:property name="user" value="dbuser" />
<spring:property name="password" value="${database.password}" />
</spring:bean>
view raw
jdbcdatasource.xml
hosted with ❤ by GitHub
Finally, create a system variable with the same name as the value of the passwordEnvName property in the first snippet, e.g. MULE_ENCRYPTION_PASSWORD and set its value to the encryption key used for the encrypting your password, e.g.:

1
$ export MULE_ENCRYPTION_PASSWORD=MyEncryptionPassword
view raw
gistfile1.txt
hosted with ❤ by GitHub
Thats it. You can now encrypt all passwords or any other values and Mule can read them and it starts up. source : http://blogs.mulesoft.org/encrypting-passwords-in-mule/

Wednesday, 18 June 2014

Building APIs around Microsoft Azure SQL Databases

In this post we are going to discuss a few emerging trends within computing including cloud based Database platforms, APIs and Integration Platform as a Service (iPaaS) environments.  More specifically we are going to discuss how to:
  • Connect to a SQL Database using Mule ESB  (for the remainder of this post I will call it Azure SQL)
  • Expose a simple API around an Azure SQL Database
  • Demonstrate the symmetry between Mule ESB On-Premise and its iPaaS equivalent CloudHub
For those not familiar with Azure SQL,
it is a fully managed relational database service in Microsoft’s Azure
cloud.  Since this is a managed service, we are not concerned with the
underlying database infrastructure.  Microsoft has abstracted all of
that for us and we are only concerned with managing our data within our
SQL Instance.  For more information on Azure SQL please refer to the
following
link.


Prerequisites

In order to complete all of the steps in this blog post we will need a Microsoft Azure account, a MuleSoft account and MuleSoft’s AnyPoint Studio – Early Access platform.  A free Microsoft trial account can be obtained here and a free CloudHub account can be found here.
To enable the database connectivity between MuleSoft’s ESB platform and
Azure SQL we will need to download the Microsoft JDBC Driver 4.0 for which is available here.
New.png


Provisioning a SQL Instance


  1. From the Microsoft Azure portal we are going to provision a new Azure SQL Database by clicking on the + New label.




  1. Click on Data Services – SQL Database – Quick Create.
QuickCreate.png

  1. Provide a DATABASE NAME of muleAPI, select an existing SERVER and click CREATE SQL DATABASE.  Note if you do not have an existing SQL Server you can create a new one by selecting New SQL database server from SERVER drop down list. 
MuleAPI.png

  1. After about 30 seconds we will discover that our new Azure SQL instance has been provisioned







  1. Click on the muleapi label to bring up the home page for our database.







  1. Click on the View SQL Database connection strings link.







  1. Note the JDBC connection information. We will need this later in the Mule ESB portion of this blog post.

  1. Next, we want to create a table where we can store our Product list.  In order to do so we need to click on the Manage icon.

  1. We will be prompted to include the IP Address from the computer we are using.  By selecting Yes
    we will be able to manage our Database from this particular IP Address.
     This is a security measure that Microsoft puts in place to prevent
    unauthorized access.

  1. A new window will open where we need to provide the credentials that we created when provisioning a new Azure SQL Server.

  1. We will be presented with a Summary of our current database.  In order to create a new table we need to click on the Design label.

  1. Click on the New Table label and then provide a Table Name of Products. We then need to create columns for ProductName, Manufacturer, Quantity, and Price.  Once we have finished adding the columns, click the Save icon to commit the changes.

  1. We now want to add some data to our Products table and can do so by clicking on the Data label.  Next we can add rows by clicking on the Add row label.  Populate each column and then click on the the Save icon once all of your data has been populated.


This concludes the Azure SQL portion of the walk through.  We will now focus on building our Mule Application.



Building Mule Application


  1. The first thing we need to do is to create a new Mule Project and we can do so by clicking on File – New – Mule Project.

  1. For the purpose of this blog post we will call our project SQLAzureAPI.  This blog post will take advantage of some of the new Mule 3.5 features and as a result we will use the Mule Server 3.5 EE Early Access edition and click the Finish button to continue.

  1. A Mule Flow will automatically be added to our solution.  Since we want to expose an API, we will select an HTTP Endpoint from our palette.

  1. Next, drag this HTTP Endpoint onto our Mule Flow.

  1. Click on the HTTP Endpoint and set the Host to localhost, Port to 8081 and Path to Products. This will allow Mule ESB to listen for API request at the following location: http://localhost:8081/Products
 

  1. Next, search for a Database Connector within our palette.  Notice there are two versions; we want the version that is not deprecated.

  1. Drag this Database Connector onto our Mule Flow next to our HTTP Endpoint.

  1. As mentioned in the Pre-requisites  of this blog post, the Microsoft JDBC Driver
    is required for this solution to work.  If you haven’t downloaded it,
    please do so now.  We need to add a reference to this driver from our
    project.  We can do so by right mouse clicking on our project and then
    selecting Properties.

  1. Our Properties form will now load.  Click on Java Build Path and then click on the Libraries tab. Next click on the Add External JARs button.  Select the sqljdbc4.jar file from the location where you downloaded the JDBC driver to and then click the OK button to continue.

  1. We are now ready to configure our Database Connection and can do so by clicking on our Database Connector so that we can specify our connection string. Next, click on the green plus (+) sign to create a new Connector configuration.

  1. When prompted, select Generic Database Configuration and click the OK button.

  1. In the portion of this blog post where we provisioned our
    Azure SQL Server Database, it was suggested to make a note of the JDBC
    Connection string.  This is the portion of the walk through where we
    need that information.  We need to put this value in our URL
    text box.  Please note that you will need to update this connection
    string with your actual password as it is not exposed on the Microsoft
    Azure portal.  For the Driver Class Name find com.microsoft.sqlserver.jdbc.SQLServerDriver  by clicking on the … button.  Once these two values have been set, click on the Test Connection button to ensure you can connect to Azure SQL successfully.  Once this connection has been verified click on the OK  button.
Note: For this blog post, the connection
string was embedded directly within our Mule Flow configuration.  In
certain scenarios this obviously is not a good idea.  To learn about how
these values can be set within a configuration file, please visit the
following link or see the section below when we deploy our application to CloudHub.

  1. With our connection string verified, we need to specify a
    query that we want to execute against our Azure SQL Database.  The Query
    that we want to run will retrieve all of the products from our Products
    table.  We want to add the following query to the Parameterized query Text box: Select ID, ProductName,Manufacturer, Quantity, Price from Products

  1. For the purpose of this API, we want to expose our response
    data as JSON.  In the Mule ESB platform this is as simple as finding an
    Object to JSON transformer from our Palette.

  1. Once we have located our Object to JSON
    transformer we can drag it onto our Mule Flow.  It is that easy in the
    Mule ESB platform; no custom pipeline components or 3rd party libraries
    are required for this to work.  In the event we want to construct our
    own JSON format we can use the AnyPoint DataMapper to define our own format and transform our Database result into a more customized JSON structure without any custom code.
This concludes the build of our very simple
API. The key message to take away is how easy it was to build this with
the Mule ESB platform and without any additional custom coding.

Testing our Mule Application


  1. For now we will just deploy our application locally and can do so by clicking on Run – Run As – Mule Application.

  1. To call our API launch Fiddler, a Web Browser or any other HTTP based testing tool and navigate to http://localhost:8081/Products.  As you can see the contents of our Azure SQL Database are returned in JSON format…pretty cool.
Not done….
One of the benefits of the Mule ESB platform is that there is complete
symmetry between the On-Premise version of the ESB and the Cloud
version.  So what this means is we can build an application for use
locally, or On-Premise. If we decide that we do not want to provision
local infrastructure we can take advantage of MuleSoft’s managed
service.  There are no
code migration wizards or tools required in order to move an application
between environments.  We simply choose to deploy our Mule Application
to a different endpoint.
In MuleSoft’s case we call our Integration Platform as a Service (iPaaS) – CloudHub.  There are too many details to share in this post so for the time being, please visit the CloudHub launch page for more information.


Deploying to CloudHub


  1. In order to deploy our application to CloudHub there is one configuration change that we need to make.  Within in our src/main/app/mule-app.properties file we want to specify a dynamic port for our HTTP Endpoint.  Within this file we want to specify http.port=8081.

  1. Next, we want to update our HTTP Endpoint to use this dynamic port.  In order to enable this macro we will click on our HTTP End point and then update our Port Text box to include ${http.port}.
    This will allow our application to read the port number from
    configuration instead of it being hard coded into our Mule Flow.  Since
    CloudHub is a multi-tenant environment we want to drive this value
    through configuration.

  1. With our configuration value set, we can now deploy to CloudHub by right mouse clicking on our Project and then selecting CloudHub – Deploy to CloudHub.  Note: For
    the purpose of this blog post we have chosen to deploy our application
    from our IDE.  While this may be the only way to deploy your application
    in some other platforms,  this is not the only way to perform this
    deployment with MuleSoft.  We can also deploy our application via the
    portal or through a continuous integration process.

  1. We now need to provide our CloudHub credentials and provide some additional configuration including Environment, Domain, Description, and Mule Version.  For this blog post I am using the Early Access Edition but prior versions of Mule ESB are capable of running in CloudHub.
     Also note that our dynamic port value has been carried over from our
    local configuration to our CloudHub configuration.  Once we have
    completed this configuration we can click on the Finish button to being our Deployment.

  1. Within a few seconds we will receive a message indicating
    that our Application has been successfully uploaded to CloudHub.  Now
    this doesn’t mean that it is ready for use, the provisioning process is
    still taking place.

  1. If we log into our CloudHub portal we will discover that our application is being provisioned.
 

  1. After a few minutes, our application will be provisioned and available for API calls.

  1. Before we try to run our application, there is one more
    activity that is outstanding.  Earlier in this walk through we discussed
    how Microsoft Azure will restrict access to the Azure SQL Databases by
    providing a Firewall.  We now need to ‘white list’ our CloudHub IP
    Address in Microsoft Azure.  To get our CloudHub IP Address, click on Logs and then set our All Priorities drop down to be System.  Next look for the lines that indicate our “… Your application has started successfully.”  Copy this IP Address and then log back into the Azure Portal.

  1. Once we have logged back into the Microsoft Azure Portal, we need to select our MuleAPI database and then click on the DASHBOARD label.  Finally, we need to click on the Manage allowed IP Addresses link.

  1. Add a row to the IP Address table and include the IP Address from the CloudHub logs and click the Save icon.
NOTE: A question that you may be asking yourself is:
what happens if my CloudHub IP Address changes?  The answer is you can
provision a CloudHub instance with a Static IP Address by contacting
MuleSoft Support.  Another option is to specify a broader range of IP
Addresses to ‘white list’ within the Microsoft Azure portal.  Once
again, MuleSoft Support can provide some additional guidance in this
area if this is a requirement.


 

Testing our API


  1. We can now test our API that is running in MuleSoft’s
    CloudHub instead of our local machine.  Once again, fire up Fiddler or
    whatever API tool you like to use and provide your CloudHub URL this
    time.  As you can see our results are once again returned in JSON format
    but we are not using any local infrastructure this time!

Telemetry

A mandatory requirement of any modern day Cloud platform
is some level of visibility of the services that are being utilized.
 While the purpose of this post is not to get into any exhaustive
detail, I did think it would be interesting to briefly display the
CloudHub Dashboard after our API test.
Similarly we can also see some Database analytics via the Microsoft Azure portal.

Conclusion

In this blog post we discussed a few concepts including:

  • Connecting to a Microsoft Azure SQL Database using MuleESB

  • Exposing a simple API around our Azure SQL Database

  • Demonstrate the symmetry between Mule ESB On-Premise and its iPaaS equivalent
These different concepts highlight some of the popular trends within
the computing industry.  We are seeing the broader adoption of Cloud
based Database platforms.  We are then seeing an explosion of APIs being
introduced and finally we are seeing the evolution of Integration Platforms
as a Service offerings.  As demonstrated in this blog post, MuleSoft is
positioned very well to support each of these different scenarios.
Another important consideration is we were only ‘scratching the surface’
when it comes to some of these features that are available in the
MuleSoft platform.  For instance this post didn’t even touch on our
comprehensive Anypoint Platform for APIs.  Our AnyPoint Platform for APIs provides full life cycle support for Designing and Engaging, Building  and Running/Managing APIs. If this sounds interesting, sign up for a free trial account and give it a try.

The Competitive Advantage and Cost Savings of Cloud Communications

Mining IT Big Data: Using Analytics to Improve your Cloud/Datacenter Operations

Why Big Data Analytics Needs Business Intelligence Too

Mobility and Cloud Communication Trends

benefits of a cloud phone system

Windows Azure Cloud Storage and You

Performance Benchmarking for Private and Hybrid Cloud

Thursday, 12 June 2014

Intro to Data Integration Patterns – Migration

Hi all, in this post I wanted to introduce you to how we are thinking about
at MuleSoft. Patterns are the most logical sequences of steps to
solving a generic problem. Like a hiking trail, patterns are discovered
and established based on use. Patterns always come in degrees of
perfection with much room to optimize or adopt based on the needs to
solve business needs. An integration application is comprised of a
pattern and business use case. You can think of the business use case as
an instantiation of the pattern, a use for the generic process of data
movement and handling. In thinking about creating , we had to first establish the base patterns, or best practices, to make atomic, reusable, extensible, and easily understandable.


When thinking about the format of a simple point to point, atomic,
integration, one can use the following structure in communicating a Mule
application:





Application(s) A to Application(s) B – Object(s) – Pattern
For example, you may have something like:


  • Salesforce to Salesforce – Contact – Migration
  • Salesforce to Netsuite – Account – Aggregation
  • SAP to Salesforce – Order – Broadcast
In this mini-series of posts, I will walk through the five basic
patterns that we have discovered and defined so far. I am sure that
there are many more, so please leave a comment below if you have any
ideas on additional patterns that may be valuable to provide templates
for.


Pattern 1: Migration

What is it?

Migration is the act of moving a specific set of data at a point in time from one system to the other. A
contains a source system where the data resides at prior to execution, a
criteria which determines the scope of the data to be migrated, a
transformation that the data set will go through, a destination system
where the data will be inserted and an ability to capture the results of
the
to know the final state vs the desired state. Just to disambiguate
here, I am referring to the act of migrating data, rather than
application which is the act of moving functionality capabilities between systems.





Why is it valuable?

Migrations are essential to any data systems and are executed
extensively in any organization that has data operations. We spend a lot
of time creating and maintaining data, and migration is key to keep
that data agnostic from the tools that we use to create it, view it, and
manage it. Without migration, we would be forced to lose all data that
we have amassed anytime that we wanted to change tools, and this would
cripple our ability to be productive in the digital world.


When is it useful?

Migrations will most commonly occur whenever you are moving from one
system to another, moving from an instance of a system to another or
newer instance of that system, spinning up a new system which is an
extension to your current infrastructure, backing up a dataset, adding
nodes to database clusters, replacing database hardware, consolidating
systems, and many more.


What are the key things to keep in mind when building applications using the ?

Here are some key considerations to keep in mind:


Triggering the migration app:




It is a best practice to create services out of any functionality that will be shared across multiple people or teams. Usually data migration
is handled via scripts, custom code, or database management tools, that
developers or IT operations will create. One neat way to use our Anypoint Platform
is to capture your migration scripts or apps as integration apps. The
benefit of this is that you can easily put an http endpoint to initiate
the application which means that if you kick off the integration using
by hitting a URL you can get a response in your browser, and if you kick
off the migration programmatically you can get back a JSON or any time
of response you configure and do additional steps with it. You can also
take our migration templates to the next level by making the
configuration parameters parameters you pass into the API
call. This means that you can do things like build a service to migrate
your Salesforce account data either on command or expose an API
that can grab a scoped dataset and move it to another system. Having
services for common migrations can go a long way in saving both your
development team and operations team a lot of time.


Pull or push model:




In general, you have two options on how to design your application.
With migration, you will usually want execute it at a specific point in
time, and you want it to process as fast as possible. Hence, you
probably will want to use a pull mechanism which is the equivalent of
saying “let the data collect in the origin system until I am ready to
move it, once I am ready, pull all the data that matches my criteria and
move it to the destination system.” Another way to look at the problem
is to say “once a specific event happens in the origin system, push all
the data that matches a specific criteria to a processing app that will
then insert it into the destination system” The issue with this latter
approach comes in the fact that the origination system needs to have an
export function which is initiated to push the data, and it needs to
know where the data will be pushed to, which implies programming this
functionality into the originating application. Hence it is much simpler
to use a pull (query) approach in migration because the application
pulling the data out, like a mule application, will be much easier to
create than modifying each origin system that will be used in
migrations.


Scoping the dataset:




One of the major problems with accumulating data is that over time
data with varying degrees of value will accumulate. Migrations, like
switching a home, is a perfect time to clean up shop, or at least move
only the valuables to the new destination. The issue is that the person
who is usually executing the migration does not know which data is
valuable vs a waste of bits. The usual solution is to just move
everything which tends to increase the scope of migrations in an effort
to preserve unknowingly worthless structures, relationships, and
objects. As the person executing the migration, I would recommend that
you work with the data owner to scope down the set of data that should
be migrated. If you use a Mule application, like the ones provided via
our templates, you can simply write a query in the connector
which specifies the object criteria and which fields will be mapped.
This can reduce the amount of data in terms of number of records,
objects, object types, and fields per object hence saving you space,
design, and management overhead. The other nice thing about making it a
Mule application is that you can create a variety of inbound flows (the
ones that trigger the application), or parameterize the inputs so that
you can have the migration application behave differently based on the
endpoint that was called to initialize it, or based on the parameters
provided in that call. You can use this as a way to pass in the scoping
of the data set or behave differently based on the scope provided.


Snapshots:




Since migrations are theoretically executed at a point in time, any
data which is created post execution will not be brought over. To
synchronize the delta (additional information) you can use a broadcast
pattern – something I will talk about in a future post. Its important to
keep in mind that data can still be created during the time that the
migration is running unless you prevent that, or create a separate table
where new data is going to be stored. Generally its is best practice to
run migrations while the database is offline. If its a live system that
cannot be taken offline, then you can always use the last modified time
(or some equivalent for the system you are using) in the query to grab
all the data which was created since the execution of the data.


Transforming the dataset:




Other than for a few exceptions like data backup/restore, migrating
to new hardware, etc… usually you will want to modify the formats and
structures of data during a migration. For example, if you are moving
from an on premise CRM to one in the Cloud, you will most likely not have an exact mapping between the source and destination. Using an integration platform like mule is beneficial in the sense that you can easily see and map the transformations with components like DataMapper.
You may also want use different mappings based on the data objects
being migrated so having the flexibility to build logic into the
migration can be very beneficial, especially if you would like to reuse
it in the future.


Insert vs update:




When writing a migration app or customizing one of our templates,
especially one that will be run multiple times against the same data set
and/or the same destination system, you should expect to find cases
where you need to create new records as well as modify existing records.
All of our migration templates come with logic which checks to see if
this record already exists, and if so it’s an update operation,
otherwise it will be a create operation. A unique field, or set of
fields are immutable or don’t change often are the best approach to
ensuring that you are dealing with the same object in both the source
and destination systems. For example, an email is an ok unique field
when dealing with uniqueness of people in many cases because it’s
unlikely to change and represents only one person at any given time.
However, when working with a HR system, or customers, you will probably
want to check the existence of multiple fields like SSN, birthdate,
customer ID, Last Name, things that will not change. Intelligently
picking the right fields is particularly important when you are going to
have multiple migrations populate a new system with data from a variety
of systems, especially if the new system is going to serve as a system
of record for the records stored in it. Duplicate or incomplete entries
can be extremely costly so having a solution for testing if a record
already exists and modifying it rather than creating a duplicate can
make a huge difference. Having the update logic in the migration will
also allow you to run the migration multiple times to fix errors that
arose in previous runs of the migration. Salesforce provides a nice
upset function to reduce the need to have separate logic to deal with an
insert vs update, but in many cases systems will not have such a
function which means that you will need to build that logic into the
integration application.


To summarize, migration is the one time act of taking a specific data
set from one system and moving it to another. Creating an application
using the Anypoint Platform for the purpose of migrations is valuable
because you can make it a reusable service, have it take input
parameters which affect how the data is processed, scope the data that
will be migrated, build in processing logic before the data is inserted
it into the destination system, and have the ability to create vs update
so that you can more easily merge data from multiple migration sources
or run the same migration multiple times after you fix errors that
arise. To get started building data migration applications you can start
with one of our recently published Salesforce to Salesforce migration templates.
These templates will take you a fair way even if you are looking to
build a X to Salesforce or a Salesforce to X migration application.


In the next post, I will walk you through similar types of considerations for the broadcast pattern – stay tuned!