Operation and Monitoring Guide


Table of Contents

Audience

This document provides design, operation and monitoring instructions for the Microsoft CityNext Big Data Solution Accelerator. The audience for this document includes application specialists, IT administrators and operators who are in position to maintain and operate the solution accelerator.

Design Goal

Management

Manage by Management Studio

Management Studio is capable of managing components of the solution accelerator and implementing the user interface for end-to-end user scenarios. Including:
  • System Health
    • Service Status
  • Data Ingestion
    • Pull Source Management
    • Push Source Management
    • Entity Type Management
  • City Artifacts Management
    • Entity Schema Management
    • Object Schema Management
  • City Analytics
    • Analytic Node Management
    • Job Management
    • Hadoop User Management
    • Set Credentials
    • Analytics Resource
  • City Services Management
    • Artifact Management
    • Data Management
    • Interface Management
    • Category Management
  • Information Dissemination
    • Data Producer Management
    • Adapter Management
    • User Group Management
    • Notification
  • Diagnostic
    • Schedule
    • Job Monitor
    • OData Explorer
    • Rubber Duck
    • HBase Admin
    • Yarn Admin
    • HDFS Admin

2.1.1 Manage by Management Studio.png

Manage by PowerShell Cmdlets

PowerShell cmdlets provide an alternative approach to manage the solution accelerator with equal capability. For detail guideline, please refer to http://aka.ms/codeplex.

Monitoring

2.2 Monitoring.png
  • Management Studio: A centralized UI portal is offered to manage and monitor solution accelerator services visually.
  • Monitoring Service: A dedicated monitoring service run with smoke test suite in order to verify availability of individual solution accelerator services.
  • SCOM Integration (Optional): Management Studio and monitoring service integrated with System Center Operation Manager 2012 to collect performance counters in terms of system load and service usage. It is not implemented in Sandbox environment.

Overall Monitoring

2.2.1 Overall Monitoring.png

Service Availability Monitoring

2.2.2 Service Availability Monitoring.png

Service Functionality Monitoring

“Rubber Duck” is a nick name of functionality monitoring for the solution accelerator. It runs a bunch of automated smoke tests to verify the availability of the fundamental functionality of the solution accelerator. Tests in the production strategy is applied while monitoring in order to isolate test data and production data. The user can easily kick start a round of functionality monitoring on demand or it will automatically run on schedule every 8 hours.

Back to top

Operation

Operation relies on Management Studio UI, PowerShell and other Microsoft products.
  • Diagnostic: An on demand diagnostic service runs to check the functionality of the solution accelerator, to report any errors and provide solutions.
  • Data Migration: An on demand data Migration service runs to import/export frequently used configuration files and data.
  • Telemetry: A telemetry service continually runs in the background to collect logs and tracing and performance counters, as well as provide end-to-end analysis for operators and stakeholders.
  • Data Storage Backup/Restore: The data storage backup and restore relies on the functionality of both the SQL Server and Hadoop.

Roles and Permissions

The table below shows the groups assigned to the solution accelerator for operational roles. Each administrator should have a dedicated account to log in and manage the system. An audit mechanism should be built as well to track every action on the system with similar personnel in charge.

2.4 Roles and Permissions.png

Back to top

Monitoring Guideline

Entry Point

Open the Management Studio site with IE through this link: http://<CityNextVMExternalFQDN>:8100/

Monitor Health

In “System Health” view, you can monitor the overall status (clock shape) of the solution accelerator services on the top of the page. Including:
  • Running virtual machines
  • Running Services
  • Overall CPU Percentage In Use
  • Overall Memory Usage
  • Overall Storage Usage
  • Overall Network Usage
3.2 Monitor Health.png

Monitor Individual Service Health

In “System Health” view you can monitor the service running status (colorful tail). The three colors below represent the service status:
  • Green: the service is running with correct functionality and good/quick performance.
  • Yellow: the service is running with correct functionality and bad/slow performance.
  • Red: the service is running with incorrect functionality or out of service.
  • Threshold of RTT can be set in the monitoring service configuration file. By default, it is 1 second.
3.3 Monitor Individual Service Health.png

Monitor VM Host Status

In “System Health” view you could click the “View All VM” icon at the bottom left to view the VM topology regarding hosting services. The two colors below represent the VM status:
  • Green: the VM is available.
  • Red: the VM is down.
3.4 Monitor VM Host Status.png

Monitor Service Usage Chart (SCOM Required)

Pre-requisites:
  • Ensure SCOM is installed on your system
  • Ensure SCOM is well configured to monitor web services and SQL
  • Set the correct URL in the monitoring service configuration file
  • Enable SCOM integration in the Management Studio web.config file
Clicking the service name inside the tail will lead you to the service usage page in SCOM. You can monitor KPIs and trend analysis in a time flow chart.
  • Service usage (API requests per minute)
3.5 Monitor Service Usage Chart (SCOM Required).png

Monitor VM Performance Chart (SCOM Required)

Pre-requisites:
  • Ensure SCOM is installed on your system
  • Ensure SCOM is well configured to monitor VMs
  • Set the correct URL in the monitoring service configuration file
  • Enable SCOM integration in the Management Studio web.config file
Clicking the VM name inside the tail (VM lists is expanded) will lead you to the VM load page in SCOM. You can monitor KPIs and trend analysis in a time flow chart.
  • CPU
  • Memory
  • Disk I/O
  • Free Disk Space
  • Network Throughput
3.6 Monitor VM Performance Chart (SCOM Required).png

Monitor Service Functionality

  • Open Management Studio.
  • Click Diagnostic -> Rubber Duck.
  • Click “New” on bottom left, and a new round of smoke tests will be launched. If there is a job already running, it will inform you to wait.
  • After 10 ~ 20 minutes, the results will be shown in the list.
3.7 Monitor Service Functionality.png
  • Select the one you ran, click “View Result” on the bottom right. The result report will be shown.
3.7 Monitor Service Functionality b.png
  • Click “View Log” and you will see the detailed information of that smoke test.

Back to top

Operation Guideline

Diagnostic

Overview

This section is designated to explain the Management Studio diagnostic target for detecting data issues, monitoring data progress in Data Ingestion and summary information for City Artifacts Management.

Monitoring
  • DI pipe line progress status
  • Object Data distribution detail and scale
  • Object Schema related data distribution details and scale
  • Entity Schema binding Object Schema related data distribution details and scale
  • The index details for specified Entity Schema
Diagnostic
  • Object data to Object Schema definition integrity check
  • Object Schema to Entity Schema integrity check
  • Index table to Entity Schema integrity check

Data Ingestion Diagnostic

Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “Data Ingestion”
  • Click “Pull Source management”
  • Each schedule will show related progress status in the pull source list
  • Click the “Schedule” button and it will show specified source related schedule details, as well as progress status
    • Status options (pending, processing, complete)

City Artifacts Management Diagnostic

HBase data integrity check

Object and Object Schema
Integrity check item
  • Invalid data value
    • The attribute definition exists for Object, but value is empty (except string type).
  • Non-nullable value is null
    • The attribute definition is non-nullable for Object Schema, but in object data the attribute is missing.
  • Attribute definition mismatch
  • The data type for the Object Schema attribute definition is mismatched with actual Object data values.

Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Object Schema Management”
  • Click the “Diagnostic” button
  • Select “Verification” button for the specified Object Schema
  • Send to Detail page for the specified Object Schema by clicking the status link when the job status is shown as “Done”
  • Details page will show test results and a detail error list

Object Schema and Entity Schema
Integrity check item
  • Data type mismatch
    • The binding check is to verify the attribute data type to see if there is a match between specified Entity Schema and related Object Schema.

Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Entity Schema Management”
  • Click the “Diagnostic” button
  • Select “Verification” button for the specified Entity Schema
  • Send to Detail page for the specified Entity Schema by clicking the status link when the job status is shown as “Done”
  • Details page will show test results and a detail error list

Entity Schema and Index
Integrity check item
  • Missing index column (index DB)
    • The Entity Schema index attribute is missing in the index table.
  • Redundant index column (index DB)
    • The index table column definition does not match the Entity Schema.

Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Entity Schema Management”
  • Click the “Diagnostic” button
  • Select “Index Verification” button for the specified Entity Schema
  • Send to Detail page for the specified Entity Schema by clicking the status link when the job status is shown as “Done”
  • Details page will show test results and a detail error list

HBase data scale measurement

Data scale measurement will show the actual data count in the Object table for the specified entity.

Data Scale per Data Source
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Object Schema Management”
  • Click the “Diagnostic” button
  • Select “Summary” button for the specified Object schema
  • Send to Detail page for the specified Object Schema by clicking the status link when the job status is shown as “Done”
  • Details page will show the actual data count in the Object table for the specified Object Schema

Data Scale per Entity Schema
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Entity Schema Management”
  • Click the “Diagnostic” button
  • Select “Summary” button for the specified Entity schema
  • Send to Detail page for the specified Entity Schema by clicking the status link when the job status is shown as “Done”
  • Details page will show the actual data count in the Object table for the specified Entity Schema

HBase data distribution measurement

Data distribution measurement will show the Object data distribution details in the regional server for the specified entity.

Data distribution per Data Source
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Object Schema Management”
  • Click the “Diagnostic” button
  • Select “Summary” button for the specified Object schema
  • Send to Detail page for the specified Object Schema by clicking the status link when the job status is shown as “Done”
  • Details page will show the Object data distribution details in the regional server for this Object Schema

Data distribution per Entity Schema
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Entity Schema Management”
  • Click the “Diagnostic” button
  • Select “Summary” button for the specified Entity schema
  • Send to Detail page for the specified Entity Schema by clicking the status link when the job status is shown as “Done”
  • Details page will show the Object data distribution details in the regional server for this Entity Schema

Index information

Index information includes two variables, the indexed attribute list of Entity Schema and the indexed rate for Entity Schema related data.

Steps to get Index information
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click the “Entity Schema Management”
  • Click the “Diagnostic” button
  • Select “Index Summary” button for the specified Entity schema
  • Send to Detail page for the specified Entity Schema by clicking status link when the job status is shown as “Done”
  • Details page will show the indexed attribute list and the indexed data rate for specified entity schema

Job Collection and Scheduler

Job collection is a container of jobs (diagnostic). It is a scheduled triage target and vector.

How to setup scheduled batch jobs in Management Studio?
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “Diagnostic”
  • Click “Schedule”
  • Click the “New” button
  • Input meaningful job collection name
  • Set triage type
  • Set schedule time
  • Click the “Add” button to add new job
  • Configure new job
  • Click “Submit” to activate the schedule
Triage Type
  • Schedule
    • Weekly settings
  • Interval
    • Periodic settings

Job Type
4.1.4 Job Collection and Scheduler.png

Data Import/Export

Data migration in Management Studio is targeted for Data Ingestion & City Artifacts Management. Below is the operation process for both importing and exporting data.

Export Process
  1. Send the job to Management Studio.
  2. Save the result as a CSV file and store it in the FTP server.
  3. View or export the result by downloading off the FTP server.
Import Process
  1. Upload the file to the FTP server by following the tips listed below.
  2. Submit the job with an FTP file path.
Tips:
File path sample: ftp://<CityNextVMExternalFQDN>:2500/{UploadFileFolder}

How to upload a file to FTP using IE?
  1. Open IE.
  2. Tools > Internet Options > Advanced.
  3. Check “Enable FTP folder view (outside of Internet Explorer)” and “Use Passive FTP (for firewall and DSL modem compatibility)”.
  4. Navigate to ftp://<CityNextVMExternalFQDN>:2500/ with domain account.
  5. Press “Alt” > View > “Open FTP site in File Explorer” with domain account.
  6. Create your folder and copy the file there.

DSML Import/Export

Export
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “Data Ingestion”
  • Click “Pull Source Management”
  • Select one data source
  • Click the “Export” button
  • Click “Ok” to submit export job , “Cancel” to cancel the job
  • Send to Detail page for the specified pull source by clicking the status link when the job status is successful
  • Details page will show the export result summary information and the result file path in the FTP server
Import
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “Data Ingestion”
  • Click “Pull Source Management”
  • Click the “Import” button
  • Input the FTP path by using the tips above
  • Click “Diagnostic” > Job Monitor to select the job
  • Click the “View Details” button to view details

Object and Object Schema Import/Export

Object Schema Import/Export target for both Object table data and Object Schema data.

Export
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Object Schema”
  • Click the “Migration” button
  • Select one Object Schema inform the list page
  • Click the “Export” button
  • Click “Ok” to submit the export job , “Cancel” to cancel the job
  • Send to Detail page for the specified pull source by clicking the status link when the job status is successful
  • Details page will show the export result summary information and the result file path in the FTP server
Import
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Object Schema”
  • Click the “Migration” button
  • Click the “Import” button
  • Input the FTP path by using the tips above
  • Click “Diagnostic” > Job Monitor to select the job
  • Click the “View Details” button to view details

Entity Schema Import/Export

Entity Schema Import/Export target only for Entity Schema table data.

Export
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Entity Schema”
  • Click the “Migration” button
  • Select one Entity Schema inform the list page
  • Click the “Export” button
  • Click “Ok” to submit the export job , “Cancel” to cancel the job
  • Send to Detail page for the specified pull source by clicking the status link when the job status is successful
  • Details page will show the export result summary information and the result file path in the FTP server
Import
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “City Artifacts Management”
  • Click “Entity Schema”
  • Click the “Migration” button
  • Click the “Import” button
  • Input the FTP path by using the tips above
  • Click “Diagnostic” > Job Monitor to select the job
  • Click the “View Details” button to view details

Job Collection and Scheduler

Job collection is a container of jobs (migration). It is a scheduled triage target and vector.

How to setup scheduled batch jobs in Management Studio?
Open Management Studio with IE through this link: http://<CityNextVMExternalFQDN>:8100/
  • Select “Diagnostic”
  • Click “Schedule”
  • Click the “New” button
  • Input meaningful job collection name
  • Set triage type
  • Set schedule time
  • Click the “Add” button to add a new job
  • Configure new job
  • Click “Submit” to activate the schedule
Triage Type
  • Schedule
    • Weekly settings
  • Interval
    • Periodic settings
Job Type
4.2.4 Job Collection and Scheduler.png

Data Storage Backup/Restore

HBase Backup

  • Frequency: Daily
  • Approach: $ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename> <outputdir> <versions> [<starttime> [<endtime>]]
  • Reference: HBase live backup
  • Scope:
    • EntitySchema
    • ObjectSchema
    • Object
    • System

HBase Restore

  • Frequency: On demand
  • Approach: $ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename> <inputdir>
  • Reference: HBase live backup
  • Scope:
    • EntitySchema table
    • ObjectSchema table
    • Object table
  • System table

Hive Backup (TBD)

Hive Restore (TBD)

SQL Database Backup

  • Frequency: Daily
  • Approach:
    • SQL Management Studio
    • Transact-SQL
-- Create a full database backup first.
BACKUP DATABASE MyAdvWorks
TO DISK =’<bakfilefullpath>'
WITH INIT;
GO
-- Time elapses.
-- Create a differential database backup, appending the backup
-- to the backup device containing the full database backup.
BACKUP DATABASE MyAdvWorks
TO DISK =’<bakfilefullpath>'
WITH DIFFERENTIAL;
GO
  • Scope:
    • AnalyticUsers.{UserName}
    • ArchiveMessageDB
    • CamIndexDB
    • CityNext.AnalyticsSink
    • CityNext.Configuration
    • CityNext.Core
    • CityNext.DataIngestion.DsmDB
    • CityNext.DataIngestion.RawDataDB
    • CityNext.DiagnosticDB
    • IDManagementDB
    • IdPayloadDB
    • Power BI
    • WSS_Content
    • TelemetryDW
Reference: http://technet.microsoft.com/zh-cn/library/ms187510.aspx

SQL Database Restore

  • Frequency: On demand
  • Approach:
    • SQL Management Studio
    • Transact-SQL
-- Assume the database is lost, and restore full database,
-- specifying the original full database backup and NORECOVERY,
-- which allows subsequent restore operations to proceed.
RESTORE DATABASE <db_name>
FROM DISK =’<bakfilefullpath>'
WITH NORECOVERY;
GO
-- Now restore the differential database backup, the second backup on
-- the MyAdvWorks_1 backup device.
RESTORE DATABASE <db_name>
FROM DISK =’<bakfilefullpath>'
WITH FILE = 2,
RECOVERY;
GO
  • Scope:
    • Raw DB
    • Configuration DB
    • DSM DB
    • OLAP Cubes
    • Analytic Temp DB
    • ArchiveMessageDB
    • CamIndexDB
    • Core DB
    • IDManagementDB
    • IDPayloadDB
    • Power BI
    • SharePoint DB
Reference: http://technet.microsoft.com/zh-cn/library/ms175510.aspx

Telemetry

Telemetry service provides an end-to-end solution to collect tracing, logging, and performance counter information from HBase and analyze them through OLAP Cubes. For details, please refer to http://aka.ms/mscitynextbigdata.

Password Management

Password expiration of accounts will lead to associated services being disabled. Password management in terms of expiration alerting, password protection, password lifecycle management, and strength commitment are mandatory for accounts listed below:
  • All accounts in "Roles and Permissions"
  • Azure/Office 365 subscription accounts
  • Exchange Email accounts

Certificate Management

Certificate expiration will lead to associated services being disabled. Certificate management in terms of expiration alerting, key protection, and lifecycle management are mandatory.
  • Service Bus Certificate
  • Azure Certificate

Encryption Key Management

Encryption key for configuration service should be managed and refreshed periodically in order to ensure that configurations of the solution accelerator are not affected by malicious modification.
  • Configuration DB Encryption Key

Security Patching

Some necessary security patching for Microsoft products may lead to a VM/service restarting, and may also affect the availability of the solution accelerator. Security patching must be centrally managed and scheduled to avoid unexpected down time.

Disaster Recovery

Disaster recovery is currently not supported for the solution accelerator.

Back to top

Upgrade (TBD)

Rollback (TBD)

Daily Checklist

5 DAILY CHECKLIST.png
Back to top

Monthly Checklist

6 MONTHLY CHECKLIST - Edited.png
Back to top

Contact and Support

For any issue, please send an email to mscitynextbigdata_sp@microsoft.com.

Last edited Aug 1, 2014 at 9:51 AM by gheadd, version 7