mike-obrien.net Curriculum Vitae Blog Labs
Thursday, October 05, 2006

Some interesting peices on interviewing developers:

http://www.joelonsoftware.com/articles/fog0000000073.html

http://www.hanselman.com/blog/WhatGreatNETDevelopersOughtToKnowMoreNETInterviewQuestions.aspx

Among the many good points Joel makes, one that really struck me was on "illeagal questions" to ask an interviewee. I think some are pretty obvious but I think one really needs to be careful about questions asked in an attempt to take an interest in the person and/or be friendly. For example I love to learn about and understand different cultures, and most people love to talk about their culture and where they are from. So I usually have no qualms about engaging people in this type of conversation. But on an interview this is off limits since it could be taken, not as friendly conversation, but as discriminatory questions regarding race.The following FAQ covers these issues.

http://www.eeoc.gov/facts/qanda.html

Thursday, October 05, 2006 3:14:34 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

Joel Spolsky (Who runs Fog Creek Software) compiled a very insightful 10 item hit list on how to write better code. A lot of good points. I especially like the points about quiet working conditions and one step builds... Check it out:

http://www.joelonsoftware.com/articles/fog0000000043.html

Wednesday, October 04, 2006 11:30:57 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 
Wednesday, October 04, 2006

When making schema changes to a DB in SQL Server 2005 Management Studio you may receive the following error for long running operations:

"Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding."

Then you’ll probably go check out the execution timeout only to find that it’s set to be unlimited (As I did… ;-)). Then you’ll probably search the web to find why this doesn’t make a difference and find that there is timeout override hiding out under the “Designers” node (The LAST place I would have thought to look!). Now the problem is if you uncheck the timeout override it will not actually remove the override (Bug?). So you have to actually explicitly set the timeout.

Wednesday, October 04, 2006 3:21:57 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [6]  | 
Monday, September 25, 2006

I have been running into this problem lately where I try to create a VSS source control binding on a project that already has a binding (Albeit an incorrect one). When I try to bind the project I get this error:

"One or more of the project's destination paths are invalid. To open this solution, you must specify valid paths for each project. The error was "The Web at 'http://localhost/yada-yada-yada' already contains a project. You need to choose another location for your Web project.".

So I have found a solution in a number of blogs/forums that seems to work. Basically you need to remove the source control information for the project in question in the solution file. The lines will look like this:

SccProjectUniqueName1 = http://localhost/SolutionName/SolutionName.csproj
SccLocalPath1 = .
CanCheckoutShared = true
SccProjectEnlistmentChoice1 = 2

Then decrement the SccNumberOfProjects value by 1. Once this has been done, manually reset the binding and your golden!

Monday, September 25, 2006 4:24:36 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [1]  | 
Saturday, September 23, 2006

Recently I was looking up some information on enhancing query performance and came accross a new term; Sargable. A "contraction of Search ARGument" according the wikipedia. Basically sargability means that the DB server can take advantage of an index to speed up the query. WHERE clauses that arent sargable can cause a table scan because every value, in certain referenced column(s), in the entire table must be evaluated. One example would be MyValue=LTRIM(SomeTable.SomeValue). The result of the LTRIM is not indexed, therefore every value, in the SomeValue column, in the SomeTable table must be evaluated. If possible, you can rework a clause so that DB server can take advantage of an index. For example, instead of saying Year(ThisDate)=Year(GETDATE()) you could say ThisDate BETWEEN YearBegin(GETDATE()) AND YearEnd(GETDATE()). Where YearBegin and YearEnd are deterministic functions that return the first and last day of the year respectively. Or instead of saying WHERE DATEDIFF(day, SomeDateField, GETDATE()) > 7 you could say WHERE SomeDateField < DATEADD(day, -7, GETDATE()). In the first statement, every record must be evaluated in order to see if the condition is true. But in the second statement the current date is processed with the DATEADD function to give is a max date and the evaluation is done with this constant result (Since the DATEADD on GETDATE will only be run once when the query starts) which could then take advantage of an index. If the clause cannot be reworked to be sargable then you could create an indexed computed column.

Saturday, September 23, 2006 4:08:13 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

I have found it very useful to have date tables to perform date range queries. For example if you want to show total sales per month over the last 12 months. You can base your range off of the months over the last 12 months from your month date table. Doing a left join will allow you to return a record for each month even if there are no invoices for that month.

Date tables also give you an easy way of determining related values such as the next month begin date without having to calculate them on the fly. When you are dealing with many records, those dynamic calculations can have a definite performance impact.

The following SQL script creates a date table for the days, months and years between the start date and end date specified at the beginning of the script.

BuildDateTables.zip (.7 KB)

So in the previous example we could compile the sales summary as follows:

SELECT M.[Month], ISNULL(SUM(INV.Amount),0) AS TotalSales
FROM Months AS M LEFT OUTER JOIN Invoice AS INV ON INV.Date BETWEEN M.[Month] AND M.
NextMonthBegin
WHERE M.[Month] BETWEEN DATEADD(month, -12, GETDATE()) AND GETDATE
()
GROUP BY M.
[Month]
ORDER BY M.[Month]
DESC

Saturday, September 23, 2006 3:31:22 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [1]  | 

Sometimes when reading binary files, integers will be stored in the big endian format, especially if the file or the application creating it was initially built on a Sun or Mac machine. Intel systems normally store integers in little endian format. Thus the .NET framework does not have built in functionality to read in big endian integers (As of v1.1). One simple solution is to read the integer into a byte array and use the Array.Reverse function to simply reverse the order of the bytes. Then use the BitConverter class to turn the bytes into an integer. For example:

Dim Data As Byte()
Dim Value As Long

Redim Data(3) '--> 32 bit integer

'--> Big endian value of 360
'--> Equals Hx0168
Data(0) = 0 '--> Hx0 = 0, Most significant byte
Data(1) = 0 '--> Hx0 = 0
Data(2) = 1 '--> Hx01 = 1
Data(3) = 104 '--> Hx68 = 104, Least significant byte

Array.Reverse(Data)

'--> Now it looks like this, little endian style:
'--> Data(0) = 104, Most significant byte
'--> Data(1) = 1
'--> Data(2) = 0
'--> Data(3) = 0, Least significant byte

Value = BitConverter.ToInt32(Data, 0)

'--> Value = 360

You can read more about "endianness" here.

Saturday, September 23, 2006 2:11:05 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

Here is a little trick that will “strip” (In reality zero out) the date or time portion from a datetime value. Basically the SQL datetime is just two 4 byte integers. When you convert a datetime to a float the integer portion represents the date (Which is the number of days since 1/1/1900) and the fractional portion represents the time (The number of 3 millisecond increments that have elapsed since midnight). To zero out one or the other simply requires removing the integer or fractional portion then converting that back into a datetime. The following illustrates this:

DECLARE @DateTime float

SET @DateTime = CAST(GETDATE() AS FLOAT)

/* Date & Time */
SELECT CAST(@DateTime AS datetime) AS DateAndTime

/* Date only */
SELECT CAST(FLOOR(@DateTime) AS datetime) AS DateOnly

/* Time only */
SELECT CAST((@DateTime - FLOOR(@DateTime)) AS datetime) AS TimeOnly


I think this is much cleaner than the common method of converting the datetime to a varchar, truncating the date or time portion, then converting the varchar back to a datetime. You can read more about the datetime data type here in the BOL.

Saturday, September 23, 2006 1:56:45 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

The library found here allows you to obtain the physical coordinates of a zip code. This can be useful when dealing with weather or map related services that require a latitude and longitude as input but where the user would be providing a zip code. This small library (With a footprint of about 650k) contains the zip code to coordinate mapping as a binary resource embedded in the DLL, so no additional files or interaction with a web service is required. The data contained in the library was obtained through the US Census Bureau and is current as of the year 2000. If a zip code is not contained in the data the nearest match is returned.

Saturday, September 23, 2006 1:00:06 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 
Tuesday, September 19, 2006

Apparently you cannot set the IDENTITY flag for a column by simply using an ALTER TABLE/ALTER COLUMN in SQL Server 2000 (Not sure about 2005). I spent some time roaming around BOL and couldn’t seem to figure out how to do it. From what I have been reading Enterprise Manager has to jump through some hoops to accomplish this when you set the flag though the GUI.

Tuesday, September 19, 2006 4:26:36 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 

Here are some good SaaS blog's I have come across:

Gianpaolo's SaaS Blog: http://blogs.msdn.com/gianpaolo/

Fred Chong's SaaS Blog: http://blogs.msdn.com/fred_chong/

Brad's SaaS Blog: http://saasmaan.blogspot.com/

 

Tuesday, September 19, 2006 1:17:33 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 
Monday, September 11, 2006

When architecting a SaaS there are two ends of the spectrum when it comes to a data strategy; a shared multi-tenant schema where there is one database in which all tenants reside, then on the other end there is the isolated multi-tenant approach where each tenant has his/her own database in which schema can vary. In the whitepaper “Multi-Tenant Data Architecture”, the SaaS guys at MS describe the options as a “continuum” between these two sides (Which include a number of other important factors but I will focus on the data strategy in this entry). I think that’s a great way to describe the architecture options, especially since both ends of the spectrum can be mixed to provide an optimal solution. In my analysis of the options I don’t see either end by itself as being optimal. Let’s look at a few of the pros and cons (This list is by no means all inclusive!):

  1. Shared
    1. Pros
      1. Ability to update the schema in one place and affect all tenants.
      2. Only one instance to support, less personnel.
      3. Only requires one server (Unless vertically partitioning), less hardware.
      4. Excellent economy of scale!
    2. Cons
      1. Can only update the schema in one place and thereby affect all tenants. (This is both a pro and a con!)
      2. Difficulty restoring individual tenant data.
      3. Can only scale up.
  2. Isolated
    1. Pros
      1. Ability to update tenant schema selectively.
      2. Ability to easily restore individual tenant data.
      3. Ability to easily scale out.
    2. Cons
      1. Must update hundreds or thousands of instances to implement a schema change.
      2. Many instances to support on many servers, more personnel.
      3. Many servers to support, a lot of hardware.
      4. Poor economy of scale (Relative to Shared multi-tenant.)

Like I said, definitely not all inclusive but does represent some of the major pro’s and cons of each end of the spectrum. A little more about these issues:

  1. Schema updates – The shared approach is great when applying schema updates, once and you’re done! This is especially nice when a bug is discovered or when delivering a new version. With an isolated approach the schema must now be applied to hundreds or even thousands of instances! Ouch, no fun at all! On the other hand, what if you are releasing a new version but only want to expose a few clients to it, not all of them? For the most part (Depending on the types of changes made) this is not possible in a shared environment. Everybody gets the changes. And if it turns out that there are major issues with this version it will affect all clients.
  2. Software support – In a shared environment software support is very simple (Relatively speaking… ;-)). Schema changes are applied to one instance, there is only one version of the schema to support, bug fixes are relatively straight forward. This requires less people to support. The isolated approach on the other hand has hundreds or even thousands of database instances and schema variations to support. Bug fixes are not as straight forward since not all schemas might be affected by a particular bug. Developing new versions of the application can be tricky since there can be many variations to the schema which the application will have to account for. On the other hand an isolated approach does allow you the latitude to make custom changes to the schema and accommodate individual tenant needs.
  3. Hardware support – Shared environment, one server, one database (Unless vertically partitioning), a lot less hardware to support. Bad thing is, for the most part, you can only scale up which can seriously limit your ability to handle more tenants or resource hungry tenants. Isolated environment, many databases (Possibly hundreds or thousands) and servers to support. This means more hardware and more support personnel. BUT, you can scale out very easily. A resource hungry tenant can be put on his/her own server and as more tenants are added more hardware can be added allowing you to have more customers.
  4. Restores – Shared, the restored data must be merged in with existing data. This can be difficult, especially if the system was not designed, with the ability to do this, in mind from inception. Isolated is obviously very simple (For the most part) since it will more than likely just require a single database restore.

In an attempt to get all the pro’s and as few of the cons as possible, what if we used a hybrid approach. I will call it Shared Assignable. Basically the system and schema are architected from inception to be shared but support multiple databases which can contain one or more tenants. Tenants can be assigned to database instances through provisioning. This thought is described in the MS whitepaper mentioned previously under “Tenant-Based Horizontal Partitioning” as a scaling technique for a shared database (I would highly recomend reading the entire whitepaper!). Shared Assignable would have the following characteristics:

  1. Support multiple tenants in one database – Each database instance can support one or more tenants. This follows the shared tenant, single databse paradigm.
  2. Support multiple database instances – By means of meta-data, the system will support one or more database instances on one or more servers. Tenants are assigned to an existing database instance or to a new one during the provisioning process.
  3. Support multiple application instances – By means of meta-data a particular version of the application can be tied to a particular database instance. This enables us to run one version of the application for tenants in a particular database instance and another version for another instance. This is important for allowing limited release of a new version of the application.
  4. Support abstract data and schema identification – The data and schema will follow an identification pattern (Similar to implementing an interface on a class) that ties records and schema elements to an individual tenant. This allows the system to identify tenants and also to perform ETL operations on individual clients without being required to understand the full extent of the schema. For example, at a basic level, every schema element would begin with a “MyApp_” and every table will be required to have a “ClientId” column. In the event a tenant must be moved into another database instance for performance or size reasons, the ETL process can simply enumerate tables that begin with “MyApp_” and records that have a “ClientId” of 123. These can then be moved to the new instance without the ETL process having prior understanding of the schema other than the table prefix and “ClientId” column name. These ETL operations can be written once and not have to be updated to handle schema variances between instances or versions.

This approach allows you to group tenants much the same way you would group applications in application pools in IIS 6. For example let’s say you have 1000 tenants. The following factors determining their usage (Again, not all inclusive):

  1. Feature exploitation
  2. Number of concurrent users
  3. Usage time
  4. Data size

As such, they can be classified as follows:

  1. Lightweight – Exploit a small number of features, low number of concurrent users, use the system for a small amount of time each day and do not have much data. Data can reside in a shared environment.
  2. Middleweight Shared– Exploit many of the features, moderate number of concurrent users, use the system during business hours, moderate amount of data. Data can reside in a shared environment.
  3. Middleweight Isolated – Same as above but data cannot be shared because of privacy and/or performance and disk space reasons.
  4. Heavyweight – Exploit all features and drive new features, high number of concurrent users, heavy usage around the clock because of global staff, large quantity of data. Data can not be shared because of privacy and/or performance and disk space reasons.


With that classification let’s say that 800 of those tenants are lightweight, 195 are middleweights 145 of which can be in a shared database and 50 that cannot. And 5 are heavyweights. The isolated approach would produce 1000 separate instances. Since we are obviously “catching the long tail” as seen with the tenant numbers above, we would seriously be loosing economy of scale with the isolated approach. Plus since more than 80% of our clients have very low hardware requirements we would be throwing away those resources which are cutting into profit margins and could be focused on the heavyweights. Schema updates would be expensive as well. The shared approach would be challenging too, schema updates would be a cinch and we would have great economy of scale but our higher end clients could potentially scuttle our hardware as a result of their resource requirements. The Shared Assignable architecture however would in many ways give us the best of both worlds. The following illustrates how we could architect our SaaS data with the above scenario:

 

In this architecture we are appropriately distributing our resources so as to make the most of them. We are giving more to those who need it and less to those who don’t. We are also not creating a high maintenance environment for smaller tenants which do not require it. In this example we are spreading 60 databases across 8 servers. In a purely isolated environment we would be dealing with 1000 databases over many servers. In a strictly shared environment we are dealing with one server, one huge database. Both scenarios would be very difficult to maintain and scale.

When creating a new tenant our provisioning process will give us the ability to add the tenant to an existing database instance or create a new one.  This will allow us to judiciously place the tenant according to their needs and our architecture. Gianpaolo Carraro made an excellent observation in regards to service levels in this type of environment. By giving the provisioning process the ability to provision a tenant in a shared database or isolated database, on a shared server or isolated server we are enabling a pricing structure for the service. A silver package could be a shared instance, a gold package could be an isolated instance on a shared server and a platinum package could be an isolated instance on a standalone server. This allows you to offer a package that best suites the customer and recover costs for the services rendered and resources used.

Using meta-data we can also tie databases instances with a particular application instance. For example let’s say that when we make a major release we only want to release it to one instance in our middleweight pool. The meta-data can direct requests for tenants in that database instance to use the beta version of the application. All other tenants would use the stable version. This mapping would only be available per database instance since the schema is tied to a particular application version.

Lastly, it is key that tenant data be easily movable thus allowing it to be "Assignable". It is inevitable that we will want to move tenants around as needs change and as the tenant base grows. This can be easily achieved by implementing an abstract identification schema on all database objects owned by a tenant. One way to do this is define object prefixes. For example all tables in our app could be prefixed with a “MyApp_Client_”. This could tell any ETL process in our system that this is a “My App” table that contains client records. Secondly we will need to include a client identification column in all client tables, for example “ClientId. Obviously this will be present by virtue of the fact that we are using a shared schema but it will need to follow a set convention in all tables. In other words it needs to have the same column name and data type across all client tables. In an ETL process that moves a tenant for example, we could provide the tenant we want to move and the server/database we want to move it too. The ETL process could then grab all tables that begin with “MyApp_Client_” in the source database and all the records in those tables that have a ClientId of xyz. Our ETL process could then move these records into the target database. In the case where the system allow additional columns to be added to a table, the ETL process can dynamically detect these from the source table and add them to the target table. It is also noteworthy that the base data schema be metadata driven to allow the provisioning process to create new databases either for a new tenant or one that is moved into a new database. Plus it is required for schema upgrades. But that’s a whole other can of worms… ;-)

Monday, September 11, 2006 8:30:17 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [4]  | 
Thursday, September 07, 2006

Never realized this but in order to implement one or more interfaces on an interface in VB.NET you use the Inherits keyword, not the Implements keyword...

Namespace LineChart

   Public Interface IRange

      Inherits IEnumerable
      Inherits IDisposable

      ReadOnly Property Count() As Integer
      ReadOnly Property Items() As RangeItem()

   End Interface

End Namespace

Thursday, September 07, 2006 4:32:57 PM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  | 
Sunday, September 03, 2006

The following function allows you to obtain (x,y) coordinates within a 3d perspective pie slice by specifying the pie dimensions, the radius (Distance from the pie origin out) and the inner angle (Degrees from the pie start angle, clockwise). The first step is to inverse our angle since our trig functions assume that degrees increase counter-clockwise from the x axis. The .NET pie function works in the opposite way, with degrees increasing clockwise from the x axis. Next we convert the degrees into radians. Then calulate the absolute coordinates of the pie origin so we can transform the relative corrdinates returned by the trig functions. Next we do the math; the the cosine of the target radian multipled by the radius gives us the X coordinate and the sine of the same gives us the Y coordinate. We need to inverse the Y coordinate since the trig functions assume a grid with an ascending Y axis above the X axis and Windows is the oppisite; ascending Y axis below the X axis. Note: This method does not work for the default pie behavior. This method only works when angles are transformed to a 3d perspective. 

The Get3dPiePoint and support functions:

Private Function Get3dPiePoint(ByVal Bounds As System.Drawing.Rectangle, _
   ByVal Angle As Single, _
   ByVal SweepAngle As Single, _
   ByVal XRadius As Integer, ByVal YRadius As Integer, _
   ByVal InnerAngle As Single) As Point

   Dim MidRadian As Single = ToRadian(GetInverseAngle(Angle, InnerAngle))
   Dim TransformX As Integer = Bounds.X + (Bounds.Width / 2)
   Dim TransformY As Integer = Bounds.Y + (Bounds.Height / 2)

   Return New Point((Math.Cos(MidRadian) * XRadius) + TransformX, (Math.Sin(MidRadian) * YRadius * -1) + TransformY)

End Function

Private Function GetInverseAngle(ByVal StartAngle As Single, _
   ByVal InnerAngle As Single) As Single

   Return 360 - (StartAngle + InnerAngle)

End Function

Private Function ToRadian(ByVal Angle As Single) As Single

   Return (Math.PI * Angle) / 180

End Function

The following code snippet demonstrates how to draw a text label in the center of a 3d perspective pie slice.

Private Sub Form1_Paint(ByVal sender As Object, _
   ByVal e As System.Windows.Forms.PaintEventArgs) Handles Me.Paint

   Dim PieBounds As New Rectangle(20, 20, 400, 100)
   Dim PieAngle As Single = 150
   Dim PieSweepAngle As Single = 100
   Dim PieText As String = "Strongbadia (28%)"
   Dim PieTextFont As New Font("Arial", 10, FontStyle.Bold)
   Dim PieTextPoint As Point
   Dim PieTextXRadius As Single = PieBounds.Width / 4
   Dim PieTextYRadius As Single = PieBounds.Height / 4
   Dim PieTextAngle As Single = PieSweepAngle / 2
   Dim PieTextBounds As SizeF

   '--> Grab the centerline of the pie slice (Sweep angle div 2)
   '--> and half the radius (Diameter div 4)
   PieTextPoint = Get3dPiePoint(PieBounds, PieAngle, PieSweepAngle, PieTextXRadius, PieTextYRadius, PieTextAngle)

   '--> Grab the dimensions of the text
   PieTextBounds = e.Graphics.MeasureString(PieText, PieTextFont)

   '--> Determine the 3d angles
   PieSweepAngle = To3dSweepAngle(PieBounds, PieAngle, PieSweepAngle)
   PieAngle = To3dAngle(PieBounds, PieAngle)

   '--> Draw the pie
   e.Graphics.FillPie(Brushes.SteelBlue, PieBounds, PieAngle, PieSweepAngle)

   '--> Center the text over the mid point
   PieTextPoint.X -= PieTextBounds.Width / 2
   PieTextPoint.Y -= PieTextBounds.Height / 2

   '--> Draw the text
   e.Graphics.DrawString(PieText, PieTextFont, Brushes.White, PieTextPoint)

End Sub

Private Function To3dSweepAngle(ByVal Bounds As Rectangle, _
   ByVal Angle As Single, _
   ByVal SweepAngle As Single) As Single

   If SweepAngle Mod 180 <> 0 Then

      Dim Angle3d As Single = To3dAngle(Bounds, Angle)
      SweepAngle = To3dAngle(Bounds, Angle + SweepAngle) - Angle3d

   End If

   If SweepAngle < 0 Then

      SweepAngle += 360

   End If

   Return SweepAngle

End Function

Private Function To3dAngle(ByVal Bounds As System.Drawing.Rectangle, _
   ByVal Angle As Single) As Single

   Dim Radians As Single = ToRadian(Angle)
   Dim X As Double = Bounds.Width * Math.Cos(Radians)
   Dim Y As Double = Bounds.Height * Math.Sin(Radians)
   Dim Angle3D As Single = Math.Atan2(Y, X) * 180 / Math.PI

   If Angle3D < 0 Then

      Return Angle3D + 360

   Else

      Return Angle3D

   End If

End Function

 

Sunday, September 03, 2006 6:21:53 AM (GMT Standard Time, UTC+00:00)  #   |  Comments [0]  |